-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow during FastText
training with corpus_file
#2258
Comments
OverflowError: value too large to convert to int
on FastText training with corpus_file
I see a similar error in Doc2Vec. I an verify that total_words is larger than a 32 bit integer. There's not an easy solution to this since training on a corpus_file will throw a different exception if total_words isn't present.
|
Thanks for the report @joelkuiper! |
OverflowError: value too large to convert to int
on FastText training with corpus_fileFastText
training with corpus_file
@menshikh-iv Since this is tagged "easy", I'm guessing the fix is to replace the int declaration here with something like a long? |
@mpenkov yes, something like this ( |
I am experiencing this same bug as well when training Word2Vec with a large corpus. There has been a pull request for this bug here for a couple of months. Would you please fix this one? Thanks. |
…iskvorky#2258) * replace `int` by `long long`
…iskvorky#2258) * replace `int` by `long long`
Description
model = FastText(corpus_file="sentences_norm.txt.gz", workers=14, iter=5, size=200, sg=1, hs=1)
with the following sizes
yields
on all workers. Note that the sg and hs parameters seem to have no relation to this, also happens without them.
Steps to reproduce
model = FastText(corpus_file="sentences_norm.txt.gz", workers=14, iter=5,size=200)
Expected Results
Should train the model
Actual Results
Exception thrown, no further output.
Versions
Python 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]
NumPy 1.15.3
SciPy 1.1.0
gensim 3.6.0
On Ubuntu 16.04
edit seems to work fine when passing in a
LineSentence
objectThe text was updated successfully, but these errors were encountered: