-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems loading glove.840B.300d/glove.840B.300d.txt (GloVe{:en}, 6) #24
Comments
I have reproduced this. What seems to be happening is that somewhere betwen the 510^4 and 610^4 entry in that file is an invalid line.
I guess the solution is to change this line Line 62 in f12e909
to call some function that does glove_float_parse(x) = x == "." ? 0f0 : parse(Float32, x) or maybe
PR would be welcome if you care to investigate further |
Right found it
The word is So my fix above is wrong |
Sounds almost easier to "patch" this specific file than introduce a potentially slower parsing overall to handle this? |
Nah the parsing change in #24 actually will speed it up |
Ok, nice. :) |
only split on spaces only in Glove, not any whitespace fixes #24
Thanks for the Embeddings.jl package; it is great! I'm building a Word Mover's Distance implementation using Sinkhorn Distance approximation, on top of it.
I wanted to try also with one of the larger embeddings so tried this on my Macbook Pro 2015 with Julia 1.2:
but after the long downloading process I then get:
When I load for example
load_embeddings(GloVe{:en}, 4)
there is no problem. Anyone else had a similar problem withglove.840B.300d
and is there a workaround?I also wonder if it would be possible to do the loading also of the Word2Vec default embeddings lazily since that could take down the time when first executing
using Embeddings
. Would simplify testing and use in "downstream" packages which might only optionally use the embeddings.The text was updated successfully, but these errors were encountered: