Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily loading embeddings #25

Open
oxinabox opened this issue Nov 12, 2019 · 0 comments
Open

Lazily loading embeddings #25

oxinabox opened this issue Nov 12, 2019 · 0 comments

Comments

@oxinabox
Copy link
Member

oxinabox commented Nov 12, 2019

in #24 @robertfeldt said

I also wonder if it would be possible to do the loading also of the Word2Vec default embeddings lazily since that could take down the time when first executing using Embeddings. Would simplify testing and use in "downstream" packages which might only optionally use the embeddings.

In theory it is possible that rather than storing the array,
we could store some lazy array that is only instantiated when it is accessed.
We'ld still need to process the whole file to get the vocabulary.
For Word2Vec I don't think it would gain much as those we have in a binary format.
But for some of the others like FastText we have them in a text format,
and so parsing takes some time.

Note: this would not change how long using Embeddings takes.
As no embeddings are actually loaded when you do that -- you need to call load_embeddings before anything is loaded.

I won't have time to work on this any time soon but would review PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant