Lazily loading embeddings #25

oxinabox · 2019-11-12T19:12:25Z

I also wonder if it would be possible to do the loading also of the Word2Vec default embeddings lazily since that could take down the time when first executing using Embeddings. Would simplify testing and use in "downstream" packages which might only optionally use the embeddings.

In theory it is possible that rather than storing the array,
we could store some lazy array that is only instantiated when it is accessed.
We'ld still need to process the whole file to get the vocabulary.
For Word2Vec I don't think it would gain much as those we have in a binary format.
But for some of the others like FastText we have them in a text format,
and so parsing takes some time.

Note: this would not change how long using Embeddings takes.
As no embeddings are actually loaded when you do that -- you need to call load_embeddings before anything is loaded.

I won't have time to work on this any time soon but would review PRs

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily loading embeddings #25

Lazily loading embeddings #25

oxinabox commented Nov 12, 2019 •

edited

Loading

Lazily loading embeddings #25

Lazily loading embeddings #25

Comments

oxinabox commented Nov 12, 2019 • edited Loading

oxinabox commented Nov 12, 2019 •

edited

Loading