-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory management for training on large data sets #137
Labels
feature
A new feature
Comments
alanakbik
pushed a commit
that referenced
this issue
Oct 10, 2018
alanakbik
pushed a commit
that referenced
this issue
Oct 10, 2018
alanakbik
pushed a commit
that referenced
this issue
Oct 10, 2018
alanakbik
pushed a commit
that referenced
this issue
Oct 11, 2018
alanakbik
pushed a commit
that referenced
this issue
Oct 11, 2018
alanakbik
pushed a commit
that referenced
this issue
Oct 11, 2018
alanakbik
pushed a commit
that referenced
this issue
Oct 11, 2018
will be part of release-0.3 and activated by default for |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In uses cases where training data sets are large or there is little available RAM, language model embeddings cannot be stored in memory (see #135 ).
Current solution: Currently, the only way to still train a model in such cases is to set the
embeddings_in_memory
tag toFalse
in the trainer classes (TextClassifierTrainer
orSequenceTaggerTrainer
). With this flag, embeddings get generated on the fly at each epoch and immediately discarded after use, which solves the memory issue but is computationally expensive since there is no re-use of already computed embeddings.Idea: Use a key-value store to persist embeddings to disk and enable quick lookup of already computed embeddings. A nice side-effect is that if we run several experiments on the same dataset, embeddings from earlier runs can be re-used, thus speeding up parameter-sweep experiments.
The text was updated successfully, but these errors were encountered: