-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new tags in doctag_vectors in #3262
Comments
Expanding the set of known doctags hasn't been supported; the work allowing expanding the Note that even if supported, such incremental expansions of a model are fraught with difficult tradeoffs. To the extent a new batch contains a different mix of words, word-senses, & topics than earlier data – & if it didn't, why bother with more training? – it will only "drag" parts of the model towards new weights, leaving others untouched, which risks degrading its overall usefulness unless you're carefully considering the mixes/balances between older & newer training data, & monitoring for ill-effects. (You can't assume incremental batches of new training are always improving things.) The surest way to ensure balance between all training data is to re-train everything in one sessiion. That is, when new data arrives, add it to the full corpus, & train again on the full corpus, & use the later model's values instead of any earlier model (with which the later model's coordinates may not be compatible). But if you thought you really needed to just do smaller updates, other options could include:
(There might be other options, depending on the details of how you're using the model/doc-vectors for downstream.) |
Hello!
I am training a doc2vec model on a tagged docset.
I need to update it on new sets that contain new tags. Is there a way to update docvectors in gensim.doc2vec? How can I do it?
There is an old issue #1019 on the same topic, but it didn't help me as there were many changes in gensim. Maybe there is another way?
The text was updated successfully, but these errors were encountered: