-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 1779 #1843
Fix 1779 #1843
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @saroufimc1.
Need to merge #1777 (refactoring of *2vec API) and fix this code after.
@manneshiva please have a look!
@@ -1,8 +1,8 @@ | |||
#!/usr/bin/env python | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Author: Jayant Jain <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure about the header. Feel free to take the code and do whatever you want with it.
@saroufimc1 #1777 merged, please resolve merge conflict first |
Ping @saroufimc1, when you plan to finish this PR? |
Unfortunately, you merged conflict incorrectly (look at current state of Are you planning to finish PR? |
@menshikh-iv Sorry about that, I am not familiar with all the changes of the source code that happened in the meantime. Is it possible that you take my code and do the merging yourself since you are more familiar with the changes? It would be much faster I guess. By the way, I am still not convinced that the way to remove all subwords not used etc. This slows down the loading a lot. |
@saroufimc1 no problem, I close this with "almost complete" label, we will return to this later and fix it, thanks! CC: @manneshiva |
Addressing issue #1779. Correct the assignment of model.wv.syn0_ngrams after trimming of unused ngrams (model.wv.syn0_ngrams.shape[0] <= self.bucket).
It is 2x slower on pre-trained Wikipedia models from Facebook's fastText than if we do not trim unused ngrams. Therefore, I recommend dealing with #1261.
For the test case, we need an actual small Facebook FastText model saved in bin format and make sure that we have model.wv.syn0_ngrams.shape[0] <= self.bucket.
Related to case #1787.