Fix 1779 #1843

saroufimc1 · 2018-01-17T22:24:08Z

Addressing issue #1779. Correct the assignment of model.wv.syn0_ngrams after trimming of unused ngrams (model.wv.syn0_ngrams.shape[0] <= self.bucket).

It is 2x slower on pre-trained Wikipedia models from Facebook's fastText than if we do not trim unused ngrams. Therefore, I recommend dealing with #1261.

For the test case, we need an actual small Facebook FastText model saved in bin format and make sure that we have model.wv.syn0_ngrams.shape[0] <= self.bucket.

Related to case #1787.

menshikh-iv

Thanks @saroufimc1.

Need to merge #1777 (refactoring of *2vec API) and fix this code after.

@manneshiva please have a look!

menshikh-iv · 2018-01-19T08:47:33Z

gensim/models/wrappers/fasttext.py

@@ -1,8 +1,8 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 #
-# Author: Jayant Jain <[email protected]>


Why you remove this?

Wasn't sure about the header. Feel free to take the code and do whatever you want with it.

menshikh-iv · 2018-02-05T10:52:46Z

@saroufimc1 #1777 merged, please resolve merge conflict first

menshikh-iv · 2018-02-14T09:35:01Z

Ping @saroufimc1, when you plan to finish this PR?

menshikh-iv · 2018-02-16T06:12:05Z

Unfortunately, you merged conflict incorrectly (look at current state of gensim/models/wrappers/fasttext.py from develop), your code edition should be in https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/deprecated/fasttext_wrapper.py

Are you planning to finish PR?

saroufimc1 · 2018-02-16T07:10:05Z

@menshikh-iv Sorry about that, I am not familiar with all the changes of the source code that happened in the meantime. Is it possible that you take my code and do the merging yourself since you are more familiar with the changes? It would be much faster I guess.

By the way, I am still not convinced that the way to remove all subwords not used etc. This slows down the loading a lot.

menshikh-iv · 2018-02-19T04:52:33Z

@saroufimc1 no problem, I close this with "almost complete" label, we will return to this later and fix it, thanks!

CC: @manneshiva

saroufimc1 added 2 commits December 14, 2017 17:02

Bug Fix 1771

5a61594

Bug Fix 1771

e5adf87

menshikh-iv reviewed Jan 19, 2018

View reviewed changes

Merge branch 'develop' into fix_1771

746ccaa

menshikh-iv added the almost complete label Feb 19, 2018

menshikh-iv closed this Feb 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 1779 #1843

Fix 1779 #1843

saroufimc1 commented Jan 17, 2018

menshikh-iv left a comment

menshikh-iv Jan 19, 2018

saroufimc1 Feb 16, 2018

menshikh-iv commented Feb 5, 2018

menshikh-iv commented Feb 14, 2018

menshikh-iv commented Feb 16, 2018

saroufimc1 commented Feb 16, 2018 •

edited

Loading

menshikh-iv commented Feb 19, 2018

Fix 1779 #1843

Fix 1779 #1843

Conversation

saroufimc1 commented Jan 17, 2018

menshikh-iv left a comment

Choose a reason for hiding this comment

menshikh-iv Jan 19, 2018

Choose a reason for hiding this comment

saroufimc1 Feb 16, 2018

Choose a reason for hiding this comment

menshikh-iv commented Feb 5, 2018

menshikh-iv commented Feb 14, 2018

menshikh-iv commented Feb 16, 2018

saroufimc1 commented Feb 16, 2018 • edited Loading

menshikh-iv commented Feb 19, 2018

saroufimc1 commented Feb 16, 2018 •

edited

Loading