Doc2vec fails to train when using build_vocab_from_freq() #2083
Labels
bug
Issue described a bug
difficulty medium
Medium issue: required good gensim understanding & python skills
Description
I have a Doc2Vec model trained using the
build_vocab_from_file()
function. This is so I can include a<PAD>
token manually at index 0. This token does not appears in the original dataset, but is needed further down my program.Steps/Code/Corpus to Reproduce
Here is a simple example of of what I am trying to achieve:
Expected Results
Expected size of
model.docvecs.count
is 3 (not 0).Actual Results
Actual size of
model.docvecs.count
is 0print(model.docvecs.count)
-> 0Versions
Linux-3.19.0-82-generic-x86_64-with-Ubuntu-15.04-vivid
('Python', '2.7.9 (default, Apr 2 2015, 15:33:21) \n[GCC 4.9.2]')
('NumPy', '1.14.3')
('SciPy', '1.1.0')
('gensim', '3.4.0')
('FAST_VERSION', 1)
Now my questions are:
build_vocab_from_freq()
to get a valid model?The text was updated successfully, but these errors were encountered: