Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Doc2Vec documentation: how tags are assigned in corpus_file mode #2320

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,7 +487,8 @@ def __init__(self, documents=None, corpus_file=None, dm_mean=None, dm=1, dbow_wo
corpus_file : str, optional
Path to a corpus file in :class:`~gensim.models.word2vec.LineSentence` format.
You may use this argument instead of `sentences` to get performance boost. Only one of `sentences` or
`corpus_file` arguments need to be passed (or none of them).
`corpus_file` arguments need to be passed (or none of them). Documents' tags are assigned automatically
and are equal to line number, as in :class:`~gensim.models.doc2vec.TaggedLineDocument`.
dm : {1,0}, optional
Defines the training algorithm. If `dm=1`, 'distributed memory' (PV-DM) is used.
Otherwise, `distributed bag of words` (PV-DBOW) is employed.
Expand Down Expand Up @@ -761,7 +762,8 @@ def train(self, documents=None, corpus_file=None, total_examples=None, total_wor
corpus_file : str, optional
Path to a corpus file in :class:`~gensim.models.word2vec.LineSentence` format.
You may use this argument instead of `sentences` to get performance boost. Only one of `sentences` or
`corpus_file` arguments need to be passed (not both of them).
`corpus_file` arguments need to be passed (not both of them). Documents' tags are assigned automatically
and are equal to line number, as in :class:`~gensim.models.doc2vec.TaggedLineDocument`.
total_examples : int, optional
Count of sentences.
total_words : int, optional
Expand Down Expand Up @@ -1140,7 +1142,8 @@ def build_vocab(self, documents=None, corpus_file=None, update=False, progress_p
corpus_file : str, optional
Path to a corpus file in :class:`~gensim.models.word2vec.LineSentence` format.
You may use this argument instead of `sentences` to get performance boost. Only one of `sentences` or
`corpus_file` arguments need to be passed (not both of them).
`corpus_file` arguments need to be passed (not both of them). Documents' tags are assigned automatically
and are equal to a line number, as in :class:`~gensim.models.doc2vec.TaggedLineDocument`.
update : bool
If true, the new words in `sentences` will be added to model's vocab.
progress_per : int
Expand Down