Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build_vocab to poincare model #2505

Merged
merged 18 commits into from
Jul 7, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions gensim/models/poincare.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,34 @@ def __init__(self, train_data, size=50, alpha=0.1, negative=10, workers=1, epsil
self._loss_grad = None
self.build_vocab(train_data)

def build_vocab(self, relations=None, update=False):
"""Load relations from the train data and build vocab."""
def build_vocab(self, relations, update=False):
"""Build vocabulary from a relations.
Each relations must be a tuples of unicode strings.

Parameters
----------
relations : list of tuples
List of tuples of positive examples of the form (node_1_index, node_2_index).
update : bool
If true, the new nodes in `relations` will be added to model's vocab.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But relations doesn't contain nodes. Its description above says it contains "node indexes" (btw where does the user find those?).

Also, what happens if False (the default)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the description of relations using init description.

But relations doesn't contain nodes. Its description above says it contains "node indexes" (btw where does the user find those?).

If update=False, the embeddings are initialized by random values.
(It means that the trained embeddings are cleaned.)

Also, what happens if False (the default)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't really understand that, but that information should appear in the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added update=False description in build_vocab 👍


Examples
--------
Train a model and update vocab for online training:

.. sourcecode:: pycon

>>> from gensim.models.poincare import PoincareModel
>>> relations_1 = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal')]
>>> relations_2 = [('striped_skunk', 'mammal')]
>>>
>>> model = PoincareModel(relations_1, negative=1)
>>> model.train(epochs=50)
>>>
>>> model.build_vocab(relations_2, update=True)
>>> model.train(epochs=50)

"""
old_index2word_len = len(self.kv.index2word)

logger.info("loading relations from train data..")
Expand Down