Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Phraser memory usage (drop frequencies) #2208

Merged
merged 12 commits into from
Jan 11, 2019
12 changes: 6 additions & 6 deletions gensim/models/phrases.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,12 +209,12 @@ def load(cls, *args, **kwargs):
model = super(PhrasesTransformation, cls).load(*args, **kwargs)
# update older models
# if value in phrasegrams dict is a tuple, load only the scores.
try:
for components, scores in model.__dict__['phrasegrams'].items():
if isinstance(scores, tuple):
model.__dict__['phrasegrams'][components] = scores[1]
except KeyError:
pass
if model.phrasegrams:
components = model.phrasegrams.keys()
for component in components:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are better merged into one (so the temporary variable is released as soon as it's not needed).

score = model.phrasegrams[component]
if isinstance(score, tuple):
model.phrasegrams[component] = score[1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserves a comment: what is score[1]?

Or even better, unroll the tuple into properly named variables (x, y, z = score) and then assign that.


# if no scoring parameter, use default scoring
if not hasattr(model, 'scoring'):
Expand Down