Phraser requires unnecessary memory #2189
Labels
difficulty easy
Easy issue: required small fix
feature
Issue described a new feature
Hacktoberfest
Issues marked for hacktoberfest
performance
Issue related to performance (in HW meaning)
Currently,
Phraser
objects (= the trimmed-down version of the full bigram finderPhrases
) contains the actual bigrams in an internal attribute called phrasegrams. This is the biggest and most memory-intense part of aPhraser
object.phrasegrams
is a dict of{tuple of strings => (frequency [int], score [float])}
. But theint
(the frequency count of that particular bigram) is unused. This means we're constructing that int, plus the wrapping tuple, for no good reason, inflating the necessary RAM. See also mailing list discussion.Task:
int
from Phraser values, leaving only thefloat
..vocab
attribute of Phrases to something more appropriate, for examplebigram_counts
.The text was updated successfully, but these errors were encountered: