-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem while passing wlocal function in tfidf model #2444
Comments
@Witiko could you please have a look? Many thanks. |
@piskvorky This appears to be a regression from 0bfb9da, which is part of #1791 (Gensim 3.3.0). Since then, One way to regain the original behaviour would be to use An alternate solution would be to document the new behavior, although this is arguably a breaking change. |
I don't really understand what that means, what are these corners @xmedved1 pointed out in his fix. What are the actual code contracts and invariants now? @Witiko Can you please suggest a fix to the documentation (+a unit test to capture any future regressions), if this is indeed a problem with documentation? CC @markroxor as author of #1791. TFIDF is a simple model, I'm strongly -1 on introducing any bloat or complexity that obscures its basic use-cases because of some obscure useful-maybe-if options. |
@xmedved1's fix and changing I agree with @xmedved1 that it would be better to document rather than fix this, since a As for the update of the documentation, changing the current text:
to the following:
should clear any misunderstanding. However, it could be considered a breaking change, since it is a departure from the behaviour of Gensim before 3.3.0. Semver and consideration for other developers dictate that a breaking change should be postponed until Gensim 4. A unit test can consist of a simple |
Perhaps we can show one of the SMART functions as an example:
Currently, @piskvorky Would you like me to make the above changes in #2420? |
Good practical examples are definitely preferable to (though not precluding) complex type descriptions. My view here is entirely pragmatic: I don't understand myself how I should to make use of this functionality. This hints at a missing tutorial / separate "example usage" section (with focus on motivation and context, not just parameter types), rather than only extending a function docstring here and there. More tests are always welcome, yes. Thanks. |
Problem description
I tried to use math.sqrt function on term frequency when computing TF-IDF model as you declare in script documentation:
========================================
Gensim implementation:
Error:
======================================
Fix if I want to use math.sqrt:
==========================================
gensim version 3.7.2
==========================================
I don't know if this is a problem to fix, because in Gensim implementation I can do some normalization that includes all items in list (something like tf = n_i_j / sum(n_k_j) where i,k is token and i!=k and j is the document –> this is not allowed in my fix). So I thing the problem is the documentation of wlocal parameter.
Best regards M
The text was updated successfully, but these errors were encountered: