Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TopP, TopK and Beam samplers #652
Implement TopP, TopK and Beam samplers #652
Changes from 12 commits
26fd509
7e4c651
28bcfe1
9757f4d
f7508cb
b658b61
76c430c
bb430dd
afd3082
273a6a5
31ad970
de2ac9c
950ad43
631afe4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenmoneygithub I really don't think these arg blocks match the rest of the library's style. I don't think there's a perfect answer but I'd prefer it not to be obvious who wrote what 🥗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I checked our code base, it appears we have a few code having this style:
I feel like giving these numbers an explicit meaning makes the code longer but kinda easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a totally valid opinion @chenmoneygithub but it's also important for the code to have a unified style rather than each contributor producing different looking code. If there's an example that's unclear without named params I'm open to trying something different but otherwise I'm hoping we can compromise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking steps towards a more unified style (and then reflecting that in our style guide) sgtm. What are the main places this differs, besides the constants at the top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically what I asked for in #658 was the current thought. A bit less script-like and a little more "drop this line in colab and see what we're talking about".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see us replacing the "model" with something like
tf.random.uniform(shape, minval=-1, maxval=1)
. It is kind of weird to me that we show a whole model, that is trainable but randomly initialized (so results will be random anyway), and not even sequence aware so would never really perform even if your trained it. For a new user this seems a bit of a red herring.Would be more concise to do something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the model part so that the example is closer to real use cases.
I will unify the docstring to move those hypers inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the model falls into an "uncanny valley" of code examples. It's not something that will actually work, yet also not clearly a random dummy data. As a newbie I worry I would not understand, first, that results will be random, and second, that this is a "bad model" for the task.
Fine to merge as is, but I hope we can play around with some improvements down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return immediately