-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BPE tokenizer #389
BPE tokenizer #389
Conversation
Thanks! Will take a look! One note, it might be nice to add Jesse as a co-author on the commit, he did some incredible work on this and we should make sure to credit it. |
@mattdangerw Definitely! Will add in the next commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Took a high level pass, left some suggestions for overall readability (particularly splitting out a util file for the BPE algo itself).
23d6eb0
to
205efa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Left some docstring comments to fix up. Still plenty of follow ups to do, but I think landing and iterating is a good plan.
Let's make sure to fix the co-author so github properly registers this.
Let's also open some follow up issues once this is in:
- Any outstanding places we know our output differs should for sure be tracked in an issue.
- An issue for removing the
tf.function
we shouldn't force compile when running eagerly. - An issue for adding some testing that does not require network (and works off small simplified vocabs).
Thanks!
Add more test cases. Co-authored-by: jessechancy <[email protected]> add merge file Make cache a tf module Delete testdata address comments address comments fix docstring fix docstring
883dcd6
to
2cc9977
Compare
Add more test cases. Co-authored-by: jessechancy <[email protected]> add merge file Make cache a tf module Delete testdata address comments address comments fix docstring fix docstring
Add more test cases. Co-authored-by: jessechancy <[email protected]> add merge file Make cache a tf module Delete testdata address comments address comments fix docstring fix docstring
This PR is a rework on #303.
Recreate the PR instead of direct editing for clear remote-local tracking.