-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Python 3.12 to the CI and update Ubuntu #1701
Add Python 3.12 to the CI and update Ubuntu #1701
Conversation
7da95f4
to
dccae90
Compare
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## releases/1.10.0 #1701 +/- ##
================================================
Coverage 81.14% 81.14%
================================================
Files 284 284
Lines 32891 32891
Branches 5299 5299
================================================
+ Hits 26688 26690 +2
+ Misses 4751 4750 -1
+ Partials 1452 1451 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
* Also replace torchtext with tokenizers library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR. I only have some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat out of scope of this PR, but my concern with this implementation is that it may be unclear to users when to pass just the tokenizer, and when to pass both a tokenizer and a vocab. I think a proper docstring would address this.
Adding the complete dosctring is out of scope, but maybe you could add a brief docstring, or comment, explaining that the vocab arg is not needed when the tokenizer already returns the token ids instead of the raw tokens, and that this depends on which NLP framework is used. Just a brief 1-2 sentence would be sufficient. It's mainly to help whoever adds the full docstring for this class in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done but I think that the construction of those classes needs to be refactored to be simplified. The parameters of the to_framework
function depends on which framework is used which makes it difficult to document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I fully agree. We'll address this at a later point.
Summary
This PR adds support for Python 3.12 in the project configuration and CI. I have left the min Python version unchanged (3.9).
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.