-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Trainer and Args to mention AdamW, not Adam. #9685
Fix Trainer and Args to mention AdamW, not Adam. #9685
Conversation
Thanks for opening the PR. As it stands this would break every existing script leveraging the parameters defined, so renaming the parameters is probably not the way to go. @sgugger, your insight on this would be very welcome. |
There is no reason to change all the names of the parameters indeed, and it would be a too-heavy breaking change. |
Hi @LysandreJik @sgugger. Thanks for your comments, I'll be changing the variables back. I apologize if this is too silly a question, but how can I run and see how the docs look on a browser after the changes? |
You should check this page for all the information on generating/writing the documentation :-) |
I have updated it and also added that by default, the weight decay is applied to all layers except bias and LayerNorm weights while training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me this way, thanks!
@sgugger My code passed only 3 out of 12 checks, I was unable to run CirlceCI properly. Can you point out the reasons why this happened? |
We are trying to get support from them to understand why, but the checks on your PR were all cancelled. I couldn't retrigger them from our interface either. |
Hi @sgugger I believe a possible reason could be that I followed I'm not sure how CircleCI works, so this is just a wild guess. |
This PR fixed the issue with Docs and labels in Trainer and TrainingArguments Class for AdamW, current version mentions adam in several places.
Fixes #9628
The Trainer class in
trainer.py
uses AdamW as the default optimizer. The TrainingArguments class mentions it as Adam in the documentation, which was confusing.I have also changed variable names to
adamw_beta1
,adamw_beta2
,adamw_epsilon
intrainer.py
.Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
@LysandreJik