-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPO] DPOConfig class #1554
[DPO] DPOConfig class #1554
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactor and new config @kashif ! Overall LGTM with some nits about moving all args from the trainer init to the config in the script example
As with the SFT trainer, it would be good to have a small integration test that checks we have the same trainer if it is initialised in both ways (via args
and via init) to ensure backwards compat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating @kashif ! I left one turbo nit and then this can be merged after addressing it :)
Co-authored-by: lewtun <[email protected]>
Co-authored-by: lewtun <[email protected]>
Co-authored-by: lewtun <[email protected]>
Add a DPOConfig training argument dataclass together with deprecation warnings to the DPOTrainer