-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RPO, DPOP losses, add lambda_dpop to basic DPO loss #2035
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2035
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit aa8f365 with merge base aa8f365 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Few notes: I've added DPOP as separate loss and penalty to DPO which is not activated by default. RPO, as separate loss. Also I've mentioned minimum penalty weight coefficient value as 1e-5 as results of empirical experiments and I did not add check on this, because it might produce UB. |
Would love to get review here. |
Hey @krammnic, thanks for opening this PR. We've discussed a bit in general about defining better criteria for adding new features to the repo and I wanted to share it with you. We want to encourage the community to add new features but at the same time be selective in only the features that are most impactful and necessary in the field of fine-tuning. That being said, we arrived at two criteria:
With those in mind, I'd like to hold off on this PR for now. I know in the past we haven't put a lot of consideration into the new features we add but this is something we'd like to rectify. This one is a bit on me since I posted the issue. We're working on a better process for novel features because we value contributors' time and we don't want you to spend effort to put up a PR when there hasn't been alignment to add a feature. So we're thinking of:
Open to any thoughts you may have, and as always you've been a valuable contributor @krammnic, don't want to discourage you from continuing to do so :) |
@RdoubleA Thanks for the answer! I see your points. Speaking about correctness, probably I can find some reference implementations and compare, also I'm open to do several full runs with this new method. I agree that definitely we should speak on structure of RLHF module in general before merge such PRs and here I'm interested in some comments of @salman probably or we can discuss this points offline. |
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses.
Changelog
What are the changes made in this PR?
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example