Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RPO, DPOP losses, add lambda_dpop to basic DPO loss #2035

Closed
wants to merge 0 commits into from

Conversation

krammnic
Copy link
Contributor

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

Copy link

pytorch-bot bot commented Nov 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2035

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit aa8f365 with merge base aa8f365 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2024
@krammnic
Copy link
Contributor Author

Few notes: I've added DPOP as separate loss and penalty to DPO which is not activated by default. RPO, as separate loss. Also I've mentioned minimum penalty weight coefficient value as 1e-5 as results of empirical experiments and I did not add check on this, because it might produce UB.

@krammnic
Copy link
Contributor Author

Would love to get review here.

@RdoubleA
Copy link
Contributor

Hey @krammnic, thanks for opening this PR. We've discussed a bit in general about defining better criteria for adding new features to the repo and I wanted to share it with you. We want to encourage the community to add new features but at the same time be selective in only the features that are most impactful and necessary in the field of fine-tuning. That being said, we arrived at two criteria:

  1. Correctness. We'd want to see a reference implementation either in another fine-tuning library or an official research repo. This is to ensure we do not unintentionally onboard a feature that is broken, as the maintainers are not all experts in every new feature and can verify the implementation. We've had this recently happen with IPOLoss, which we've had to remove because it was not correct. Removing a broken feature is worst case scenario and we'd like to avoid that.
  2. Prevalence. This one is a bit harder to gauge but one quantifiable metric could be citations on the paper with the method. Or perhaps a separate repo that is gaining popularity. Again, this is to filter out features that will not maintain relevance.

With those in mind, I'd like to hold off on this PR for now. I know in the past we haven't put a lot of consideration into the new features we add but this is something we'd like to rectify. This one is a bit on me since I posted the issue.

We're working on a better process for novel features because we value contributors' time and we don't want you to spend effort to put up a PR when there hasn't been alignment to add a feature. So we're thinking of:

  1. Discuss a new feature on an issue first, get explicit approval from a maintainer, before opening a PR
  2. Encourage and outline how contributors can create example repos or host our own for cases such as DPOP that we still need to understand it's importance before onboarding.

Open to any thoughts you may have, and as always you've been a valuable contributor @krammnic, don't want to discourage you from continuing to do so :)

@krammnic
Copy link
Contributor Author

@RdoubleA Thanks for the answer! I see your points. Speaking about correctness, probably I can find some reference implementations and compare, also I'm open to do several full runs with this new method. I agree that definitely we should speak on structure of RLHF module in general before merge such PRs and here I'm interested in some comments of @salman probably or we can discuss this points offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants