Add RPO, DPOP losses, add lambda_dpop to basic DPO loss #2035

krammnic · 2024-11-20T13:40:19Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Add NLL and DPOP weighting to DPO losses #2032

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2024-11-20T13:40:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2035

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit aa8f365 with merge base aa8f365 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Regression Tests / regression_test (3.11, nightly) (gh) (trunk failure)
tests/regression_tests/test_llama2_7b.py::TestLoRA7BDistributedFinetuneEval::test_finetune_and_eval
Regression Tests / regression_test (3.11, stable) (gh) (trunk failure)
tests/regression_tests/test_llama2_7b.py::TestLoRA7BDistributedFinetuneEval::test_finetune_and_eval

This comment was automatically generated by Dr. CI and updates every 15 minutes.

krammnic · 2024-11-20T13:45:25Z

Few notes: I've added DPOP as separate loss and penalty to DPO which is not activated by default. RPO, as separate loss. Also I've mentioned minimum penalty weight coefficient value as 1e-5 as results of empirical experiments and I did not add check on this, because it might produce UB.

krammnic · 2024-11-21T07:50:01Z

Would love to get review here.

RdoubleA · 2024-11-24T23:14:56Z

Hey @krammnic, thanks for opening this PR. We've discussed a bit in general about defining better criteria for adding new features to the repo and I wanted to share it with you. We want to encourage the community to add new features but at the same time be selective in only the features that are most impactful and necessary in the field of fine-tuning. That being said, we arrived at two criteria:

Correctness. We'd want to see a reference implementation either in another fine-tuning library or an official research repo. This is to ensure we do not unintentionally onboard a feature that is broken, as the maintainers are not all experts in every new feature and can verify the implementation. We've had this recently happen with IPOLoss, which we've had to remove because it was not correct. Removing a broken feature is worst case scenario and we'd like to avoid that.
Prevalence. This one is a bit harder to gauge but one quantifiable metric could be citations on the paper with the method. Or perhaps a separate repo that is gaining popularity. Again, this is to filter out features that will not maintain relevance.

With those in mind, I'd like to hold off on this PR for now. I know in the past we haven't put a lot of consideration into the new features we add but this is something we'd like to rectify. This one is a bit on me since I posted the issue.

We're working on a better process for novel features because we value contributors' time and we don't want you to spend effort to put up a PR when there hasn't been alignment to add a feature. So we're thinking of:

Discuss a new feature on an issue first, get explicit approval from a maintainer, before opening a PR
Encourage and outline how contributors can create example repos or host our own for cases such as DPOP that we still need to understand it's importance before onboarding.

Open to any thoughts you may have, and as always you've been a valuable contributor @krammnic, don't want to discourage you from continuing to do so :)

krammnic · 2024-11-25T06:17:29Z

@RdoubleA Thanks for the answer! I see your points. Speaking about correctness, probably I can find some reference implementations and compare, also I'm open to do several full runs with this new method. I agree that definitely we should speak on structure of RLHF module in general before merge such PRs and here I'm interested in some comments of @salman probably or we can discuss this points offline.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2024

krammnic closed this Dec 21, 2024

krammnic force-pushed the main branch from e883a14 to aa8f365 Compare December 21, 2024 23:19

krammnic mentioned this pull request Dec 25, 2024

Custom losses redesign in alignment section #2206

Open

krammnic mentioned this pull request Feb 9, 2025

Custom DPO losses support #2292

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RPO, DPOP losses, add lambda_dpop to basic DPO loss #2035

Add RPO, DPOP losses, add lambda_dpop to basic DPO loss #2035

krammnic commented Nov 20, 2024

pytorch-bot bot commented Nov 20, 2024 •

edited

Loading

krammnic commented Nov 20, 2024

krammnic commented Nov 21, 2024

RdoubleA commented Nov 24, 2024

krammnic commented Nov 25, 2024

Add RPO, DPOP losses, add lambda_dpop to basic DPO loss #2035

Add RPO, DPOP losses, add lambda_dpop to basic DPO loss #2035

Conversation

krammnic commented Nov 20, 2024

Context

Changelog

Test plan

UX

pytorch-bot bot commented Nov 20, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2035

✅ You can merge normally! (2 Unrelated Failures)

krammnic commented Nov 20, 2024

krammnic commented Nov 21, 2024

RdoubleA commented Nov 24, 2024

krammnic commented Nov 25, 2024

pytorch-bot bot commented Nov 20, 2024 •

edited

Loading