Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPO] add 'bco_pair' loss_type #1524

Merged
merged 2 commits into from
Apr 22, 2024
Merged

[DPO] add 'bco_pair' loss_type #1524

merged 2 commits into from
Apr 22, 2024

Conversation

seanexp
Copy link
Contributor

@seanexp seanexp commented Apr 11, 2024

add Binary Classifier Optimization (BCO) loss function from https://arxiv.org/abs/2404.04656

Implemented bce loss and reward shift in the paper.

I will make separate PR for unpaired version of BCO after rebase and polishing.

@kashif kashif self-requested a review April 12, 2024 09:59
@kashif
Copy link
Collaborator

kashif commented Apr 12, 2024

thanks @seanexp perhaps lets also add some description in the DPO docs too?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@seanexp
Copy link
Contributor Author

seanexp commented Apr 12, 2024

@kashif

Ah yes! I'll work on it.

@seanexp
Copy link
Contributor Author

seanexp commented Apr 12, 2024

@kashif

Just added BCO description. 5439c90

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this addition !

@younesbelkada younesbelkada merged commit c050ebc into huggingface:main Apr 22, 2024
9 checks passed
kashif pushed a commit to kashif/trl that referenced this pull request Apr 23, 2024
* add 'bco_pair' loss_type

* add BCO description to DPO doc

---------

Co-authored-by: sean.jung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants