Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KTOTrainer] add BCO (reward shift and underlying distribution matching) #1599

Merged
merged 11 commits into from
Apr 30, 2024
Merged

[KTOTrainer] add BCO (reward shift and underlying distribution matching) #1599

merged 11 commits into from
Apr 30, 2024

Conversation

seanexp
Copy link
Contributor

@seanexp seanexp commented Apr 29, 2024

add Binary Classifier Optimization (BCO) loss function from https://arxiv.org/abs/2404.04656

Implemented BCE loss, reward shift and underlying distribution matching.

Also added example script at examples/scripts/bco.py

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

@seanexp can you kindly run:

pre-commit run --all-file

in the root folder of TRL to clean up the formatting?

@seanexp
Copy link
Contributor Author

seanexp commented Apr 30, 2024

@kashif

My bad. Applied pre-commit at a9a7a9c

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

thanks @seanexp one questions, would it make sense in the example script to have an option to try with different embedding models? or are things hard-coded for nomic-embed?

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

@seanexp you might need to add scikit-learn dep here: https://github.com/huggingface/trl/blob/main/setup.py#L72

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

oh sorry @seanexp i thought the scikit-learn is in the tests but its in the main trainer... so we need to add that as a regular dependency! 🙇🏽

@seanexp
Copy link
Contributor Author

seanexp commented Apr 30, 2024

@kashif

I found that adding scikit-learn is redundant as it is a required package of transformers.

Should I still add scikit-learn as a regular dependency?

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

@seanexp yeah lets remove the dep from the test then... you can see here that even though transformers is installed, the tests fail: https://github.com/huggingface/trl/actions/runs/8891165233/job/24412685192#step:5:2843

@seanexp
Copy link
Contributor Author

seanexp commented Apr 30, 2024

Seems like the dep in the test is necessary. Without the dep the test fails. Shall we leave as it is? @kashif

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

@seanexp ok lets leave the dependency in the tests and add a helper here: https://github.com/huggingface/trl/blob/main/trl/import_utils.py for scikit-learn and then in the trainer we can check if its available when BCO is the loss type and if not, we can ask that they install it via pip instal ...

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

@seanexp something like this in the KTOConfig: https://github.com/huggingface/trl/blob/main/trl/trainer/ddpo_config.py#L116-L120

@seanexp
Copy link
Contributor Author

seanexp commented Apr 30, 2024

@kashif

done at ddbbbdf :)

@seanexp
Copy link
Contributor Author

seanexp commented Apr 30, 2024

thanks @seanexp one questions, would it make sense in the example script to have an option to try with different embedding models? or are things hard-coded for nomic-embed?

Users can try different embedding models. Please note that users have to modify embedding_func (and embed_prompt) accordingly as embedding models ask different post processing methods (e.g. mean/max pooling, using only subset of features) @kashif

Copy link
Collaborator

@kashif kashif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this great contribution @seanexp ! Overall looks very clean, some CI is failing though:

=========================== short test summary info ============================
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_lora_save - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_0_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_1_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_2_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_3_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_4_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_5_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_6_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_7_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_bco_udm - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_without_providing_ref_model - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_without_providing_ref_model_with_lora - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_tokenize_and_process_tokens - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_cli.py::test_sft_cli - AssertionError: An error occured while running the CLI, please double check

Can you have a look before we merge it? 🙏 Thanks !

@seanexp
Copy link
Contributor Author

seanexp commented Apr 30, 2024

@younesbelkada

hmm... I'll take a look few hours later!

@seanexp
Copy link
Contributor Author

seanexp commented Apr 30, 2024

Not calling super().__post_init__() was the problem. Fixed at 5c58065

The tests now run properly on my machine :) @younesbelkada

@kashif
Copy link
Collaborator

kashif commented Apr 30, 2024

good catch @seanexp 🥇

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah great catch ! Thanks again for adding this nice feature !

@younesbelkada younesbelkada merged commit d1aa0b6 into huggingface:main Apr 30, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants