[`KTOTrainer`] add BCO (reward shift and underlying distribution matching) #1599

seanexp · 2024-04-29T16:16:22Z

add Binary Classifier Optimization (BCO) loss function from https://arxiv.org/abs/2404.04656

Implemented BCE loss, reward shift and underlying distribution matching.

Also added example script at examples/scripts/bco.py

HuggingFaceDocBuilderDev · 2024-04-30T07:41:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kashif · 2024-04-30T07:44:06Z

@seanexp can you kindly run:

pre-commit run --all-file

in the root folder of TRL to clean up the formatting?

seanexp · 2024-04-30T07:50:08Z

@kashif

My bad. Applied pre-commit at a9a7a9c

kashif · 2024-04-30T07:57:43Z

thanks @seanexp one questions, would it make sense in the example script to have an option to try with different embedding models? or are things hard-coded for nomic-embed?

kashif · 2024-04-30T08:02:50Z

@seanexp you might need to add scikit-learn dep here: https://github.com/huggingface/trl/blob/main/setup.py#L72

kashif · 2024-04-30T08:12:49Z

oh sorry @seanexp i thought the scikit-learn is in the tests but its in the main trainer... so we need to add that as a regular dependency! 🙇🏽

seanexp · 2024-04-30T08:22:14Z

@kashif

I found that adding scikit-learn is redundant as it is a required package of transformers.

Should I still add scikit-learn as a regular dependency?

kashif · 2024-04-30T08:30:03Z

@seanexp yeah lets remove the dep from the test then... you can see here that even though transformers is installed, the tests fail: https://github.com/huggingface/trl/actions/runs/8891165233/job/24412685192#step:5:2843

seanexp · 2024-04-30T08:40:06Z

Seems like the dep in the test is necessary. Without the dep the test fails. Shall we leave as it is? @kashif

kashif · 2024-04-30T09:23:19Z

@seanexp ok lets leave the dependency in the tests and add a helper here: https://github.com/huggingface/trl/blob/main/trl/import_utils.py for scikit-learn and then in the trainer we can check if its available when BCO is the loss type and if not, we can ask that they install it via pip instal ...

kashif · 2024-04-30T09:25:20Z

@seanexp something like this in the KTOConfig: https://github.com/huggingface/trl/blob/main/trl/trainer/ddpo_config.py#L116-L120

seanexp · 2024-04-30T10:44:54Z

@kashif

done at ddbbbdf :)

seanexp · 2024-04-30T10:50:04Z

thanks @seanexp one questions, would it make sense in the example script to have an option to try with different embedding models? or are things hard-coded for nomic-embed?

Users can try different embedding models. Please note that users have to modify embedding_func (and embed_prompt) accordingly as embedding models ask different post processing methods (e.g. mean/max pooling, using only subset of features) @kashif

kashif

LGTM!

younesbelkada

Thanks so much for this great contribution @seanexp ! Overall looks very clean, some CI is failing though:

=========================== short test summary info ============================
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_lora_save - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_0_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_1_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_2_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_3_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_4_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_5_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_6_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_7_gpt2 - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_bco_udm - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_without_providing_ref_model - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_kto_trainer_without_providing_ref_model_with_lora - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_kto_trainer.py::KTOTrainerTester::test_tokenize_and_process_tokens - AttributeError: 'NoneType' object has no attribute 'gradient_accumulation_kwargs'
FAILED tests/test_cli.py::test_sft_cli - AssertionError: An error occured while running the CLI, please double check

Can you have a look before we merge it? 🙏 Thanks !

seanexp · 2024-04-30T11:27:08Z

@younesbelkada

hmm... I'll take a look few hours later!

seanexp · 2024-04-30T11:55:49Z

Not calling super().__post_init__() was the problem. Fixed at 5c58065

The tests now run properly on my machine :) @younesbelkada

kashif · 2024-04-30T11:56:13Z

good catch @seanexp 🥇

younesbelkada

Woah great catch ! Thanks again for adding this nice feature !

seanexp added 7 commits April 28, 2024 11:04

add Loss Functions section in the doc.

2524f12

add bce loss with reward shift in KTOTrainer

5113775

add underlying distribution matching

4e5267f

update example to use underlying distribution matching

aaa99cf

add config description

c7d923b

fix 'referenced before assignment' error

21dbb43

add 'bco' and 'udm' test cases

2bb0440

kashif approved these changes Apr 30, 2024

View reviewed changes

run pre-commit

a9a7a9c

add scikit-learn dependency

3eec112

raise error is sklearn is not available

ddbbbdf

kashif approved these changes Apr 30, 2024

View reviewed changes

younesbelkada reviewed Apr 30, 2024

View reviewed changes

call TrainingArguments().__post_init__() for proper init

5c58065

younesbelkada approved these changes Apr 30, 2024

View reviewed changes

younesbelkada merged commit d1aa0b6 into huggingface:main Apr 30, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`KTOTrainer`] add BCO (reward shift and underlying distribution matching) #1599

[`KTOTrainer`] add BCO (reward shift and underlying distribution matching) #1599

seanexp commented Apr 29, 2024

HuggingFaceDocBuilderDev commented Apr 30, 2024

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

kashif commented Apr 30, 2024

kashif commented Apr 30, 2024

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

kashif commented Apr 30, 2024 •

edited

Loading

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

seanexp commented Apr 30, 2024 •

edited

Loading

kashif left a comment

younesbelkada left a comment

seanexp commented Apr 30, 2024

seanexp commented Apr 30, 2024 •

edited

Loading

kashif commented Apr 30, 2024

younesbelkada left a comment

[KTOTrainer] add BCO (reward shift and underlying distribution matching) #1599

[KTOTrainer] add BCO (reward shift and underlying distribution matching) #1599

Conversation

seanexp commented Apr 29, 2024

HuggingFaceDocBuilderDev commented Apr 30, 2024

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

kashif commented Apr 30, 2024

kashif commented Apr 30, 2024

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

kashif commented Apr 30, 2024 • edited Loading

kashif commented Apr 30, 2024

seanexp commented Apr 30, 2024

seanexp commented Apr 30, 2024 • edited Loading

kashif left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

seanexp commented Apr 30, 2024

seanexp commented Apr 30, 2024 • edited Loading

kashif commented Apr 30, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

[`KTOTrainer`] add BCO (reward shift and underlying distribution matching) #1599

[`KTOTrainer`] add BCO (reward shift and underlying distribution matching) #1599

kashif commented Apr 30, 2024 •

edited

Loading

seanexp commented Apr 30, 2024 •

edited

Loading

seanexp commented Apr 30, 2024 •

edited

Loading