[KTO] fix interleaving, reporting, hanging bugs #1499

kawine · 2024-04-01T01:09:09Z

remove interleaving in KTOTrainer since it is no longer needed for KL estimation and artificially duplicates data to force an even number of positive and negative examples (see [Question] desirable_weight and undesirable_weight in KTOTrainer #1467)
do not report statistics for good/bad data if there are zero good/bad examples in the current batch (currently NaNs are returned, which are sometimes converted to floats before being sent to wandb, leading to reward spikes (see https://twitter.com/_lewtun/status/1771493512262942801)
make same changes as in [KTO]: Fix nan losses and crashing job #1472 to prevent hanging but still aggregate metrics from across batch (instead of only reporting from main process)
fix arithmetic errors in metric aggregation introduced in [KTO] fix minor bugs with data loading and reporting #1476 when trying to patch the NaNs issue

cc @kashif @lewtun

…e batch_size losses

add reference to paper Co-authored-by: lewtun <[email protected]>

Co-authored-by: Kashif Rasul <[email protected]>

HuggingFaceDocBuilderDev · 2024-04-01T08:00:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kashif · 2024-04-02T10:53:50Z

@kawine are you disabling the WANDB on your side because of some local issue?

kawine · 2024-04-02T18:03:09Z

just removed the disabling of wandb

kawine · 2024-04-03T20:58:20Z

@kashif i reverted the changes to the tests -- are we good to merge?

PhilipMay · 2024-04-08T09:47:44Z

Hi @kawine and @kashif thanks for this PR and for merging it.

Nevertheless, I find it a bit unusual to take the fixes from an older PR from other contributors and then put them into an own much larger PR instead of actively helping out in the original PR. 🙁

see #1472

kashif · 2024-04-08T09:51:09Z

@PhilipMay appologies! yes it's my bad I believe... let me see how to attribute Clara properly.

claralp · 2024-04-08T10:02:05Z

@kashif I am mostly fine with the version implemented here now.
However, one minor issue still exists with this version:
When logging metrics after multiple iterations e.g. mean(mean(rewards/chosen in iteration1), mean(rewards/chosen in iteration2)) is not the overall mean, because the total number of chosen samples differs.
@kawine fixed this for different number of samples in one batch across multiple devices, but the same applies when averaging across multiple iterations

* add warning for imbalanced data * update documentation * update script commands to be same as in dpo * use batch_size KL examples and batch_size target examples to calculate batch_size losses * fix deepspeed issue * speed up forward with no_grad for KL * add some removed metrics * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py add reference to paper Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * add more detailed comments * convert assert to ValueError * Update kto_trainer.py * precommit formatting * remove nans in metrics by gathering across machines * fix formatting * fix choice of mismatched examples for KL term * describe weights * fix hanging issue in distributed training * linting * move metrics to cpu * Update trl/trainer/kto_trainer.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * fix tokenization error: lack of bos * change user warning for weight hyperparams * minor update to docs * reshape attention mask * reformat * add test for bos/eos tokens * move dependency location * Update tests/test_kto_trainer.py * don't report nan metrics * don't report nan metrics and remove data interleaving * fix bugs in calculating metrics * no need to gather KL term * minor changes * use nanmean for losses * remove disabling of wandb * revert changes --------- Co-authored-by: Clara Luise Pohland <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]>

* add warning for imbalanced data * update documentation * update script commands to be same as in dpo * use batch_size KL examples and batch_size target examples to calculate batch_size losses * fix deepspeed issue * speed up forward with no_grad for KL * add some removed metrics * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py add reference to paper Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * add more detailed comments * convert assert to ValueError * Update kto_trainer.py * precommit formatting * remove nans in metrics by gathering across machines * fix formatting * fix choice of mismatched examples for KL term * describe weights * fix hanging issue in distributed training * linting * move metrics to cpu * Update trl/trainer/kto_trainer.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * fix tokenization error: lack of bos * change user warning for weight hyperparams * minor update to docs * reshape attention mask * reformat * add test for bos/eos tokens * move dependency location * Update tests/test_kto_trainer.py * don't report nan metrics * don't report nan metrics and remove data interleaving * fix bugs in calculating metrics * no need to gather KL term * minor changes * use nanmean for losses * remove disabling of wandb * revert changes --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]>

claralp · 2024-04-08T14:38:23Z

@kashif see #1514

kawine · 2024-04-09T01:22:44Z

@PhilipMay apologies for the confusion! i had an earlier PR #1476 that had to be closed bc it conflicted with the proposed changes in #1472, but #1472 had some separate issues with logging, which is why i created this PR (#1499). users were actively blocked because the code was hanging, so fixing that asap was high priority

clara's contributions are very much appreciated and i'm happy to defer to you folks on crediting everyone properly

* add warning for imbalanced data * update documentation * update script commands to be same as in dpo * use batch_size KL examples and batch_size target examples to calculate batch_size losses * fix deepspeed issue * speed up forward with no_grad for KL * add some removed metrics * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py add reference to paper Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * add more detailed comments * convert assert to ValueError * Update kto_trainer.py * precommit formatting * remove nans in metrics by gathering across machines * fix formatting * fix choice of mismatched examples for KL term * describe weights * fix hanging issue in distributed training * linting * move metrics to cpu * Update trl/trainer/kto_trainer.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * fix tokenization error: lack of bos * change user warning for weight hyperparams * minor update to docs * reshape attention mask * reformat * add test for bos/eos tokens * move dependency location * Update tests/test_kto_trainer.py * don't report nan metrics * don't report nan metrics and remove data interleaving * fix bugs in calculating metrics * no need to gather KL term * minor changes * use nanmean for losses * remove disabling of wandb * revert changes --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]>

kawine and others added 30 commits February 24, 2024 18:19

add warning for imbalanced data

6ee3be4

update documentation

22dd810

update script commands to be same as in dpo

8d14930

use batch_size KL examples and batch_size target examples to calculat…

8a490af

…e batch_size losses

fix deepspeed issue

f826600

speed up forward with no_grad for KL

688ed6c

Merge branch 'huggingface:main' into main

587517b

add some removed metrics

e128f09

Update trl/trainer/kto_trainer.py

2d860b8

Update trl/trainer/kto_trainer.py

48d25ff

Update trl/trainer/kto_trainer.py

392bcc0

add reference to paper Co-authored-by: lewtun <[email protected]>

Update trl/trainer/kto_trainer.py

a42049f

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

5696814

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

000d5d8

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

2738d1f

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

d7f63c5

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

824da55

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

4399af4

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

69094be

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

73f7ed7

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

5b95aca

Co-authored-by: Kashif Rasul <[email protected]>

Update trl/trainer/kto_trainer.py

3102901

Co-authored-by: Kashif Rasul <[email protected]>

add more detailed comments

ca68f24

convert assert to ValueError

94fb375

Update kto_trainer.py

8f7e788

precommit formatting

ed19ed5

Merge branch 'main' of https://github.com/kawine/trl into main

310bd97

Merge branch 'huggingface:main' into main

639f4de

remove nans in metrics by gathering across machines

ee7d6a4

fix formatting

7ae95c2

kawine added 8 commits March 24, 2024 01:49

don't report nan metrics

bbd5715

Merge branch 'main' of https://github.com/kawine/trl into main

f603aeb

don't report nan metrics and remove data interleaving

856b796

merge latest changes in trl

7f0bea8

fix bugs in calculating metrics

8cf28a6

no need to gather KL term

aef50f1

minor changes

3e10bae

use nanmean for losses

2a38b15

kawine mentioned this pull request Apr 1, 2024

[KTO]: Fix nan losses and crashing job #1472

Closed

kawine added 2 commits April 2, 2024 11:00

Merge branch 'huggingface:main' into main

e1b6132

remove disabling of wandb

7130212

kawine added 2 commits April 3, 2024 13:49

revert changes

2fb641f

Merge branch 'main' of https://github.com/kawine/trl into main

0b44e42

kashif approved these changes Apr 3, 2024

View reviewed changes

kashif merged commit 4f8057a into huggingface:main Apr 3, 2024
8 of 9 checks passed

younesbelkada mentioned this pull request Apr 8, 2024

[Question] desirable_weight and undesirable_weight in KTOTrainer #1467

Closed

claralp mentioned this pull request Apr 8, 2024

[KTO] fix metric logging #1514

Merged

claralp mentioned this pull request Apr 11, 2024

KTO training produces NaN rewards #1447

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KTO] fix interleaving, reporting, hanging bugs #1499

[KTO] fix interleaving, reporting, hanging bugs #1499

kawine commented Apr 1, 2024

HuggingFaceDocBuilderDev commented Apr 1, 2024

kashif commented Apr 2, 2024 •

edited

Loading

kawine commented Apr 2, 2024

kawine commented Apr 3, 2024

PhilipMay commented Apr 8, 2024 •

edited

Loading

kashif commented Apr 8, 2024

claralp commented Apr 8, 2024

claralp commented Apr 8, 2024

kawine commented Apr 9, 2024

[KTO] fix interleaving, reporting, hanging bugs #1499

[KTO] fix interleaving, reporting, hanging bugs #1499

Conversation

kawine commented Apr 1, 2024

HuggingFaceDocBuilderDev commented Apr 1, 2024

kashif commented Apr 2, 2024 • edited Loading

kawine commented Apr 2, 2024

kawine commented Apr 3, 2024

PhilipMay commented Apr 8, 2024 • edited Loading

kashif commented Apr 8, 2024

claralp commented Apr 8, 2024

claralp commented Apr 8, 2024

kawine commented Apr 9, 2024

kashif commented Apr 2, 2024 •

edited

Loading

PhilipMay commented Apr 8, 2024 •

edited

Loading