[XPO] xpo trainer #1943

kashif · 2024-08-18T10:50:00Z

https://huggingface.co/papers/2405.21046 implementation

HuggingFaceDocBuilderDev · 2024-08-18T10:53:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/online_dpo_trainer.py

trl/trainer/xpo_trainer.py

qgallouedec · 2024-09-09T11:58:07Z

kl isn't logged in XPO. Is it possible to have it logged?

qgallouedec · 2024-09-10T08:50:28Z

Is it possible to have it logged?

from @kashif

In the online-dpo what is the "kl" being logged?

"kl" is the approximate KL div from the ref model distribution to the trained model distribution. Approximated as:

trl/trl/trainer/online_dpo_trainer.py

Line 389 in 72f19c3

kl = logprobs - ref_logprobs

Your calculation abdf1a5 seems correct to me

        # Calculate KL divergence for model and ref data
         kl_model_data = model_logprobs_model_data - ref_logprobs_model_data
         kl_ref_data = model_logprobs_ref_data - ref_logprobs_ref_data
         mean_kl = (kl_model_data.sum(1) + kl_ref_data.sum(1)).mean() / 2
         self.stats["objective/kl"].append(gather_mean(mean_kl))

Xpo changes

examples/scripts/xpo.py

trl/trainer/xpo_trainer.py

qgallouedec

Very nice work @kashif

Co-authored-by: Quentin Gallouédec <[email protected]>

initial xpo trainer

41d546d

kashif marked this pull request as draft August 18, 2024 10:50

kashif added 5 commits August 19, 2024 10:04

compute rewards and ref log probs in smaller batches

c4a5c4c

add logging

b59afe1

initial log docs

2adc1c4

fix global_step increment

7506efc

fix metric descriptions

fd6221d

kashif mentioned this pull request Aug 20, 2024

[ODPO] Fix global step for consistent checkpointing with global updates #1950

Merged

kashif added 12 commits August 20, 2024 17:15

Merge branch 'main' into xpo

25802fa

Merge branch 'main' into xpo

23ad98f

use messages API

1c898fa

Merge branch 'main' into xpo

564738b

use training_step API

b82774f

Merge branch 'main' into xpo

7c6df04

fix logs

62ea60b

add test

6b6ae87

add back max_new_tokens

14eeecd

use max_new_tokens

5b6772a

Merge branch 'main' into xpo

cb2fbf7

refactor

59af961

kashif marked this pull request as ready for review August 31, 2024 13:41

kashif added 8 commits September 3, 2024 13:34

top_k is an int

d6b0f60

Merge branch 'main' into xpo

778f9d1

fix formatting

b0428f3

fix the loss

21483e9

fix logging

0b185ae

fix logging

28a030f

fix logging

7f4cc39

fix loss

85807ea

kashif added 3 commits September 7, 2024 09:44

do not log loss again

f623f6f

fix docs

6a22d83

Merge branch 'main' into xpo

e2c921b

kashif requested a review from qgallouedec September 7, 2024 10:06

kashif added 2 commits September 8, 2024 13:23

add disable_dropout_in_model via flag

4b49824

comments

066f2b8

kashif mentioned this pull request Sep 9, 2024

Disable the dropout by default in Online DPO #2042

Closed

qgallouedec linked an issue Sep 9, 2024 that may be closed by this pull request

Disable the dropout by default in Online DPO #2042

Closed

qgallouedec reviewed Sep 9, 2024

View reviewed changes

trl/trainer/online_dpo_trainer.py Show resolved Hide resolved

qgallouedec reviewed Sep 9, 2024

View reviewed changes

trl/trainer/xpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec added 4 commits September 9, 2024 12:20

revert doc change

61f2030

rm empty cache in online dpo

811a282

improve doc xpo config

45cf130

some comment

11300fa

kashif changed the title ~~[XPO] initial xpo trainer~~ [XPO] xpo trainer Sep 10, 2024

fix loggings stats

adbf1a5

kashif added 3 commits September 10, 2024 10:52

fix docs

5b5855f

save the model

e316156

Merge pull request #2 from huggingface/xpo_changes

4b9ccc8

Xpo changes

qgallouedec reviewed Sep 10, 2024

View reviewed changes

examples/scripts/xpo.py Outdated Show resolved Hide resolved

fix model and reward model

8f34169

qgallouedec reviewed Sep 10, 2024

View reviewed changes

trl/trainer/xpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec approved these changes Sep 10, 2024

View reviewed changes

kashif and others added 4 commits September 10, 2024 11:13

Update trl/trainer/xpo_trainer.py

719849f

Co-authored-by: Quentin Gallouédec <[email protected]>

Merge branch 'main' into xpo

9b4cf4f

Merge branch 'main' into xpo

7df80de

Merge branch 'main' into xpo

9b898d2

kashif merged commit 3511856 into huggingface:main Sep 10, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPO] xpo trainer #1943

[XPO] xpo trainer #1943

kashif commented Aug 18, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 18, 2024

qgallouedec commented Sep 9, 2024

qgallouedec commented Sep 10, 2024

qgallouedec left a comment

[XPO] xpo trainer #1943

[XPO] xpo trainer #1943

Conversation

kashif commented Aug 18, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Aug 18, 2024

qgallouedec commented Sep 9, 2024

qgallouedec commented Sep 10, 2024

qgallouedec left a comment

Choose a reason for hiding this comment

kashif commented Aug 18, 2024 •

edited

Loading