Handle last token from generation prompt #1153

pablovicente · 2023-12-28T16:50:32Z

Some tokenizers, like Llama BPE tokenizer, might merge tokens into one such as the case where there is a space follow by some characters like $.

In a previous PR we handled the case for chosen/rejected but one small improvement is to remove the extra space token from the end of the prompt used for generation in get_batch_samples. It should left to the model if the next generated token is just the space token or one such that it combines the space with other token into one. To do so, we need to manually check prompt_ids for chosen/rejected and by itself and chose the shortest knowing that they can only differ by 1 token.

Given the following example, prompt + chosen will leave the space after [\INST] unchanged but it wont be the case for prompt+rejected since it starts with $.

"prompt": "[INST] How is the stock price? [/INST] "
"chosen": "46 as of 10am EST"
"rejected":  "$46 as of 10am EST"

@kashif

kashif · 2023-12-28T18:59:13Z

@pablovicente are the tests passing for you?

pablovicente · 2023-12-28T20:17:48Z

@pablovicente are the tests passing for you?

test_dpo_trainer.py tests pass. I see issues on the sft_trainer but have not made changes there. Are dpo_trainer tests pass for you?

HuggingFaceDocBuilderDev · 2023-12-28T21:05:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

@kashif we can merge this no? Wdyt? 🙏

* Handle last token from generation prompt * Remove prints * Reformat dpo_trainer file

pablovicente added 2 commits December 28, 2023 17:31

Handle last token from generation prompt

cc8ea95

Remove prints

a44eec1

kashif self-requested a review December 28, 2023 16:51

kashif added the 🏋 DPO Related to DPO label Dec 28, 2023

Reformat dpo_trainer file

91fe8fe

younesbelkada reviewed Jan 8, 2024

View reviewed changes

kashif approved these changes Jan 8, 2024

View reviewed changes

kashif merged commit d5910b0 into huggingface:main Jan 8, 2024
9 checks passed

jondurbin pushed a commit to jondurbin/trl that referenced this pull request Jan 8, 2024

Handle last token from generation prompt (huggingface#1153)

1ce29e2

* Handle last token from generation prompt * Remove prints * Reformat dpo_trainer file

lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024

Handle last token from generation prompt (huggingface#1153)

e8c7d50

* Handle last token from generation prompt * Remove prints * Reformat dpo_trainer file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle last token from generation prompt #1153

Handle last token from generation prompt #1153

pablovicente commented Dec 28, 2023

kashif commented Dec 28, 2023

pablovicente commented Dec 28, 2023

HuggingFaceDocBuilderDev commented Dec 28, 2023

younesbelkada left a comment

Handle last token from generation prompt #1153

Handle last token from generation prompt #1153

Conversation

pablovicente commented Dec 28, 2023

kashif commented Dec 28, 2023

pablovicente commented Dec 28, 2023

HuggingFaceDocBuilderDev commented Dec 28, 2023

younesbelkada left a comment

Choose a reason for hiding this comment