Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle last token from generation prompt #1153

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

pablovicente
Copy link
Contributor

Some tokenizers, like Llama BPE tokenizer, might merge tokens into one such as the case where there is a space follow by some characters like $.

In a previous PR we handled the case for chosen/rejected but one small improvement is to remove the extra space token from the end of the prompt used for generation in get_batch_samples. It should left to the model if the next generated token is just the space token or one such that it combines the space with other token into one. To do so, we need to manually check prompt_ids for chosen/rejected and by itself and chose the shortest knowing that they can only differ by 1 token.

Given the following example, prompt + chosen will leave the space after [\INST] unchanged but it wont be the case for prompt+rejected since it starts with $.

"prompt": "[INST] How is the stock price? [/INST] "
"chosen": "46 as of 10am EST"
"rejected":  "$46 as of 10am EST"

@kashif

@kashif kashif self-requested a review December 28, 2023 16:51
@kashif kashif added the 🏋 DPO Related to DPO label Dec 28, 2023
@kashif
Copy link
Collaborator

kashif commented Dec 28, 2023

@pablovicente are the tests passing for you?

@pablovicente
Copy link
Contributor Author

@pablovicente are the tests passing for you?

test_dpo_trainer.py tests pass. I see issues on the sft_trainer but have not made changes there. Are dpo_trainer tests pass for you?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kashif we can merge this no? Wdyt? 🙏

@kashif kashif merged commit d5910b0 into huggingface:main Jan 8, 2024
9 checks passed
jondurbin pushed a commit to jondurbin/trl that referenced this pull request Jan 8, 2024
* Handle last token from generation prompt

* Remove prints

* Reformat dpo_trainer file
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* Handle last token from generation prompt

* Remove prints

* Reformat dpo_trainer file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 DPO Related to DPO
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants