-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anthropic HH new dataset format repeats the prompt #1582
Comments
Fully agree. The current data processing code (e.g., tokenize_row function in DPOTrainer: ----------------------------------------------- Print out the detailed tokenized data format (using the first data sample in Anthropic HH in v0.8.2 dataset for example) --------------------------------------------
Execution results: |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
I believe the dataset still has this issue. |
Has this problem been fixed? |
With the new data format of Anthropic HH in v0.8.2 (for example, see https://huggingface.co/datasets/trl-internal-testing/hh-rlhf-trl-style vs. the older https://huggingface.co/datasets/trl-internal-testing/Anthropic-hh-rlhf-processed), I think the samples for DPO training end up repeating the first message of the chat. For example, if the original row of the dataset is:
Then the processed sample (for chosen) will look like:
The text was updated successfully, but these errors were encountered: