-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eos_token config in PPOTrainer #2387
Comments
Thanks for reporting it. Would you like to open a PR to fix it? |
@kechunFIVE, @qgallouedec, @dame-cell Seems that current code is inspired by https://iclr-blogposts.github.io/2024/blog/the-n-implementation-details-of-rlhf-with-ppo/, section General implementation details, 4.2. Authors tried to recreate results from early OpenAI work, but they say:
Reading
|
The option to control the stop token should be added to all online trainer configs imho |
Can we also fix grpo? Or is there a convenient way to fix all online algorithms at onece? |
Feature request
It appears that the generation config in PPOTrainer does not set an eos_token, resulting in each generation process continuing until it reaches the maximum length before stopping, which is quite time-consuming.
Motivation
If the
eos_token
is set, it will significantly reduce the time spent during the generation phase.Your contribution
i'm sorry
The text was updated successfully, but these errors were encountered: