eos_token config in PPOTrainer #2387

kechunFIVE · 2024-11-23T16:18:35Z

Feature request

It appears that the generation config in PPOTrainer does not set an eos_token, resulting in each generation process continuing until it reaches the maximum length before stopping, which is quite time-consuming.

Motivation

If the eos_token is set, it will significantly reduce the time spent during the generation phase.

Your contribution

i'm sorry

The text was updated successfully, but these errors were encountered:

qgallouedec · 2024-11-26T09:50:17Z

Thanks for reporting it. Would you like to open a PR to fix it?

dawidm · 2024-12-28T12:43:38Z

@kechunFIVE, @qgallouedec, @dame-cell Seems that current code is inspired by https://iclr-blogposts.github.io/2024/blog/the-n-implementation-details-of-rlhf-with-ppo/, section General implementation details, 4.2. Authors tried to recreate results from early OpenAI work, but they say:

Note that in a more recent codebase https://github.com/openai/summarize-from-feedback, OpenAI does stop sampling when encountering EOS token (summarize_from_feedback/utils/experiment_helpers.py#L19). However in this work we aim to do a 1:1 replication, so we align the setting that could keep sampling even eos_token is encountered

Reading PPOTrainer implementation, it seems that stop_token/stop_token_id arguments should control when to stop generation:

When stop_token is set to eos value, everything that is after EOS is padded and therefore ignored in calculations (using masks). In that case, keeping generating after all sequences have EOS seems just wasting of time and resources. I've checked that between stopping on EOS and generating until max length, loss values are almost exact and minimal differences of the order of 1e-8 are caused by masked calculations.
When stop_token_id is set, sequences get padded after this token, so generation should stop there, not model's EOS.
The case when stop_token != None and stop_token != 'eos' are not supported, so they should probably raise an exception.

Benjoyo · 2025-01-15T20:41:07Z

The option to control the stop token should be added to all online trainer configs imho

haoxiongliu · 2025-02-18T13:16:37Z

Can we also fix grpo? Or is there a convenient way to fix all online algorithms at onece?

qgallouedec added ✨ enhancement New feature or request 👶 good first issue Good for newcomers 🏋 PPO Related to PPO labels Nov 26, 2024

dame-cell mentioned this issue Nov 30, 2024

added eos token for ppotrainer #2420

Closed

5 tasks

qgallouedec added the 🙋 help from community wanted Open invitation for community members to contribute label Dec 13, 2024

dawidm mentioned this issue Dec 28, 2024

🫷 Include stop token in policy model's generation_config #2528

Merged

5 tasks

qgallouedec closed this as completed in #2528 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eos_token config in PPOTrainer #2387

eos_token config in PPOTrainer #2387

kechunFIVE commented Nov 23, 2024

qgallouedec commented Nov 26, 2024

dawidm commented Dec 28, 2024 •

edited

Loading

Benjoyo commented Jan 15, 2025

haoxiongliu commented Feb 18, 2025

eos_token config in PPOTrainer #2387

eos_token config in PPOTrainer #2387

Comments

kechunFIVE commented Nov 23, 2024

Feature request

Motivation

Your contribution

qgallouedec commented Nov 26, 2024

dawidm commented Dec 28, 2024 • edited Loading

Benjoyo commented Jan 15, 2025

haoxiongliu commented Feb 18, 2025

dawidm commented Dec 28, 2024 •

edited

Loading