Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds early stopping #238

Merged
merged 5 commits into from
Mar 23, 2023
Merged

adds early stopping #238

merged 5 commits into from
Mar 23, 2023

Conversation

edbeeching
Copy link
Collaborator

@edbeeching edbeeching commented Mar 21, 2023

Adds early stopping to the PPO loop. Fixes #232

I used a value of 0.1 as the threshold as I observed an initial spike of 0.2 before we had instabilities in gpt2-xl earlier this week:
image

Note that RL4LM use a value of 0.5

My only other concern is gradient accumulation. I think it would it be better to zero the gradients as soon as we see a splike in KL, or to leave them in the as is. I have zeroed them for now, but it would be great to have feedback on this.

@edbeeching edbeeching requested a review from lvwerra March 21, 2023 15:06
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 21, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks nice and clean to me! My main worry is that this breaks logging because if we early stop the values might not all have the same shape. Did you double check that and if not could you run a quick test?

@edbeeching
Copy link
Collaborator Author

Good point, I ran a benchmark with early stopping enabled and a threshold of 0.001 here and the logging seems to work ok.

@lvwerra lvwerra merged commit 1620da3 into main Mar 23, 2023
@lvwerra lvwerra deleted the early-stopping branch March 23, 2023 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: PPO early stopping, important for training stability
3 participants