-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t5-sentiment example collapses on master #256
Comments
@younesbelkada can you reproduce? negative KL is very suspicious! |
I was able to reproduce, will investigate! |
#262 should fix the issue! |
@younesbelkada Understood the bug. I should have checked more diligently. |
No problem at all @GauravVirmani ! Don't worry about that as it can happen to anyone! It is also my fault as I did not flagged that the KL was negative when running the experiment |
I was following the t5-sentiment example in order to run RL training on a custom dataset with a custom metric and it also showed negative KL. So I looked into this issue and the created pull-request, which lead me to I rerun my experiments the same way as @younesbelkada has done in the pull-request:
Unfortunately, none of this fixed the issue. My knowledge about PPO is limited, so I cannot contribute much to the discussion about the underlying issue, but I hope that this information might be useful. Also, I'll be grateful if you point-out an error that I could have committed. Thanks a lot for your effort working on this amazing library! |
Hi @chizhikchi |
Hi, @younesbelkada , thank you for the suggestions! Then, I run the same experiment on the 0.4.1 version. KL wasn't negative this time, so the problem seems to be related to batched generation. My model didn't improve much, though, but I think that's a more of a problem of the reward definition and the complexity of the task. Hope this information helps! Have a nice day :) |
Meaningful experiment. Generating results individually in this code without using batch generation?
|
I just rerun the t5-sentiment example and rn on master it has negative kl divergence in new version and does not learn in general.
This seems not have happened since the v0.4.1 release.
The t5-sentiment.py script itself does not seem to be the culprit as i tested master with it reverted to v0.4.1 version and the behavior is identical.
The text was updated successfully, but these errors were encountered: