t5-sentiment example collapses on master #256

janpawlowskiof · 2023-03-28T13:23:59Z

I just rerun the t5-sentiment example and rn on master it has negative kl divergence in new version and does not learn in general.
This seems not have happened since the v0.4.1 release.

The t5-sentiment.py script itself does not seem to be the culprit as i tested master with it reverted to v0.4.1 version and the behavior is identical.

lvwerra · 2023-03-29T12:19:05Z

@younesbelkada can you reproduce? negative KL is very suspicious!

younesbelkada · 2023-03-29T12:26:34Z

I was able to reproduce, will investigate!

younesbelkada · 2023-03-29T13:34:42Z

The culprit seems to be b5cce0d
This PR introduced batched generation on the t5 example, as it can be observed on the wandb log the kl is negative, I can confirm the KL was always positive before that commit

younesbelkada · 2023-03-29T15:21:59Z

#262 should fix the issue!

GauravVirmani · 2023-03-30T08:04:26Z

@younesbelkada Understood the bug. I should have checked more diligently.

younesbelkada · 2023-03-30T08:12:21Z

No problem at all @GauravVirmani ! Don't worry about that as it can happen to anyone! It is also my fault as I did not flagged that the KL was negative when running the experiment

chizhikchi · 2023-04-08T11:45:12Z

I was following the t5-sentiment example in order to run RL training on a custom dataset with a custom metric and it also showed negative KL. So I looked into this issue and the created pull-request, which lead me to I rerun my experiments the same way as @younesbelkada has done in the pull-request:

changing only the generation kwargs
running on the younesbelkada:fix-t5-neg-kl branch (aborted this run as soon as I saw negative KL warning)

Unfortunately, none of this fixed the issue. My knowledge about PPO is limited, so I cannot contribute much to the discussion about the underlying issue, but I hope that this information might be useful. Also, I'll be grateful if you point-out an error that I could have committed.

Thanks a lot for your effort working on this amazing library!

younesbelkada · 2023-04-08T13:04:42Z

Hi @chizhikchi
Thanks for the heads up and for your words!
Sadly your wandb reports seems to be private so we can't see it
We will definitely investigate that, can you double check the solution proposed by @janpawlowskiof , i.e. try on the 0.4.1 release? Also I would give it a try without batched generation
Let us know how it goes!

chizhikchi · 2023-04-11T09:42:46Z

Hi, @younesbelkada , thank you for the suggestions!
I ran the same experiment on the tix-t5-neg-kl branch without batched generation and I seemed to work better: the KL got negative on some batches, as can be seen on the graph. I aborted this experiment, because it was giving unpromising results.

Then, I run the same experiment on the 0.4.1 version. KL wasn't negative this time, so the problem seems to be related to batched generation.

My model didn't improve much, though, but I think that's a more of a problem of the reward definition and the complexity of the task.

Hope this information helps! Have a nice day :)

hecongqing · 2023-05-28T06:56:00Z

Meaningful experiment. Generating results individually in this code without using batch generation?

response_tensors = ppo_trainer.generate(
query_tensors, return_prompt=False, length_sampler=output_length_sampler, **generation_kwargs
)

younesbelkada mentioned this issue Mar 29, 2023

[t5] Fix negative kl issue #262

Merged

lvwerra closed this as completed Apr 14, 2023

This was referenced May 2, 2023

ValueError: can only convert an array of size 1 to a Python scalar #329

Closed

Generation not working properly #333

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t5-sentiment example collapses on master #256

t5-sentiment example collapses on master #256

janpawlowskiof commented Mar 28, 2023 •

edited

Loading

lvwerra commented Mar 29, 2023

younesbelkada commented Mar 29, 2023

younesbelkada commented Mar 29, 2023

younesbelkada commented Mar 29, 2023

GauravVirmani commented Mar 30, 2023

younesbelkada commented Mar 30, 2023 •

edited

Loading

chizhikchi commented Apr 8, 2023

younesbelkada commented Apr 8, 2023

chizhikchi commented Apr 11, 2023

hecongqing commented May 28, 2023

t5-sentiment example collapses on master #256

t5-sentiment example collapses on master #256

Comments

janpawlowskiof commented Mar 28, 2023 • edited Loading

lvwerra commented Mar 29, 2023

younesbelkada commented Mar 29, 2023

younesbelkada commented Mar 29, 2023

younesbelkada commented Mar 29, 2023

GauravVirmani commented Mar 30, 2023

younesbelkada commented Mar 30, 2023 • edited Loading

chizhikchi commented Apr 8, 2023

younesbelkada commented Apr 8, 2023

chizhikchi commented Apr 11, 2023

hecongqing commented May 28, 2023

janpawlowskiof commented Mar 28, 2023 •

edited

Loading

younesbelkada commented Mar 30, 2023 •

edited

Loading