-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have trouble in ppo example #1618
Comments
I rolled back the changes in #1483 and the code seems to work as expected, but results in low GPU utilization and running slowly. Can we have an implementation that balances stability and speed? |
Additionally, I tried to do ppo on a 70B model, which should require a large amount of memory and cannot be done on a single GPU machine. I've been working for a few days trying to find a way to distribute the model across different nodes, but accelerate always seems to want to load one copy of the model for each GPU. Is there any good way to solve this problem? |
I also face the issue when using trl/examples/accelerate_configs/deepspeed_zero3.yaml. The models seem to be distributed among severl gpus but the input tensors may not ?, which causes error that they are not on the same device. |
Please give the new PPOv2Trainer a try :) https://huggingface.co/docs/trl/ppov2_trainer |
I've decided to use a different training framework, but thank you anyway! |
I've come into the similar issue, do you have any recommendation about different training frameworks? That will be very helpful, huh. |
I used deepspeed-chat from deepspeedexample before. I recently used DPO training from trl and did not have similar problems. Maybe checking the version or code can simply solve this problem. |
I've heard of Deepspeed-chat before but not tried it, that's a good choice maybe. I tried another frameworks but the train speed with zero3 is too slow, I decide to check the PPOtrainer code again since DPO goes well. Thank you for your reply. |
which framework you think is better? |
I run
ppo.py
in example withdeepspeed_zero3.yaml
. Except for changing the data set and model to use local ones, I did not modify other code and got this error when runningIndexError: pop from an empty deque
.Here is the whole log:
And here is the result of
pip list
click here
Can anyone help me?
The text was updated successfully, but these errors were encountered: