Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have trouble in ppo example #1618

Closed
Shiguang-Guo opened this issue May 3, 2024 · 10 comments
Closed

Have trouble in ppo example #1618

Shiguang-Guo opened this issue May 3, 2024 · 10 comments

Comments

@Shiguang-Guo
Copy link

I run ppo.py in example with deepspeed_zero3.yaml. Except for changing the data set and model to use local ones, I did not modify other code and got this error when running IndexError: pop from an empty deque.

Here is the whole log:

Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


  0%|          | 0/24 [00:00<?, ?it/s]
  0%|          | 0/24 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
homepath/.conda/envs/evolve/lib/python3.11/site-packages/transformers/pipelines/text_classification.py:104: UserWarning: `return_all_scores` is now deprecated,  if want a similar functionality use `top_k=None` instead of `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.
  warnings.warn(
  4%|▍         | 1/24 [00:22<08:42, 22.73s/it]
  8%|▊         | 2/24 [00:44<08:08, 22.19s/it]

  8%|▊         | 2/24 [01:05<11:56, 32.56s/it]
[rank7]: Traceback (most recent call last):
[rank7]:   File "homepath/code/evolve/example/ppo.py", line 189, in <module>
[rank7]:     stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
[rank7]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/contextlib.py", line 81, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:            ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/code/evolve/TRL_GITHUB/trl/trainer/ppo_trainer.py", line 721, in step
[rank7]:     all_logprobs, logits_or_none, values, masks = self.batched_forward_pass(
[rank7]:                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/contextlib.py", line 81, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:            ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/code/evolve/TRL_GITHUB/trl/trainer/ppo_trainer.py", line 994, in batched_forward_pass
[rank7]:     logits, _, values = model(**input_kwargs)
[rank7]:                         ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
[rank7]:     loss = self.module(*inputs, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1595, in _call_impl
[rank7]:     hook_result = hook(self, args, result)
[rank7]:                   ^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 232, in _end_of_forward_hook
[rank7]:     self.get_param_coordinator(training=False).reset_step()
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in reset_step
[rank7]:     self.construct_parameter_trace_from_module_trace()
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 202, in construct_parameter_trace_from_module_trace
[rank7]:     self.record_parameters(sub_module)
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 194, in record_parameters
[rank7]:     step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: IndexError: pop from an empty deque
....

And here is the result of pip list

click here
Package            Version     Editable project location
------------------ ----------- -----------------------------------------------------------------------------------------------
accelerate         0.29.3
aiohttp            3.9.5
aiosignal          1.3.1
annotated-types    0.6.0
anyio              4.3.0
appdirs            1.4.4
attrs              23.2.0
Brotli             1.0.9
certifi            2024.2.2
charset-normalizer 2.0.4
click              8.1.7
datasets           2.19.0
deepspeed          0.14.2
dill               0.3.8
docker-pycreds     0.4.0
docopt             0.6.2
docstring_parser   0.16
einops             0.8.0
fairscale          0.4.13
fastapi            0.110.2
filelock           3.13.1
flash-attn         2.5.8
frozenlist         1.4.1
fschat             0.2.36
fsspec             2024.3.1
gitdb              4.0.11
GitPython          3.1.43
gmpy2              2.1.2
h11                0.14.0
hjson              3.1.0
hope               3.6.6.1
httpcore           1.0.5
httpx              0.27.0
huggingface-hub    0.22.2
idna               3.4
Jinja2             3.1.3
markdown-it-py     3.0.0
markdown2          2.4.13
MarkupSafe         2.1.3
mdurl              0.1.2
mkl-fft            1.3.8
mkl-random         1.2.4
mkl-service        2.4.0
mpmath             1.3.0
multidict          6.0.5
multiprocess       0.70.16
networkx           3.1
nh3                0.2.17
ninja              1.11.1.1
numpy              1.26.4
packaging          24.0
pandas             2.2.2
pexpect            4.9.0
pillow             10.2.0
pip                23.3.1
prettytable        3.10.0
prompt-toolkit     3.0.43
protobuf           4.25.3
psutil             5.9.8
ptyprocess         0.7.0
py-cpuinfo         9.0.0
pyarrow            16.0.0
pyarrow-hotfix     0.6
pydantic           2.7.1
pydantic_core      2.18.2
Pygments           2.17.2
pynvml             11.5.0
PySocks            1.7.1
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.1
regex              2024.4.16
requests           2.31.0
rich               13.7.1
safetensors        0.4.3
sentencepiece      0.2.0
sentry-sdk         2.0.1
setproctitle       1.3.3
setuptools         68.2.2
shortuuid          1.0.13
shtab              1.7.1
six                1.16.0
smmap              5.0.1
sniffio            1.3.1
starlette          0.37.2
svgwrite           1.4.3
sympy              1.12
thrift             0.20.0
tiktoken           0.6.0
tokenizers         0.19.1
torch              2.3.0
torchaudio         2.3.0
torchvision        0.18.0
tqdm               4.66.2
transformers       4.40.1
trl                0.8.7.dev0  homepath/code/evolve/TRL_GITHUB
typing_extensions  4.9.0
tyro               0.8.3
tzdata             2024.1
urllib3            2.1.0
uvicorn            0.29.0
wandb              0.16.6
wavedrom           2.0.3.post3
wcwidth            0.2.13
wheel              0.41.2
xxhash             3.4.1
yarl               1.9.4

Can anyone help me?

@Shiguang-Guo
Copy link
Author

I rolled back the changes in #1483 and the code seems to work as expected, but results in low GPU utilization and running slowly. Can we have an implementation that balances stability and speed?

@Shiguang-Guo
Copy link
Author

Additionally, I tried to do ppo on a 70B model, which should require a large amount of memory and cannot be done on a single GPU machine. I've been working for a few days trying to find a way to distribute the model across different nodes, but accelerate always seems to want to load one copy of the model for each GPU. Is there any good way to solve this problem?

@yananchen1989
Copy link

I also face the issue when using trl/examples/accelerate_configs/deepspeed_zero3.yaml. The models seem to be distributed among severl gpus but the input tensors may not ?, which causes error that they are not on the same device.

@vwxyzjn
Copy link
Contributor

vwxyzjn commented May 15, 2024

Would you like to give #1540 a try? It has a DS3 example. I had encountered that IndexError: pop from an empty deque issue before but #1540 fixes it.

@vwxyzjn vwxyzjn closed this as completed Jun 5, 2024
@vwxyzjn
Copy link
Contributor

vwxyzjn commented Jun 5, 2024

Please give the new PPOv2Trainer a try :) https://huggingface.co/docs/trl/ppov2_trainer

@Shiguang-Guo
Copy link
Author

I've decided to use a different training framework, but thank you anyway!

@zhuyuzy
Copy link

zhuyuzy commented Jul 11, 2024

I've decided to use a different training framework, but thank you anyway!

I've come into the similar issue, do you have any recommendation about different training frameworks? That will be very helpful, huh.

@Shiguang-Guo
Copy link
Author

I used deepspeed-chat from deepspeedexample before. I recently used DPO training from trl and did not have similar problems. Maybe checking the version or code can simply solve this problem.

@zhuyuzy
Copy link

zhuyuzy commented Jul 11, 2024

I used deepspeed-chat from deepspeedexample before. I recently used DPO training from trl and did not have similar problems. Maybe checking the version or code can simply solve this problem.

I've heard of Deepspeed-chat before but not tried it, that's a good choice maybe. I tried another frameworks but the train speed with zero3 is too slow, I decide to check the PPOtrainer code again since DPO goes well. Thank you for your reply.

@yiyepiaoling0715
Copy link

I've decided to use a different training framework, but thank you anyway!

which framework you think is better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants