Have trouble in ppo example #1618

Shiguang-Guo · 2024-05-03T17:12:02Z

I run ppo.py in example with deepspeed_zero3.yaml. Except for changing the data set and model to use local ones, I did not modify other code and got this error when running IndexError: pop from an empty deque.

Here is the whole log:

Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


  0%|          | 0/24 [00:00<?, ?it/s]
  0%|          | 0/24 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
homepath/.conda/envs/evolve/lib/python3.11/site-packages/transformers/pipelines/text_classification.py:104: UserWarning: `return_all_scores` is now deprecated,  if want a similar functionality use `top_k=None` instead of `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.
  warnings.warn(
  4%|▍         | 1/24 [00:22<08:42, 22.73s/it]
  8%|▊         | 2/24 [00:44<08:08, 22.19s/it]

  8%|▊         | 2/24 [01:05<11:56, 32.56s/it]
[rank7]: Traceback (most recent call last):
[rank7]:   File "homepath/code/evolve/example/ppo.py", line 189, in <module>
[rank7]:     stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
[rank7]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/contextlib.py", line 81, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:            ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/code/evolve/TRL_GITHUB/trl/trainer/ppo_trainer.py", line 721, in step
[rank7]:     all_logprobs, logits_or_none, values, masks = self.batched_forward_pass(
[rank7]:                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/contextlib.py", line 81, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:            ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/code/evolve/TRL_GITHUB/trl/trainer/ppo_trainer.py", line 994, in batched_forward_pass
[rank7]:     logits, _, values = model(**input_kwargs)
[rank7]:                         ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
[rank7]:     loss = self.module(*inputs, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1595, in _call_impl
[rank7]:     hook_result = hook(self, args, result)
[rank7]:                   ^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 232, in _end_of_forward_hook
[rank7]:     self.get_param_coordinator(training=False).reset_step()
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in reset_step
[rank7]:     self.construct_parameter_trace_from_module_trace()
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 202, in construct_parameter_trace_from_module_trace
[rank7]:     self.record_parameters(sub_module)
[rank7]:   File "homepath/.conda/envs/evolve/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 194, in record_parameters
[rank7]:     step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: IndexError: pop from an empty deque
....

And here is the result of pip list

click here

Package            Version     Editable project location
------------------ ----------- -----------------------------------------------------------------------------------------------
accelerate         0.29.3
aiohttp            3.9.5
aiosignal          1.3.1
annotated-types    0.6.0
anyio              4.3.0
appdirs            1.4.4
attrs              23.2.0
Brotli             1.0.9
certifi            2024.2.2
charset-normalizer 2.0.4
click              8.1.7
datasets           2.19.0
deepspeed          0.14.2
dill               0.3.8
docker-pycreds     0.4.0
docopt             0.6.2
docstring_parser   0.16
einops             0.8.0
fairscale          0.4.13
fastapi            0.110.2
filelock           3.13.1
flash-attn         2.5.8
frozenlist         1.4.1
fschat             0.2.36
fsspec             2024.3.1
gitdb              4.0.11
GitPython          3.1.43
gmpy2              2.1.2
h11                0.14.0
hjson              3.1.0
hope               3.6.6.1
httpcore           1.0.5
httpx              0.27.0
huggingface-hub    0.22.2
idna               3.4
Jinja2             3.1.3
markdown-it-py     3.0.0
markdown2          2.4.13
MarkupSafe         2.1.3
mdurl              0.1.2
mkl-fft            1.3.8
mkl-random         1.2.4
mkl-service        2.4.0
mpmath             1.3.0
multidict          6.0.5
multiprocess       0.70.16
networkx           3.1
nh3                0.2.17
ninja              1.11.1.1
numpy              1.26.4
packaging          24.0
pandas             2.2.2
pexpect            4.9.0
pillow             10.2.0
pip                23.3.1
prettytable        3.10.0
prompt-toolkit     3.0.43
protobuf           4.25.3
psutil             5.9.8
ptyprocess         0.7.0
py-cpuinfo         9.0.0
pyarrow            16.0.0
pyarrow-hotfix     0.6
pydantic           2.7.1
pydantic_core      2.18.2
Pygments           2.17.2
pynvml             11.5.0
PySocks            1.7.1
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.1
regex              2024.4.16
requests           2.31.0
rich               13.7.1
safetensors        0.4.3
sentencepiece      0.2.0
sentry-sdk         2.0.1
setproctitle       1.3.3
setuptools         68.2.2
shortuuid          1.0.13
shtab              1.7.1
six                1.16.0
smmap              5.0.1
sniffio            1.3.1
starlette          0.37.2
svgwrite           1.4.3
sympy              1.12
thrift             0.20.0
tiktoken           0.6.0
tokenizers         0.19.1
torch              2.3.0
torchaudio         2.3.0
torchvision        0.18.0
tqdm               4.66.2
transformers       4.40.1
trl                0.8.7.dev0  homepath/code/evolve/TRL_GITHUB
typing_extensions  4.9.0
tyro               0.8.3
tzdata             2024.1
urllib3            2.1.0
uvicorn            0.29.0
wandb              0.16.6
wavedrom           2.0.3.post3
wcwidth            0.2.13
wheel              0.41.2
xxhash             3.4.1
yarl               1.9.4

Can anyone help me?

The text was updated successfully, but these errors were encountered:

Shiguang-Guo · 2024-05-03T19:08:44Z

I rolled back the changes in #1483 and the code seems to work as expected, but results in low GPU utilization and running slowly. Can we have an implementation that balances stability and speed?

Shiguang-Guo · 2024-05-03T23:59:35Z

Additionally, I tried to do ppo on a 70B model, which should require a large amount of memory and cannot be done on a single GPU machine. I've been working for a few days trying to find a way to distribute the model across different nodes, but accelerate always seems to want to load one copy of the model for each GPU. Is there any good way to solve this problem?

yananchen1989 · 2024-05-06T15:10:28Z

I also face the issue when using trl/examples/accelerate_configs/deepspeed_zero3.yaml. The models seem to be distributed among severl gpus but the input tensors may not ?, which causes error that they are not on the same device.

vwxyzjn · 2024-05-15T14:29:59Z

Would you like to give #1540 a try? It has a DS3 example. I had encountered that IndexError: pop from an empty deque issue before but #1540 fixes it.

vwxyzjn · 2024-06-05T21:37:16Z

Please give the new PPOv2Trainer a try :) https://huggingface.co/docs/trl/ppov2_trainer

Shiguang-Guo · 2024-06-06T08:26:29Z

I've decided to use a different training framework, but thank you anyway!

zhuyuzy · 2024-07-11T08:19:43Z

I've decided to use a different training framework, but thank you anyway!

I've come into the similar issue, do you have any recommendation about different training frameworks? That will be very helpful, huh.

Shiguang-Guo · 2024-07-11T08:24:53Z

I used deepspeed-chat from deepspeedexample before. I recently used DPO training from trl and did not have similar problems. Maybe checking the version or code can simply solve this problem.

zhuyuzy · 2024-07-11T09:17:44Z

I used deepspeed-chat from deepspeedexample before. I recently used DPO training from trl and did not have similar problems. Maybe checking the version or code can simply solve this problem.

I've heard of Deepspeed-chat before but not tried it, that's a good choice maybe. I tried another frameworks but the train speed with zero3 is too slow, I decide to check the PPOtrainer code again since DPO goes well. Thank you for your reply.

yiyepiaoling0715 · 2025-02-08T09:06:40Z

I've decided to use a different training framework, but thank you anyway!

which framework you think is better?

Shiguang-Guo mentioned this issue May 3, 2024

Speed up ZeRO-3 generation with DPO #1543

Closed

vwxyzjn closed this as completed Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have trouble in ppo example #1618

Have trouble in ppo example #1618

Shiguang-Guo commented May 3, 2024

Shiguang-Guo commented May 3, 2024

Shiguang-Guo commented May 3, 2024

yananchen1989 commented May 6, 2024

vwxyzjn commented May 15, 2024

vwxyzjn commented Jun 5, 2024

Shiguang-Guo commented Jun 6, 2024

zhuyuzy commented Jul 11, 2024

Shiguang-Guo commented Jul 11, 2024

zhuyuzy commented Jul 11, 2024

yiyepiaoling0715 commented Feb 8, 2025

Have trouble in ppo example #1618

Have trouble in ppo example #1618

Comments

Shiguang-Guo commented May 3, 2024

Shiguang-Guo commented May 3, 2024

Shiguang-Guo commented May 3, 2024

yananchen1989 commented May 6, 2024

vwxyzjn commented May 15, 2024

vwxyzjn commented Jun 5, 2024

Shiguang-Guo commented Jun 6, 2024

zhuyuzy commented Jul 11, 2024

Shiguang-Guo commented Jul 11, 2024

zhuyuzy commented Jul 11, 2024

yiyepiaoling0715 commented Feb 8, 2025