Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_collator is not assigned in PPOTrainer #143

Closed
DaehanKim opened this issue Feb 10, 2023 · 6 comments
Closed

data_collator is not assigned in PPOTrainer #143

DaehanKim opened this issue Feb 10, 2023 · 6 comments

Comments

@DaehanKim
Copy link

DaehanKim commented Feb 10, 2023

https://github.com/lvwerra/trl/blob/main/trl/trainer/ppo_trainer.py#L217
Here dataloader is defined using passed data_collator function.

https://github.com/lvwerra/trl/blob/main/trl/trainer/ppo_trainer.py#L231
but self.data_collator is always set to DataCollatorForLanguageModeling(self.tokenizer, mlm=False) regardless of passed data_collator argument.

I think it should be something like

        if data_collator is None:
            self.data_collator = DataCollatorForLanguageModeling(self.tokenizer, mlm=False)
        else:
            self.data_collator = data_collator

@DaehanKim
Copy link
Author

I found this because sentiment-control example ipynb didn't run in my environment.

I use A6000 GPU and only added os.environ['CUDA_VISIBLE_DEVICES']='5' at the start of the script. And got this stacktrace :

RuntimeError                              Traceback (most recent call last)
Cell In[15], line 24
     22 #### Run PPO training 
     23 t = time.time()
---> 24 stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
     26 for cs in ctrl_str:
     27     key = 'env/reward_'+cs.strip('[]')

File /workspace/trl/trl/trainer/ppo_trainer.py:441, in PPOTrainer.step(self, queries, responses, scores)
    437 t0 = time.time()
    439 t = time.time()
--> 441 logprobs, ref_logprobs, values = self.batched_forward_pass(queries, responses)
    442 timing["time/ppo/forward_pass"] = time.time() - t
    444 t = time.time()

File /workspace/trl/trl/trainer/ppo_trainer.py:587, in PPOTrainer.batched_forward_pass(self, queries, responses)
    584 response_batch = responses[i * fbs : (i + 1) * fbs]
    586 with torch.no_grad():
--> 587     logits, _, v = self.model(**input_kwargs)
    588     ref_logits, _, _ = self.ref_model(**input_kwargs)
    590 if self.is_encoder_decoder:

File /opt/conda/envs/trl-tutorial/lib/python3.8/site-packages/torch/nn/modules/module.py:1212, in Module._call_impl(self, *input, **kwargs)
   1209     bw_hook = hooks.BackwardHook(self, full_backward_hooks)
...
   2208     # remove once script supports set_grad_enabled
   2209     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

I saw all inputs(query, responses) are properly in cuda devices. but in forward batching process, they seems to be mapped to CPU. Maybe it's related to recent forward batching update #139 (I'm not sure). I'll update when I figure out something.

@younesbelkada
Copy link
Contributor

Hi @DaehanKim ,

Thanks for the report! I think this makes sense and should be replaced as suggested ;)
Do you know if adding your patch fixes your issue?
Also I would recommend to not add CUDA_VISIBLE_DEVICES as device assignement should be taken care by accelerate

@DaehanKim
Copy link
Author

DaehanKim commented Feb 10, 2023

Unfortunately, fixing collator alone didn't solve the problem.
It gives another error stacktrace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[33], line 24
     22 #### Run PPO training 
     23 t = time.time()
---> 24 stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
     26 for cs in ctrl_str:
     27     key = 'env/reward_'+cs.strip('[]')

File /workspace/trl/trl/trainer/ppo_trainer.py:441, in PPOTrainer.step(self, queries, responses, scores)
    437 t0 = time.time()
    439 t = time.time()
--> 441 logprobs, ref_logprobs, values = self.batched_forward_pass(queries, responses)
    442 timing["time/ppo/forward_pass"] = time.time() - t
    444 t = time.time()

File /workspace/trl/trl/trainer/ppo_trainer.py:587, in PPOTrainer.batched_forward_pass(self, queries, responses)
    584 response_batch = responses[i * fbs : (i + 1) * fbs]
    586 with torch.no_grad():
--> 587     logits, _, v = self.model(**input_kwargs)
    588     ref_logits, _, _ = self.ref_model(**input_kwargs)
    590 if self.is_encoder_decoder:

File /opt/conda/envs/trl-tutorial/lib/python3.8/site-packages/torch/nn/modules/module.py:1212, in Module._call_impl(self, *input, **kwargs)
   1209     bw_hook = hooks.BackwardHook(self, full_backward_hooks)
...
--> 767     input_shape = input_ids.size()
    768     input_ids = input_ids.view(-1, input_shape[-1])
    769     batch_size = input_ids.shape[0]

AttributeError: 'list' object has no attribute 'size'

This happens because input_ids becomes a list of individual input_id tensors. due to this part:
https://github.com/lvwerra/trl/blob/main/trl/trainer/ppo_trainer.py#L569-L576

@DaehanKim
Copy link
Author

I think #144 partially fix the problem, if we assume self.data_collator is always set to DataCollatorForLanguageModeling(self.tokenizer, mlm=False).

@lvwerra
Copy link
Member

lvwerra commented Feb 13, 2023

Hi @DaehanKim I think there's some confusion in the data_collator since there are actually two :)

  1. The data_collator passed to PPOTrainer will be used by self.dataloader. This is used to give you data in your training loop: https://github.com/lvwerra/trl/blob/ccd2641b51fb12424e46fff82b150e830f87800c/examples/sentiment/scripts/gpt2-sentiment.py#L136

  2. The second collator (self.data_collator) is used internally only and is used to batch the inputs passed to PPOTrainer.step in batched_forward_pass here: https://github.com/lvwerra/trl/blob/ccd2641b51fb12424e46fff82b150e830f87800c/trl/trainer/ppo_trainer.py#L555-L557

Indeed for the second collator you have no way overwriting it and we always assume CLM objective but for the first one the one you pass will take effect.

For your latest error: is it possible that you pass list of lists rather than list of torch.tensors as queries/responses in ppo_trainer.step(query_tensors, response_tensors, rewards)?

@DaehanKim
Copy link
Author

@lvwerra Thank you for correction! I didn't know these two are used in different ways.
My latest error occured since I forcefully overwrite self.data_collator to what was passed atPPOTrainer initialization. Everything works fine if I don't do that.

I think things are wrapped up : so closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants