data_collator is not assigned in PPOTrainer #143

DaehanKim · 2023-02-10T11:10:30Z

https://github.com/lvwerra/trl/blob/main/trl/trainer/ppo_trainer.py#L217
Here dataloader is defined using passed data_collator function.

https://github.com/lvwerra/trl/blob/main/trl/trainer/ppo_trainer.py#L231
but self.data_collator is always set to DataCollatorForLanguageModeling(self.tokenizer, mlm=False) regardless of passed data_collator argument.

I think it should be something like

        if data_collator is None:
            self.data_collator = DataCollatorForLanguageModeling(self.tokenizer, mlm=False)
        else:
            self.data_collator = data_collator

The text was updated successfully, but these errors were encountered:

DaehanKim · 2023-02-10T11:20:04Z

I found this because sentiment-control example ipynb didn't run in my environment.

I use A6000 GPU and only added os.environ['CUDA_VISIBLE_DEVICES']='5' at the start of the script. And got this stacktrace :

RuntimeError                              Traceback (most recent call last)
Cell In[15], line 24
     22 #### Run PPO training 
     23 t = time.time()
---> 24 stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
     26 for cs in ctrl_str:
     27     key = 'env/reward_'+cs.strip('[]')

File /workspace/trl/trl/trainer/ppo_trainer.py:441, in PPOTrainer.step(self, queries, responses, scores)
    437 t0 = time.time()
    439 t = time.time()
--> 441 logprobs, ref_logprobs, values = self.batched_forward_pass(queries, responses)
    442 timing["time/ppo/forward_pass"] = time.time() - t
    444 t = time.time()

File /workspace/trl/trl/trainer/ppo_trainer.py:587, in PPOTrainer.batched_forward_pass(self, queries, responses)
    584 response_batch = responses[i * fbs : (i + 1) * fbs]
    586 with torch.no_grad():
--> 587     logits, _, v = self.model(**input_kwargs)
    588     ref_logits, _, _ = self.ref_model(**input_kwargs)
    590 if self.is_encoder_decoder:

File /opt/conda/envs/trl-tutorial/lib/python3.8/site-packages/torch/nn/modules/module.py:1212, in Module._call_impl(self, *input, **kwargs)
   1209     bw_hook = hooks.BackwardHook(self, full_backward_hooks)
...
   2208     # remove once script supports set_grad_enabled
   2209     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

I saw all inputs(query, responses) are properly in cuda devices. but in forward batching process, they seems to be mapped to CPU. Maybe it's related to recent forward batching update #139 (I'm not sure). I'll update when I figure out something.

younesbelkada · 2023-02-10T11:25:29Z

Hi @DaehanKim ,

Thanks for the report! I think this makes sense and should be replaced as suggested ;)
Do you know if adding your patch fixes your issue?
Also I would recommend to not add CUDA_VISIBLE_DEVICES as device assignement should be taken care by accelerate

DaehanKim · 2023-02-10T11:47:01Z

Unfortunately, fixing collator alone didn't solve the problem.
It gives another error stacktrace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[33], line 24
     22 #### Run PPO training 
     23 t = time.time()
---> 24 stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
     26 for cs in ctrl_str:
     27     key = 'env/reward_'+cs.strip('[]')

File /workspace/trl/trl/trainer/ppo_trainer.py:441, in PPOTrainer.step(self, queries, responses, scores)
    437 t0 = time.time()
    439 t = time.time()
--> 441 logprobs, ref_logprobs, values = self.batched_forward_pass(queries, responses)
    442 timing["time/ppo/forward_pass"] = time.time() - t
    444 t = time.time()

File /workspace/trl/trl/trainer/ppo_trainer.py:587, in PPOTrainer.batched_forward_pass(self, queries, responses)
    584 response_batch = responses[i * fbs : (i + 1) * fbs]
    586 with torch.no_grad():
--> 587     logits, _, v = self.model(**input_kwargs)
    588     ref_logits, _, _ = self.ref_model(**input_kwargs)
    590 if self.is_encoder_decoder:

File /opt/conda/envs/trl-tutorial/lib/python3.8/site-packages/torch/nn/modules/module.py:1212, in Module._call_impl(self, *input, **kwargs)
   1209     bw_hook = hooks.BackwardHook(self, full_backward_hooks)
...
--> 767     input_shape = input_ids.size()
    768     input_ids = input_ids.view(-1, input_shape[-1])
    769     batch_size = input_ids.shape[0]

AttributeError: 'list' object has no attribute 'size'

This happens because input_ids becomes a list of individual input_id tensors. due to this part:
https://github.com/lvwerra/trl/blob/main/trl/trainer/ppo_trainer.py#L569-L576

DaehanKim · 2023-02-12T20:35:10Z

I think #144 partially fix the problem, if we assume self.data_collator is always set to DataCollatorForLanguageModeling(self.tokenizer, mlm=False).

lvwerra · 2023-02-13T10:46:29Z

Hi @DaehanKim I think there's some confusion in the data_collator since there are actually two :)

The data_collator passed to PPOTrainer will be used by self.dataloader. This is used to give you data in your training loop: https://github.com/lvwerra/trl/blob/ccd2641b51fb12424e46fff82b150e830f87800c/examples/sentiment/scripts/gpt2-sentiment.py#L136
The second collator (self.data_collator) is used internally only and is used to batch the inputs passed to PPOTrainer.step in batched_forward_pass here: https://github.com/lvwerra/trl/blob/ccd2641b51fb12424e46fff82b150e830f87800c/trl/trainer/ppo_trainer.py#L555-L557

Indeed for the second collator you have no way overwriting it and we always assume CLM objective but for the first one the one you pass will take effect.

For your latest error: is it possible that you pass list of lists rather than list of torch.tensors as queries/responses in ppo_trainer.step(query_tensors, response_tensors, rewards)?

DaehanKim · 2023-02-13T11:33:52Z

@lvwerra Thank you for correction! I didn't know these two are used in different ways.
My latest error occured since I forcefully overwrite self.data_collator to what was passed atPPOTrainer initialization. Everything works fine if I don't do that.

I think things are wrapped up : so closing the issue.

DaehanKim closed this as completed Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_collator is not assigned in PPOTrainer #143

data_collator is not assigned in PPOTrainer #143

DaehanKim commented Feb 10, 2023 •

edited

Loading

DaehanKim commented Feb 10, 2023

younesbelkada commented Feb 10, 2023

DaehanKim commented Feb 10, 2023 •

edited

Loading

DaehanKim commented Feb 12, 2023

lvwerra commented Feb 13, 2023

DaehanKim commented Feb 13, 2023

data_collator is not assigned in PPOTrainer #143

data_collator is not assigned in PPOTrainer #143

Comments

DaehanKim commented Feb 10, 2023 • edited Loading

DaehanKim commented Feb 10, 2023

younesbelkada commented Feb 10, 2023

DaehanKim commented Feb 10, 2023 • edited Loading

DaehanKim commented Feb 12, 2023

lvwerra commented Feb 13, 2023

DaehanKim commented Feb 13, 2023

DaehanKim commented Feb 10, 2023 •

edited

Loading

DaehanKim commented Feb 10, 2023 •

edited

Loading