You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Proposing adding a new configuration parameter precompute_ref_batch_size to allow users to specify a different (likely larger) batch size specifically for the reference model precomputation phase. This would:
Speed up the precomputation phase by processing more examples per batch
Make better use of available GPU memory since no gradients need to be stored
Maintain backward compatibility by defaulting to current behavior if not specified
The change would affect:
DPOConfig: Add new optional parameter precompute_ref_batch_size
get_train_dataloader() and get_eval_dataloader(): Use the new batch size when precomputing reference probabilities
This is particularly useful for large-scale DPO training where the precomputation phase can be a significant bottleneck.
Motivation
Currently when precompute_ref_log_probs=True the DPOTrainer class uses the per_device_train_batch_size or per_device_eval_batch_size respectively during training or evaluation for generating the logprobs.
But for efficiency this batch size can be more than the used for training one since this step does not require gradient computation and storage in memory. Adding a configurable precompute_ref_batch_size parameter would allow users to optimize this preprocessing step by using larger batch sizes while maintaining memory efficiency.
Your contribution
Yes, I can help by submitting a PR following the contribution guidelines.
The text was updated successfully, but these errors were encountered:
Thanks for this suggestion @SwayamInSync!
Do you have any idea of the gain in speed?
If you've a working implementation, feel free to submit a PR so that we can test and discuss the code
Thanks for this suggestion @SwayamInSync! Do you have any idea of the gain in speed? If you've a working implementation, feel free to submit a PR so that we can test and discuss the code
Made a PR at #2426
From a quick test on my settings I can fit only a batch size upto 8 (get OOM elsewise) but with this new parameter, inference batch size can go upto 32 (instead of same as train, so pretty better than before I guess)
Feature request
Proposing adding a new configuration parameter
precompute_ref_batch_size
to allow users to specify a different (likely larger) batch size specifically for the reference model precomputation phase. This would:The change would affect:
precompute_ref_batch_size
get_train_dataloader()
andget_eval_dataloader()
: Use the new batch size when precomputing reference probabilitiesThis is particularly useful for large-scale DPO training where the precomputation phase can be a significant bottleneck.
Motivation
Currently when
precompute_ref_log_probs=True
theDPOTrainer
class uses theper_device_train_batch_size
orper_device_eval_batch_size
respectively during training or evaluation for generating the logprobs.But for efficiency this batch size can be more than the used for training one since this step does not require gradient computation and storage in memory. Adding a configurable
precompute_ref_batch_size
parameter would allow users to optimize this preprocessing step by using larger batch sizes while maintaining memory efficiency.Your contribution
Yes, I can help by submitting a PR following the contribution guidelines.
The text was updated successfully, but these errors were encountered: