Adding precompute batch size argument in DPOTrainer for reference model #2421

SwayamInSync · 2024-12-01T09:26:23Z

Feature request

Proposing adding a new configuration parameter precompute_ref_batch_size to allow users to specify a different (likely larger) batch size specifically for the reference model precomputation phase. This would:

Speed up the precomputation phase by processing more examples per batch
Make better use of available GPU memory since no gradients need to be stored
Maintain backward compatibility by defaulting to current behavior if not specified

The change would affect:

DPOConfig: Add new optional parameter precompute_ref_batch_size
get_train_dataloader() and get_eval_dataloader(): Use the new batch size when precomputing reference probabilities

This is particularly useful for large-scale DPO training where the precomputation phase can be a significant bottleneck.

Motivation

Currently when precompute_ref_log_probs=True the DPOTrainer class uses the per_device_train_batch_size or per_device_eval_batch_size respectively during training or evaluation for generating the logprobs.
But for efficiency this batch size can be more than the used for training one since this step does not require gradient computation and storage in memory. Adding a configurable precompute_ref_batch_size parameter would allow users to optimize this preprocessing step by using larger batch sizes while maintaining memory efficiency.

Your contribution

Yes, I can help by submitting a PR following the contribution guidelines.

The text was updated successfully, but these errors were encountered:

qgallouedec · 2024-12-01T09:34:31Z

Thanks for this suggestion @SwayamInSync!
Do you have any idea of the gain in speed?
If you've a working implementation, feel free to submit a PR so that we can test and discuss the code

SwayamInSync · 2024-12-02T14:20:00Z

Thanks for this suggestion @SwayamInSync! Do you have any idea of the gain in speed? If you've a working implementation, feel free to submit a PR so that we can test and discuss the code

Made a PR at #2426
From a quick test on my settings I can fit only a batch size upto 8 (get OOM elsewise) but with this new parameter, inference batch size can go upto 32 (instead of same as train, so pretty better than before I guess)

qgallouedec added ✨ enhancement New feature or request 🏋 DPO Related to DPO labels Dec 1, 2024

SwayamInSync mentioned this issue Dec 2, 2024

🧑‍🍳 Add precompute batch size argument in DPOTrainer for reference model #2426

Merged

5 tasks

qgallouedec closed this as completed in #2426 Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding precompute batch size argument in DPOTrainer for reference model #2421

Adding precompute batch size argument in DPOTrainer for reference model #2421

SwayamInSync commented Dec 1, 2024

qgallouedec commented Dec 1, 2024

SwayamInSync commented Dec 2, 2024

Adding precompute batch size argument in DPOTrainer for reference model #2421

Adding precompute batch size argument in DPOTrainer for reference model #2421

Comments

SwayamInSync commented Dec 1, 2024

Feature request

Motivation

Your contribution

qgallouedec commented Dec 1, 2024

SwayamInSync commented Dec 2, 2024