[DPOTrainer
] Fix peft + DPO + bf16 if one uses generate_during_eval
or pre-computed logits
#1203
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Fixes #1202 which is a regression
#1143 introduced
peft_module_casting_to_bf16
forDPOTrainer
which worked fine whengenerate_during_eval
is set toFalse
.peft_module_casting_to_bf16
will cast the LayerNorm layers in fp32 for stability purposes for smoother training, in case one uses peft + 4-bit quantization (QLoRA).If one uses
bf16=True
in theTrainingArguments
,Trainer
will automatcially compute the loss under the torch.cuda.amp.autocast regime, making the forward pass possible to work without any dtype mismatch issueHowever, we need to make sure all other methods inside
DPOTrainer
that call the model's forward pass will also use that context manager to make sure there is no dtype mismatchsince this is a regression, will publish a patch release after merging this pr
cc @kashif @pacman100 @lvwerra