Apply quantization during DPO QLoRA #115

lewtun · 2024-02-02T21:40:48Z

This PR fixes a bug where we weren't quantising the base model with QLoRA during DPO and thus were actually doing LoRA instead.

Now we first quantise the base model in 4bit and load the SFT adapter (which later gets merged within the DPOTrainer). Although this isn't as memory efficient as loading two adapters in a single base model (example), it does provide the flexibility to customise the QLoRA config.

I find that with these settings MT-Bench yields a score of 7.212, which is ~0.1 lower than zephyr-7b-beta and could likely be improved with a bit more tuning of hparams.

lewtun · 2024-02-02T21:41:32Z

recipes/zephyr-7b-beta/dpo/config_qlora.yaml

@@ -1,12 +1,12 @@
 # Model arguments
 model_name_or_path: alignment-handbook/zephyr-7b-sft-qlora
-torch_dtype: float16
+torch_dtype: bfloat16


I turns out that using bfloat16 makes a non-trivial difference to downstream perf! cc @nathan-az :)

HuggingFaceDocBuilderDev · 2024-02-02T21:44:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun · 2024-02-02T21:48:02Z

scripts/run_dpo.py

        model_kwargs = dict(
            revision=model_args.base_model_revision,
            trust_remote_code=model_args.trust_remote_code,
            use_flash_attention_2=model_args.use_flash_attention_2,
            torch_dtype=torch_dtype,
            use_cache=False if training_args.gradient_checkpointing else True,
+            device_map=get_kbit_device_map() if quantization_config is not None else None,
+            quantization_config=quantization_config,


Note that this approach of quantizing and then merging in DPOTrainer is what Tim Dettmers suggests: https://twitter.com/Tim_Dettmers/status/1694654191325573456

lewtun · 2024-02-05T08:56:34Z

recipes/zephyr-7b-beta/dpo/config_qlora.yaml


 # LoRA arguments
 use_peft: true
 load_in_4bit: true
-lora_r: 16
-lora_alpha: 16
+lora_r: 128


Tuning these hparams was necessary to get close to zephyr-7b-beta perf on MT-Bench

edbeeching

LGTM

lewtun added 2 commits February 2, 2024 14:59

Add QLoRA fix

4c2f315

Update script

220e4c6

lewtun requested a review from edbeeching February 2, 2024 21:40

lewtun commented Feb 2, 2024

View reviewed changes

lewtun commented Feb 5, 2024

View reviewed changes

edbeeching approved these changes Feb 5, 2024

View reviewed changes

lewtun merged commit 87cc800 into main Feb 5, 2024
3 checks passed

lewtun deleted the fix-qlora branch February 5, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply quantization during DPO QLoRA #115

Apply quantization during DPO QLoRA #115

lewtun commented Feb 2, 2024

lewtun Feb 2, 2024

HuggingFaceDocBuilderDev commented Feb 2, 2024

lewtun Feb 2, 2024

lewtun Feb 5, 2024

edbeeching left a comment

Apply quantization during DPO QLoRA #115

Apply quantization during DPO QLoRA #115

Conversation

lewtun commented Feb 2, 2024

lewtun Feb 2, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 2, 2024

lewtun Feb 2, 2024

Choose a reason for hiding this comment

lewtun Feb 5, 2024

Choose a reason for hiding this comment

edbeeching left a comment

Choose a reason for hiding this comment