Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lora seems to be invalid when using vsft_llava.py #1786

Closed
shijian2001 opened this issue Jun 28, 2024 · 5 comments · Fixed by #1865
Closed

Lora seems to be invalid when using vsft_llava.py #1786

shijian2001 opened this issue Jun 28, 2024 · 5 comments · Fixed by #1865
Labels
👁️ VLM Related to Visual Language Models

Comments

@shijian2001
Copy link

shijian2001 commented Jun 28, 2024

I used two A100 40g to fine-tune llava-7b with lora. When I used the lora vsft command you provided, I found that the error CUDA out of memory still appeared, so it seems that lora did not work.
My command is as follows, in which I have modified the dataset path:

python examples/scripts/vsft_llava.py \
    --dataset_name="../subset/aug_llava_instruct_mix_vsft" \    
    --model_name_or_path="llava-hf/llava-1.5-7b-hf" \
    --report_to="wandb" \
    --learning_rate=1.4e-5 \
    --per_device_train_batch_size=8 \
    --gradient_accumulation_steps=1 \
    --output_dir="../logs/checkpoints/aug-vsft-llava-1.5-7b-hf" \
    --logging_steps=5 \
    --num_train_epochs=1 \
    --push_to_hub \
    --gradient_checkpointing \
    --remove_unused_columns=False \
    --torch_dtype=float16 \
    --fp16=True \ 
    --use_peft=True \
    --lora_r=64 \
    --lora_alpha=16 \
    --lora_target_modules="all-linear"
@shijian2001 shijian2001 changed the title Lora seems to be invalid when using vsft_llava.pyLora Lora seems to be invalid when using vsft_llava.py Jun 28, 2024
@kashif
Copy link
Collaborator

kashif commented Jun 28, 2024

cc @qgallouedec

@qgallouedec
Copy link
Member

Thanks for reporting @shijian2001. I've encountered this error too. I will provide a fix asap. Feel free to open a PR if you manage to fix it.

@shijian2001
Copy link
Author

shijian2001 commented Jun 29, 2024

@qgallouedec Sorry, I haven't located the specific bug yet.
After debugging, I think there is no problem with the construction of the peft model. After forward, my two 40g A100 each occupied about 15g of vram (total 30g), and when backward, the vram was not enough

@qgallouedec
Copy link
Member

qgallouedec commented Jul 3, 2024

@shijian2001 can you double-check your command? When running it I get another error:

python examples/scripts/vsft_llava.py \
    --dataset_name="HuggingFaceH4/llava-instruct-mix-vsft" \
    --model_name_or_path="llava-hf/llava-1.5-7b-hf" \
    --per_device_train_batch_size=8 \
    --gradient_accumulation_steps=1 \
    --output_dir="../logs/checkpoints/aug-vsft-llava-1.5-7b-hf" \
    --gradient_checkpointing \
    --remove_unused_columns=False \
    --torch_dtype=float16 \
    --fp16=True \
    --use_peft=True \
    --lora_r=64 \
    --lora_alpha=16 \
    --lora_target_modules="all-linear"
Traceback (most recent call last):
  File "/fsx/qgallouedec/trl-2/examples/scripts/vsft_llava.py", line 206, in <module>
    trainer.train()
  File "/fsx/qgallouedec/trl-2/trl/trainer/sft_trainer.py", line 440, in train
    output = super().train(*args, **kwargs)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
    return inner_training_loop(
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 2314, in _inner_training_loop
    _grad_norm = self.accelerator.clip_grad_norm_(
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/accelerate/accelerator.py", line 2269, in clip_grad_norm_
    self.unscale_gradients()
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/accelerate/accelerator.py", line 2219, in unscale_gradients
    self.scaler.unscale_(opt)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")

Related: #1785 (comment)

Removing --fp16=True solves the issue:

python examples/scripts/vsft_llava.py \
    --dataset_name="HuggingFaceH4/llava-instruct-mix-vsft" \
    --model_name_or_path="llava-hf/llava-1.5-7b-hf" \
    --per_device_train_batch_size=8 \
    --gradient_accumulation_steps=1 \
    --output_dir="../logs/checkpoints/aug-vsft-llava-1.5-7b-hf" \
    --gradient_checkpointing \
    --remove_unused_columns=False \
    --torch_dtype=float16 \
    --use_peft=True \
    --lora_r=64 \
    --lora_alpha=16 \
    --lora_target_modules="all-linear"

It requires around 48 GB of VRAM. If you get an OOM error, trying reducing the batch size.

@qgallouedec qgallouedec added the 👁️ VLM Related to Visual Language Models label Jul 4, 2024
@shijian2001
Copy link
Author

@qgallouedec Thank you! However, when I followed your command and tried to set per_device_train_batch_size to 1, I still get an OOM error on the 40g A100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👁️ VLM Related to Visual Language Models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants