-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging LoRA weights into a quantized model is not supported 嗯。你说的 #2795
Comments
LLaMA-Factory-main/src/llmtuner/model/adapter.py 这里的代码把微调模型和基础模型统一了起来,好神奇呀。 if adapter_to_resume is not None: # resume lora training |
@hiyouga 为啥用量化模型训练呀。因为大模型的量化模型用来训练,精度高,loss值降的快,并且占用显存少。 |
can vllm support loading lora?jvmncs commented on Feb 6 @simon-mo Using LoRA adapters from huggingface_hub import snapshot_download sql_lora_path = snapshot_download(repo_id="yard1/llama-2-7b-sql-lora-test") from vllm import LLM, SamplingParams llm = LLM(model="meta-llama/Llama-2-7b-hf", enable_lora=True) sampling_params = SamplingParams( prompts = [ outputs = llm.generate( |
vllm Support loras on quantized models |
@hiyouga |
但是qlora训练只能单卡 |
同问,gptq量化模型如何合并呢 |
Reminder
Reproduction
python src/export_model.py
--model_name_or_path ../../../workspace/Llama/Qwen-14B-Chat-Int4
--adapter_name_or_path ../LLaMA-Factory-main-bk/path_to_sft14bint4_checkpoint/checkpoint-7000
--template default
--finetuning_type lora
--export_dir export_sft14bint4
--export_size 2
--export_legacy_format False
ValueError: Cannot merge adapters to a quantized model
1 确实 Cannot merge adapters to a quantized model
2 但是 python src/web_demo.py
--model_name_or_path ../../../workspace/Llama/Qwen-14B-Chat-Int4
--adapter_name_or_path path_to_sft14bint4_checkpoint/checkpoint-7000/
--template qwen
--finetuning_type lora
用你的web_demo.py 是可以加载量化模型和lora微调的模型的,我亲测可以。但是速度慢
3 所以作者你的整体流程是通的; 用量化模型训练,还能加载他们。
但是现在就是解决速度慢的问题。
a 我在想你的这个思路,vllm 单独加载量化是可以,那 用vllm 实现你的整体加载是不是也可以的
b 你的整体加载是不是等于说可以解决“ Cannot merge adapters to a quantized model”这个问题
为什么我纠结于此问题,作者您已经实现了量化模型的训练和加载,但是速度慢。
如果能够解决速度问题。代表70%的玩家可以用量化14B甚至更大的量化,用来训练可以加载。对实现落地还是有很大益处的。
我也将继续学习您的代码。
Expected behavior
No response
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: