Merging LoRA weights into a quantized model is not supported 嗯。你说的 #2795

sunjunlishi · 2024-03-12T01:42:36Z

Reminder

I have read the README and searched the existing issues.

Reproduction

python src/export_model.py
--model_name_or_path ../../../workspace/Llama/Qwen-14B-Chat-Int4
--adapter_name_or_path ../LLaMA-Factory-main-bk/path_to_sft14bint4_checkpoint/checkpoint-7000
--template default
--finetuning_type lora
--export_dir export_sft14bint4
--export_size 2
--export_legacy_format False
ValueError: Cannot merge adapters to a quantized model

1 确实 Cannot merge adapters to a quantized model
2 但是 python src/web_demo.py
--model_name_or_path ../../../workspace/Llama/Qwen-14B-Chat-Int4
--adapter_name_or_path path_to_sft14bint4_checkpoint/checkpoint-7000/
--template qwen
--finetuning_type lora
用你的web_demo.py 是可以加载量化模型和lora微调的模型的，我亲测可以。但是速度慢
3 所以作者你的整体流程是通的；用量化模型训练，还能加载他们。
但是现在就是解决速度慢的问题。
a 我在想你的这个思路，vllm 单独加载量化是可以，那用vllm 实现你的整体加载是不是也可以的
b 你的整体加载是不是等于说可以解决“ Cannot merge adapters to a quantized model”这个问题

为什么我纠结于此问题，作者您已经实现了量化模型的训练和加载，但是速度慢。
如果能够解决速度问题。代表70%的玩家可以用量化14B甚至更大的量化，用来训练可以加载。对实现落地还是有很大益处的。
我也将继续学习您的代码。

Expected behavior

No response

System Info

No response

Others

No response

sunjunlishi · 2024-03-12T03:16:46Z

LLaMA-Factory-main/src/llmtuner/model/adapter.py 这里的代码把微调模型和基础模型统一了起来，好神奇呀。

if adapter_to_resume is not None: # resume lora training
print('to resume....')
model = PeftModel.from_pretrained(model, adapter_to_resume, is_trainable=is_trainable)

sunjunlishi · 2024-03-12T08:32:17Z

@hiyouga 为啥用量化模型训练呀。因为大模型的量化模型用来训练，精度高，loss值降的快，并且占用显存少。
现在整个流程都是通的，训练，执行demo。唯一的就是速度稍慢。
14B-4bit量化比7B非量化用来训练，效果更好。

sunjunlishi · 2024-03-15T10:01:23Z

can vllm support loading lora?

vllm-project/vllm#2710

jvmncs commented on Feb 6
Have a look at this example: https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py

@simon-mo
Collaborator
simon-mo commented 2 weeks ago
And documentation here: https://docs.vllm.ai/en/latest/models/lora.html

Using LoRA adapters
This document shows you how to use LoRA adapters with vLLM on top of a base model. Adapters can be efficiently served on a per request basis with minimal overhead. First we download the adapter(s) and save them locally with

from huggingface_hub import snapshot_download

sql_lora_path = snapshot_download(repo_id="yard1/llama-2-7b-sql-lora-test")
Then we instantiate the base model and pass in the enable_lora=True flag:

from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(model="meta-llama/Llama-2-7b-hf", enable_lora=True)
We can now submit the prompts and call llm.generate with the lora_request parameter. The first parameter of LoRARequest is a human identifiable name, the second parameter is a globally unique ID for the adapter and the third parameter is the path to the LoRA adapter.

sampling_params = SamplingParams(
temperature=0,
max_tokens=256,
stop=["[/assistant]"]
)

prompts = [
"[user] Write a SQL query to answer the question based on the table schema.\n\n context: CREATE TABLE table_name_74 (icao VARCHAR, airport VARCHAR)\n\n question: Name the ICAO for lilongwe international airport [/user] [assistant]",
"[user] Write a SQL query to answer the question based on the table schema.\n\n context: CREATE TABLE table_name_11 (nationality VARCHAR, elector VARCHAR)\n\n question: When Anchero Pantaleone was the elector what is under nationality? [/user] [assistant]",
]

outputs = llm.generate(
prompts,
sampling_params,
lora_request=LoRARequest("sql_adapter", 1, sql_lora_path)
)

sunjunlishi · 2024-03-18T06:19:48Z

vllm-project/vllm#2828

vllm Support loras on quantized models

sunjunlishi · 2024-03-27T02:39:30Z

@hiyouga
Merging LoRA weights into a quantized model is not supported.
我看可以Qlora训练量化模型，那作者大拿，Qlora模型可不可以和量化的模型合并啊。
我就用Qlora训练，然后合并。

world2025 · 2024-05-05T04:17:23Z

但是qlora训练只能单卡

lebronjamesking · 2024-08-23T05:03:37Z

@hiyouga Merging LoRA weights into a quantized model is not supported. 我看可以Qlora训练量化模型，那作者大拿，Qlora模型可不可以和量化的模型合并啊。我就用Qlora训练，然后合并。

同问，gptq量化模型如何合并呢

hiyouga added the pending This problem is yet to be addressed label Mar 12, 2024

hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Mar 25, 2024

hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging LoRA weights into a quantized model is not supported 嗯。你说的 #2795

Merging LoRA weights into a quantized model is not supported 嗯。你说的 #2795

sunjunlishi commented Mar 12, 2024

sunjunlishi commented Mar 12, 2024

sunjunlishi commented Mar 12, 2024 •

edited

Loading

sunjunlishi commented Mar 15, 2024 •

edited

Loading

sunjunlishi commented Mar 18, 2024

sunjunlishi commented Mar 27, 2024

world2025 commented May 5, 2024

lebronjamesking commented Aug 23, 2024

Merging LoRA weights into a quantized model is not supported 嗯。你说的 #2795

Merging LoRA weights into a quantized model is not supported 嗯。你说的 #2795

Comments

sunjunlishi commented Mar 12, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

sunjunlishi commented Mar 12, 2024

sunjunlishi commented Mar 12, 2024 • edited Loading

sunjunlishi commented Mar 15, 2024 • edited Loading

can vllm support loading lora?

sunjunlishi commented Mar 18, 2024

sunjunlishi commented Mar 27, 2024

world2025 commented May 5, 2024

lebronjamesking commented Aug 23, 2024

sunjunlishi commented Mar 12, 2024 •

edited

Loading

sunjunlishi commented Mar 15, 2024 •

edited

Loading