[Usage]: How to inference the quantized base model with LoRA weights ? #3580

thincal · 2024-03-23T08:57:06Z

Your current environment

installed vLLM + GPU env.

How would you like to use vllm

I have done the QLoRA training with AWQ quantized base model, is that possible using the vLLM to load the AWQ base model and inference with the lora weights directly without merging ?

jeejeelee · 2024-03-23T09:20:42Z

The current vllm does not support QLoRA yet

thincal · 2024-03-23T14:36:02Z

is there any plan for the qlora inference ?

jeejeelee · 2024-03-23T16:17:30Z

@thincal perhaps this PR can help you #2828

thincal added the usage How to use vllm label Mar 23, 2024

thincal closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: How to inference the quantized base model with LoRA weights ? #3580

[Usage]: How to inference the quantized base model with LoRA weights ? #3580

thincal commented Mar 23, 2024

jeejeelee commented Mar 23, 2024

thincal commented Mar 23, 2024

jeejeelee commented Mar 23, 2024

[Usage]: How to inference the quantized base model with LoRA weights ? #3580

[Usage]: How to inference the quantized base model with LoRA weights ? #3580

Comments

thincal commented Mar 23, 2024

Your current environment

How would you like to use vllm

jeejeelee commented Mar 23, 2024

thincal commented Mar 23, 2024

jeejeelee commented Mar 23, 2024