Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: How to inference the quantized base model with LoRA weights ? #3580

Closed
thincal opened this issue Mar 23, 2024 · 3 comments
Closed
Labels
usage How to use vllm

Comments

@thincal
Copy link

thincal commented Mar 23, 2024

Your current environment

installed vLLM + GPU env.

How would you like to use vllm

I have done the QLoRA training with AWQ quantized base model, is that possible using the vLLM to load the AWQ base model and inference with the lora weights directly without merging ?

@thincal thincal added the usage How to use vllm label Mar 23, 2024
@jeejeelee
Copy link
Collaborator

The current vllm does not support QLoRA yet

@thincal
Copy link
Author

thincal commented Mar 23, 2024

is there any plan for the qlora inference ?

@jeejeelee
Copy link
Collaborator

@thincal perhaps this PR can help you #2828

@thincal thincal closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants