-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SLIM][AWQ] AWQ GEMM support (#1362)
Previously #1229, we only supported loading INT4-GEMV-awq weights quantized by [the original repo](https://github.com/mit-han-lab/llm-awq#usage). This pr supports loading INT4-GEMM format weights quantized by [AutoAWQ](https://github.com/casper-hansen/AutoAWQ). Here is an example that loads from [TheBloke/Llama-2-7b-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ) and runs the benchmark. ```bash MODEL_PATH=/opt/models/llama-2/llama-2-7b-chat-hf/ OUTPUT_PATH=./dist/new-llama-awq/ QUANT=q4f16_autoawq python -m mlc_chat gen_config $MODEL_PATH/config.json --quantization $QUANT -o $OUTPUT_PATH --conv-template llama-2 python -m mlc_chat compile $OUTPUT_PATH -o $OUTPUT_PATH/llama.so python -m mlc_chat convert_weight $MODEL_PATH --quantization $QUANT -o $OUTPUT_PATH --source-format awq --source ../Llama-2-7B-Chat-AWQ/model.safetensors python -m mlc_chat.cli.benchmark --model $OUTPUT_PATH --model-lib $OUTPUT_PATH/llama.so --device "cuda:0" --prompt "What is the meaning of life?" --generate-length 256 ``` Note: Q: What is difference between INT4-GEMV and INT4-GEMM? A: https://github.com/casper-hansen/AutoAWQ#int4-gemm-vs-int4-gemv-vs-fp16
- Loading branch information
1 parent
76a77bc
commit 26d348a
Showing
4 changed files
with
28 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters