Skip to content

Commit

Permalink
Add quantized PyTorch models in model builder (#600)
Browse files Browse the repository at this point in the history
### Description

This PR adds support for building the final ONNX models that are
optimized and quantized from already-quantized PyTorch models.

### Motivation and Context

Quantization methods supported for already-quantized PyTorch models are
[GPTQ](https://github.com/AutoGPTQ/AutoGPTQ) and
[AWQ](https://github.com/casper-hansen/AutoAWQ). Currently, only INT4
precision is supported.
  • Loading branch information
kunal-vaishnavi authored Jun 18, 2024
1 parent 25e135e commit c622cc1
Show file tree
Hide file tree
Showing 3 changed files with 841 additions and 62 deletions.
13 changes: 13 additions & 0 deletions src/python/py/models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This folder contains the model builder for quickly creating optimized and quanti
- [Original PyTorch Model from Hugging Face](#original-pytorch-model-from-hugging-face)
- [Original PyTorch Model from Disk](#original-pytorch-model-from-disk)
- [Customized or Finetuned PyTorch Model](#customized-or-finetuned-pytorch-model)
- [Quantized PyTorch Model](#quantized-pytorch-model)
- [GGUF Model](#gguf-model)
- [Extra Options](#extra-options)
- [Config Only](#config-only)
Expand Down Expand Up @@ -82,6 +83,18 @@ python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o p
python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files
```

### Quantized PyTorch model

This scenario is where your PyTorch model is one of the currently supported model architectures, has already been quantized to INT4 precision, and your model can be loaded in the Hugging Face style via [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) or [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).

```
# From wheel:
python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p int4 -e execution_provider -c cache_dir_to_store_temp_files
# From source:
python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p int4 -e execution_provider -c cache_dir_to_store_temp_files
```

### GGUF Model

This scenario is where your float16/float32 GGUF model is already on disk.
Expand Down
Loading

0 comments on commit c622cc1

Please sign in to comment.