Add quantized PyTorch models in model builder (#600)

### Description This PR adds support for building the final ONNX models that are optimized and quantized from already-quantized PyTorch models. ### Motivation and Context Quantization methods supported for already-quantized PyTorch models are [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ) and [AWQ](https://github.com/casper-hansen/AutoAWQ). Currently, only INT4 precision is supported.
microsoft · Jun 18, 2024 · c622cc1 · c622cc1
1 parent 25e135e
commit c622cc1
Show file tree

Hide file tree

Showing 3 changed files with 841 additions and 62 deletions.
diff --git a/src/python/py/models/README.md b/src/python/py/models/README.md
@@ -10,6 +10,7 @@ This folder contains the model builder for quickly creating optimized and quanti
   - [Original PyTorch Model from Hugging Face](#original-pytorch-model-from-hugging-face)
   - [Original PyTorch Model from Disk](#original-pytorch-model-from-disk)
   - [Customized or Finetuned PyTorch Model](#customized-or-finetuned-pytorch-model)
+  - [Quantized PyTorch Model](#quantized-pytorch-model)
   - [GGUF Model](#gguf-model)
   - [Extra Options](#extra-options)
     - [Config Only](#config-only)
@@ -82,6 +83,18 @@ python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o p
 python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files
 ```
 
+### Quantized PyTorch model
+
+This scenario is where your PyTorch model is one of the currently supported model architectures, has already been quantized to INT4 precision, and your model can be loaded in the Hugging Face style via [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) or [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).
+
+```
+# From wheel:
+python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p int4 -e execution_provider -c cache_dir_to_store_temp_files
+
+# From source:
+python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p int4 -e execution_provider -c cache_dir_to_store_temp_files
+```
+
 ### GGUF Model
 
 This scenario is where your float16/float32 GGUF model is already on disk.