Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update quant overview for 021 #3857

Merged
merged 1 commit into from
Jun 5, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/source/quantization-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,25 @@ Backend developers will need to implement their own ``Quantizer`` to express how
Modeling users will use the ``Quantizer`` specific to their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.

For an example quantization flow with ``XNPACKQuantizer``, more documentation and tutorials, please see ``Performing Quantization`` section in [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).

## Source Quantization: Int8DynActInt4WeightQuantizer

In addition to export based quantization (described above), ExecuTorch wants to highlight source based quantizations, accomplished via [torchao](https://github.com/pytorch/ao). Unlike export based quantization, source based quantization directly modifies the model prior to export. One specific example is `Int8DynActInt4WeightQuantizer`.

This scheme represents 4-bit weight quantization with 8-bit dynamic quantization of activation during inference.

Imported with ``from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer``, this class uses a quantization instance constructed with a specified dtype precision and groupsize, to mutate a provided ``nn.Module``.

```
# Source Quant
from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer

model = Int8DynActInt4WeightQuantizer(precision=torch_dtype, groupsize=group_size).quantize(model)

# Export to ExecuTorch
from executorch.exir import to_edge
from torch.export import export

exported_model = export(model, ...)
et_program = to_edge(exported_model, ...).to_executorch(...)
```
Loading