Skip to content

Commit

Permalink
comments
Browse files Browse the repository at this point in the history
  • Loading branch information
jerryzh168 committed Nov 13, 2024
1 parent 1c26fbc commit f10990a
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 80 deletions.
76 changes: 3 additions & 73 deletions docs/source/contributor_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ First we want to lay out the torchao stack::

Quantization Algorithms/Flows: weight only/dynamic/static quantization, hqq, awq, gptq etc.
---------------------------------------------------------------------------------------------
Quantized Tensors (derived dtypes): AffineQuantizedTensor, CoodbookQuantizedTensor
Quantized Tensors (derived dtypes): AffineQuantizedTensor, CodebookQuantizedTensor
---------------------------------------------------------------------------------------------
Quantization Primitive Ops/Efficient Kernels: matmul, quantize, dequantize
---------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -209,6 +209,8 @@ Quantized Training
******************
Similar to low bit optimizers, we have quantized training prototype in `main/torchao/prototype/quantized_training <https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training>`__, and we could extend AffineQuantizedTensor to support training as well, initial enablement is in progress, but there will be a lot of follow up work needed including making it work for different kernels etc.

You can also checkout the tutorial for `Quantized Training <https://github.com/pytorch/ao/blob/main/tutorials/developer_api_guide/my_trainable_tensor_subclass.py>`__ that talks about how to make a dtype tensor subclass trainable.

Case Study: How int4 weight only quantization works in torchao?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To connect everything together, here is a more detailed walk through for how int4 weight only quantization is implemented in torchao.
Expand Down Expand Up @@ -600,75 +602,3 @@ Note: llama model (llama2/llama3) is our representative model for memory bound m
Please checkout the ``--help`` option for each of the script to understand the supported options, e.g. you can use ``--profile=profile_path`` to get the chrome trace of the run to understand detailed `chrome trace <https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-tracing-functionality>`__.

Please let us know if there are any new important models that makes sense to be added to torchao model benchmark/eval folder.








































































7 changes: 0 additions & 7 deletions torchao/quantization/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,6 @@
"DEFAULT_AUTOQUANT_CLASS_LIST",
"DEFAULT_INT4_AUTOQUANT_CLASS_LIST",
"OTHER_AUTOQUANT_CLASS_LIST",

# top level API - manual
"quantize_",
"int8_dynamic_activation_int4_weight",
Expand All @@ -103,7 +102,6 @@
"float8_static_activation_float8_weight",
"uintx_weight_only",
"fpx_weight_only",

# smooth quant - subject to change
"swap_conv2d_1x1_to_linear",
"get_scale",
Expand All @@ -113,13 +111,11 @@
"smooth_fq_linear_to_inference",
"set_smooth_fq_attribute",
"compute_error",

# building blocks
"to_linear_activation_quantized",
"to_weight_tensor_with_linear_activation_scale_metadata",
"AffineQuantizedMinMaxObserver",
"AffineQuantizedObserverBase",

# quant primitive ops
"choose_qparams_affine",
"choose_qparams_affine_with_min_max",
Expand All @@ -131,11 +127,9 @@
"choose_qparams_and_quantize_affine_hqq",
"fake_quantize_affine",
"fake_quantize_affine_cachemask",

# operators/kernels
"safe_int_mm",
"int_scaled_matmul",

# dataclasses and types
"MappingType",
"ZeroPointDomain",
Expand All @@ -145,7 +139,6 @@
"PerGroup",
"PerRow",
"PerToken",

"LinearActivationQuantizedTensor",
"Int4WeightOnlyGPTQQuantizer",
"Int4WeightOnlyQuantizer",
Expand Down

0 comments on commit f10990a

Please sign in to comment.