comments

pytorch · Nov 13, 2024 · f10990a · f10990a
1 parent 1c26fbc
commit f10990a
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 80 deletions.
diff --git a/docs/source/contributor_guide.rst b/docs/source/contributor_guide.rst
@@ -19,7 +19,7 @@ First we want to lay out the torchao stack::
 
   Quantization Algorithms/Flows: weight only/dynamic/static quantization, hqq, awq, gptq etc.
   ---------------------------------------------------------------------------------------------
-          Quantized Tensors (derived dtypes): AffineQuantizedTensor, CoodbookQuantizedTensor
+          Quantized Tensors (derived dtypes): AffineQuantizedTensor, CodebookQuantizedTensor
   ---------------------------------------------------------------------------------------------
     Quantization Primitive Ops/Efficient Kernels: matmul, quantize, dequantize
   ---------------------------------------------------------------------------------------------
@@ -209,6 +209,8 @@ Quantized Training
 ******************
 Similar to low bit optimizers, we have quantized training prototype in `main/torchao/prototype/quantized_training <https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training>`__, and we could extend AffineQuantizedTensor to support training as well, initial enablement is in progress, but there will be a lot of follow up work needed including making it work for different kernels etc.
 
+You can also checkout the tutorial for `Quantized Training <https://github.com/pytorch/ao/blob/main/tutorials/developer_api_guide/my_trainable_tensor_subclass.py>`__ that talks about how to make a dtype tensor subclass trainable.
+
 Case Study: How int4 weight only quantization works in torchao?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 To connect everything together, here is a more detailed walk through for how int4 weight only quantization is implemented in torchao.
@@ -600,75 +602,3 @@ Note: llama model (llama2/llama3) is our representative model for memory bound m
 Please checkout the ``--help`` option for each of the script to understand the supported options, e.g. you can use ``--profile=profile_path`` to get the chrome trace of the run to understand detailed `chrome trace <https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-tracing-functionality>`__.
 
 Please let us know if there are any new important models that makes sense to be added to torchao model benchmark/eval folder.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/torchao/quantization/__init__.py b/torchao/quantization/__init__.py
@@ -90,7 +90,6 @@
     "DEFAULT_AUTOQUANT_CLASS_LIST",
     "DEFAULT_INT4_AUTOQUANT_CLASS_LIST",
     "OTHER_AUTOQUANT_CLASS_LIST",
-
     # top level API - manual
     "quantize_",
     "int8_dynamic_activation_int4_weight",
@@ -103,7 +102,6 @@
     "float8_static_activation_float8_weight",
     "uintx_weight_only",
     "fpx_weight_only",
-
     # smooth quant - subject to change
     "swap_conv2d_1x1_to_linear",
     "get_scale",
@@ -113,13 +111,11 @@
     "smooth_fq_linear_to_inference",
     "set_smooth_fq_attribute",
     "compute_error",
-
     # building blocks
     "to_linear_activation_quantized",
     "to_weight_tensor_with_linear_activation_scale_metadata",
     "AffineQuantizedMinMaxObserver",
     "AffineQuantizedObserverBase",
-
     # quant primitive ops
     "choose_qparams_affine",
     "choose_qparams_affine_with_min_max",
@@ -131,11 +127,9 @@
     "choose_qparams_and_quantize_affine_hqq",
     "fake_quantize_affine",
     "fake_quantize_affine_cachemask",
-
     # operators/kernels
     "safe_int_mm",
     "int_scaled_matmul",
-
     # dataclasses and types
     "MappingType",
     "ZeroPointDomain",
@@ -145,7 +139,6 @@
     "PerGroup",
     "PerRow",
     "PerToken",
-
     "LinearActivationQuantizedTensor",
     "Int4WeightOnlyGPTQQuantizer",
     "Int4WeightOnlyQuantizer",