diff --git a/docs/.buildinfo b/docs/.buildinfo
index eafc4f405..ee11fdd63 100644
--- a/docs/.buildinfo
+++ b/docs/.buildinfo
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 463b5d411b812fb296a8f7bff970d1cf
+config: e8534f6a2f0b425ce862dbeb0800af00
tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/_sources/api/api_docs/classes/GradientPTQConfig.rst.txt b/docs/_sources/api/api_docs/classes/GradientPTQConfig.rst.txt
index c14ec1e7c..711c4f2d1 100644
--- a/docs/_sources/api/api_docs/classes/GradientPTQConfig.rst.txt
+++ b/docs/_sources/api/api_docs/classes/GradientPTQConfig.rst.txt
@@ -8,7 +8,7 @@ GradientPTQConfig Class
=================================
-**The following API can be used to create a GradientPTQConfig instance which can be used for post training quantization using knowledge distillation from a teacher (float Keras model) to a student (the quantized Keras model)**
+**The following API can be used to create a GradientPTQConfig instance which can be used for post training quantization using knowledge distillation from a teacher (float model) to a student (the quantized model)**
.. autoclass:: model_compression_toolkit.gptq.GradientPTQConfig
:members:
@@ -30,3 +30,22 @@ RoundingType
.. autoclass:: model_compression_toolkit.gptq.RoundingType
:members:
+
+
+=====================================
+GradualActivationQuantizationConfig
+=====================================
+
+**The following API can be used to configure the gradual activation quantization when using GPTQ.**
+
+.. autoclass:: model_compression_toolkit.gptq.GradualActivationQuantizationConfig
+ :members:
+
+
+=====================================
+QFractionLinearAnnealingConfig
+=====================================
+
+.. autoclass:: model_compression_toolkit.gptq.QFractionLinearAnnealingConfig
+ :members:
+
diff --git a/docs/_sources/api/api_docs/index.rst.txt b/docs/_sources/api/api_docs/index.rst.txt
index 0c4433163..cd78a4b5c 100644
--- a/docs/_sources/api/api_docs/index.rst.txt
+++ b/docs/_sources/api/api_docs/index.rst.txt
@@ -106,9 +106,9 @@ keras_load_quantized_model
- :ref:`keras_load_quantized_model`: A function to load a quantized keras model.
-target_platform
-================
-- :ref:`target_platform`: Module to create and model hardware-related settings to optimize the model according to, by the hardware the optimized model will use during inference.
+target_platform_capabilities
+==============================
+- :ref:`target_platform_capabilities`: Module to create and model hardware-related settings to optimize the model according to, by the hardware the optimized model will use during inference.
- :ref:`get_target_platform_capabilities`: A function to get a target platform model for Tensorflow and Pytorch.
- :ref:`DefaultDict`: Util class for creating a TargetPlatformCapabilities.
diff --git a/docs/_sources/api/api_docs/methods/get_target_platform_capabilities.rst.txt b/docs/_sources/api/api_docs/methods/get_target_platform_capabilities.rst.txt
index cc623b66a..e8346a359 100644
--- a/docs/_sources/api/api_docs/methods/get_target_platform_capabilities.rst.txt
+++ b/docs/_sources/api/api_docs/methods/get_target_platform_capabilities.rst.txt
@@ -4,7 +4,7 @@
=======================================
-Get TargetPlatformCapabilities
+Get FrameworkQuantizationCapabilities
=======================================
.. autofunction:: model_compression_toolkit.get_target_platform_capabilities
diff --git a/docs/_sources/api/api_docs/modules/layer_filters.rst.txt b/docs/_sources/api/api_docs/modules/layer_filters.rst.txt
index 2279e54b6..f21836e08 100644
--- a/docs/_sources/api/api_docs/modules/layer_filters.rst.txt
+++ b/docs/_sources/api/api_docs/modules/layer_filters.rst.txt
@@ -15,30 +15,30 @@ one may use the next filters to check if a layer configuration holds the created
Attribute Filters
==================
-.. autoclass:: model_compression_toolkit.target_platform.AttributeFilter
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.AttributeFilter
|
-.. autoclass:: model_compression_toolkit.target_platform.Eq
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.Eq
|
-.. autoclass:: model_compression_toolkit.target_platform.NotEq
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.NotEq
|
-.. autoclass:: model_compression_toolkit.target_platform.Greater
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.Greater
|
-.. autoclass:: model_compression_toolkit.target_platform.GreaterEq
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.GreaterEq
|
-.. autoclass:: model_compression_toolkit.target_platform.Smaller
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.Smaller
|
-.. autoclass:: model_compression_toolkit.target_platform.SmallerEq
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.SmallerEq
diff --git a/docs/_sources/api/api_docs/modules/qat_config.rst.txt b/docs/_sources/api/api_docs/modules/qat_config.rst.txt
index 9583aee88..c7dfcc9ea 100644
--- a/docs/_sources/api/api_docs/modules/qat_config.rst.txt
+++ b/docs/_sources/api/api_docs/modules/qat_config.rst.txt
@@ -10,10 +10,7 @@ qat_config Module
TrainingMethod
================================
-**Select a QAT training method:**
-
-.. autoclass:: model_compression_toolkit.qat.TrainingMethod
-
+In order to select a training method, please visit the :ref:`trainable_infrastructure API.`
|
diff --git a/docs/_sources/api/api_docs/modules/target_platform.rst.txt b/docs/_sources/api/api_docs/modules/target_platform_capabilities.rst.txt
similarity index 51%
rename from docs/_sources/api/api_docs/modules/target_platform.rst.txt
rename to docs/_sources/api/api_docs/modules/target_platform_capabilities.rst.txt
index c393cb21a..5e0dd9252 100644
--- a/docs/_sources/api/api_docs/modules/target_platform.rst.txt
+++ b/docs/_sources/api/api_docs/modules/target_platform_capabilities.rst.txt
@@ -1,11 +1,11 @@
:orphan:
-.. _ug-target_platform:
+.. _ug-target_platform_capabilities:
-=================================
-target_platform Module
-=================================
+=====================================
+target_platform_capabilities Module
+=====================================
MCT can be configured to quantize and optimize models for different hardware settings.
For example, when using qnnpack backend for Pytorch model inference, Pytorch `quantization
@@ -14,7 +14,7 @@ uses `per-tensor weights quantization `_.
-This can be addressed in MCT by using the target_platform module, that can configure different
+This can be addressed in MCT by using the target_platform_capabilities module, that can configure different
parameters that are hardware-related, and the optimization process will use this to optimize the model accordingly.
Models for IMX500, TFLite and qnnpack can be observed `here `_, and can be used using :ref:`get_target_platform_capabilities function`.
@@ -27,7 +27,7 @@ Models for IMX500, TFLite and qnnpack can be observed `here `.
-
-
-TargetPlatformCapabilities
-=============================
-.. autoclass:: model_compression_toolkit.target_platform.TargetPlatformCapabilities
-
-
-
+.. autoclass:: model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.OperatorSetGroup
diff --git a/docs/_sources/api/api_docs/modules/trainable_infrastructure.rst.txt b/docs/_sources/api/api_docs/modules/trainable_infrastructure.rst.txt
index c514a3307..42541f22f 100644
--- a/docs/_sources/api/api_docs/modules/trainable_infrastructure.rst.txt
+++ b/docs/_sources/api/api_docs/modules/trainable_infrastructure.rst.txt
@@ -36,6 +36,15 @@ It adds to the base quantizer a get_config and from_config functions to enable l
.. autoclass:: model_compression_toolkit.trainable_infrastructure.BasePytorchTrainableQuantizer
+
+
+TrainingMethod
+================================
+**Select a training method:**
+
+.. autoclass:: model_compression_toolkit.trainable_infrastructure.TrainingMethod
+
+
TrainableQuantizerWeightsConfig
=================================
This configuration object contains the necessary attributes for configuring a weights trainable quantizer.
@@ -46,7 +55,7 @@ For example, we can set a trainable weights quantizer with the following configu
.. code-block:: python
- from model_compression_toolkit.target_platform_capabilities.target_platform import QuantizationMethod
+ from model_compression_toolkit.target_platform_capabilities.target_platform_capabilities import QuantizationMethod
from model_compression_toolkit.constants import THRESHOLD, MIN_THRESHOLD
TrainableQuantizerWeightsConfig(weights_quantization_method=QuantizationMethod.SYMMETRIC,
@@ -70,7 +79,7 @@ For example, we can set a trainable activation quantizer with the following conf
.. code-block:: python
- from model_compression_toolkit.target_platform_capabilities.target_platform import QuantizationMethod
+ from model_compression_toolkit.target_platform_capabilities.target_platform_capabilities import QuantizationMethod
from model_compression_toolkit.constants import THRESHOLD, MIN_THRESHOLD
TrainableQuantizerActivationConfig(activation_quantization_method=QuantizationMethod.UNIFORM,
diff --git a/docs/_sources/api/api_docs/notes/tpc_note.rst.txt b/docs/_sources/api/api_docs/notes/tpc_note.rst.txt
index 39558f42a..7ced4a5d6 100644
--- a/docs/_sources/api/api_docs/notes/tpc_note.rst.txt
+++ b/docs/_sources/api/api_docs/notes/tpc_note.rst.txt
@@ -1,7 +1,7 @@
.. note::
- For now, some fields of :class:`~model_compression_toolkit.target_platform.OpQuantizationConfig` are ignored during
+ For now, some fields of :class:`~model_compression_toolkit.target_platform_capabilities.OpQuantizationConfig` are ignored during
the optimization process such as quantization_preserving, fixed_scale, and fixed_zero_point.
- - MCT will use more information from :class:`~model_compression_toolkit.target_platform.OpQuantizationConfig`, in the future.
+ - MCT will use more information from :class:`~model_compression_toolkit.target_platform_capabilities.OpQuantizationConfig`, in the future.
diff --git a/docs/api/api_docs/classes/BitWidthConfig.html b/docs/api/api_docs/classes/BitWidthConfig.html
index ffb941926..f69e01d18 100644
--- a/docs/api/api_docs/classes/BitWidthConfig.html
+++ b/docs/api/api_docs/classes/BitWidthConfig.html
@@ -7,7 +7,7 @@
- BitWidthConfig — MCT Documentation: ver 2.2.0
+ BitWidthConfig — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
The following API can be used to create a GradientPTQConfig instance which can be used for post training quantization using knowledge distillation from a teacher (float Keras model) to a student (the quantized Keras model)
+
The following API can be used to create a GradientPTQConfig instance which can be used for post training quantization using knowledge distillation from a teacher (float model) to a student (the quantized model)
Configuration to use for quantization with GradientPTQ.
-
Initialize a GradientPTQConfig.
Parameters:
-
n_epochs (int) – Number of representative dataset epochs to train.
-
optimizer (Any) – Optimizer to use.
-
optimizer_rest (Any) – Optimizer to use for bias and quantizer parameters.
-
loss (Callable) – The loss to use. should accept 6 lists of tensors. 1st list of quantized tensors, the 2nd list is the float tensors,
-the 3rd is a list of quantized weights, the 4th is a list of float weights, the 5th and 6th lists are the mean and std of the tensors
-accordingly. see example in multiple_tensors_mse_loss
-
log_function (Callable) – Function to log information about the GPTQ process.
-
train_bias (bool) – Whether to update the bias during the training or not.
-
rounding_type (RoundingType) – An enum that defines the rounding type.
-
use_hessian_based_weights (bool) – Whether to use Hessian-based weights for weighted average loss.
-
optimizer_quantization_parameter (Any) – Optimizer to override the rest optimizer for quantizer parameters.
-
optimizer_bias (Any) – Optimizer to override the rest optimizer for bias.
-
regularization_factor (float) – A floating point number that defines the regularization factor.
-
hessian_weights_config (GPTQHessianScoresConfig) – A configuration that include all necessary arguments to run a computation of Hessian scores for the GPTQ loss.
-
gptq_quantizer_params_override (dict) – A dictionary of parameters to override in GPTQ quantizer instantiation. Defaults to None (no parameters).
+
n_epochs – Number of representative dataset epochs to train.
+
loss – The loss to use. See ‘multiple_tensors_mse_loss’ for the expected interface.
+
optimizer – Optimizer to use.
+
optimizer_rest – Default optimizer to use for bias and quantizer parameters.
+
train_bias – Whether to update the bias during the training or not.
+
hessian_weights_config – A configuration that include all necessary arguments to run a computation of
+Hessian scores for the GPTQ loss.
+
gradual_activation_quantization_config – A configuration for Gradual Activation Quantization.
+
regularization_factor – A floating point number that defines the regularization factor.
+
rounding_type – An enum that defines the rounding type.
+
optimizer_quantization_parameter – Optimizer to override the rest optimizer for quantizer parameters.
+
optimizer_bias – Optimizer to override the rest optimizer for bias.
+
log_function – Function to log information about the GPTQ process.
+
gptq_quantizer_params_override – A dictionary of parameters to override in GPTQ quantizer instantiation.
Class to wrap all different parameters the library quantize the input model according to.
-
-
Parameters:
-
-
activation_error_method (QuantizationErrorMethod) – Which method to use from QuantizationErrorMethod for activation quantization threshold selection.
-
weights_error_method (QuantizationErrorMethod) – Which method to use from QuantizationErrorMethod for activation quantization threshold selection.
-
relu_bound_to_power_of_2 (bool) – Whether to use relu to power of 2 scaling correction or not.
-
weights_bias_correction (bool) – Whether to use weights bias correction or not.
-
weights_second_moment_correction (bool) – Whether to use weights second_moment correction or not.
-
input_scaling (bool) – Whether to use input scaling or not.
-
softmax_shift (bool) – Whether to use softmax shift or not.
-
shift_negative_activation_correction (bool) – Whether to use shifting negative activation correction or not.
-
activation_channel_equalization (bool) – Whether to use activation channel equalization correction or not.
-
z_threshold (float) – Value of z score for outliers removal.
-
min_threshold (float) – Minimum threshold to use during thresholds selection.
-
l_p_value (int) – The p value of L_p norm threshold selection.
-
block_collapsing (bool) – Whether to collapse block one to another in the input network
-
shift_negative_ratio (float) – Value for the ratio between the minimal negative value of a non-linearity output to its activation threshold, which above it - shifting negative activation should occur if enabled.
-
shift_negative_threshold_recalculation (bool) – Whether or not to recompute the threshold after shifting negative activation.
-
shift_negative_params_search (bool) – Whether to search for optimal shift and threshold in shift negative activation.
A class that encapsulates all the different parameters used by the library to quantize a model.
Examples
-
One may create a quantization configuration to quantize a model according to.
-For example, to quantize a model’s weights and activation using thresholds, such that
-weights threshold selection is done using MSE, activation threshold selection is done using NOCLIPPING (min/max),
-enabling relu_bound_to_power_of_2, weights_bias_correction,
-one can instantiate a quantization configuration:
+
You can create a quantization configuration to apply to a model. For example, to quantize a model’s weights and
+activations using thresholds, with weight threshold selection based on MSE and activation threshold selection
+using NOCLIPPING (min/max), while enabling relu_bound_to_power_of_2 and weights_bias_correction,
+you can instantiate a quantization configuration like this:
weights_memory – Memory of a model’s weights in bytes. Note that this includes only coefficients that should be quantized (for example, the kernel of Conv2D in Keras will be affected by this value, while the bias will not).
-
activation_memory – Memory of a model’s activation in bytes, according to the given activation resource utilization metric.
-
total_memory – The sum of model’s activation and weights memory in bytes, according to the given total resource utilization metric.
-
bops – The total bit-operations in the model.
-
-
-
+
weights_memory: Memory of a model’s weights in bytes.
+activation_memory: Memory of a model’s activation in bytes.
+total_memory: The sum of model’s activation and weights memory in bytes.
+bops: The total bit-operations in the model.
target_platform: Module to create and model hardware-related settings to optimize the model according to, by the hardware the optimized model will use during inference.
+
target_platform_capabilities: Module to create and model hardware-related settings to optimize the model according to, by the hardware the optimized model will use during inference.
diff --git a/docs/api/api_docs/methods/get_keras_data_generation_config.html b/docs/api/api_docs/methods/get_keras_data_generation_config.html
index acacb952d..098ecf3e0 100644
--- a/docs/api/api_docs/methods/get_keras_data_generation_config.html
+++ b/docs/api/api_docs/methods/get_keras_data_generation_config.html
@@ -7,7 +7,7 @@
- Get DataGenerationConfig for Keras Models — MCT Documentation: ver 2.2.0
+ Get DataGenerationConfig for Keras Models — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
diff --git a/docs/api/api_docs/methods/get_keras_gptq_config.html b/docs/api/api_docs/methods/get_keras_gptq_config.html
index 37cbf9274..43b773633 100644
--- a/docs/api/api_docs/methods/get_keras_gptq_config.html
+++ b/docs/api/api_docs/methods/get_keras_gptq_config.html
@@ -7,7 +7,7 @@
- Get GradientPTQConfig for Keras Models — MCT Documentation: ver 2.2.0
+ Get GradientPTQConfig for Keras Models — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
Create a GradientPTQConfig instance for Keras models.
Parameters:
@@ -58,10 +58,12 @@
Navigation
use_hessian_based_weights (bool) – Whether to use Hessian-based weights for weighted average loss.
regularization_factor (float) – A floating point number that defines the regularization factor.
hessian_batch_size (int) – Batch size for Hessian computation in Hessian-based weights GPTQ.
+
use_hessian_sample_attention (bool) – whether to use Sample-Layer Attention score for weighted loss.
+
gradual_activation_quantization (bool, GradualActivationQuantizationConfig) – If False, GradualActivationQuantization is disabled. If True, GradualActivationQuantization is enabled with the default settings. GradualActivationQuantizationConfig object can be passed to use non-default settings.
Returns:
-
a GradientPTQConfigV2 object to use when fine-tuning the quantized model using gptq.
+
a GradientPTQConfig object to use when fine-tuning the quantized model using gptq.
Create a GradientPTQConfig instance for Pytorch models.
Parameters:
n_epochs (int) – Number of epochs for running the representative dataset for fine-tuning.
optimizer (Optimizer) – Pytorch optimizer to use for fine-tuning for auxiliry variable.
optimizer_rest (Optimizer) – Pytorch optimizer to use for fine-tuning of the bias variable.
-
loss (Callable) – loss to use during fine-tuning. should accept 4 lists of tensors. 1st list of quantized tensors, the 2nd list is the float tensors, the 3rd is a list of quantized weights and the 4th is a list of float weights.
+
loss (Callable) – loss to use during fine-tuning. See the default loss function for the exact interface.
log_function (Callable) – Function to log information about the gptq process.
use_hessian_based_weights (bool) – Whether to use Hessian-based weights for weighted average loss.
regularization_factor (float) – A floating point number that defines the regularization factor.
hessian_batch_size (int) – Batch size for Hessian computation in Hessian-based weights GPTQ.
+
use_hessian_sample_attention (bool) – whether to use Sample-Layer Attention score for weighted loss.
+
gradual_activation_quantization (bool, GradualActivationQuantizationConfig) – If False, GradualActivationQuantization is disabled. If True, GradualActivationQuantization is enabled with the default settings. GradualActivationQuantizationConfig object can be passed to use non-default settings.
Returns:
-
a GradientPTQConfigV2 object to use when fine-tuning the quantized model using gptq.
+
a GradientPTQConfig object to use when fine-tuning the quantized model using gptq.
Examples
-
Import MCT and Create a GradientPTQConfigV2 to run for 5 epochs:
+
Import MCT and Create a GradientPTQConfig to run for 5 epochs:
Get a TargetPlatformCapabilities by the target platform model name and the framework name.
-For now, it supports frameworks ‘tensorflow’ and ‘pytorch’. For both of them
-the target platform model can be ‘default’, ‘imx500’, ‘tflite’, or ‘qnnpack’.
+
This is a degenerated function that only returns the MCT default TargetPlatformCapabilities object, to comply with the
+existing TPC API.
Parameters:
-
fw_name – Framework name of the TargetPlatformCapabilities.
+
fw_name – Framework name of the FrameworkQuantizationCapabilities.
target_platform_name – Target platform model name the model will use for inference.
For now, some fields of OpQuantizationConfig are ignored during
the optimization process such as quantization_preserving, fixed_scale, and fixed_zero_point.
diff --git a/docs/api/api_docs/methods/keras_gradient_post_training_quantization.html b/docs/api/api_docs/methods/keras_gradient_post_training_quantization.html
index 62eb33a89..5c2492cae 100644
--- a/docs/api/api_docs/methods/keras_gradient_post_training_quantization.html
+++ b/docs/api/api_docs/methods/keras_gradient_post_training_quantization.html
@@ -7,7 +7,7 @@
- Keras Gradient Based Post Training Quantization — MCT Documentation: ver 2.2.0
+ Keras Gradient Based Post Training Quantization — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
diff --git a/docs/api/api_docs/methods/keras_kpi_data.html b/docs/api/api_docs/methods/keras_kpi_data.html
index e39322e8c..e31bb0dee 100644
--- a/docs/api/api_docs/methods/keras_kpi_data.html
+++ b/docs/api/api_docs/methods/keras_kpi_data.html
@@ -7,7 +7,7 @@
- Get Resource Utilization information for Keras Models — MCT Documentation: ver 2.2.0
+ Get Resource Utilization information for Keras Models — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
diff --git a/docs/api/api_docs/methods/keras_post_training_quantization.html b/docs/api/api_docs/methods/keras_post_training_quantization.html
index 9cc04bf05..023c8a43e 100644
--- a/docs/api/api_docs/methods/keras_post_training_quantization.html
+++ b/docs/api/api_docs/methods/keras_post_training_quantization.html
@@ -7,7 +7,7 @@
- Keras Post Training Quantization — MCT Documentation: ver 2.2.0
+ Keras Post Training Quantization — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
diff --git a/docs/api/api_docs/methods/keras_quantization_aware_training_finalize_experimental.html b/docs/api/api_docs/methods/keras_quantization_aware_training_finalize_experimental.html
index f19e24b23..11d911a65 100644
--- a/docs/api/api_docs/methods/keras_quantization_aware_training_finalize_experimental.html
+++ b/docs/api/api_docs/methods/keras_quantization_aware_training_finalize_experimental.html
@@ -7,7 +7,7 @@
- Keras Quantization Aware Training Model Finalize — MCT Documentation: ver 2.2.0
+ Keras Quantization Aware Training Model Finalize — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
diff --git a/docs/api/api_docs/methods/keras_quantization_aware_training_init_experimental.html b/docs/api/api_docs/methods/keras_quantization_aware_training_init_experimental.html
index b5a47c72a..54ee5b454 100644
--- a/docs/api/api_docs/methods/keras_quantization_aware_training_init_experimental.html
+++ b/docs/api/api_docs/methods/keras_quantization_aware_training_init_experimental.html
@@ -7,7 +7,7 @@
- Keras Quantization Aware Training Model Init — MCT Documentation: ver 2.2.0
+ Keras Quantization Aware Training Model Init — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
diff --git a/docs/api/api_docs/methods/pytorch_gradient_post_training_quantization.html b/docs/api/api_docs/methods/pytorch_gradient_post_training_quantization.html
index 3db8f08ed..5d41c6976 100644
--- a/docs/api/api_docs/methods/pytorch_gradient_post_training_quantization.html
+++ b/docs/api/api_docs/methods/pytorch_gradient_post_training_quantization.html
@@ -7,7 +7,7 @@
- Pytorch Gradient Based Post Training Quantization — MCT Documentation: ver 2.2.0
+ Pytorch Gradient Based Post Training Quantization — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
Quantize a trained Pytorch module using post-training quantization.
By default, the module is quantized using a symmetric constraint quantization thresholds
-(power of two) as defined in the default TargetPlatformCapabilities.
+(power of two) as defined in the default FrameworkQuantizationCapabilities.
The module is first optimized using several transformations (e.g. BatchNormalization folding to
preceding layers). Then, using a given dataset, statistics (e.g. min/max, histogram, etc.) are
being collected for each layer’s output (and input, depends on the quantization configuration).
@@ -69,7 +69,7 @@
Navigation
core_config (CoreConfig) – Configuration object containing parameters of how the model should be quantized, including mixed precision parameters.
gptq_config (GradientPTQConfig) – Configuration for using gptq (e.g. optimizer).
gptq_representative_data_gen (Callable) – Dataset used for GPTQ training. If None defaults to representative_data_gen
-
target_platform_capabilities (TargetPlatformCapabilities) – TargetPlatformCapabilities to optimize the PyTorch model according to.
+
target_platform_capabilities (Union[TargetPlatformCapabilities, str]) – TargetPlatformCapabilities to optimize the PyTorch model according to.
diff --git a/docs/api/api_docs/methods/pytorch_kpi_data.html b/docs/api/api_docs/methods/pytorch_kpi_data.html
index 6913f0e11..f7a639a62 100644
--- a/docs/api/api_docs/methods/pytorch_kpi_data.html
+++ b/docs/api/api_docs/methods/pytorch_kpi_data.html
@@ -7,7 +7,7 @@
- Get Resource Utilization information for PyTorch Models — MCT Documentation: ver 2.2.0
+ Get Resource Utilization information for PyTorch Models — MCT Documentation: ver 2.3.0
@@ -31,7 +31,7 @@
Quantize a trained Pytorch module using post-training quantization.
By default, the module is quantized using a symmetric constraint quantization thresholds
-(power of two) as defined in the default TargetPlatformCapabilities.
+(power of two) as defined in the default FrameworkQuantizationCapabilities.
The module is first optimized using several transformations (e.g. BatchNormalization folding to
preceding layers). Then, using a given dataset, statistics (e.g. min/max, histogram, etc.) are
being collected for each layer’s output (and input, depends on the quantization configuration).
@@ -64,7 +64,7 @@
Navigation
representative_data_gen (Callable) – Dataset used for calibration.
target_resource_utilization (ResourceUtilization) – ResourceUtilization object to limit the search of the mixed-precision configuration as desired.
core_config (CoreConfig) – Configuration object containing parameters of how the model should be quantized, including mixed precision parameters.
-
target_platform_capabilities (TargetPlatformCapabilities) – TargetPlatformCapabilities to optimize the PyTorch model according to.
+
target_platform_capabilities (Union[TargetPlatformCapabilities, str]) – TargetPlatformCapabilities to optimize the PyTorch model according to.
target_platform_capabilities (TargetPlatformCapabilities) – TargetPlatformCapabilities to optimize the Pytorch model according to.
+
target_platform_capabilities (Union[TargetPlatformCapabilities, str]) – TargetPlatformCapabilities to optimize the Pytorch model according to.
Returns:
@@ -85,9 +85,7 @@
Navigation
>>> model=mobilenet_v2(pretrained=True)
-
-
Create a random dataset generator, for required number of calibration iterations (num_calibration_batches):
-In this example a random dataset of 10 batches each containing 4 images is used.
+
Create a random dataset generator, for required number of calibration iterations (num_calibration_batches). In this example, a random dataset of 10 batches each containing 4 images is used:
Create a MCT core config, containing the quantization configuration:
+
Create a MCT core config, containing the quantization configuration:
>>> config=mct.core.CoreConfig()
-
Pass the model, the representative dataset generator, the configuration and the target resource utilization to get a
-quantized model. Now the model contains quantizer wrappers for fine tunning the weights:
+
Pass the model, the representative dataset generator, the configuration and the target resource utilization to get a quantized model. Now the model contains quantizer wrappers for fine tunning the weights:
used. (a default MixedPrecisionQuantizationConfig is) –
+
mixed_precision_config (MixedPrecisionQuantizationConfig) – Config for mixed precision quantization.
+If None, a default MixedPrecisionQuantizationConfig is used.
bit_width_config (BitWidthConfig) – Config for manual bit-width selection.
debug_config (DebugConfig) – Config for debugging and editing the network quantization process.
analyze_similarity (bool) – Whether to plot similarity figures within TensorBoard (when logger is
enabled) or not. Can be used to pinpoint problematic layers in the quantization process.
network_editor (List[EditRule]) – A list of rules and actions to edit the network for quantization.
-
simulate_scheduler (bool) – Simulate scheduler behaviour to compute operators order and cuts.
+
simulate_scheduler (bool) – Simulate scheduler behavior to compute operators’ order and cuts.
Wrap a key, value and an operation to filter a layer’s configuration according to.
If the layer’s configuration has the key, and its’ value matches when applying the operator,
the configuration matches the AttributeFilter.
Filter configurations such that it matches configurations
that have an attribute with a value that is greater or equal than the value that GreaterEq holds.
Filter configurations such that it matches configurations that have an attribute with a value that is smaller or equal than the value that SmallerEq holds.
activation_training_method (TrainingMethod) – Training method for activation quantizers:
+
weight_training_method (TrainingMethod) – Training method for weight quantizers
+
activation_training_method (TrainingMethod) – Training method for activation quantizers:
weight_quantizer_params_override – A dictionary of parameters to override in weight quantization quantizer instantiation. Defaults to None (no parameters)
activation_quantizer_params_override – A dictionary of parameters to override in activation quantization quantizer instantiation. Defaults to None (no parameters)
This can be addressed in MCT by using the target_platform module, that can configure different
-parameters that are hardware-related, and the optimization process will use this to optimize the model accordingly.
-Models for IMX500, TFLite and qnnpack can be observed here, and can be used using get_target_platform_capabilities function.
-
-
-
-
-
Note
-
For now, some fields of OpQuantizationConfig are ignored during
-the optimization process such as quantization_preserving, fixed_scale, and fixed_zero_point.
OpQuantizationConfig is a class to configure the quantization parameters of an operator.
-
-
Parameters:
-
-
default_weight_attr_config (AttributeQuantizationConfig) – A default attribute quantization configuration for the operation.
-
attr_weights_configs_mapping (Dict[str, AttributeQuantizationConfig]) – A mapping between an op attribute name and its quantization configuration.
-
activation_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for activation quantization.
-
activation_n_bits (int) – Number of bits to quantize the activations.
-
supported_input_activation_n_bits (int or Tuple[int]) – Number of bits that operator accepts as input.
-
enable_activation_quantization (bool) – Whether to quantize the model activations or not.
-
quantization_preserving (bool) – Whether quantization parameters should be the same for an operator’s input and output.
-
fixed_scale (float) – Scale to use for an operator quantization parameters.
-
fixed_zero_point (int) – Zero-point to use for an operator quantization parameters.
-
simd_size (int) – Per op integer representing the Single Instruction, Multiple Data (SIMD) width of an operator. It indicates the number of data elements that can be fetched and processed simultaneously in a single instruction.
-
signedness (bool) – Set activation quantization signedness.
Modeling of the hardware the quantized model will use during inference.
-The model contains definition of operators, quantization configurations of them, and
-fusing patterns so that multiple operators will be combined into a single operator.
-
-
Parameters:
-
-
default_qco (QuantizationConfigOptions) – Default QuantizationConfigOptions to use for operators that their QuantizationConfigOptions are not defined in the model.
-
add_metadata (bool) – Whether to add metadata to the model or not.
Fusing defines a list of operators that should be combined and treated as a single operator,
-hence no quantization is applied between them.
-
-
Parameters:
-
-
operator_groups_list (List[Union[OperatorsSet, OperatorSetConcat]]) – A list of operator groups, each being either an OperatorSetConcat or an OperatorsSet.
-
name (str) – The name for the Fusing instance. If not provided, it’s generated from the operator groups’ names.
Gather multiple OperationsSetToLayers to represent mapping of framework’s layers to TargetPlatformModel OperatorsSet.
-
-
Parameters:
-
op_sets_to_layers (List[OperationsSetToLayers]) – List of OperationsSetToLayers where each of them maps an OperatorsSet name to a list of layers that represents the OperatorsSet.
Associate an OperatorsSet to a list of framework’s layers.
-
-
Parameters:
-
-
op_set_name (str) – Name of OperatorsSet to associate with layers.
-
layers (List[Any]) – List of layers/FilterLayerParams to associate with OperatorsSet.
-
attr_mapping (Dict[str, DefaultDict]) – A mapping between a general attribute name to a DefaultDict that maps a layer type to the layer’s framework name of this attribute.
This can be addressed in MCT by using the target_platform_capabilities module, that can configure different
+parameters that are hardware-related, and the optimization process will use this to optimize the model accordingly.
+Models for IMX500, TFLite and qnnpack can be observed here, and can be used using get_target_platform_capabilities function.
+
+
+
+
+
Note
+
For now, some fields of OpQuantizationConfig are ignored during
+the optimization process such as quantization_preserving, fixed_scale, and fixed_zero_point.
+
+
MCT will use more information from OpQuantizationConfig, in the future.
+
+
+
+
+
+
The object MCT should get called TargetPlatformCapabilities (or shortly TPC).
+This diagram demonstrates the main components:
+
+
Now, we will detail about the different components.
OpQuantizationConfig is a class to configure the quantization parameters of an operator.
+
+
Parameters:
+
+
default_weight_attr_config (AttributeQuantizationConfig) – A default attribute quantization configuration for the operation.
+
attr_weights_configs_mapping (Dict[str, AttributeQuantizationConfig]) – A mapping between an op attribute name and its quantization configuration.
+
activation_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for activation quantization.
+
activation_n_bits (int) – Number of bits to quantize the activations.
+
supported_input_activation_n_bits (Union[int, Tuple[int, ...]]) – Number of bits that operator accepts as input.
+
enable_activation_quantization (bool) – Whether to quantize the model activations or not.
+
quantization_preserving (bool) – Whether quantization parameters should be the same for an operator’s input and output.
+
fixed_scale (Optional[float]) – Scale to use for an operator quantization parameters.
+
fixed_zero_point (Optional[int]) – Zero-point to use for an operator quantization parameters.
+
simd_size (Optional[int]) – Per op integer representing the Single Instruction, Multiple Data (SIMD) width of an operator. It indicates the number of data elements that can be fetched and processed simultaneously in a single instruction.
+
signedness (Signedness) – Set activation quantization signedness.
+
+
+
+
Create a new model by parsing and validating input data from keyword arguments.
+
Raises ValidationError if the input data cannot be parsed to form a valid model.
This class is a base quantizer which validates provided quantization config and defines an abstract function which any quantizer needs to implement.
This class adds to the base quantizer a get_config and from_config functions to enable loading and saving the keras model.
+
This class is a base quantizer which validates the provided quantization config and defines an abstract function which any quantizer needs to implment.
Parameters:
-
quantization_config – quantizer config class contains all the information about a quantizer configuration.
+
+
quantization_config – quantizer config class contains all the information about the quantizer configuration.
+
freeze_quant_params – whether to freeze all learnable quantization parameters during training.
This class is a base Pytorch quantizer which validates the provided quantization config and defines an
-abstract function which any quantizer needs to implement.
This class is a base quantizer which validates the provided quantization config and defines an abstract function which any quantizer needs to implment.
Parameters:
-
quantization_config – quantizer config class contains all the information about the quantizer configuration.
+
+
quantization_config – quantizer config class contains all the information about the quantizer configuration.
+
freeze_quant_params – whether to freeze all learnable quantization parameters during training.
For now, some fields of OpQuantizationConfig are ignored during
the optimization process such as quantization_preserving, fixed_scale, and fixed_zero_point.