openvinotoolkit · alexsu52 · Feb 5, 2025 · Jan 28, 2025 · Jan 28, 2025 · Jan 28, 2025
@@ -201,6 +201,9 @@ def transform_fn(data_item):
 calibration_dataset = nncf.Dataset(val_dataset, transform_fn)
 # Step 3: Run the quantization pipeline
 quantized_model = nncf.quantize(model, calibration_dataset)
+# Step 4: Remove auxiliary layers and operations added during the quantization process,
+# resulting in a clean, fully quantized model ready for deployment.
+stripped_model = nncf.strip(quantized_model)
 ```
 
 </details>

@@ -1,7 +1,7 @@
-# Use NNCF for Quantization Aware Training in PyTorch
+# Use NNCF for Quantization Aware Training
 
-This is a step-by-step tutorial on how to integrate the NNCF package into the existing PyTorch project (please see the [TensorFlow quantization documentation](../other_algorithms/LegacyQuantization.md) for integration tutorial for the existing TensorFlow project).
-The use case implies that the user already has a training pipeline that reproduces training of the model in the floating  point precision and pretrained model.
+This is a step-by-step tutorial on how to integrate the NNCF package into the existing PyTorch or TensorFlow projects.
+The use case implies that the user already has a training pipeline that reproduces training of the model in the floating point precision and pretrained model.
 The task is to prepare this model for accelerated inference by simulating the compression at train time.
 Please refer to this [document](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md) for details of the implementation.
 
@@ -11,11 +11,24 @@ Please refer to this [document](/docs/usage/training_time_compression/other_algo
 
 Quantize the model using the [Post Training Quantization](../../post_training_compression/post_training_quantization/Usage.md) method.
 
+<details open><summary><b>PyTorch</b></summary>
+
 ```python
 model = TorchModel() # instance of torch.nn.Module
 quantized_model = nncf.quantize(model, ...)
 ```
 
+</details>
+
+<details><summary><b>TensorFlow</b></summary>
+
+```python
+model = TensorFlowModel() # instance of tf.keras.Model
+quantized_model = nncf.quantize(model, ...)
+```
+
+</details>
+
 ### Step 2: Run the training pipeline
 
 At this point, the NNCF is fully integrated into your training pipeline.
@@ -27,27 +40,46 @@ Important points you should consider when training your networks with compressio
 
 ### Step 3: Export the compressed model
 
-After the compressed model has been fine-tuned to acceptable accuracy and compression stages, you can export it. There are two ways to export a model:
+After the compressed model has been fine-tuned to acceptable accuracy and compression stages, you can export it.
+
+<details open><summary><b>PyTorch</b></summary>
+
+Trace the model via inference in framework operations.
+
+```python
+# To OpenVINO format
+import openvino as ov
+ov_quantized_model = ov.convert_model(quantized_model.cpu(), example_input=dummy_input)
+```
 
-1. Trace the model via inference in framework operations.
+</details>
 
-    ```python
-    # To OpenVINO format
-    import openvino as ov
-    ov_quantized_model = ov.convert_model(quantized_model.cpu(), example_input=dummy_input)
-    ```
+<details><summary><b>TensorFlow</b></summary>
+
+```python
+# To OpenVINO format
+import openvino as ov
+
+# Removes auxiliary layers and operations added during the quantization process,
+# resulting in a clean, fully quantized model ready for deployment.
+stripped_model = nncf.strip(quantized_model)
+
+ov_quantized_model = ov.convert_model(stripped_model, share_weights=False)
+```
+
+</details>
 
 ## Saving and loading compressed models
 
+<details open><summary><b>PyTorch</b></summary>
+
 The complete information about compression is defined by a compressed model and a NNCF config.
 The model characterizes the weights and topology of the network. The NNCF config - how to restore additional modules intoduced by NNCF.
 The NNCF config can be obtained by `quantized_model.nncf.get_config()` on saving and passed to the
 `nncf.torch.load_from_config` helper function to load additional modules from the given NNCF config.
 The quantized model saving allows to load quantized modules to the target model in a new python process and
 requires only example input for the target module, corresponding NNCF config and the quantized model state dict.
 
-### Saving and loading compressed models in PyTorch
-
 ```python
 # save part
 quantized_model = nncf.quantize(model, calibration_dataset)
@@ -70,10 +102,14 @@ quantized_model.load_state_dict(state_dict)
 
 You can save the `compressed_model` object `torch.save` as usual: via `state_dict` and `load_state_dict` methods.
 
+</details>
+
 ## Advanced usage
 
 ### Compression of custom modules
 
+<details open><summary><b>PyTorch</b></summary>
+
 With no target model code modifications, NNCF only supports native PyTorch modules with respect to trainable parameter (weight) compressed, such as `torch.nn.Conv2d`.
 If your model contains a custom, non-PyTorch standard module with trainable weights that should be compressed, you can register it using the `@nncf.register_module` decorator:
 
@@ -91,4 +127,9 @@ If registered module should be ignored by specific algorithms use `ignored_algor
 
 In the example above, the NNCF-compressed models that contain instances of `MyModule` will have the corresponding modules extended with functionality that will allow NNCF to quantize the `weight` parameter of `MyModule` before it takes part in `MyModule`'s `forward` calculation.
 
-See a PyTorch [example](/examples/quantization_aware_training/torch/resnet18/README.md) for **Quantization** Compression scenario on Tiny ImageNet-200 dataset.
+</details>
+
+## Examples
+
+- See a PyTorch [example](/examples/quantization_aware_training/torch/resnet18/README.md) for **Quantization** Compression scenario on Tiny ImageNet-200 dataset.
+- See a TensorFlow [example](/examples/quantization_aware_training/tensorflow/mobilenet_v2/README.md) for **Quantization** Compression scenario on `imagenette/320px-v2` dataset.
@@ -18,6 +18,7 @@
 from nncf import NNCFConfig
 from nncf.api.compression import CompressionAlgorithmController
 from nncf.common.compression import BaseCompressionAlgorithmController as BaseController
+from nncf.common.deprecation import deprecated
 from nncf.common.utils.api_marker import api
 from nncf.config.extractors import extract_algorithm_names
 from nncf.config.telemetry_extractors import CompressionStartedFromConfig
@@ -62,6 +63,12 @@ def create_compression_algorithm_builder(config: NNCFConfig, should_init: bool)
         CompressionStartedFromConfig(argname="config"),
     ],
 )
+@deprecated(
+    msg="Consider using the 'nncf.quantize()' method instead. "
+    "Please refer to the documentation for guidance on migration.",
+    start_version="2.14.2",
+    end_version="2.14.3",
+)
 def create_compressed_model(
     model: tf.keras.Model, config: NNCFConfig, compression_state: Optional[Dict[str, Any]] = None
 ) -> Tuple[CompressionAlgorithmController, tf.keras.Model]: