Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TF QAT docs. Deprecate TF create_compressed_model method #3217

Merged
merged 19 commits into from
Feb 5, 2025
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,9 @@ def transform_fn(data_item):
calibration_dataset = nncf.Dataset(val_dataset, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(model, calibration_dataset)
# Step 4: Remove auxiliary layers and operations added during the quantization process,
# resulting in a clean, fully quantized model ready for deployment.
stripped_model = nncf.strip(quantized_model)
```

</details>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Use NNCF for Quantization Aware Training in PyTorch
# Use NNCF for Quantization Aware Training

This is a step-by-step tutorial on how to integrate the NNCF package into the existing PyTorch project (please see the [TensorFlow quantization documentation](../other_algorithms/LegacyQuantization.md) for integration tutorial for the existing TensorFlow project).
The use case implies that the user already has a training pipeline that reproduces training of the model in the floating point precision and pretrained model.
This is a step-by-step tutorial on how to integrate the NNCF package into the existing PyTorch or TensorFlow projects.
The use case implies that the user already has a training pipeline that reproduces training of the model in the floating point precision and pretrained model.
The task is to prepare this model for accelerated inference by simulating the compression at train time.
Please refer to this [document](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md) for details of the implementation.

Expand All @@ -11,11 +11,24 @@ Please refer to this [document](/docs/usage/training_time_compression/other_algo

Quantize the model using the [Post Training Quantization](../../post_training_compression/post_training_quantization/Usage.md) method.

<details open><summary><b>PyTorch</b></summary>

```python
model = TorchModel() # instance of torch.nn.Module
quantized_model = nncf.quantize(model, ...)
```

</details>

<details><summary><b>TensorFlow</b></summary>

```python
model = TensorFlowModel() # instance of tf.keras.Model
quantized_model = nncf.quantize(model, ...)
```

</details>

### Step 2: Run the training pipeline

At this point, the NNCF is fully integrated into your training pipeline.
Expand All @@ -27,27 +40,46 @@ Important points you should consider when training your networks with compressio

### Step 3: Export the compressed model

After the compressed model has been fine-tuned to acceptable accuracy and compression stages, you can export it. There are two ways to export a model:
After the compressed model has been fine-tuned to acceptable accuracy and compression stages, you can export it.

<details open><summary><b>PyTorch</b></summary>

Trace the model via inference in framework operations.

```python
# To OpenVINO format
import openvino as ov
ov_quantized_model = ov.convert_model(quantized_model.cpu(), example_input=dummy_input)
```

1. Trace the model via inference in framework operations.
</details>

```python
# To OpenVINO format
import openvino as ov
ov_quantized_model = ov.convert_model(quantized_model.cpu(), example_input=dummy_input)
```
<details><summary><b>TensorFlow</b></summary>

```python
# To OpenVINO format
import openvino as ov

# Removes auxiliary layers and operations added during the quantization process,
# resulting in a clean, fully quantized model ready for deployment.
stripped_model = nncf.strip(quantized_model)

ov_quantized_model = ov.convert_model(stripped_model, share_weights=False)
```

</details>

## Saving and loading compressed models

<details open><summary><b>PyTorch</b></summary>

The complete information about compression is defined by a compressed model and a NNCF config.
The model characterizes the weights and topology of the network. The NNCF config - how to restore additional modules intoduced by NNCF.
The NNCF config can be obtained by `quantized_model.nncf.get_config()` on saving and passed to the
`nncf.torch.load_from_config` helper function to load additional modules from the given NNCF config.
The quantized model saving allows to load quantized modules to the target model in a new python process and
requires only example input for the target module, corresponding NNCF config and the quantized model state dict.

### Saving and loading compressed models in PyTorch

```python
# save part
quantized_model = nncf.quantize(model, calibration_dataset)
Expand All @@ -70,10 +102,14 @@ quantized_model.load_state_dict(state_dict)

You can save the `compressed_model` object `torch.save` as usual: via `state_dict` and `load_state_dict` methods.

</details>

## Advanced usage

### Compression of custom modules

<details open><summary><b>PyTorch</b></summary>

With no target model code modifications, NNCF only supports native PyTorch modules with respect to trainable parameter (weight) compressed, such as `torch.nn.Conv2d`.
If your model contains a custom, non-PyTorch standard module with trainable weights that should be compressed, you can register it using the `@nncf.register_module` decorator:

Expand All @@ -91,4 +127,9 @@ If registered module should be ignored by specific algorithms use `ignored_algor

In the example above, the NNCF-compressed models that contain instances of `MyModule` will have the corresponding modules extended with functionality that will allow NNCF to quantize the `weight` parameter of `MyModule` before it takes part in `MyModule`'s `forward` calculation.

See a PyTorch [example](/examples/quantization_aware_training/torch/resnet18/README.md) for **Quantization** Compression scenario on Tiny ImageNet-200 dataset.
</details>

## Examples

- See a PyTorch [example](/examples/quantization_aware_training/torch/resnet18/README.md) for **Quantization** Compression scenario on Tiny ImageNet-200 dataset.
- See a TensorFlow [example](/examples/quantization_aware_training/tensorflow/mobilenet_v2/README.md) for **Quantization** Compression scenario on `imagenette/320px-v2` dataset.
7 changes: 7 additions & 0 deletions nncf/tensorflow/helpers/model_creation.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from nncf import NNCFConfig
from nncf.api.compression import CompressionAlgorithmController
from nncf.common.compression import BaseCompressionAlgorithmController as BaseController
from nncf.common.deprecation import deprecated
from nncf.common.utils.api_marker import api
from nncf.config.extractors import extract_algorithm_names
from nncf.config.telemetry_extractors import CompressionStartedFromConfig
Expand Down Expand Up @@ -62,6 +63,12 @@ def create_compression_algorithm_builder(config: NNCFConfig, should_init: bool)
CompressionStartedFromConfig(argname="config"),
],
)
@deprecated(
msg="Consider using the 'nncf.quantize()' method instead. "
"Please refer to the documentation for guidance on migration.",
start_version="2.14.2",
end_version="2.14.3",
)
def create_compressed_model(
model: tf.keras.Model, config: NNCFConfig, compression_state: Optional[Dict[str, Any]] = None
) -> Tuple[CompressionAlgorithmController, tf.keras.Model]:
Expand Down