Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model optimizataion documentation update #11072

Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
a8e7795
Fixed Model Optimization Guide and NNCF docs
AlexKoff88 Mar 18, 2022
e667a72
Fixed the link to Optimum
AlexKoff88 Mar 18, 2022
c8457df
Updated installatin guide
AlexKoff88 Mar 19, 2022
75609a4
Changed API description
AlexKoff88 Mar 19, 2022
b4d5e31
Changes quantization documents
AlexKoff88 Mar 21, 2022
1a1395f
Fixed links in the relevant components
AlexKoff88 Mar 21, 2022
6805712
Fixed API description
AlexKoff88 Mar 21, 2022
1b08109
Revised CLI document
AlexKoff88 Mar 21, 2022
3b1e412
Fixed formatting bugs in the main document
AlexKoff88 Mar 21, 2022
dfb3374
Fixed formatting bugs in the main document
AlexKoff88 Mar 21, 2022
5a14334
Changed the structure. Added Default quantization usage via API
AlexKoff88 Mar 21, 2022
3343dae
Fixed E2E CLI example
AlexKoff88 Mar 21, 2022
db64ba9
Added AccuracyAware usage description
AlexKoff88 Mar 21, 2022
418b021
Revised structure and examples
AlexKoff88 Mar 22, 2022
7884330
Fixed a link to POT intro
AlexKoff88 Mar 22, 2022
71f44b3
Changed the structure for algorithms
AlexKoff88 Mar 22, 2022
c5951a1
Fixed links
AlexKoff88 Mar 22, 2022
e278da6
Additional fixed of the links
AlexKoff88 Mar 22, 2022
509e77b
Revised Ranger documentation
AlexKoff88 Mar 22, 2022
ea6920d
Some fixes
AlexKoff88 Mar 22, 2022
dddc2f9
Revised Best Practicies
AlexKoff88 Mar 22, 2022
7d07ddc
Fixed descriptions
AlexKoff88 Mar 22, 2022
d0ffcde
Fixed section names
AlexKoff88 Mar 22, 2022
551fbd4
Changed the workflow one more time
AlexKoff88 Mar 22, 2022
d2cfa07
Additional fixes to the model structure
AlexKoff88 Mar 22, 2022
247bf14
Fixed AA usage
AlexKoff88 Mar 22, 2022
ba9d448
Added DefaultQuantization flow image
AlexKoff88 Mar 22, 2022
deace17
Fixed many issues
AlexKoff88 Mar 22, 2022
b8f9e76
Fixed many issues
AlexKoff88 Mar 22, 2022
e6e7c47
Applied many comments
AlexKoff88 Mar 23, 2022
5f1a4d4
Additional fixes
AlexKoff88 Mar 23, 2022
db311d7
Fixed examples and provided links to them
AlexKoff88 Mar 23, 2022
1b80a47
Changed DataLoader Example. Fixed FAQ
AlexKoff88 Mar 23, 2022
636db2f
Changed the main README for GitHub
AlexKoff88 Mar 23, 2022
7a87136
Fixed E2E CLI example
AlexKoff88 Mar 24, 2022
dedc3ca
Fixed links and code of DataLoader
AlexKoff88 Mar 24, 2022
5d9a190
Merged with upstream
AlexKoff88 Mar 24, 2022
741977e
Fixed build issues
AlexKoff88 Mar 24, 2022
01096fa
Fixed more links
AlexKoff88 Mar 24, 2022
591f8a4
Fixed one more documentation build issue
AlexKoff88 Mar 24, 2022
b82797c
Fixed more links
AlexKoff88 Mar 24, 2022
ab9b7e7
Fixed code example
AlexKoff88 Mar 25, 2022
29ae43e
Add multiple data loaders
AlexKoff88 Mar 28, 2022
831c3e4
Add audio example
AlexKoff88 Mar 28, 2022
2114423
Minor fixes in the code of sample loaders
AlexKoff88 Mar 28, 2022
68fa7b0
Merge with upstream. Resolved conflicts
AlexKoff88 Mar 28, 2022
b098fce
Add descriptions of dataloaders. Changed the behaviour of text loader
AlexKoff88 Mar 28, 2022
2a85f11
Fixed typos
AlexKoff88 Mar 29, 2022
5525025
Added a new item into the FAQ
AlexKoff88 Mar 29, 2022
e5fb837
Apply wording corrections
tsavina Mar 29, 2022
5d0dc16
Update docs/OV_Runtime_UG/supported_plugins/CPU.md
AlexKoff88 Mar 29, 2022
e70e2a7
Fixed comments
AlexKoff88 Mar 29, 2022
81c7295
Fixed merge conflicts
AlexKoff88 Mar 29, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Please report questions, issues and suggestions using:
[Open Model Zoo]:https://github.com/openvinotoolkit/open_model_zoo
[OpenVINO™ Runtime]:https://docs.openvino.ai/latest/openvino_docs_OV_UG_OV_Runtime_User_Guide.html
[Model Optimizer]:https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html
[Post-Training Optimization Tool]:https://docs.openvino.ai/latest/pot_README.html
[Post-Training Optimization Tool]:https://docs.openvino.ai/latest/pot_introduction.html
[Samples]:https://github.com/openvinotoolkit/openvino/tree/master/samples
[tag on StackOverflow]:https://stackoverflow.com/search?q=%23openvino

2 changes: 1 addition & 1 deletion docs/IE_PLUGIN_DG/QuantizedNetworks.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ For more details about low-precision model representation please refer to this [
During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
- Independently based on the definition of *FakeQuantize* operation.
- Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](../OV_Runtime_UG/Int8Inference.md).
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations.

Here we provide only a high-level overview of the interpretation rules of FakeQuantize.
At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,7 @@ For example, if you would like to infer a model with `Convolution` operation in
> There are several supported quantization approaches on activations and on weights. All supported approaches are described in [Quantization approaches](#quantization-approaches) section below. In demonstrated model [FakeQuantize operation quantization](#fakequantize-operation) approach is used.

### Low precision tools
There are two tools to quantize a model:
1. [Post-Training Optimization Toolkit](@ref pot_docs_LowPrecisionOptimizationGuide) (POT)
2. [Neural Network Compression Framework](https://github.com/openvinotoolkit/nncf) (NNCF)

Additionally, low precision transformations can handle ONNX quantized models.
For more details on how to get quantized model please refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.

## Quantization approaches
LPT transformations support two quantization approaches:
Expand Down
2 changes: 1 addition & 1 deletion docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The IR is a pair of files describing the model:

* <code>.bin</code> - Contains the weights and biases binary data.

> **NOTE**: The generated IR can be additionally optimized for inference by [Post-training Optimization tool](../../tools/pot/README.md)
> **NOTE**: The generated IR can be additionally optimized for inference by [Post-training optimization](../../tools/pot/docs/Introduction.md)
> that applies post-training quantization methods.

> **TIP**: You also can work with the Model Optimizer inside the OpenVINO™ [Deep Learning Workbench](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Introduction.html) (DL Workbench).
Expand Down
2 changes: 1 addition & 1 deletion docs/MO_DG/prepare_model/FP16_Compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ although for the majority of models accuracy degradation is negligible. For deta
compressed `FP16` models refer to [Working with devices](../../OV_Runtime_UG/supported_plugins/Device_Plugins.md) page.

> **NOTE**: `FP16` compression is sometimes used as initial step for `INT8` quantization, please refer to
> [Post-Training Optimization tool](../../../tools/pot/README.md) for more information about that.
> [Post-training optimization](../../../tools/pot/docs/Introduction.md) for more information about that.
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
## Introduction

OpenVINO Runtime CPU and GPU devices can infer models in the low precision.
For details, refer to [Low Precision Inference on the CPU](../../../OV_Runtime_UG/Int8Inference.md).
For details, refer to [Model Optimization Guide](@ref openvino_docs_model_optimization_guide).

Intermediate Representation (IR) should be specifically formed to be suitable for low precision inference.
Such an IR is called a Low Precision IR and you can generate it in two ways:
- [Quantize regular IR with the Post-Training Optimization tool](@ref pot_README)
- [Quantize regular IR with the Post-Training Optimization tool](@ref pot_introduction)
- Use the Model Optimizer for a model pretrained for Low Precision inference: TensorFlow\* pre-TFLite models (`.pb` model file with `FakeQuantize*` operations) and ONNX\* quantized models.
Both TensorFlow and ONNX quantized models could be prepared by [Neural Network Compression Framework](https://github.com/openvinotoolkit/nncf/blob/develop/README.md).

Expand Down
10 changes: 4 additions & 6 deletions docs/OV_Runtime_UG/Int8Inference.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Low-Precision 8-bit Integer Inference {#openvino_docs_OV_UG_Int8Inference}
# Low-Precision 8-bit Integer Inference

## Disclaimer

Expand All @@ -14,9 +14,7 @@ Low-precision 8-bit inference is optimized for:

## Introduction

For 8-bit integer computation, a model must be quantized. You can use a quantized model from [OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel) or quantize a model yourself. For quantization, you can use the following:
- [Post-Training Optimization Tool](@ref pot_docs_LowPrecisionOptimizationGuide) delivered with the Intel® Distribution of OpenVINO™ toolkit release package
- [Neural Network Compression Framework](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-nncf.html) available on GitHub: https://github.com/openvinotoolkit/nncf
For 8-bit integer computation, a model must be quantized. You can use a quantized model from [OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel) or quantize a model yourself. For more details on how to get quantized model please refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.

The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).

Expand Down Expand Up @@ -46,10 +44,10 @@ If you infer the model with the OpenVINO™ CPU plugin and collect performance c

## Low-Precision 8-bit Integer Inference Workflow

For 8-bit integer computations, a model must be quantized. Quantized models can be downloaded from [Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel). If the model is not quantized, you can use the [Post-Training Optimization Tool](@ref pot_README) to quantize the model. The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).
For 8-bit integer computations, a model must be quantized. Quantized models can be downloaded from [Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel). If the model is not quantized, you can use the [Post-Training Optimization Tool](@ref pot_introduction) to quantize the model. The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).

8-bit inference pipeline includes two stages (also refer to the figure below):
1. *Offline stage*, or *model quantization*. During this stage, [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers are added before most layers to have quantized tensors before layers in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a quantized model. Quantized model precision is not changed, quantized tensors are in original precision range (`fp32`). `FakeQuantize` layer has `levels` attribute which defines quants count. Quants count defines precision which is used during inference. For `int8` range `levels` attribute value has to be 255 or 256. To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_README) delivered with the Intel® Distribution of OpenVINO™ toolkit release package.
1. *Offline stage*, or *model quantization*. During this stage, [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers are added before most layers to have quantized tensors before layers in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a quantized model. Quantized model precision is not changed, quantized tensors are in original precision range (`fp32`). `FakeQuantize` layer has `levels` attribute which defines quants count. Quants count defines precision which is used during inference. For `int8` range `levels` attribute value has to be 255 or 256. To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_introduction) delivered with the Intel® Distribution of OpenVINO™ toolkit release package.

When you pass the quantized IR to the OpenVINO™ plugin, the plugin automatically recognizes it as a quantized model and performs 8-bit inference. Note, if you pass a quantized model to another plugin that does not support 8-bit inference but supports all operations from the model, the model is inferred in precision that this plugin supports.

Expand Down
2 changes: 1 addition & 1 deletion docs/OV_Runtime_UG/supported_plugins/CPU.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ CPU plugin supports the following data types as inference precision of internal
Selected precision of each primitive depends on the operation precision in IR, quantization primitives, and available hardware capabilities.
u1/u8/i8 data types are used for quantized operations only, i.e. those are not selected automatically for non-quantized operations.

See [low-precision optimization guide](@ref pot_docs_LowPrecisionOptimizationGuide) for more details on how to get quantized model.
See [low-precision optimization guide](@ref openvino_docs_model_optimization_guide) for more details on how to get quantized model.

> **NOTE**: Platforms that do not support Intel® AVX512-VNNI have a known "saturation issue" which in some cases leads to reduced computational accuracy for u8/i8 precision calculations.
> See [saturation (overflow) issue section](@ref pot_saturation_issue) to get more information on how to detect such issues and possible workarounds.
Expand Down
4 changes: 2 additions & 2 deletions docs/OV_Runtime_UG/supported_plugins/GNA.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ can cause the user's request to be executed on CPU, thereby unnecessarily increa

Intel® GNA essentially operates in the low-precision mode which represents a mix of 8-bit (`i8`), 16-bit (`i16`), and 32-bit (`i32`) integer computations.

GNA plugin users are encouraged to use the [Post-Training Optimization Tool](@ref pot_README) to get a model with quantization hints based on statistics for the provided dataset.
GNA plugin users are encouraged to use the [Post-Training Optimization Tool](@ref pot_introduction) to get a model with quantization hints based on statistics for the provided dataset.

Unlike other plugins supporting low-precision execution, the GNA plugin can calculate quantization factors at the model loading time, so you can run a model without calibration. However, this mode may not provide satisfactory accuracy because the internal quantization algorithm is based on heuristics which may or may not be efficient, depending on the model and dynamic range of input data and this mode is going to be deprecated soon.

Expand All @@ -101,7 +101,7 @@ GNA plugin supports the following data types as inference precision of internal

[Hello Query Device C++ Sample](@ref openvino_inference_engine_samples_hello_query_device_README) can be used to print out supported data types for all detected devices.

[POT API Usage sample for GNA](@ref pot_sample_speech_README) demonstrates how a model can be quantized for GNA using POT API in 2 modes:
[POT API Usage sample for GNA](@ref pot_example_speech_README) demonstrates how a model can be quantized for GNA using POT API in 2 modes:
* Accuracy (i16 weights)
* Performance (i8 weights)

Expand Down
4 changes: 2 additions & 2 deletions docs/OV_Runtime_UG/supported_plugins/GPU.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ GPU plugin supports the following data types as inference precision of internal

Selected precision of each primitive depends on the operation precision in IR, quantization primitives, and available hardware capabilities.
u1/u8/i8 data types are used for quantized operations only, i.e. those are not selected automatically for non-quantized operations.
See [low-precision optimization guide](@ref pot_docs_LowPrecisionOptimizationGuide) for more details on how to get quantized model.
See For more details on how to get quantized model please refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document. for more details on how to get quantized model.

Floating-point precision of a GPU primitive is selected based on operation precision in IR except [compressed f16 IR form](../../MO_DG/prepare_model/FP16_Compression.md) which is executed in f16 precision.

Expand Down Expand Up @@ -298,7 +298,7 @@ The behavior depends on specific parameters of the operations and hardware confi

## GPU Performance Checklist: Summary <a name="gpu-checklist"></a>
Since the OpenVINO relies on the OpenCL&trade; kernels for the GPU implementation. Thus, many general OpenCL tips apply:
- Prefer `FP16` inference precision over `FP32`, as the Model Optimizer can generate both variants and the `FP32` is default. Also, consider [int8 inference](../Int8Inference.md)
- Prefer `FP16` inference precision over `FP32`, as the Model Optimizer can generate both variants and the `FP32` is default. Also, consider [int8 inference](@ref openvino_docs_model_optimization_guide).
- Try to group individual infer jobs by using [automatic batching](../automatic_batching.md)
- Consider [caching](../Model_caching_overview.md) to minimize model load time
- If your application is simultaneously using the inference on the CPU or otherwise loads the host heavily, make sure that the OpenCL driver threads do not starve. You can use [CPU configuration options](./CPU.md) to limit number of inference threads for the CPU plugin.
Expand Down
2 changes: 1 addition & 1 deletion docs/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ With the [Model Downloader](@ref omz_tools_downloader) and [Model Optimizer](MO_
The [OpenVINO™ Runtime User Guide](./OV_Runtime_UG/openvino_intro.md) explains the process of creating your own application that runs inference with the OpenVINO™ toolkit. The [API Reference](./api_references.html) defines the OpenVINO Runtime API for Python, C++, and C. The OpenVINO Runtime API is what you'll use to create an OpenVINO™ inference application, use enhanced operations sets and other features. After writing your application, you can use the [Deployment with OpenVINO](./OV_Runtime_UG/deployment/deployment_intro.md) for deploying to target devices.

## Tuning for Performance
The toolkit provides a [Performance Optimization Guide](optimization_guide/dldt_optimization_guide.md) and utilities for squeezing the best performance out of your application, including [Accuracy Checker](@ref omz_tools_accuracy_checker), [Post-Training Optimization Tool](@ref pot_README), and other tools for measuring accuracy, benchmarking performance, and tuning your application.
The toolkit provides a [Performance Optimization Guide](optimization_guide/dldt_optimization_guide.md) and utilities for squeezing the best performance out of your application, including [Accuracy Checker](@ref omz_tools_accuracy_checker), [Post-Training Optimization Tool](@ref pot_introduction), and other tools for measuring accuracy, benchmarking performance, and tuning your application.

## Graphical Web Interface for OpenVINO™ Toolkit
You can choose to use the [OpenVINO™ Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction), a web-based tool that guides you through the process of converting, measuring, optimizing, and deploying models. This tool also serves as a low-effort introduction to the toolkit and provides a variety of useful interactive charts for understanding performance.
Expand Down
2 changes: 1 addition & 1 deletion docs/install_guides/pypi-openvino-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applicatio
| [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) | `mo` |**Model Optimizer** imports, converts, and optimizes models that were trained in popular frameworks to a format usable by OpenVINO components. <br>Supported frameworks include Caffe\*, TensorFlow\*, MXNet\*, PaddlePaddle\*, and ONNX\*. |
| [Benchmark Tool](../../tools/benchmark_tool/README.md)| `benchmark_app` | **Benchmark Application** allows you to estimate deep learning inference performance on supported devices for synchronous and asynchronous modes. |
| [Accuracy Checker](@ref omz_tools_accuracy_checker) and <br> [Annotation Converter](@ref omz_tools_accuracy_checker_annotation_converters) | `accuracy_check` <br> `convert_annotation` |**Accuracy Checker** is a deep learning accuracy validation tool that allows you to collect accuracy metrics against popular datasets. The main advantages of the tool are the flexibility of configuration and a set of supported datasets, preprocessing, postprocessing, and metrics. <br> **Annotation Converter** is a utility that prepares datasets for evaluation with Accuracy Checker. |
| [Post-Training Optimization Tool](../../tools/pot/README.md)| `pot` |**Post-Training Optimization Tool** allows you to optimize trained models with advanced capabilities, such as quantization and low-precision optimizations, without the need to retrain or fine-tune models. Optimizations are also available through the [API](../../tools/pot/openvino/tools/pot/api/README.md). |
| [Post-Training Optimization Tool](../../tools/pot/docs/pot_introduction.md)| `pot` |**Post-Training Optimization Tool** allows you to optimize trained models with advanced capabilities, such as quantization and low-precision optimizations, without the need to retrain or fine-tune models. |
| [Model Downloader and other Open Model Zoo tools](@ref omz_tools_downloader)| `omz_downloader` <br> `omz_converter` <br> `omz_quantizer` <br> `omz_info_dumper`| **Model Downloader** is a tool for getting access to the collection of high-quality and extremely fast pre-trained deep learning [public](@ref omz_models_group_public) and [Intel](@ref omz_models_group_intel)-trained models. These free pre-trained models can be used to speed up the development and production deployment process without training your own models. The tool downloads model files from online sources and, if necessary, patches them to make them more usable with Model Optimizer. A number of additional tools are also provided to automate the process of working with downloaded models:<br> **Model Converter** is a tool for converting Open Model Zoo models that are stored in an original deep learning framework format into the OpenVINO Intermediate Representation (IR) using Model Optimizer. <br> **Model Quantizer** is a tool for automatic quantization of full-precision models in the IR format into low-precision versions using the Post-Training Optimization Tool. <br> **Model Information Dumper** is a helper utility for dumping information about the models to a stable, machine-readable format.

The developer package also installs the OpenVINO™ Runtime package as a dependency.
Expand Down
Loading