Merge remote-tracking branch 'upstream/releases/2022/1' into title-up…

…dates
openvinotoolkit · Mar 23, 2022 · 9802547 · 9802547
2 parents 787764e + dd0038b
commit 9802547
Show file tree

Hide file tree

Showing 78 changed files with 1,146 additions and 877 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 # OpenVINO™ Toolkit
-[![Stable release](https://img.shields.io/badge/version-2021.4.2-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2021.4.2)
+[![Stable release](https://img.shields.io/badge/version-2022.1-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2022.1)
 [![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)
 ![GitHub branch checks state](https://img.shields.io/github/checks-status/openvinotoolkit/openvino/master?label=GitHub%20checks)
 ![Azure DevOps builds (branch)](https://img.shields.io/azure-devops/build/openvinoci/b2bab62f-ab2f-4871-a538-86ea1be7d20f/13?label=Public%20CI)

diff --git a/docs/Extensibility_UG/add_openvino_ops.md b/docs/Extensibility_UG/add_openvino_ops.md
@@ -51,7 +51,7 @@ OpenVINO™ operation contains two constructors:
 
 @snippet template_extension/new/identity.cpp op:visit_attributes
 
-### `evaluate()` and `has_evaluate()`
+### evaluate() and has_evaluate()
 
 `ov::Node::evaluate` method enables you to apply constant folding to an operation.
 

diff --git a/docs/Extensibility_UG/frontend_extensions.md b/docs/Extensibility_UG/frontend_extensions.md
@@ -86,7 +86,7 @@ Previous sections cover the case when a single operation is mapped to a single o
 
 In case if one-to-one mapping is not possible, *decomposition to multiple operations* should be considered. It is achieved by using more verbose and less automated `ConversionExtension` class. It enables writing arbitrary code to replace a single framework operation by multiple connected OpenVINO operations constructing dependency graph of any complexity.
 
-`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter “Build a Model in OpenVINO Runtime” in [](../OV_Runtime_UG/model_representation.md) to learn how to use OpenVINO operation classes to build a fragment of model for replacement.
+`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter [Build a Model in OpenVINO Runtime](@ref ov_ug_build_model) to learn how to use OpenVINO operation classes to build a fragment of model for replacement.
 
 The next example illustrates using `ConversionExtension` for conversion of “ThresholdedRelu” from ONNX according to the formula: `ThresholdedRelu(x, alpha) -> Multiply(x, Convert(Greater(x, alpha), type=float))`.
 

diff --git a/docs/MO_DG/prepare_model/Getting_performance_numbers.md b/docs/MO_DG/prepare_model/Getting_performance_numbers.md
@@ -9,7 +9,7 @@ When evaluating performance of your model with the OpenVINO Runtime, you must me
 
 - Track separately the operations that happen outside the OpenVINO Runtime, like video decoding. 
 
-> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [_runtime_ preprocessing optimizations](../../optimization_guide/dldt_deployment_optimization_common).
+> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [Runtime Optimizations of the Preprocessing](../../optimization_guide/dldt_deployment_optimization_common).
 
 ## Tip 2. Getting Credible Performance Numbers 
 
@@ -53,22 +53,37 @@ When comparing the OpenVINO Runtime performance with the framework or another re
 Further, finer-grained insights into inference performance breakdown can be achieved with device-specific performance counters and/or execution graphs.
 Both [C++](../../../samples/cpp/benchmark_app/README.md) and [Python](../../../tools/benchmark_tool/README.md) versions of the `benchmark_app` supports a `-pc` command-line parameter that outputs internal execution breakdown.
 
-Below is example of CPU plugin output for a network (since the device is CPU, the layers wall clock `realTime` and the `cpu` time are the same):
+For example, below is the part of performance counters for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on [CPU Plugin](../../OV_Runtime_UG/supported_plugins/CPU.md).
+Notice that since the device is CPU, the layers wall clock `realTime` and the `cpu` time are the same. Information about layer precision is also stored in the performance counters. 
+
+| layerName                                                 | execStatus | layerType    | execType             | realTime (ms) | cpuTime (ms) |
+| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ |
+| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED   | Convolution  | jit\_avx512\_1x1\_I8 | 0.377         | 0.377        |
+| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0              | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
+| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED   | Convolution  | jit\_avx512\_I8      | 0.499         | 0.499        |
+| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0              | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
+| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED   | Convolution  | jit\_avx512\_1x1\_I8 | 0.399         | 0.399        |
+| resnet\_model/add\_4/fq\_input\_0                         | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
+| resnet\_model/add\_4                                      | NOT\_RUN   | Eltwise      | undef                | 0             | 0            |
+| resnet\_model/add\_5/fq\_input\_1                         | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
+
+
+   The `exeStatus` column of the table includes possible values:
+   - `EXECUTED` - layer was executed by standalone primitive,
+   - `NOT_RUN` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.  
+
+   The `execType` column of the table includes inference primitives with specific suffixes. The layers have the following marks:
+   * Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision
+   * Suffix `FP32` for layers computed in 32-bit precision 
+
+   All `Convolution` layers are executed in int8 precision. Rest layers are fused into Convolutions using post operations optimization technique, which is described in [Internal CPU Plugin Optimizations](../../OV_Runtime_UG/supported_plugins/CPU.md).
+   This contains layers name (as seen in IR), layers type and execution statistics.
 
-```
-conv1      EXECUTED       layerType: Convolution        realTime: 706        cpu: 706            execType: jit_avx2
-conv2_1_x1  EXECUTED       layerType: Convolution        realTime: 137        cpu: 137            execType: jit_avx2_1x1
-fc6        EXECUTED       layerType: Convolution        realTime: 233        cpu: 233            execType: jit_avx2_1x1
-fc6_nChw8c_nchw      EXECUTED  layerType: Reorder           realTime: 20         cpu: 20             execType: reorder
-out_fc6         EXECUTED       layerType: Output            realTime: 3          cpu: 3              execType: unknown
-relu5_9_x2    OPTIMIZED_OUT     layerType: ReLU             realTime: 0          cpu: 0              execType: undef
-```
-This contains layers name (as seen in IR), layers type and execution statistics. Notice the `OPTIMIZED_OUT`, which indicates that the particular activation was fused into adjacent convolution.
 Both benchmark_app versions also support "exec_graph_path" command-line option governing the OpenVINO to output the same per-layer execution statistics, but in the form of the plugin-specific [Netron-viewable](https://netron.app/) graph to the specified file.
 
 Notice that on some devices, the execution graphs/counters may be pretty intrusive overhead-wise. 
 Also, especially when performance-debugging the [latency case](../../optimization_guide/dldt_deployment_optimization_latency.md) notice that  the counters do not reflect the time spent in the plugin/device/driver/etc queues. If the sum of the counters is too different from the latency of an inference request, consider testing with less inference requests. For example running single [OpenVINO stream](../../optimization_guide/dldt_deployment_optimization_tput.md) with multiple requests would produce nearly identical counters as running single inference request, yet the actual latency can be quite different.
 
 Finally, the performance statistics with both performance counters and execution graphs is averaged, so such a data for the [dynamically-shaped inputs](../../OV_Runtime_UG/ov_dynamic_shapes.md) should be measured carefully (ideally by isolating the specific shape and executing multiple times in a loop, to gather the reliable data).
 
-OpenVINO in general and individual plugins are heavily instrumented with Intel® instrumentation and tracing technology (ITT), so another option is to compile the OpenVINO from the source code with the ITT enabled and using tools like [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) to get detailed inference performance breakdown and additional insights in the application-level performance on the timeline view.
+OpenVINO in general and individual plugins are heavily instrumented with Intel® instrumentation and tracing technology (ITT), so another option is to compile the OpenVINO from the source code with the ITT enabled and using tools like [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) to get detailed inference performance breakdown and additional insights in the application-level performance on the timeline view.
diff --git a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md
@@ -1,4 +1,4 @@
-# Converting a ONNX* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX}
+# Converting an ONNX Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX}
 
 ## Introduction to ONNX
 [ONNX*](https://github.com/onnx/onnx) is a representation format for deep learning models. ONNX allows AI developers easily transfer models between different frameworks that helps to choose the best combination for them. Today, PyTorch\*, Caffe2\*, Apache MXNet\*, Microsoft Cognitive Toolkit\* and other tools are developing ONNX support.

diff --git a/...DG/prepare_model/convert_model/tf_specific/Convert_RetinaNet_From_Tensorflow.md b/...DG/prepare_model/convert_model/tf_specific/Convert_RetinaNet_From_Tensorflow.md
@@ -3,7 +3,7 @@
 This tutorial explains how to convert RetinaNet model to the Intermediate Representation (IR).
 
 [Public RetinaNet model](https://github.com/fizyr/keras-retinanet) does not contain pretrained TensorFlow\* weights.
-To convert this model to the TensorFlow\* format, you can use [Reproduce Keras* to TensorFlow* Conversion tutorial](https://docs.openvino.ai/latest/omz_models_model_retinanet_tf.html).
+To convert this model to the TensorFlow\* format, you can use [Reproduce Keras* to TensorFlow* Conversion tutorial](@ref omz_models_model_retinanet_tf).
 
 After you convert the model to TensorFlow* format, run the Model Optimizer command below:
 ```sh

diff --git a/docs/OV_Runtime_UG/Int8Inference.md b/docs/OV_Runtime_UG/Int8Inference.md
@@ -59,30 +59,3 @@ For 8-bit integer computations, a model must be quantized. Quantized models can
 
 ![int8_flow]
 
-## Performance Counters
-
-Information about layer precision is stored in the performance counters that are
-available from the OpenVINO Runtime API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on [CPU Plugin](supported_plugins/CPU.md) looks as follows:
-
-
-| layerName                                                 | execStatus | layerType    | execType             | realTime (ms) | cpuTime (ms) |
-| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ |
-| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED   | Convolution  | jit\_avx512\_1x1\_I8 | 0.377         | 0.377        |
-| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0              | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
-| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED   | Convolution  | jit\_avx512\_I8      | 0.499         | 0.499        |
-| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0              | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
-| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED   | Convolution  | jit\_avx512\_1x1\_I8 | 0.399         | 0.399        |
-| resnet\_model/add\_4/fq\_input\_0                         | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
-| resnet\_model/add\_4                                      | NOT\_RUN   | Eltwise      | undef                | 0             | 0            |
-| resnet\_model/add\_5/fq\_input\_1                         | NOT\_RUN   | FakeQuantize | undef                | 0             | 0            |
-
-
-   The `exeStatus` column of the table includes possible values:
-   - `EXECUTED` - layer was executed by standalone primitive,
-   - `NOT_RUN` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.  
-
-   The `execType` column of the table includes inference primitives with specific suffixes. The layers have the following marks:
-   * Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision
-   * Suffix `FP32` for layers computed in 32-bit precision 
-
-   All `Convolution` layers are executed in int8 precision. Rest layers are fused into Convolutions using post operations optimization technique, which is described in [Internal CPU Plugin Optimizations](supported_plugins/CPU.md).
diff --git a/docs/OV_Runtime_UG/auto_device_selection.md b/docs/OV_Runtime_UG/auto_device_selection.md
@@ -23,22 +23,24 @@ The best device is chosen using the following logic:
 3. Select the first device capable of supporting the given precision, as presented in the table below.
 4. If the model’s precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
 
+@sphinxdirective
 +----------+------------------------------------------------------+-------------------------------------+
 | Choice   || Supported                                           || Supported                          |
 | Priority || Device                                              || model precision                    |
 +==========+======================================================+=====================================+
-| 1        || dGPU                                                | FP32, FP16, INT8, BIN               |
-|          || (e.g. Intel® Iris® Xe MAX)                          |                                     |
+| 1        || dGPU                                                || FP32, FP16, INT8, BIN              |
+|          || (e.g. Intel® Iris® Xe MAX)                          ||                                    |
 +----------+------------------------------------------------------+-------------------------------------+
-| 2        || iGPU                                                | FP32, FP16, BIN                     |
-|          || (e.g. Intel® UHD Graphics 620 (iGPU))               |                                     |
+| 2        || iGPU                                                || FP32, FP16, BIN                    |
+|          || (e.g. Intel® UHD Graphics 620 (iGPU))               ||                                    |
 +----------+------------------------------------------------------+-------------------------------------+
-| 3        || Intel® Movidius™ Myriad™ X VPU                      | FP16                                |
-|          || (e.g. Intel® Neural Compute Stick 2 (Intel® NCS2))  |                                     |
+| 3        || Intel® Movidius™ Myriad™ X VPU                      || FP16                               |
+|          || (e.g. Intel® Neural Compute Stick 2 (Intel® NCS2))  ||                                    |
 +----------+------------------------------------------------------+-------------------------------------+
-| 4        || Intel® CPU                                          | FP32, FP16, INT8, BIN               |
-|          || (e.g. Intel® Core™ i7-1165G7)                       |                                     |
+| 4        || Intel® CPU                                          || FP32, FP16, INT8, BIN              |
+|          || (e.g. Intel® Core™ i7-1165G7)                       ||                                    |
 +----------+------------------------------------------------------+-------------------------------------+
+@endsphinxdirective
 
 What is important, **AUTO starts inference with the CPU by default except the priority list is set and there is no CPU in it**. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in compile the model, GPU being the best example, do not impede inference at its initial stages. 
 

diff --git a/docs/OV_Runtime_UG/automatic_batching.md b/docs/OV_Runtime_UG/automatic_batching.md
@@ -98,6 +98,7 @@ To achieve the best performance with the Automatic Batching, the application sho
  - Operate the number of inference requests that represents the multiple of the batch size. In the above example, for batch size 4, the application should operate 4, 8, 12, 16, etc. requests.
  - Use the requests, grouped by the batch size, together. For example, the first 4 requests are inferred, while the second group of the requests is being populated. Essentially, the Automatic Batching shifts the asynchronousity from the individual requests to the groups of requests that constitute the batches.
   - Balance the 'timeout' value vs the batch size. For example, in many cases having a smaller timeout value/batch size may yield better performance than large batch size, but with the timeout value that is not large enough to accommodate the full number of the required requests.
+  - When the Automatic Batching is enabled, the 'timeout' property of the `ov::CompiledModel` can be changed any time, even after model loading/compilation. For example, setting the value to 0 effectively disables the auto-batching, as requests' collection would be omitted.
   - Carefully apply the auto-batching to the pipelines. For example for the conventional video-sources->detection->classification flow, it is the most benefical to do auto-batching over the inputs to the detection stage. Whereas the resulting number of detections is usually fluent, which makes the auto-batching less applicable for the classification stage.         
 
 The following are limitations of the current implementations:
@@ -119,11 +120,12 @@ Following the OpenVINO convention for devices names, the *batching* device is na
 ### Testing Automatic Batching Performance with the Benchmark_App
 The `benchmark_app`, that exists in both  [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the Automatic Batching:
  -  The most straighforward way is performance hints:
-- - benchmark_app **-hint tput** -d GPU -m 'path to your favorite model'
+    - benchmark_app **-hint tput** -d GPU -m 'path to your favorite model'
  -  Overriding the strict rules of implicit reshaping by the batch dimension via the explicit device notion:
-- - benchmark_app **-hint none -d BATCH:GPU** -m 'path to your favorite model'
+    - benchmark_app **-hint none -d BATCH:GPU** -m 'path to your favorite model'
  -  Finally, overriding the automatically-deduced batch size as well:
-- - $benchmark_app -hint none -d **BATCH:GPU(16)** -m 'path to your favorite model'
+    - $benchmark_app -hint none -d **BATCH:GPU(16)** -m 'path to your favorite model'
+    - notice that some shell versions (e.g. `bash`) may require adding quotes around complex device names, i.e. -d "BATCH:GPU(16)"
 
 The last example is also applicable to the CPU or any other device that generally supports the batched execution.