Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/releases/2022/1' into title-up…
Browse files Browse the repository at this point in the history
…dates
  • Loading branch information
ilya-lavrenov committed Mar 23, 2022
2 parents 787764e + dd0038b commit 9802547
Show file tree
Hide file tree
Showing 78 changed files with 1,146 additions and 877 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# OpenVINO™ Toolkit
[![Stable release](https://img.shields.io/badge/version-2021.4.2-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2021.4.2)
[![Stable release](https://img.shields.io/badge/version-2022.1-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2022.1)
[![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)
![GitHub branch checks state](https://img.shields.io/github/checks-status/openvinotoolkit/openvino/master?label=GitHub%20checks)
![Azure DevOps builds (branch)](https://img.shields.io/azure-devops/build/openvinoci/b2bab62f-ab2f-4871-a538-86ea1be7d20f/13?label=Public%20CI)
Expand Down
2 changes: 1 addition & 1 deletion docs/Extensibility_UG/add_openvino_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ OpenVINO™ operation contains two constructors:

@snippet template_extension/new/identity.cpp op:visit_attributes

### `evaluate()` and `has_evaluate()`
### evaluate() and has_evaluate()

`ov::Node::evaluate` method enables you to apply constant folding to an operation.

Expand Down
2 changes: 1 addition & 1 deletion docs/Extensibility_UG/frontend_extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Previous sections cover the case when a single operation is mapped to a single o

In case if one-to-one mapping is not possible, *decomposition to multiple operations* should be considered. It is achieved by using more verbose and less automated `ConversionExtension` class. It enables writing arbitrary code to replace a single framework operation by multiple connected OpenVINO operations constructing dependency graph of any complexity.

`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter Build a Model in OpenVINO Runtime” in [](../OV_Runtime_UG/model_representation.md) to learn how to use OpenVINO operation classes to build a fragment of model for replacement.
`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter [Build a Model in OpenVINO Runtime](@ref ov_ug_build_model) to learn how to use OpenVINO operation classes to build a fragment of model for replacement.

The next example illustrates using `ConversionExtension` for conversion of “ThresholdedRelu” from ONNX according to the formula: `ThresholdedRelu(x, alpha) -> Multiply(x, Convert(Greater(x, alpha), type=float))`.

Expand Down
39 changes: 27 additions & 12 deletions docs/MO_DG/prepare_model/Getting_performance_numbers.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ When evaluating performance of your model with the OpenVINO Runtime, you must me

- Track separately the operations that happen outside the OpenVINO Runtime, like video decoding.

> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [_runtime_ preprocessing optimizations](../../optimization_guide/dldt_deployment_optimization_common).
> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [Runtime Optimizations of the Preprocessing](../../optimization_guide/dldt_deployment_optimization_common).
## Tip 2. Getting Credible Performance Numbers

Expand Down Expand Up @@ -53,22 +53,37 @@ When comparing the OpenVINO Runtime performance with the framework or another re
Further, finer-grained insights into inference performance breakdown can be achieved with device-specific performance counters and/or execution graphs.
Both [C++](../../../samples/cpp/benchmark_app/README.md) and [Python](../../../tools/benchmark_tool/README.md) versions of the `benchmark_app` supports a `-pc` command-line parameter that outputs internal execution breakdown.

Below is example of CPU plugin output for a network (since the device is CPU, the layers wall clock `realTime` and the `cpu` time are the same):
For example, below is the part of performance counters for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on [CPU Plugin](../../OV_Runtime_UG/supported_plugins/CPU.md).
Notice that since the device is CPU, the layers wall clock `realTime` and the `cpu` time are the same. Information about layer precision is also stored in the performance counters.

| layerName | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) |
| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ |
| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.377 | 0.377 |
| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_I8 | 0.499 | 0.499 |
| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.399 | 0.399 |
| resnet\_model/add\_4/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
| resnet\_model/add\_4 | NOT\_RUN | Eltwise | undef | 0 | 0 |
| resnet\_model/add\_5/fq\_input\_1 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |


The `exeStatus` column of the table includes possible values:
- `EXECUTED` - layer was executed by standalone primitive,
- `NOT_RUN` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.

The `execType` column of the table includes inference primitives with specific suffixes. The layers have the following marks:
* Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision
* Suffix `FP32` for layers computed in 32-bit precision

All `Convolution` layers are executed in int8 precision. Rest layers are fused into Convolutions using post operations optimization technique, which is described in [Internal CPU Plugin Optimizations](../../OV_Runtime_UG/supported_plugins/CPU.md).
This contains layers name (as seen in IR), layers type and execution statistics.

```
conv1 EXECUTED layerType: Convolution realTime: 706 cpu: 706 execType: jit_avx2
conv2_1_x1 EXECUTED layerType: Convolution realTime: 137 cpu: 137 execType: jit_avx2_1x1
fc6 EXECUTED layerType: Convolution realTime: 233 cpu: 233 execType: jit_avx2_1x1
fc6_nChw8c_nchw EXECUTED layerType: Reorder realTime: 20 cpu: 20 execType: reorder
out_fc6 EXECUTED layerType: Output realTime: 3 cpu: 3 execType: unknown
relu5_9_x2 OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: undef
```
This contains layers name (as seen in IR), layers type and execution statistics. Notice the `OPTIMIZED_OUT`, which indicates that the particular activation was fused into adjacent convolution.
Both benchmark_app versions also support "exec_graph_path" command-line option governing the OpenVINO to output the same per-layer execution statistics, but in the form of the plugin-specific [Netron-viewable](https://netron.app/) graph to the specified file.

Notice that on some devices, the execution graphs/counters may be pretty intrusive overhead-wise.
Also, especially when performance-debugging the [latency case](../../optimization_guide/dldt_deployment_optimization_latency.md) notice that the counters do not reflect the time spent in the plugin/device/driver/etc queues. If the sum of the counters is too different from the latency of an inference request, consider testing with less inference requests. For example running single [OpenVINO stream](../../optimization_guide/dldt_deployment_optimization_tput.md) with multiple requests would produce nearly identical counters as running single inference request, yet the actual latency can be quite different.

Finally, the performance statistics with both performance counters and execution graphs is averaged, so such a data for the [dynamically-shaped inputs](../../OV_Runtime_UG/ov_dynamic_shapes.md) should be measured carefully (ideally by isolating the specific shape and executing multiple times in a loop, to gather the reliable data).

OpenVINO in general and individual plugins are heavily instrumented with Intel® instrumentation and tracing technology (ITT), so another option is to compile the OpenVINO from the source code with the ITT enabled and using tools like [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) to get detailed inference performance breakdown and additional insights in the application-level performance on the timeline view.
OpenVINO in general and individual plugins are heavily instrumented with Intel® instrumentation and tracing technology (ITT), so another option is to compile the OpenVINO from the source code with the ITT enabled and using tools like [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) to get detailed inference performance breakdown and additional insights in the application-level performance on the timeline view.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Converting a ONNX* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX}
# Converting an ONNX Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX}

## Introduction to ONNX
[ONNX*](https://github.com/onnx/onnx) is a representation format for deep learning models. ONNX allows AI developers easily transfer models between different frameworks that helps to choose the best combination for them. Today, PyTorch\*, Caffe2\*, Apache MXNet\*, Microsoft Cognitive Toolkit\* and other tools are developing ONNX support.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This tutorial explains how to convert RetinaNet model to the Intermediate Representation (IR).

[Public RetinaNet model](https://github.com/fizyr/keras-retinanet) does not contain pretrained TensorFlow\* weights.
To convert this model to the TensorFlow\* format, you can use [Reproduce Keras* to TensorFlow* Conversion tutorial](https://docs.openvino.ai/latest/omz_models_model_retinanet_tf.html).
To convert this model to the TensorFlow\* format, you can use [Reproduce Keras* to TensorFlow* Conversion tutorial](@ref omz_models_model_retinanet_tf).

After you convert the model to TensorFlow* format, run the Model Optimizer command below:
```sh
Expand Down
27 changes: 0 additions & 27 deletions docs/OV_Runtime_UG/Int8Inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,30 +59,3 @@ For 8-bit integer computations, a model must be quantized. Quantized models can

![int8_flow]

## Performance Counters

Information about layer precision is stored in the performance counters that are
available from the OpenVINO Runtime API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on [CPU Plugin](supported_plugins/CPU.md) looks as follows:


| layerName | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) |
| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ |
| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.377 | 0.377 |
| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_I8 | 0.499 | 0.499 |
| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.399 | 0.399 |
| resnet\_model/add\_4/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
| resnet\_model/add\_4 | NOT\_RUN | Eltwise | undef | 0 | 0 |
| resnet\_model/add\_5/fq\_input\_1 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |


The `exeStatus` column of the table includes possible values:
- `EXECUTED` - layer was executed by standalone primitive,
- `NOT_RUN` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.

The `execType` column of the table includes inference primitives with specific suffixes. The layers have the following marks:
* Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision
* Suffix `FP32` for layers computed in 32-bit precision

All `Convolution` layers are executed in int8 precision. Rest layers are fused into Convolutions using post operations optimization technique, which is described in [Internal CPU Plugin Optimizations](supported_plugins/CPU.md).
18 changes: 10 additions & 8 deletions docs/OV_Runtime_UG/auto_device_selection.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,24 @@ The best device is chosen using the following logic:
3. Select the first device capable of supporting the given precision, as presented in the table below.
4. If the model’s precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.

@sphinxdirective
+----------+------------------------------------------------------+-------------------------------------+
| Choice || Supported || Supported |
| Priority || Device || model precision |
+==========+======================================================+=====================================+
| 1 || dGPU | FP32, FP16, INT8, BIN |
| || (e.g. Intel® Iris® Xe MAX) | |
| 1 || dGPU || FP32, FP16, INT8, BIN |
| || (e.g. Intel® Iris® Xe MAX) || |
+----------+------------------------------------------------------+-------------------------------------+
| 2 || iGPU | FP32, FP16, BIN |
| || (e.g. Intel® UHD Graphics 620 (iGPU)) | |
| 2 || iGPU || FP32, FP16, BIN |
| || (e.g. Intel® UHD Graphics 620 (iGPU)) || |
+----------+------------------------------------------------------+-------------------------------------+
| 3 || Intel® Movidius™ Myriad™ X VPU | FP16 |
| || (e.g. Intel® Neural Compute Stick 2 (Intel® NCS2)) | |
| 3 || Intel® Movidius™ Myriad™ X VPU || FP16 |
| || (e.g. Intel® Neural Compute Stick 2 (Intel® NCS2)) || |
+----------+------------------------------------------------------+-------------------------------------+
| 4 || Intel® CPU | FP32, FP16, INT8, BIN |
| || (e.g. Intel® Core™ i7-1165G7) | |
| 4 || Intel® CPU || FP32, FP16, INT8, BIN |
| || (e.g. Intel® Core™ i7-1165G7) || |
+----------+------------------------------------------------------+-------------------------------------+
@endsphinxdirective

What is important, **AUTO starts inference with the CPU by default except the priority list is set and there is no CPU in it**. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in compile the model, GPU being the best example, do not impede inference at its initial stages.

Expand Down
8 changes: 5 additions & 3 deletions docs/OV_Runtime_UG/automatic_batching.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ To achieve the best performance with the Automatic Batching, the application sho
- Operate the number of inference requests that represents the multiple of the batch size. In the above example, for batch size 4, the application should operate 4, 8, 12, 16, etc. requests.
- Use the requests, grouped by the batch size, together. For example, the first 4 requests are inferred, while the second group of the requests is being populated. Essentially, the Automatic Batching shifts the asynchronousity from the individual requests to the groups of requests that constitute the batches.
- Balance the 'timeout' value vs the batch size. For example, in many cases having a smaller timeout value/batch size may yield better performance than large batch size, but with the timeout value that is not large enough to accommodate the full number of the required requests.
- When the Automatic Batching is enabled, the 'timeout' property of the `ov::CompiledModel` can be changed any time, even after model loading/compilation. For example, setting the value to 0 effectively disables the auto-batching, as requests' collection would be omitted.
- Carefully apply the auto-batching to the pipelines. For example for the conventional video-sources->detection->classification flow, it is the most benefical to do auto-batching over the inputs to the detection stage. Whereas the resulting number of detections is usually fluent, which makes the auto-batching less applicable for the classification stage.

The following are limitations of the current implementations:
Expand All @@ -119,11 +120,12 @@ Following the OpenVINO convention for devices names, the *batching* device is na
### Testing Automatic Batching Performance with the Benchmark_App
The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the Automatic Batching:
- The most straighforward way is performance hints:
- - benchmark_app **-hint tput** -d GPU -m 'path to your favorite model'
- benchmark_app **-hint tput** -d GPU -m 'path to your favorite model'
- Overriding the strict rules of implicit reshaping by the batch dimension via the explicit device notion:
- - benchmark_app **-hint none -d BATCH:GPU** -m 'path to your favorite model'
- benchmark_app **-hint none -d BATCH:GPU** -m 'path to your favorite model'
- Finally, overriding the automatically-deduced batch size as well:
- - $benchmark_app -hint none -d **BATCH:GPU(16)** -m 'path to your favorite model'
- $benchmark_app -hint none -d **BATCH:GPU(16)** -m 'path to your favorite model'
- notice that some shell versions (e.g. `bash`) may require adding quotes around complex device names, i.e. -d "BATCH:GPU(16)"

The last example is also applicable to the CPU or any other device that generally supports the batched execution.

Expand Down
Loading

0 comments on commit 9802547

Please sign in to comment.