From 3d82fd29bb90ec7636b900c84cc0611711c26aa9 Mon Sep 17 00:00:00 2001 From: Savina Date: Wed, 7 Jul 2021 10:09:04 +0300 Subject: [PATCH 01/10] install docs fixes --- docs/get_started/get_started_dl_workbench.md | 28 +++++++++++++++---- .../installing-openvino-linux.md | 12 ++++---- .../installing-openvino-macos.md | 21 +++++++------- .../installing-openvino-windows.md | 16 ++++++----- 4 files changed, 50 insertions(+), 27 deletions(-) diff --git a/docs/get_started/get_started_dl_workbench.md b/docs/get_started/get_started_dl_workbench.md index b867dcf94282a9..e1c948e9e8da01 100644 --- a/docs/get_started/get_started_dl_workbench.md +++ b/docs/get_started/get_started_dl_workbench.md @@ -14,8 +14,6 @@ Start working with the OpenVINO™ toolkit right from your browser: import a mod * Analyze the quality of your model and visualize output. * Use preconfigured JupyterLab\* environment to learn OpenVINO™ workflow. - - ## Run DL Workbench You can [run DL Workbench](@ref workbench_docs_Workbench_DG_Install) on your local system or in the Intel® DevCloud for the Edge. Ensure that you have met the [prerequisites](@ref workbench_docs_Workbench_DG_Prerequisites). @@ -28,12 +26,32 @@ Once DL Workbench is set up, open the http://127.0.0.1:5665 link. ![](./dl_workbench_img/active_projects_page.png) -Watch the video to learn more detailed information on how to run DL Workbench: - - Congratulations, you have installed DL Workbench. Your next step is to [Get Started with DL Workbench](@ref workbench_docs_Workbench_DG_Work_with_Models_and_Sample_Datasets) and create your first project. +## Videos + +\htmlonly + + + + + + + + + +
+\endhtmlonly + +\htmlonly + +\endhtmlonly + +\htmlonly +
What is the OpenVINO™ toolkit DL Workbench.
Duration: 1:31
How to Install the OpenVINO™ toolkit DL Workbench.
Duration: 8:20.
+\endhtmlonly + ## See Also * [Get Started with DL Workbench](@ref workbench_docs_Workbench_DG_Work_with_Models_and_Sample_Datasets) * [DL Workbench Overview](@ref workbench_docs_Workbench_DG_Introduction) diff --git a/docs/install_guides/installing-openvino-linux.md b/docs/install_guides/installing-openvino-linux.md index 5f8cbfe4bd8085..50f63e62c34072 100644 --- a/docs/install_guides/installing-openvino-linux.md +++ b/docs/install_guides/installing-openvino-linux.md @@ -6,11 +6,13 @@ > - CentOS and Yocto installations will require some modifications that are not covered in this guide. > - An internet connection is required to follow the steps in this guide. -> **TIP**: You can quick start with the Model Optimizer inside the OpenVINO™ [Deep Learning Workbench](@ref -> openvino_docs_get_started_get_started_dl_workbench) (DL Workbench). -> [DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is an OpenVINO™ UI that enables you to -> import a model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for -> deployment on various Intel® platforms. + +> **TIP**: If you want to [quick start with OpenVINO™ toolkit](@ref +> openvino_docs_get_started_get_started_dl_workbench), you can use +> the OpenVINO™ [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench). DL Workbench is the OpenVINO™ toolkit UI +> that enables you to import a +> model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for deployment +> on various Intel® platforms. ## Introduction diff --git a/docs/install_guides/installing-openvino-macos.md b/docs/install_guides/installing-openvino-macos.md index d5982602b0ab17..44196a46c031e3 100644 --- a/docs/install_guides/installing-openvino-macos.md +++ b/docs/install_guides/installing-openvino-macos.md @@ -4,17 +4,18 @@ > - The Intel® Distribution of OpenVINO™ is supported on macOS\* 10.15.x versions. > - An internet connection is required to follow the steps in this guide. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment. -> **TIP**: You can quick start with the Model Optimizer inside the OpenVINO™ [Deep Learning Workbench](@ref -> openvino_docs_get_started_get_started_dl_workbench) (DL Workbench). -> [DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is an OpenVINO™ UI that enables you to -> import a model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for -> deployment on various Intel® platforms. +> **TIP**: If you want to [quick start with OpenVINO™ toolkit](@ref +> openvino_docs_get_started_get_started_dl_workbench), you can use +> the OpenVINO™ [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench). DL Workbench is the OpenVINO™ toolkit UI +> that enables you to import a +> model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for deployment +> on various Intel® platforms. ## Introduction The Intel® Distribution of OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. -The Intel® Distribution of OpenVINO™ toolkit for macOS* includes the Inference Engine, OpenCV* libraries and Model Optimizer tool to deploy applications for accelerated inference on Intel® CPUs and Intel® Neural Compute Stick 2. +The Intel® Distribution of OpenVINO™ toolkit for macOS* includes the Inference Engine, OpenCV* libraries, Sample Applications, Demos, Model Optimizer and other additional tools to deploy applications for accelerated inference on Intel® CPUs and Intel® Neural Compute Stick 2. The Intel® Distribution of OpenVINO™ toolkit for macOS*: @@ -53,7 +54,7 @@ The development and target platforms have the same requirements, but you can sel **Software Requirements** -* CMake 3.10 or higher +* CMake 3.13 or higher + [Install](https://cmake.org/download/) (choose "macOS 10.13 or later") + Add `/Applications/CMake.app/Contents/bin` to path (for default install) * Python 3.6 - 3.7 @@ -108,7 +109,7 @@ The disk image is mounted to `/Volumes/m_openvino_toolkit_p_` and autom 5. Click **Next** and follow the instructions on your screen. -6. If you are missing external dependencies, you will see a warning screen. Take note of any dependencies you are missing. After installing the Intel® Distribution of OpenVINO™ toolkit core components, you will need to install the missing dependencies. For example, the screen example below indicates you are missing two dependencies: +6. If you are missing external dependencies, you will see a warning screen. Take note of any dependencies you are missing. After installing the Intel® Distribution of OpenVINO™ toolkit core components, you will need to install the missing dependencies. For example, the screen example below indicates you are missing a dependency: ![](../img/openvino-install-macos-02.png) 7. Click **Next**. @@ -118,7 +119,7 @@ The disk image is mounted to `/Volumes/m_openvino_toolkit_p_` and autom By default, the Intel® Distribution of OpenVINO™ is installed to the following directory, referred to as ``: * For root or administrator: `/opt/intel/openvino_/` - * For regular users: `/home//intel/openvino_/` + * For regular users: `/home//intel/openvino_/` For simplicity, a symbolic link to the latest installation is also created: `/home//intel/openvino_2021/`. 9. If needed, click **Customize** to change the installation directory or the components you want to install: @@ -273,7 +274,7 @@ Now you are ready to get started. To continue, see the following pages: Follow the steps below to uninstall the Intel® Distribution of OpenVINO™ Toolkit from your system: -1. From the the installation directory (by default, `/opt/intel/openvino_2021`), locate and open `openvino_toolkit_uninstaller.app`. +1. From the installation directory (by default, `/opt/intel/openvino_2021`), locate and open `openvino_toolkit_uninstaller.app`. 2. Follow the uninstallation wizard instructions. 3. When uninstallation is complete, click **Finish**. diff --git a/docs/install_guides/installing-openvino-windows.md b/docs/install_guides/installing-openvino-windows.md index 35720fe0784449..239afbb4935aa1 100644 --- a/docs/install_guides/installing-openvino-windows.md +++ b/docs/install_guides/installing-openvino-windows.md @@ -1,13 +1,14 @@ # Install Intel® Distribution of OpenVINO™ toolkit for Windows* 10 {#openvino_docs_install_guides_installing_openvino_windows} -> **NOTES**: +> **NOTE**: > - This guide applies to Microsoft Windows\* 10 64-bit. For Linux* OS information and instructions, see the [Installation Guide for Linux](installing-openvino-linux.md). -> **TIP**: You can quick start with the Model Optimizer inside the OpenVINO™ [Deep Learning Workbench](@ref -> openvino_docs_get_started_get_started_dl_workbench) (DL Workbench). -> [DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is an OpenVINO™ UI that enables you to -> import a model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for -> deployment on various Intel® platforms. +> **TIP**: If you want to [quick start with OpenVINO™ toolkit](@ref +> openvino_docs_get_started_get_started_dl_workbench), you can use +> the OpenVINO™ [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench). DL Workbench is the OpenVINO™ toolkit UI +> that enables you to import a +> model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for deployment +> on various Intel® platforms. ## Introduction @@ -270,7 +271,7 @@ To perform inference on Intel® Vision Accelerator Design with Intel® Movidius 1. Download and install Visual C++ Redistributable for Visual Studio 2017 2. Check with a support engineer if your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs card requires SMBUS connection to PCIe slot (most unlikely). Install the SMBUS driver only if confirmed (by default, it's not required): 1. Go to the `\deployment_tools\inference-engine\external\hddl\drivers\SMBusDriver` directory, where `` is the directory in which the Intel Distribution of OpenVINO toolkit is installed. - 2. Right click on the `hddlsmbus.inf` file and choose **Install** from the pop up menu. + 2. Right click on the `hddlsmbus.inf` file and choose **Install** from the pop-up menu. You are done installing your device driver and are ready to use your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. @@ -332,6 +333,7 @@ To learn more about converting deep learning models, go to: - [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) - [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) - [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) +- [Convert Your Kaldi* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) ## Additional Resources From ce0c28db7bc7d704d337fabda70106a71e2bdf0c Mon Sep 17 00:00:00 2001 From: Savina Date: Wed, 7 Jul 2021 13:06:34 +0300 Subject: [PATCH 02/10] changed video width --- docs/get_started/get_started_dl_workbench.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/get_started/get_started_dl_workbench.md b/docs/get_started/get_started_dl_workbench.md index e1c948e9e8da01..7282c3cb69283d 100644 --- a/docs/get_started/get_started_dl_workbench.md +++ b/docs/get_started/get_started_dl_workbench.md @@ -36,18 +36,18 @@ Congratulations, you have installed DL Workbench. Your next step is to [Get Star \endhtmlonly - + \htmlonly \endhtmlonly - + \htmlonly What is the OpenVINO™ toolkit DL Workbench.
Duration: 1:31 - How to Install the OpenVINO™ toolkit DL Workbench.
Duration: 8:20. + How to Install the OpenVINO™ toolkit DL Workbench.
Duration: 8:20 \endhtmlonly From 23bbf868647e3caf183e198ea077f38a36815f7d Mon Sep 17 00:00:00 2001 From: Savina Date: Wed, 7 Jul 2021 18:52:37 +0300 Subject: [PATCH 03/10] CMake reference added --- docs/IE_DG/Samples_Overview.md | 2 +- docs/install_guides/installing-openvino-macos.md | 2 -- docs/install_guides/installing-openvino-windows.md | 5 ++--- 3 files changed, 3 insertions(+), 6 deletions(-) diff --git a/docs/IE_DG/Samples_Overview.md b/docs/IE_DG/Samples_Overview.md index db39cbfc5b4cf8..1332e30d3faad5 100644 --- a/docs/IE_DG/Samples_Overview.md +++ b/docs/IE_DG/Samples_Overview.md @@ -109,7 +109,7 @@ for the debug configuration — in `/intel64/Debug/`. The recommended Windows* build environment is the following: * Microsoft Windows* 10 -* Microsoft Visual Studio* 2017, or 2019 +* Microsoft Visual Studio* 2017, or 2019. Make sure that C++ CMake tools for Windows is [enabled](https://docs.microsoft.com/en-us/cpp/build/cmake-projects-in-visual-studio?view=msvc-160#:~:text=The%20Visual%20C%2B%2B%20Tools%20for,Visual%20Studio%20generators%20are%20supported). * CMake* version 3.10 or higher > **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. diff --git a/docs/install_guides/installing-openvino-macos.md b/docs/install_guides/installing-openvino-macos.md index 44196a46c031e3..f4b3f177713939 100644 --- a/docs/install_guides/installing-openvino-macos.md +++ b/docs/install_guides/installing-openvino-macos.md @@ -15,8 +15,6 @@ The Intel® Distribution of OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. -The Intel® Distribution of OpenVINO™ toolkit for macOS* includes the Inference Engine, OpenCV* libraries, Sample Applications, Demos, Model Optimizer and other additional tools to deploy applications for accelerated inference on Intel® CPUs and Intel® Neural Compute Stick 2. - The Intel® Distribution of OpenVINO™ toolkit for macOS*: - Enables CNN-based deep learning inference on the edge diff --git a/docs/install_guides/installing-openvino-windows.md b/docs/install_guides/installing-openvino-windows.md index 239afbb4935aa1..7b99292e32688f 100644 --- a/docs/install_guides/installing-openvino-windows.md +++ b/docs/install_guides/installing-openvino-windows.md @@ -96,9 +96,8 @@ The following components are installed by default: - Microsoft Windows\* 10 64-bit **Software** -- [Microsoft Visual Studio* with C++ **2019 or 2017** with MSBuild](http://visualstudio.microsoft.com/downloads/) -- [CMake **3.10 or higher** 64-bit](https://cmake.org/download/) - > **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. +- [Microsoft Visual Studio* with C++ **2019 or 2017** with MSBuild](http://visualstudio.microsoft.com/downloads/). Make sure that C++ CMake tools for Windows is [enabled](https://docs.microsoft.com/en-us/cpp/build/cmake-projects-in-visual-studio?view=msvc-160#:~:text=The%20Visual%20C%2B%2B%20Tools%20for,Visual%20Studio%20generators%20are%20supported). +- [CMake **3.10 or higher** 64-bit](https://cmake.org/download/). If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. - [Python **3.6** - **3.8** 64-bit](https://www.python.org/downloads/windows/) ## Installation Steps From 809f05a20a4e54c0907f151e45996fc35163300e Mon Sep 17 00:00:00 2001 From: Savina Date: Thu, 8 Jul 2021 20:32:05 +0300 Subject: [PATCH 04/10] fixed table --- docs/IE_DG/supported_plugins/CPU.md | 4 ++-- docs/IE_DG/supported_plugins/GPU.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/IE_DG/supported_plugins/CPU.md b/docs/IE_DG/supported_plugins/CPU.md index 8f75a792adeeb2..1ff09616e6e4c0 100644 --- a/docs/IE_DG/supported_plugins/CPU.md +++ b/docs/IE_DG/supported_plugins/CPU.md @@ -110,8 +110,8 @@ These are general options, also supported by other plugins: CPU-specific settings: -| Parameter name | Parameter values | Default | Description | -| :--- | :--- | :--- | :--- | +| Parameter Name | Parameter Values | Default | Description | +| :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------| | KEY_CPU_THREADS_NUM | positive integer values| 0 | Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores| | KEY_CPU_BIND_THREAD | YES/NUMA/NO | YES | Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (number of the streams is less or equal to the number of NUMA nodes, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines. | | KEY_CPU_THROUGHPUT_STREAMS | KEY_CPU_THROUGHPUT_NUMA, KEY_CPU_THROUGHPUT_AUTO, or positive integer values| 1 | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior for single NUMA-node machine, with all available cores processing requests one by one. On the multi-socket (multiple NUMA nodes) machine, the best latency numbers usually achieved with a number of streams matching the number of NUMA-nodes.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
Non-negative integer value creates the requested number of streams. If a number of streams is 0, no internal streams are created and user threads are interpreted as stream master threads.| diff --git a/docs/IE_DG/supported_plugins/GPU.md b/docs/IE_DG/supported_plugins/GPU.md index cc12be98a121e1..e09b5f542dd917 100644 --- a/docs/IE_DG/supported_plugins/GPU.md +++ b/docs/IE_DG/supported_plugins/GPU.md @@ -99,8 +99,8 @@ The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::Core::LoadNetwork() in order to take effect. When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. -| Parameter Name | Parameter Values | Default | Description | -|---------------------|-----------------------------|-----------------|-----------------------------------------------------------| +| Parameter Name | Parameter Values | Default | Description | +| :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------| | `KEY_CACHE_DIR` | `""` | `""` | Specifies a directory where compiled OCL binaries can be cached. First model loading generates the cache, and all subsequent LoadNetwork calls use precompiled kernels which significantly improves load time. If empty - caching is disabled | | `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | | `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | From d1905d4a991b97ecb898d6a117e8a2a6df1042e8 Mon Sep 17 00:00:00 2001 From: Savina Date: Thu, 8 Jul 2021 20:47:33 +0300 Subject: [PATCH 05/10] added backtics and table formating --- docs/IE_DG/supported_plugins/CPU.md | 16 ++++++++-------- docs/IE_DG/supported_plugins/GPU.md | 4 ++-- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/IE_DG/supported_plugins/CPU.md b/docs/IE_DG/supported_plugins/CPU.md index 1ff09616e6e4c0..b2842d0243034e 100644 --- a/docs/IE_DG/supported_plugins/CPU.md +++ b/docs/IE_DG/supported_plugins/CPU.md @@ -105,17 +105,17 @@ These are general options, also supported by other plugins: | Parameter name | Parameter values | Default | Description | | :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| -| KEY_EXCLUSIVE_ASYNC_REQUESTS | YES/NO | NO | Forces async requests (also from different executable networks) to execute serially. This prevents potential oversubscription| -| KEY_PERF_COUNT | YES/NO | NO | Enables gathering performance counters | +| `KEY_EXCLUSIVE_ASYNC_REQUESTS` | `YES`/`NO` | `NO` | Forces async requests (also from different executable networks) to execute serially. This prevents potential oversubscription| +| `KEY_PERF_COUNT` | `YES`/`NO` | `NO` | Enables gathering performance counters | CPU-specific settings: -| Parameter Name | Parameter Values | Default | Description | -| :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------| -| KEY_CPU_THREADS_NUM | positive integer values| 0 | Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores| -| KEY_CPU_BIND_THREAD | YES/NUMA/NO | YES | Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (number of the streams is less or equal to the number of NUMA nodes, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines. | -| KEY_CPU_THROUGHPUT_STREAMS | KEY_CPU_THROUGHPUT_NUMA, KEY_CPU_THROUGHPUT_AUTO, or positive integer values| 1 | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior for single NUMA-node machine, with all available cores processing requests one by one. On the multi-socket (multiple NUMA nodes) machine, the best latency numbers usually achieved with a number of streams matching the number of NUMA-nodes.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
Non-negative integer value creates the requested number of streams. If a number of streams is 0, no internal streams are created and user threads are interpreted as stream master threads.| -| KEY_ENFORCE_BF16 | YES/NO| YES | The name for setting to execute in bfloat16 precision whenever it is possible. This option lets plugin know to downscale the precision where it sees performance benefits from bfloat16 execution. Such option does not guarantee accuracy of the network, you need to verify the accuracy in this mode separately, based on performance and accuracy results. It should be your decision whether to use this option or not. | +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| +| `KEY_CPU_THREADS_NUM` | `positive integer values`| `0` | Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores| +| `KEY_CPU_BIND_THREAD` | `YES`/`NUMA`/`NO` | `YES` | Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (number of the streams is less or equal to the number of NUMA nodes, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines. | +| `KEY_CPU_THROUGHPUT_STREAMS` | `KEY_CPU_THROUGHPUT_NUMA`, `KEY_CPU_THROUGHPUT_AUTO`, or `positive integer values`| `1` | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior for single NUMA-node machine, with all available cores processing requests one by one. On the multi-socket (multiple NUMA nodes) machine, the best latency numbers usually achieved with a number of streams matching the number of NUMA-nodes.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
Non-negative integer value creates the requested number of streams. If a number of streams is 0, no internal streams are created and user threads are interpreted as stream master threads.| +| `KEY_ENFORCE_BF16` | `YES`/`NO`| `YES` | The name for setting to execute in bfloat16 precision whenever it is possible. This option lets plugin know to downscale the precision where it sees performance benefits from bfloat16 execution. Such option does not guarantee accuracy of the network, you need to verify the accuracy in this mode separately, based on performance and accuracy results. It should be your decision whether to use this option or not. | > **NOTE**: To disable all internal threading, use the following set of configuration parameters: `KEY_CPU_THROUGHPUT_STREAMS=0`, `KEY_CPU_THREADS_NUM=1`, `KEY_CPU_BIND_THREAD=NO`. diff --git a/docs/IE_DG/supported_plugins/GPU.md b/docs/IE_DG/supported_plugins/GPU.md index e09b5f542dd917..2f877775dcf32f 100644 --- a/docs/IE_DG/supported_plugins/GPU.md +++ b/docs/IE_DG/supported_plugins/GPU.md @@ -99,8 +99,8 @@ The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::Core::LoadNetwork() in order to take effect. When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. -| Parameter Name | Parameter Values | Default | Description | -| :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------| +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| | `KEY_CACHE_DIR` | `""` | `""` | Specifies a directory where compiled OCL binaries can be cached. First model loading generates the cache, and all subsequent LoadNetwork calls use precompiled kernels which significantly improves load time. If empty - caching is disabled | | `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | | `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | From 8b8513f3f73e1d9ce5ecd0dbb36a465bf6e9db84 Mon Sep 17 00:00:00 2001 From: Savina Date: Fri, 9 Jul 2021 09:32:09 +0300 Subject: [PATCH 06/10] new table changes --- docs/IE_DG/supported_plugins/CPU.md | 5 +++-- docs/IE_DG/supported_plugins/GPU.md | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/IE_DG/supported_plugins/CPU.md b/docs/IE_DG/supported_plugins/CPU.md index b2842d0243034e..5e335a4fffc880 100644 --- a/docs/IE_DG/supported_plugins/CPU.md +++ b/docs/IE_DG/supported_plugins/CPU.md @@ -110,8 +110,9 @@ These are general options, also supported by other plugins: CPU-specific settings: -| Parameter name | Parameter values | Default | Description | -| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| + +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- |:-----------------------------------------------------------------------------| | `KEY_CPU_THREADS_NUM` | `positive integer values`| `0` | Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores| | `KEY_CPU_BIND_THREAD` | `YES`/`NUMA`/`NO` | `YES` | Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (number of the streams is less or equal to the number of NUMA nodes, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines. | | `KEY_CPU_THROUGHPUT_STREAMS` | `KEY_CPU_THROUGHPUT_NUMA`, `KEY_CPU_THROUGHPUT_AUTO`, or `positive integer values`| `1` | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior for single NUMA-node machine, with all available cores processing requests one by one. On the multi-socket (multiple NUMA nodes) machine, the best latency numbers usually achieved with a number of streams matching the number of NUMA-nodes.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
Non-negative integer value creates the requested number of streams. If a number of streams is 0, no internal streams are created and user threads are interpreted as stream master threads.| diff --git a/docs/IE_DG/supported_plugins/GPU.md b/docs/IE_DG/supported_plugins/GPU.md index 2f877775dcf32f..fe8e8c66c641e3 100644 --- a/docs/IE_DG/supported_plugins/GPU.md +++ b/docs/IE_DG/supported_plugins/GPU.md @@ -100,7 +100,7 @@ All parameters must be set before calling InferenceEngine::Core::LoadNetwo When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. | Parameter name | Parameter values | Default | Description | -| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| +| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------      | | `KEY_CACHE_DIR` | `""` | `""` | Specifies a directory where compiled OCL binaries can be cached. First model loading generates the cache, and all subsequent LoadNetwork calls use precompiled kernels which significantly improves load time. If empty - caching is disabled | | `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | | `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | From 40f34373ddf5a636115884b58691e0abefa29c4c Mon Sep 17 00:00:00 2001 From: Savina Date: Fri, 9 Jul 2021 09:57:20 +0300 Subject: [PATCH 07/10] GPU table changes --- docs/IE_DG/supported_plugins/GPU.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/IE_DG/supported_plugins/GPU.md b/docs/IE_DG/supported_plugins/GPU.md index fe8e8c66c641e3..e8fa9fec9acc05 100644 --- a/docs/IE_DG/supported_plugins/GPU.md +++ b/docs/IE_DG/supported_plugins/GPU.md @@ -99,8 +99,8 @@ The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::Core::LoadNetwork() in order to take effect. When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. -| Parameter name | Parameter values | Default | Description | -| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------      | +| Parameter Name | Parameter Values | Default | Description | +| :--- | :--- | :--- | :--- | | `KEY_CACHE_DIR` | `""` | `""` | Specifies a directory where compiled OCL binaries can be cached. First model loading generates the cache, and all subsequent LoadNetwork calls use precompiled kernels which significantly improves load time. If empty - caching is disabled | | `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | | `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | @@ -114,9 +114,9 @@ When specifying key values as raw strings (that is, when using Python API), omit | `KEY_GPU_ENABLE_LOOP_UNROLLING` | `YES` / `NO` | `YES` | Enables recurrent layers such as TensorIterator or Loop with fixed iteration count to be unrolled. It is turned on by default. Turning this key on will achieve better inference performance for loops with not too many iteration counts (less than 16, as a rule of thumb). Turning this key off will achieve better performance for both graph loading time and inference time with many iteration counts (greater than 16). Note that turning this key on will increase the graph loading time in proportion to the iteration counts. Thus, this key should be turned off if graph loading time is considered to be most important target to optimize. | | `KEY_CLDNN_PLUGIN_PRIORITY` | `<0-3>` | `0` | OpenCL queue priority (before usage, make sure your OpenCL driver supports appropriate extension)
Higher value means higher priority for OpenCL queue. 0 disables the setting. **Deprecated**. Please use KEY_GPU_PLUGIN_PRIORITY | | `KEY_CLDNN_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)
Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. **Deprecated**. Please use KEY_GPU_PLUGIN_THROTTLE | -| `KEY_CLDNN_GRAPH_DUMPS_DIR` | `""` | `""` | clDNN graph optimizer stages dump output directory (in GraphViz format) **Deprecated**. Will be removed in the next release | -| `KEY_CLDNN_SOURCES_DUMPS_DIR` | `""` | `""` | Final optimized clDNN OpenCL sources dump output directory. **Deprecated**. Will be removed in the next release | -| `KEY_DUMP_KERNELS` | `YES` / `NO` | `NO` | Dump the final kernels used for custom layers. **Deprecated**. Will be removed in the next release | +| `KEY_CLDNN_GRAPH_DUMPS_DIR` | `""` | `""` | clDNN graph optimizer stages dump output directory (in GraphViz format) **Deprecated**. Will be removed in the next release | +| `KEY_CLDNN_SOURCES_DUMPS_DIR` | `""` | `""` | Final optimized clDNN OpenCL sources dump output directory. **Deprecated**. Will be removed in the next release | +| `KEY_DUMP_KERNELS` | `YES` / `NO` | `NO` | Dump the final kernels used for custom layers. **Deprecated**. Will be removed in the next release | | `KEY_TUNING_MODE` | `TUNING_DISABLED`
`TUNING_CREATE`
`TUNING_USE_EXISTING` | `TUNING_DISABLED` | Disable inference kernel tuning
Create tuning file (expect much longer runtime)
Use an existing tuning file. **Deprecated**. Will be removed in the next release | | `KEY_TUNING_FILE` | `""` | `""` | Tuning file to create / use. **Deprecated**. Will be removed in the next release | From 34e88e54a964379905d166835555c03a4875e77a Mon Sep 17 00:00:00 2001 From: Savina Date: Fri, 9 Jul 2021 16:05:50 +0300 Subject: [PATCH 08/10] added more backtics and changed table format --- docs/IE_DG/supported_plugins/CPU.md | 2 +- docs/IE_DG/supported_plugins/GPU.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/IE_DG/supported_plugins/CPU.md b/docs/IE_DG/supported_plugins/CPU.md index 5e335a4fffc880..12b005099ba092 100644 --- a/docs/IE_DG/supported_plugins/CPU.md +++ b/docs/IE_DG/supported_plugins/CPU.md @@ -115,7 +115,7 @@ CPU-specific settings: | :--- | :--- | :--- |:-----------------------------------------------------------------------------| | `KEY_CPU_THREADS_NUM` | `positive integer values`| `0` | Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores| | `KEY_CPU_BIND_THREAD` | `YES`/`NUMA`/`NO` | `YES` | Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (number of the streams is less or equal to the number of NUMA nodes, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines. | -| `KEY_CPU_THROUGHPUT_STREAMS` | `KEY_CPU_THROUGHPUT_NUMA`, `KEY_CPU_THROUGHPUT_AUTO`, or `positive integer values`| `1` | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior for single NUMA-node machine, with all available cores processing requests one by one. On the multi-socket (multiple NUMA nodes) machine, the best latency numbers usually achieved with a number of streams matching the number of NUMA-nodes.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
Non-negative integer value creates the requested number of streams. If a number of streams is 0, no internal streams are created and user threads are interpreted as stream master threads.| +| `KEY_CPU_THROUGHPUT_STREAMS` | `KEY_CPU_THROUGHPUT_NUMA`, `KEY_CPU_THROUGHPUT_AUTO`, or `positive integer values`| `1` | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior for single NUMA-node machine, with all available cores processing requests one by one. On the multi-socket (multiple NUMA nodes) machine, the best latency numbers usually achieved with a number of streams matching the number of NUMA-nodes.
`KEY_CPU_THROUGHPUT_NUMA` creates as many streams as needed to accommodate NUMA and avoid associated penalties.
`KEY_CPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
Non-negative integer value creates the requested number of streams. If a number of streams is 0, no internal streams are created and user threads are interpreted as stream master threads.| | `KEY_ENFORCE_BF16` | `YES`/`NO`| `YES` | The name for setting to execute in bfloat16 precision whenever it is possible. This option lets plugin know to downscale the precision where it sees performance benefits from bfloat16 execution. Such option does not guarantee accuracy of the network, you need to verify the accuracy in this mode separately, based on performance and accuracy results. It should be your decision whether to use this option or not. | > **NOTE**: To disable all internal threading, use the following set of configuration parameters: `KEY_CPU_THROUGHPUT_STREAMS=0`, `KEY_CPU_THREADS_NUM=1`, `KEY_CPU_BIND_THREAD=NO`. diff --git a/docs/IE_DG/supported_plugins/GPU.md b/docs/IE_DG/supported_plugins/GPU.md index e8fa9fec9acc05..d5a638bc2a0fe9 100644 --- a/docs/IE_DG/supported_plugins/GPU.md +++ b/docs/IE_DG/supported_plugins/GPU.md @@ -99,8 +99,8 @@ The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::Core::LoadNetwork() in order to take effect. When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. -| Parameter Name | Parameter Values | Default | Description | -| :--- | :--- | :--- | :--- | +| Parameter Name | Parameter Values | Default | Description | | +| :--- | :--- | :---: | :--- |:--- | | `KEY_CACHE_DIR` | `""` | `""` | Specifies a directory where compiled OCL binaries can be cached. First model loading generates the cache, and all subsequent LoadNetwork calls use precompiled kernels which significantly improves load time. If empty - caching is disabled | | `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | | `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | From 6949db02ddda4f9ab9f5b1bca7c76f99cb6d4739 Mon Sep 17 00:00:00 2001 From: Savina Date: Fri, 9 Jul 2021 16:22:02 +0300 Subject: [PATCH 09/10] gpu table changes --- docs/IE_DG/supported_plugins/GPU.md | 40 ++++++++++++++--------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/IE_DG/supported_plugins/GPU.md b/docs/IE_DG/supported_plugins/GPU.md index d5a638bc2a0fe9..006a3850a239c4 100644 --- a/docs/IE_DG/supported_plugins/GPU.md +++ b/docs/IE_DG/supported_plugins/GPU.md @@ -99,26 +99,26 @@ The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::Core::LoadNetwork() in order to take effect. When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. -| Parameter Name | Parameter Values | Default | Description | | -| :--- | :--- | :---: | :--- |:--- | -| `KEY_CACHE_DIR` | `""` | `""` | Specifies a directory where compiled OCL binaries can be cached. First model loading generates the cache, and all subsequent LoadNetwork calls use precompiled kernels which significantly improves load time. If empty - caching is disabled | -| `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | -| `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | -| `KEY_GPU_PLUGIN_PRIORITY` | `<0-3>` | `0` | OpenCL queue priority (before usage, make sure your OpenCL driver supports appropriate extension)
Higher value means higher priority for OpenCL queue. 0 disables the setting. | -| `KEY_GPU_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)
Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. | -| `KEY_CLDNN_ENABLE_FP16_FOR_QUANTIZED_MODELS` | `YES` / `NO` | `YES` | Allows using FP16+INT8 mixed precision mode, so non-quantized parts of a model will be executed in FP16 precision for FP16 IR. Does not affect quantized FP32 IRs | -| `KEY_GPU_NV12_TWO_INPUTS` | `YES` / `NO` | `NO` | Controls preprocessing logic for nv12 input. If it's set to YES, then device graph will expect that user will set biplanar nv12 blob as input wich will be directly passed to device execution graph. Otherwise, preprocessing via GAPI is used to convert NV12->BGR, thus GPU graph have to expect single input | -| `KEY_GPU_THROUGHPUT_STREAMS` | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).
This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_GPU_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low.
The default value is 1, which implies latency-oriented behavior.
`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams).
A positive integer value creates the requested number of streams. | -| `KEY_EXCLUSIVE_ASYNC_REQUESTS` | `YES` / `NO` | `NO` | Forces async requests (also from different executable networks) to execute serially.| -| `KEY_GPU_MAX_NUM_THREADS` | `integer value` | `maximum # of HW threads available in host environment` | Specifies the number of CPU threads that can be used for GPU engine, e.g, JIT compilation of GPU kernels or cpu kernel processing within GPU plugin. The default value is set as the number of maximum available threads in host environment to minimize the time for LoadNetwork, where the GPU kernel build time occupies a large portion. Note that if the specified value is larger than the maximum available # of threads or less than zero, it is set as maximum available # of threads. It can be specified with a smaller number than the available HW threads according to the usage scenario, e.g., when the user wants to assign more CPU threads while GPU plugin is running. Note that setting this value with lower number will affect not only the network loading time but also the cpu layers of GPU networks that are optimized with multi-threading. | -| `KEY_GPU_ENABLE_LOOP_UNROLLING` | `YES` / `NO` | `YES` | Enables recurrent layers such as TensorIterator or Loop with fixed iteration count to be unrolled. It is turned on by default. Turning this key on will achieve better inference performance for loops with not too many iteration counts (less than 16, as a rule of thumb). Turning this key off will achieve better performance for both graph loading time and inference time with many iteration counts (greater than 16). Note that turning this key on will increase the graph loading time in proportion to the iteration counts. Thus, this key should be turned off if graph loading time is considered to be most important target to optimize. | -| `KEY_CLDNN_PLUGIN_PRIORITY` | `<0-3>` | `0` | OpenCL queue priority (before usage, make sure your OpenCL driver supports appropriate extension)
Higher value means higher priority for OpenCL queue. 0 disables the setting. **Deprecated**. Please use KEY_GPU_PLUGIN_PRIORITY | -| `KEY_CLDNN_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)
Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. **Deprecated**. Please use KEY_GPU_PLUGIN_THROTTLE | -| `KEY_CLDNN_GRAPH_DUMPS_DIR` | `""` | `""` | clDNN graph optimizer stages dump output directory (in GraphViz format) **Deprecated**. Will be removed in the next release | -| `KEY_CLDNN_SOURCES_DUMPS_DIR` | `""` | `""` | Final optimized clDNN OpenCL sources dump output directory. **Deprecated**. Will be removed in the next release | -| `KEY_DUMP_KERNELS` | `YES` / `NO` | `NO` | Dump the final kernels used for custom layers. **Deprecated**. Will be removed in the next release | -| `KEY_TUNING_MODE` | `TUNING_DISABLED`
`TUNING_CREATE`
`TUNING_USE_EXISTING` | `TUNING_DISABLED` | Disable inference kernel tuning
Create tuning file (expect much longer runtime)
Use an existing tuning file. **Deprecated**. Will be removed in the next release | -| `KEY_TUNING_FILE` | `""` | `""` | Tuning file to create / use. **Deprecated**. Will be removed in the next release | +| Parameter Name | Parameter Values | Default | Description | | +| :--- | :--- | :---: | :--- |:--- | +| `KEY_CACHE_DIR` | `""` | `""` | Specifies a directory where compiled OCL binaries can be cached. First model loading generates the cache, and all subsequent LoadNetwork calls use precompiled kernels which significantly improves load time. If empty - caching is disabled || +| `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference || +| `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files || +| `KEY_GPU_PLUGIN_PRIORITY` | `<0-3>` | `0` | OpenCL queue priority (before usage, make sure your OpenCL driver supports appropriate extension)
Higher value means higher priority for OpenCL queue. 0 disables the setting. || +| `KEY_GPU_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)
Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. || +| `KEY_CLDNN_ENABLE_FP16_FOR_QUANTIZED_MODELS` | `YES` / `NO` | `YES` | Allows using FP16+INT8 mixed precision mode, so non-quantized parts of a model will be executed in FP16 precision for FP16 IR. Does not affect quantized FP32 IRs || +| `KEY_GPU_NV12_TWO_INPUTS` | `YES` / `NO` | `NO` | Controls preprocessing logic for nv12 input. If it's set to YES, then device graph will expect that user will set biplanar nv12 blob as input wich will be directly passed to device execution graph. Otherwise, preprocessing via GAPI is used to convert NV12->BGR, thus GPU graph have to expect single input || +| `KEY_GPU_THROUGHPUT_STREAMS` | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).
This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_GPU_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low.
The default value is 1, which implies latency-oriented behavior.
`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams).
A positive integer value creates the requested number of streams. || +| `KEY_EXCLUSIVE_ASYNC_REQUESTS` | `YES` / `NO` | `NO` | Forces async requests (also from different executable networks) to execute serially.|| +| `KEY_GPU_MAX_NUM_THREADS` | `integer value` | `maximum # of HW threads available in host environment` | Specifies the number of CPU threads that can be used for GPU engine, e.g, JIT compilation of GPU kernels or cpu kernel processing within GPU plugin. The default value is set as the number of maximum available threads in host environment to minimize the time for LoadNetwork, where the GPU kernel build time occupies a large portion. Note that if the specified value is larger than the maximum available # of threads or less than zero, it is set as maximum available # of threads. It can be specified with a smaller number than the available HW threads according to the usage scenario, e.g., when the user wants to assign more CPU threads while GPU plugin is running. Note that setting this value with lower number will affect not only the network loading time but also the cpu layers of GPU networks that are optimized with multi-threading. || +| `KEY_GPU_ENABLE_LOOP_UNROLLING` | `YES` / `NO` | `YES` | Enables recurrent layers such as TensorIterator or Loop with fixed iteration count to be unrolled. It is turned on by default. Turning this key on will achieve better inference performance for loops with not too many iteration counts (less than 16, as a rule of thumb). Turning this key off will achieve better performance for both graph loading time and inference time with many iteration counts (greater than 16). Note that turning this key on will increase the graph loading time in proportion to the iteration counts. Thus, this key should be turned off if graph loading time is considered to be most important target to optimize. || +| `KEY_CLDNN_PLUGIN_PRIORITY` | `<0-3>` | `0` | OpenCL queue priority (before usage, make sure your OpenCL driver supports appropriate extension)
Higher value means higher priority for OpenCL queue. 0 disables the setting. **Deprecated**. Please use KEY_GPU_PLUGIN_PRIORITY || +| `KEY_CLDNN_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)
Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. **Deprecated**. Please use KEY_GPU_PLUGIN_THROTTLE || +| `KEY_CLDNN_GRAPH_DUMPS_DIR` | `""` | `""` | clDNN graph optimizer stages dump output directory (in GraphViz format) **Deprecated**. Will be removed in the next release || +| `KEY_CLDNN_SOURCES_DUMPS_DIR` | `""` | `""` | Final optimized clDNN OpenCL sources dump output directory. **Deprecated**. Will be removed in the next release || +| `KEY_DUMP_KERNELS` | `YES` / `NO` | `NO` | Dump the final kernels used for custom layers. **Deprecated**. Will be removed in the next release || +| `KEY_TUNING_MODE` | `TUNING_DISABLED`
`TUNING_CREATE`
`TUNING_USE_EXISTING` | `TUNING_DISABLED` | Disable inference kernel tuning
Create tuning file (expect much longer runtime)
Use an existing tuning file. **Deprecated**. Will be removed in the next release || +| `KEY_TUNING_FILE` | `""` | `""` | Tuning file to create / use. **Deprecated**. Will be removed in the next release || ## GPU Context and Video Memory Sharing RemoteBlob API From 7ec67788f95df5515c3bf10c775c842629deffe4 Mon Sep 17 00:00:00 2001 From: Andrey Zaytsev Date: Mon, 12 Jul 2021 13:54:39 +0300 Subject: [PATCH 10/10] Update get_started_dl_workbench.md --- docs/get_started/get_started_dl_workbench.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/docs/get_started/get_started_dl_workbench.md b/docs/get_started/get_started_dl_workbench.md index 7282c3cb69283d..0812f543495ea9 100644 --- a/docs/get_started/get_started_dl_workbench.md +++ b/docs/get_started/get_started_dl_workbench.md @@ -31,18 +31,13 @@ Congratulations, you have installed DL Workbench. Your next step is to [Get Star ## Videos -\htmlonly @@ -50,7 +45,6 @@ Congratulations, you have installed DL Workbench. Your next step is to [Get Star
-\endhtmlonly -\htmlonly -\endhtmlonly -\htmlonly
How to Install the OpenVINO™ toolkit DL Workbench.
Duration: 8:20
-\endhtmlonly ## See Also * [Get Started with DL Workbench](@ref workbench_docs_Workbench_DG_Work_with_Models_and_Sample_Datasets)