Skip to content

Commit

Permalink
Merge branch 'master' into mvafin/pt_fe/rerun_traceing
Browse files Browse the repository at this point in the history
  • Loading branch information
andrei-kochin authored Sep 25, 2024
2 parents 4319333 + 81bd537 commit fa9a388
Show file tree
Hide file tree
Showing 442 changed files with 19,547 additions and 14,971 deletions.
4 changes: 4 additions & 0 deletions .github/scripts/workflow_rerun/errors_to_look_for.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,5 +78,9 @@
{
"error_text": "lost communication with the server",
"ticket": 152565
},
{
"error_text": "Upload progress stalled",
"ticket": 152933
}
]
2 changes: 1 addition & 1 deletion cmake/developer_package/compile_flags/os_flags.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ macro(ov_avx512_optimization_flags flags)
if(CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
set(${flags} /arch:AVX512)
elseif(OV_COMPILER_IS_INTEL_LLVM AND WIN32)
set(${flags} /QxCOMMON-AVX512)
set(${flags} /QxCORE-AVX512)
elseif(OV_COMPILER_IS_CLANG OR CMAKE_COMPILER_IS_GNUCXX OR (OV_COMPILER_IS_INTEL_LLVM AND UNIX))
set(${flags} -mavx512f -mavx512bw -mavx512vl -mfma -mf16c)
else()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,22 @@ CPU

.. tab-item:: Supported Hardware

* Intel® Core™ Ultra Series 1 and Series 2 (Windows only)
* Intel® Xeon® 6 processor (preview)
* Intel Atom® Processor X Series
* Intel Atom® processor with Intel® SSE4.2 support
* Intel® Pentium® processor N4200/5, N3350/5, N3450/5 with Intel® HD Graphics
* 6th - 14th generation Intel® Core™ processors
* Intel® Core™ Ultra Series 1 and Series 2 (Windows only)
* 1st - 5th generation Intel® Xeon® Scalable Processors
* ARM CPUs with armv7a and higher, ARM64 CPUs with arm64-v8a and higher, Apple® Mac with Apple silicon

.. tab-item:: Supported Operating Systems

* Windows 11, 64-bit
* Windows 10, 64-bit
* Ubuntu 24.04 long-term support (LTS), 64-bit (Kernel 6.8+) (preview support)
* Ubuntu 22.04 long-term support (LTS), 64-bit (Kernel 5.15+)
* Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)
* Windows 10, 64-bit
* Windows 11, 64-bit
* macOS 12.6 and above, 64-bit and ARM64
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit
Expand All @@ -48,22 +48,22 @@ GPU

.. tab-item:: Supported Hardware

* Intel® Arc™ GPU Series
* Intel® HD Graphics
* Intel® UHD Graphics
* Intel® Iris® Pro Graphics
* Intel® Iris® Xe Graphics
* Intel® Iris® Xe Max Graphics
* Intel® Arc™ GPU Series
* Intel® Data Center GPU Flex Series
* Intel® Data Center GPU Max Series

.. tab-item:: Supported Operating Systems

* Windows 11, 64-bit
* Windows 10, 64-bit
* Ubuntu 24.04 long-term support (LTS), 64-bit
* Ubuntu 22.04 long-term support (LTS), 64-bit
* Ubuntu 20.04 long-term support (LTS), 64-bit
* Windows 10, 64-bit
* Windows 11, 64-bit
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,26 +19,31 @@ Install required dependencies:
npu-env\Scripts\activate
pip install optimum-intel nncf==2.11 onnx==1.16.1
pip install --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
Export an LLM model via Hugging Face Optimum-Intel
##################################################

A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
A chat-tuned TinyLlama model is used in this example. The following conversion & optimization
settings are recommended when using the NPU:

.. code-block:: python
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
**For models exceeding 1 billion parameters, it is recommended to use remarkably effective channel-wise quantization.** For example, you can try the approach with the llama-2-7b-chat-hf model:
**For models exceeding 1 billion parameters**, it is recommended to use **channel-wise
quantization** that is remarkably effective. For example, you can try the approach with the
llama-2-7b-chat-hf model:

.. code-block:: python
optimum-cli export openvino -m meta-llama/Llama-2-7b-chat-hf --weight-format int4 --sym --group-size -1 --ratio 1.0 Llama-2-7b-chat-hf
Run generation using OpenVINO GenAI
###################################

It is recommended to install the latest available `driver <https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html>`__.
It is recommended to install the latest available
`driver <https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html>`__.

Use the following code snippet to perform generation with OpenVINO GenAI API:

Expand Down Expand Up @@ -74,12 +79,19 @@ Additional configuration options
Prompt and response length options
++++++++++++++++++++++++++++++++++

The LLM pipeline for NPUs leverages the static shape approach, optimizing execution performance, while potentially introducing certain usage limitations. By default, the LLM pipeline supports input prompts up to 1024 tokens in length. It also ensures that the generated response contains at least 150 tokens, unless the generation encounters the end-of-sequence (EOS) token or the user explicitly sets a lower length limit for the response.
The LLM pipeline for NPUs leverages the static shape approach, optimizing execution performance,
while potentially introducing certain usage limitations. By default, the LLM pipeline supports
input prompts up to 1024 tokens in length. It also ensures that the generated response contains
at least 150 tokens, unless the generation encounters the end-of-sequence (EOS) token or the
user explicitly sets a lower length limit for the response.

You may configure both the 'maximum input prompt length' and 'minimum response length' using the following parameters:
You may configure both the 'maximum input prompt length' and 'minimum response length' using
the following parameters:

- ``MAX_PROMPT_LEN``: Defines the maximum number of tokens that the LLM pipeline can process for the input prompt (default: 1024).
- ``MIN_RESPONSE_LEN``: Defines the minimum number of tokens that the LLM pipeline will generate in its response (default: 150).
* ``MAX_PROMPT_LEN``: Defines the maximum number of tokens that the LLM pipeline can process
for the input prompt (default: 1024).
* ``MIN_RESPONSE_LEN``: Defines the minimum number of tokens that the LLM pipeline will generate
in its response (default: 150).

Use the following code snippet to change the default settings:

Expand Down Expand Up @@ -107,4 +119,4 @@ Additional Resources

* :doc:`NPU Device <../../openvino-workflow/running-inference/inference-devices-and-modes/npu-device>`
* `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
* `Neural Network Compression Framework <https://github.com/openvinotoolkit/nncf>`__
* `Neural Network Compression Framework <https://github.com/openvinotoolkit/nncf>`__
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Run LLM Inference on OpenVINO with the GenAI Flavor
:maxdepth: 1
:hidden:

genai-guide-npu
NPU inference of LLMs <genai-guide-npu>


This guide will show you how to integrate the OpenVINO GenAI flavor into your application, covering
Expand Down
2 changes: 1 addition & 1 deletion docs/nbdoc/consts.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
repo_owner = "openvinotoolkit"
repo_name = "openvino_notebooks"
repo_branch = "tree/main"
artifacts_link = "http://repository.toolbox.iotg.sclab.intel.com/projects/ov-notebook/0.1.0-latest/20240827220813/dist/rst_files/"
artifacts_link = "http://repository.toolbox.iotg.sclab.intel.com/projects/ov-notebook/0.1.0-latest/20240923220849/dist/rst_files/"
blacklisted_extensions = ['.xml', '.bin']
notebooks_repo = "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/"
notebooks_binder = "https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath="
Expand Down
Loading

0 comments on commit fa9a388

Please sign in to comment.