Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

README.md

Large Language Models Weight Compression Example

This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API. The example applies 4/8-bit mixed-precision quantization to weights of Linear (Fully-connected) layers of TinyLlama/TinyLlama-1.1B-Chat-v1.0 model. This leads to a significant decrease in model footprint and performance improvement with OpenVINO.

Prerequisites

To use this example:

Create a separate Python* environment and activate it: python3 -m venv nncf_env && source nncf_env/bin/activate
Install dependencies:

pip install -U pip
pip install -r requirements.txt
pip install ../../../../

Run Example

To run example:

python main.py

It will automatically download the dataset and baseline model and save the resulting model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiny_llama

tiny_llama

README.md

Large Language Models Weight Compression Example

Prerequisites

Run Example

Files

tiny_llama

Directory actions

More options

Directory actions

More options

Latest commit

History

tiny_llama

Folders and files

parent directory

README.md

Large Language Models Weight Compression Example

Prerequisites

Run Example