NF4+F8E4M3 support #1127

KodiaqQ · 2025-01-23T16:20:59Z

What does this PR do?

Implements 153357

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

KodiaqQ · 2025-01-24T09:07:29Z

@nikita-savelyevv, @AlexKoff88, could you please review it? Thanks.

AlexKoff88 · 2025-01-24T12:29:21Z

optimum/intel/openvino/configuration.py

+        return copy.deepcopy(self.__dict__)
+
+
+class OVGeneralQuantizationConfig(QuantizationConfigMixin):


I propose the name OVMixedQuantizationConfig

If weight compression or quantize-specific options are absent, the method's main purpose would be only to fully quantize or compress the models' weights. Does the name Mixed reflect this behaviour? I don't think so.

ok, please name it OVGenericQuantizationConfig then.

AlexKoff88 · 2025-01-24T12:31:07Z

optimum/intel/openvino/configuration.py

+        self,
+        ignored_scope: Optional[Dict] = None,
+        num_samples: Optional[int] = None,
+        compress_weights_options: Optional[OVCompressWeightsOptions] = None,


Why did not you use the existing OVWeightQuantizationConfig and OVQuantizationConfig?

Because OVWeightQuantizationConfig and OVQuantizationConfig contain general parameters like tokenizer or dataset that are not method-specific options. For parameters like sym or bits it is even not clear how they should be used in the new approach.

HuggingFaceDocBuilderDev · 2025-01-24T16:03:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

KodiaqQ · 2025-01-24T16:47:16Z

The issue with:
NotImplementedError: NNCF is not yet supported OpenVINO data type: nf4.
Already was fixed on the NNCF's side: openvinotoolkit/nncf#3209.

AlexKoff88 · 2025-01-28T10:23:11Z

optimum/commands/export/openvino.py

+        from ...intel.openvino.configuration import (
+            _DEFAULT_4BIT_CONFIG,
+            OVCompressWeightsOptions,
+            OVConfig,


New classes should be presented in the optimum.intel.__init__

AlexKoff88 · 2025-01-28T10:23:32Z

optimum/intel/openvino/configuration.py

+        return copy.deepcopy(self.__dict__)
+
+
+class OVQuantizeOptions:


Suggested change

class OVQuantizeOptions:

class OVQuantizationOptions:

AlexKoff88 · 2025-01-28T10:23:48Z

optimum/intel/openvino/configuration.py

@@ -775,3 +775,348 @@ def to_dict(self) -> Dict[str, Any]:

    def to_diff_dict(self) -> Dict[str, Any]:
        return self._to_dict_safe(to_diff_dict=True)
+
+
+class OVCompressWeightsOptions:


Suggested change

class OVCompressWeightsOptions:

class OVWeightsCompressionOptions:

Let's use this OVXXXOptions classes as base entities for corresponding OVXXXConfigs.

AlexKoff88 · 2025-01-28T10:26:00Z

optimum/intel/openvino/configuration.py

+
+class OVGeneralQuantizationConfig(QuantizationConfigMixin):
+    def __init__(
+        self,


Constructor of the class should be extended with dataset, tokenizer and the rest of the options.

nikita-savelyevv

I've left several comments, some are minor, but there are a couple of major ones.

The major ones are related to (1) interaction of OVGeneralQuantizationConfig with OVConfig, e.g. dict-conversion support, (2) complexity of config creation with OVGeneralQuantizationConfig.

I've sketched out a proposal where I've implemented a way that satisfies some of the issues I've highlighted. But possibly, this is not the only way. It supports dict-to-config-class conversion and inherits OVQuantizationConfigBase. I've also refactored the base class itself to contain dataset-related logic, assuming that compression and quantization will rely on the same dataset. Unfortunately I don't have much time for this so possibly there are some rough points still.

@KodiaqQ @AlexKoff88, please take a look: main...nikita-savelyevv:optimum-intel:ns/nf4_f8e4m3_proposal

nikita-savelyevv · 2025-01-24T13:19:13Z

optimum/commands/export/openvino.py

@@ -470,3 +473,36 @@ def run(self):
                library_name=library_name,
                # **input_shapes,
            )
+
+
+def prepare_for_wc_config(args, default_configs):


These methods are helpful! Would you consider renaming though? IMO this method prepares the config itself rather then something for the config. Same for prepare_for_q_config.

Suggested change

def prepare_for_wc_config(args, default_configs):

def prepare_wc_config(args, default_configs):

nikita-savelyevv · 2025-01-24T13:19:26Z

optimum/commands/export/openvino.py

+    }
+
+
+def prepare_for_q_config(args):


Suggested change

def prepare_for_q_config(args):

def prepare_q_config(args):

nikita-savelyevv · 2025-01-27T14:52:40Z

optimum/intel/openvino/configuration.py

+        Class containing specific nncf.quantize method's options.
+        Args:
+            mode (`str`, *optional*):
+                Defines special quantization modes. Possible values: ['fp8_e4m3', 'fp8_e5m2'].


Why "int8" is not possible?

https://github.com/openvinotoolkit/nncf/blob/develop/nncf/parameters.py#L144-L154

nikita-savelyevv · 2025-01-27T14:59:37Z

optimum/intel/openvino/configuration.py

+        return copy.deepcopy(self.__dict__)
+
+
+class OVGeneralQuantizationConfig(QuantizationConfigMixin):


Currently compression and quantization configs can be set to ov_config.quantization_config either as config class instances or as dicts. There is a special method _quantization_config_from_dict converting a dict to one of these instances. Ideally, OVGeneralQuantizationConfig should also be supported by this logic.

_quantization_config_from_dict contains hard-coded configuration detection and instantiation. Do you think is it a scalable approach to extend & support?

We currently support providing quantization config as dict for weight and activation quantization. I believe we should also support this for mixed precision quantization.

This should also become useful when implementing ticket 157523. The plan there was to allow specifying quantization config as json file from cli in which case dict to config class conversion should be supported.

_quantization_config_from_dict implementation may not be ideal, but it does the job. Feel free to suggest improvements if you see how it can be done better.

nikita-savelyevv · 2025-01-28T16:35:18Z

optimum/intel/openvino/configuration.py

+        return copy.deepcopy(self.__dict__)
+
+
+class OVGeneralQuantizationConfig(QuantizationConfigMixin):


As I understand, the intended "general" usage of this class is as following. For example for weight compression:

# Creation via optimum-style parameters quantization_config = OVGeneralQuantizationConfig( compress_weights_options=OVCompressWeightsOptions.init_with_format(bits=8, sym=False) )

or

# Creation via NNCF-style parameters quantization_config = OVGeneralQuantizationConfig( compress_weights_options=OVCompressWeightsOptions(mode="int8_asym") )

My concern is that it looks much more cumbersome compared to the current options:

# Current way via config class quantization_config = OVWeightQuantizationConfig(bits=8, sym=False)

or even

# Current way via dict (will be accepted by OVQuantizer.quantize() too) quantization_config = {"bits": 8, "sym": False}

I believe the current API for compression/quantization is worth keeping. If we agree to this statement, then we need to decide whether "generality" (ability to instantiate it with only one of two configs) of the proposed new class is really needed.

Do you have a better proposal?

Yes. I propose to use OVGeneralQuantizationConfig only for mixed precision quantization. It can then be named OVMixedQuantizationConfig. I've linked an example of how it could look above.

nikita-savelyevv · 2025-01-28T17:24:32Z

optimum/intel/openvino/configuration.py

+        mode = None
+        if activation_format:
+            mode_map = {
+                "f8e4m3": "fp8_e4m3",
+                "f8e5m2": "fp8_e5m2",
+            }
+            mode = mode_map[activation_format]


I don't see how mode can become of int-like type. Isn't this a problem?

https://github.com/openvinotoolkit/nncf/blob/develop/nncf/parameters.py#L144-L154

nikita-savelyevv · 2025-01-28T18:36:00Z

optimum/intel/openvino/configuration.py

+        return copy.deepcopy(self.__dict__)
+
+
+class OVGeneralQuantizationConfig(QuantizationConfigMixin):


I believe it should inherit from OVQuantizationConfigBase as other configs do because ov_config.quantization_config currently supports such instances.

OVQuantizationConfigBase was introduced for weight compression purposes. I do not understand, why we should use such a specific "base" everywhere. Are there any reasons?

We currently rely on the quantization config class to belong to OVQuantizationConfigBase in some parts of the code. For example:

https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L1027

https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/quantization.py#L728

This is useful for clarity, when accessing some shared properties of quantization configs.

KodiaqQ added 2 commits January 23, 2025 17:30

Initial commit

d9ef0e6

Fix tests

08f5992

KodiaqQ marked this pull request as ready for review January 24, 2025 09:07

Add test

89b3afc

AlexKoff88 requested a review from nikita-savelyevv January 24, 2025 12:20

AlexKoff88 reviewed Jan 24, 2025

View reviewed changes

Fix tests

8fe43a0

KodiaqQ requested a review from AlexKoff88 January 27, 2025 07:29

AlexKoff88 reviewed Jan 28, 2025

View reviewed changes

nikita-savelyevv reviewed Jan 28, 2025

View reviewed changes

KodiaqQ marked this pull request as draft January 30, 2025 08:35

KodiaqQ closed this Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NF4+F8E4M3 support #1127

NF4+F8E4M3 support #1127

KodiaqQ commented Jan 23, 2025 •

edited

Loading

KodiaqQ commented Jan 24, 2025

AlexKoff88 Jan 24, 2025

KodiaqQ Jan 24, 2025

AlexKoff88 Jan 27, 2025

AlexKoff88 Jan 24, 2025

KodiaqQ Jan 24, 2025

HuggingFaceDocBuilderDev commented Jan 24, 2025

KodiaqQ commented Jan 24, 2025

AlexKoff88 Jan 28, 2025

AlexKoff88 Jan 28, 2025

AlexKoff88 Jan 28, 2025

AlexKoff88 Jan 28, 2025

AlexKoff88 Jan 28, 2025

nikita-savelyevv left a comment

nikita-savelyevv Jan 24, 2025

nikita-savelyevv Jan 24, 2025

nikita-savelyevv Jan 27, 2025

KodiaqQ Jan 29, 2025

nikita-savelyevv Jan 27, 2025

KodiaqQ Jan 31, 2025 •

edited

Loading

nikita-savelyevv Jan 31, 2025

nikita-savelyevv Jan 28, 2025 •

edited

Loading

KodiaqQ Jan 31, 2025

nikita-savelyevv Jan 31, 2025

nikita-savelyevv Jan 28, 2025

KodiaqQ Jan 29, 2025

nikita-savelyevv Jan 28, 2025

KodiaqQ Jan 31, 2025

nikita-savelyevv Jan 31, 2025

		return copy.deepcopy(self.__dict__)


		class OVGeneralQuantizationConfig(QuantizationConfigMixin):

	class OVCompressWeightsOptions:
	class OVWeightsCompressionOptions:

	def prepare_for_wc_config(args, default_configs):
	def prepare_wc_config(args, default_configs):

NF4+F8E4M3 support #1127

NF4+F8E4M3 support #1127

Conversation

KodiaqQ commented Jan 23, 2025 • edited Loading

What does this PR do?

Before submitting

KodiaqQ commented Jan 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 24, 2025

KodiaqQ commented Jan 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikita-savelyevv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KodiaqQ Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikita-savelyevv Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KodiaqQ commented Jan 23, 2025 •

edited

Loading

KodiaqQ Jan 31, 2025 •

edited

Loading

nikita-savelyevv Jan 28, 2025 •

edited

Loading