Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF4+F8E4M3 support #1127

Closed
wants to merge 4 commits into from
Closed

Conversation

KodiaqQ
Copy link
Contributor

@KodiaqQ KodiaqQ commented Jan 23, 2025

What does this PR do?

Implements 153357

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@KodiaqQ KodiaqQ marked this pull request as ready for review January 24, 2025 09:07
@KodiaqQ
Copy link
Contributor Author

KodiaqQ commented Jan 24, 2025

@nikita-savelyevv, @AlexKoff88, could you please review it? Thanks.

return copy.deepcopy(self.__dict__)


class OVGeneralQuantizationConfig(QuantizationConfigMixin):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose the name OVMixedQuantizationConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If weight compression or quantize-specific options are absent, the method's main purpose would be only to fully quantize or compress the models' weights. Does the name Mixed reflect this behaviour? I don't think so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, please name it OVGenericQuantizationConfig then.

self,
ignored_scope: Optional[Dict] = None,
num_samples: Optional[int] = None,
compress_weights_options: Optional[OVCompressWeightsOptions] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did not you use the existing OVWeightQuantizationConfig and OVQuantizationConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because OVWeightQuantizationConfig and OVQuantizationConfig contain general parameters like tokenizer or dataset that are not method-specific options. For parameters like sym or bits it is even not clear how they should be used in the new approach.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@KodiaqQ
Copy link
Contributor Author

KodiaqQ commented Jan 24, 2025

The issue with:
NotImplementedError: NNCF is not yet supported OpenVINO data type: nf4.
Already was fixed on the NNCF's side: openvinotoolkit/nncf#3209.

@KodiaqQ KodiaqQ requested a review from AlexKoff88 January 27, 2025 07:29
from ...intel.openvino.configuration import (
_DEFAULT_4BIT_CONFIG,
OVCompressWeightsOptions,
OVConfig,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New classes should be presented in the optimum.intel.__init__

return copy.deepcopy(self.__dict__)


class OVQuantizeOptions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class OVQuantizeOptions:
class OVQuantizationOptions:

@@ -775,3 +775,348 @@ def to_dict(self) -> Dict[str, Any]:

def to_diff_dict(self) -> Dict[str, Any]:
return self._to_dict_safe(to_diff_dict=True)


class OVCompressWeightsOptions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class OVCompressWeightsOptions:
class OVWeightsCompressionOptions:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use this OVXXXOptions classes as base entities for corresponding OVXXXConfigs.


class OVGeneralQuantizationConfig(QuantizationConfigMixin):
def __init__(
self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructor of the class should be extended with dataset, tokenizer and the rest of the options.

Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left several comments, some are minor, but there are a couple of major ones.

The major ones are related to (1) interaction of OVGeneralQuantizationConfig with OVConfig, e.g. dict-conversion support, (2) complexity of config creation with OVGeneralQuantizationConfig.

I've sketched out a proposal where I've implemented a way that satisfies some of the issues I've highlighted. But possibly, this is not the only way. It supports dict-to-config-class conversion and inherits OVQuantizationConfigBase. I've also refactored the base class itself to contain dataset-related logic, assuming that compression and quantization will rely on the same dataset. Unfortunately I don't have much time for this so possibly there are some rough points still.

@KodiaqQ @AlexKoff88, please take a look: main...nikita-savelyevv:optimum-intel:ns/nf4_f8e4m3_proposal

@@ -470,3 +473,36 @@ def run(self):
library_name=library_name,
# **input_shapes,
)


def prepare_for_wc_config(args, default_configs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are helpful! Would you consider renaming though? IMO this method prepares the config itself rather then something for the config. Same for prepare_for_q_config.

Suggested change
def prepare_for_wc_config(args, default_configs):
def prepare_wc_config(args, default_configs):

}


def prepare_for_q_config(args):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def prepare_for_q_config(args):
def prepare_q_config(args):

Class containing specific nncf.quantize method's options.
Args:
mode (`str`, *optional*):
Defines special quantization modes. Possible values: ['fp8_e4m3', 'fp8_e5m2'].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "int8" is not possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return copy.deepcopy(self.__dict__)


class OVGeneralQuantizationConfig(QuantizationConfigMixin):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently compression and quantization configs can be set to ov_config.quantization_config either as config class instances or as dicts. There is a special method _quantization_config_from_dict converting a dict to one of these instances. Ideally, OVGeneralQuantizationConfig should also be supported by this logic.

Copy link
Contributor Author

@KodiaqQ KodiaqQ Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_quantization_config_from_dict contains hard-coded configuration detection and instantiation. Do you think is it a scalable approach to extend & support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently support providing quantization config as dict for weight and activation quantization. I believe we should also support this for mixed precision quantization.

This should also become useful when implementing ticket 157523. The plan there was to allow specifying quantization config as json file from cli in which case dict to config class conversion should be supported.

_quantization_config_from_dict implementation may not be ideal, but it does the job. Feel free to suggest improvements if you see how it can be done better.

return copy.deepcopy(self.__dict__)


class OVGeneralQuantizationConfig(QuantizationConfigMixin):
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the intended "general" usage of this class is as following. For example for weight compression:

# Creation via optimum-style parameters
quantization_config = OVGeneralQuantizationConfig(
    compress_weights_options=OVCompressWeightsOptions.init_with_format(bits=8, sym=False)
)

or

# Creation via NNCF-style parameters
quantization_config = OVGeneralQuantizationConfig(
    compress_weights_options=OVCompressWeightsOptions(mode="int8_asym")
)

My concern is that it looks much more cumbersome compared to the current options:

# Current way via config class
quantization_config = OVWeightQuantizationConfig(bits=8, sym=False)

or even

# Current way via dict (will be accepted by OVQuantizer.quantize() too)
quantization_config = {"bits": 8, "sym": False}

I believe the current API for compression/quantization is worth keeping. If we agree to this statement, then we need to decide whether "generality" (ability to instantiate it with only one of two configs) of the proposed new class is really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a better proposal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I propose to use OVGeneralQuantizationConfig only for mixed precision quantization. It can then be named OVMixedQuantizationConfig. I've linked an example of how it could look above.

Comment on lines +996 to +1002
mode = None
if activation_format:
mode_map = {
"f8e4m3": "fp8_e4m3",
"f8e5m2": "fp8_e5m2",
}
mode = mode_map[activation_format]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how mode can become of int-like type. Isn't this a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return copy.deepcopy(self.__dict__)


class OVGeneralQuantizationConfig(QuantizationConfigMixin):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should inherit from OVQuantizationConfigBase as other configs do because ov_config.quantization_config currently supports such instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVQuantizationConfigBase was introduced for weight compression purposes. I do not understand, why we should use such a specific "base" everywhere. Are there any reasons?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently rely on the quantization config class to belong to OVQuantizationConfigBase in some parts of the code. For example:

This is useful for clarity, when accessing some shared properties of quantization configs.

@KodiaqQ KodiaqQ marked this pull request as draft January 30, 2025 08:35
@KodiaqQ KodiaqQ closed this Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants