-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF4+F8E4M3 support #1127
NF4+F8E4M3 support #1127
Conversation
@nikita-savelyevv, @AlexKoff88, could you please review it? Thanks. |
return copy.deepcopy(self.__dict__) | ||
|
||
|
||
class OVGeneralQuantizationConfig(QuantizationConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose the name OVMixedQuantizationConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If weight compression or quantize-specific options are absent, the method's main purpose would be only to fully quantize or compress the models' weights. Does the name Mixed
reflect this behaviour? I don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, please name it OVGenericQuantizationConfig
then.
self, | ||
ignored_scope: Optional[Dict] = None, | ||
num_samples: Optional[int] = None, | ||
compress_weights_options: Optional[OVCompressWeightsOptions] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did not you use the existing OVWeightQuantizationConfig
and OVQuantizationConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because OVWeightQuantizationConfig
and OVQuantizationConfig
contain general parameters like tokenizer
or dataset
that are not method-specific options. For parameters like sym
or bits
it is even not clear how they should be used in the new approach.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The issue with: |
from ...intel.openvino.configuration import ( | ||
_DEFAULT_4BIT_CONFIG, | ||
OVCompressWeightsOptions, | ||
OVConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New classes should be presented in the optimum.intel.__init__
return copy.deepcopy(self.__dict__) | ||
|
||
|
||
class OVQuantizeOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class OVQuantizeOptions: | |
class OVQuantizationOptions: |
@@ -775,3 +775,348 @@ def to_dict(self) -> Dict[str, Any]: | |||
|
|||
def to_diff_dict(self) -> Dict[str, Any]: | |||
return self._to_dict_safe(to_diff_dict=True) | |||
|
|||
|
|||
class OVCompressWeightsOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class OVCompressWeightsOptions: | |
class OVWeightsCompressionOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use this OVXXXOptions
classes as base entities for corresponding OVXXXConfig
s.
|
||
class OVGeneralQuantizationConfig(QuantizationConfigMixin): | ||
def __init__( | ||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructor of the class should be extended with dataset
, tokenizer
and the rest of the options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left several comments, some are minor, but there are a couple of major ones.
The major ones are related to (1) interaction of OVGeneralQuantizationConfig
with OVConfig
, e.g. dict
-conversion support, (2) complexity of config creation with OVGeneralQuantizationConfig
.
I've sketched out a proposal where I've implemented a way that satisfies some of the issues I've highlighted. But possibly, this is not the only way. It supports dict-to-config-class conversion and inherits OVQuantizationConfigBase
. I've also refactored the base class itself to contain dataset-related logic, assuming that compression and quantization will rely on the same dataset. Unfortunately I don't have much time for this so possibly there are some rough points still.
@KodiaqQ @AlexKoff88, please take a look: main...nikita-savelyevv:optimum-intel:ns/nf4_f8e4m3_proposal
@@ -470,3 +473,36 @@ def run(self): | |||
library_name=library_name, | |||
# **input_shapes, | |||
) | |||
|
|||
|
|||
def prepare_for_wc_config(args, default_configs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are helpful! Would you consider renaming though? IMO this method prepares the config itself rather then something for the config. Same for prepare_for_q_config
.
def prepare_for_wc_config(args, default_configs): | |
def prepare_wc_config(args, default_configs): |
} | ||
|
||
|
||
def prepare_for_q_config(args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def prepare_for_q_config(args): | |
def prepare_q_config(args): |
Class containing specific nncf.quantize method's options. | ||
Args: | ||
mode (`str`, *optional*): | ||
Defines special quantization modes. Possible values: ['fp8_e4m3', 'fp8_e5m2']. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "int8" is not possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return copy.deepcopy(self.__dict__) | ||
|
||
|
||
class OVGeneralQuantizationConfig(QuantizationConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently compression and quantization configs can be set to ov_config.quantization_config
either as config class instances or as dicts. There is a special method _quantization_config_from_dict
converting a dict to one of these instances. Ideally, OVGeneralQuantizationConfig
should also be supported by this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_quantization_config_from_dict
contains hard-coded configuration detection and instantiation. Do you think is it a scalable approach to extend & support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently support providing quantization config as dict for weight and activation quantization. I believe we should also support this for mixed precision quantization.
This should also become useful when implementing ticket 157523. The plan there was to allow specifying quantization config as json file from cli in which case dict to config class conversion should be supported.
_quantization_config_from_dict
implementation may not be ideal, but it does the job. Feel free to suggest improvements if you see how it can be done better.
return copy.deepcopy(self.__dict__) | ||
|
||
|
||
class OVGeneralQuantizationConfig(QuantizationConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, the intended "general" usage of this class is as following. For example for weight compression:
# Creation via optimum-style parameters
quantization_config = OVGeneralQuantizationConfig(
compress_weights_options=OVCompressWeightsOptions.init_with_format(bits=8, sym=False)
)
or
# Creation via NNCF-style parameters
quantization_config = OVGeneralQuantizationConfig(
compress_weights_options=OVCompressWeightsOptions(mode="int8_asym")
)
My concern is that it looks much more cumbersome compared to the current options:
# Current way via config class
quantization_config = OVWeightQuantizationConfig(bits=8, sym=False)
or even
# Current way via dict (will be accepted by OVQuantizer.quantize() too)
quantization_config = {"bits": 8, "sym": False}
I believe the current API for compression/quantization is worth keeping. If we agree to this statement, then we need to decide whether "generality" (ability to instantiate it with only one of two configs) of the proposed new class is really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a better proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I propose to use OVGeneralQuantizationConfig
only for mixed precision quantization. It can then be named OVMixedQuantizationConfig
. I've linked an example of how it could look above.
mode = None | ||
if activation_format: | ||
mode_map = { | ||
"f8e4m3": "fp8_e4m3", | ||
"f8e5m2": "fp8_e5m2", | ||
} | ||
mode = mode_map[activation_format] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how mode
can become of int
-like type. Isn't this a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return copy.deepcopy(self.__dict__) | ||
|
||
|
||
class OVGeneralQuantizationConfig(QuantizationConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it should inherit from OVQuantizationConfigBase
as other configs do because ov_config.quantization_config
currently supports such instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OVQuantizationConfigBase
was introduced for weight compression purposes. I do not understand, why we should use such a specific "base" everywhere. Are there any reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently rely on the quantization config class to belong to OVQuantizationConfigBase
in some parts of the code. For example:
- https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L1027
- https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/quantization.py#L728
This is useful for clarity, when accessing some shared properties of quantization configs.
What does this PR do?
Implements 153357
Before submitting