Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue with imposibility remove uncompressed model in tmp after compression #925

Closed
wants to merge 9 commits into from

Conversation

eaidova
Copy link
Collaborator

@eaidova eaidova commented Oct 4, 2024

What does this PR do?

Fixes # openvinotoolkit/openvino#26844

Reworked conversion part and weight compression after it for avoiding tmp directory lock on windows and unnecessary reading model.

Added handler for tmp dir that allow ignore permission errors (natively added in python3.10, for forward-porting to <3.10, we can remove it after removing python3.9 support)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@eaidova eaidova mentioned this pull request Oct 4, 2024
3 tasks
@eaidova eaidova force-pushed the ea/fix_win_compression branch from 4048b60 to ff6bd41 Compare October 4, 2024 14:31
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@eaidova eaidova force-pushed the ea/fix_win_compression branch from f113f31 to 0b2a1ee Compare October 7, 2024 06:31
@eaidova eaidova force-pushed the ea/fix_win_compression branch from 0b2a1ee to b02753a Compare October 7, 2024 06:42
@eaidova eaidova added openvino-test Trigger OpenVINO slow tests and removed openvino-test Trigger OpenVINO slow tests labels Oct 7, 2024
@eaidova eaidova force-pushed the ea/fix_win_compression branch from 13c9356 to 06625ca Compare October 9, 2024 08:02
@nikita-savelyevv nikita-savelyevv added the openvino-test Trigger OpenVINO slow tests label Oct 9, 2024
@eaidova eaidova force-pushed the ea/fix_win_compression branch 2 times, most recently from 3d521c0 to ef74d0c Compare October 9, 2024 14:55
@eaidova eaidova force-pushed the ea/fix_win_compression branch from ef74d0c to 83a8800 Compare October 9, 2024 15:43
@@ -444,12 +439,14 @@ class StoreAttr(object):

from optimum.intel.openvino.quantization import _weight_only_quantization

submodel = core.read_model(submodel_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the model is gonna be read either ways, why nesting return_model_sizes and modify the behavior of all the export functions ? please explain the necessity for making export functions return model sizes as well, because it doesn't make sense for me.

Copy link
Collaborator Author

@eaidova eaidova Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the porpose of this read model in this place is prepare model for quantization, if model not planned to be compressed, there is no need in this step.
There are several cases for this:

  1. nncf is not available (in this case we would like to notify user that if model applicable for quantization by model size, he loose opportunity to optimize it and calculation of model size is helpful)
  2. quantization disabled by export config (e.g. load_in_8bit=False in from_pretrained or --weight-format fp16/fp32 in optimum-cli
  3. default decision about necessary to compress model based on model size, we have rule by default apply int8 weight compression only in case if model size > 1B parameters and any additional quantization configs not provided (it is also the reason why we need calculate model size and avoid extra reading model if at this moment we already know that weight compression is not applicable)

in some cases, when user uses cli and not plan to use model in the same machine, it is just waste of time and possibly memory to read model, when it does not need (if we are not speaking about additional issues caused by need to take care about it removal from memory that we try to avoid)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that avoiding an extra model read in case the model is not going to be quantized anyway is a good thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing return_model_sizes introduces some complexity so would prefer to avoid if possible, what do you think about moving this logic (applying quantization on each submodels depending on multiple criteria) from main_export to _save_model (or export_pytorch / export_pytorch_via_onnx / export_tensorflow) ? Also the ov_config could be updated to know whether we should check the model's size (in case where quantization wasn't specified) or if the model should or shouldn't be quantized (load_in_8bit set to True of False for example) wdyt @eaidova ?

Copy link
Collaborator Author

@eaidova eaidova Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is also thing what we wanted to avoid, when we save ov model, pytorch model still in memory and this memory can not be deallocated until we do not return into place where it was loaded (export_main) that lead to extra memory consumption during weight compression, see for details #878

@nikita-savelyevv
Copy link
Collaborator

The PR has been stale for a while. @IlyasMoutawwakil @echarlaix do you have any additional concerns regarding this PR?

@@ -444,12 +439,14 @@ class StoreAttr(object):

from optimum.intel.openvino.quantization import _weight_only_quantization

submodel = core.read_model(submodel_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing return_model_sizes introduces some complexity so would prefer to avoid if possible, what do you think about moving this logic (applying quantization on each submodels depending on multiple criteria) from main_export to _save_model (or export_pytorch / export_pytorch_via_onnx / export_tensorflow) ? Also the ov_config could be updated to know whether we should check the model's size (in case where quantization wasn't specified) or if the model should or shouldn't be quantized (load_in_8bit set to True of False for example) wdyt @eaidova ?

@eaidova
Copy link
Collaborator Author

eaidova commented Oct 22, 2024

@echarlaix @IlyasMoutawwakil could you please merge these changes? issue with inability to remove model from tmp dir is blocking several customer requests, thanks

@eaidova eaidova closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openvino-test Trigger OpenVINO slow tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants