fix issue with imposibility remove uncompressed model in tmp after compression #925

eaidova · 2024-10-04T14:26:35Z

What does this PR do?

Reworked conversion part and weight compression after it for avoiding tmp directory lock on windows and unnecessary reading model.

Added handler for tmp dir that allow ignore permission errors (natively added in python3.10, for forward-porting to <3.10, we can remove it after removing python3.9 support)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-10-04T14:36:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/exporters/openvino/__main__.py

optimum/exporters/openvino/convert.py

optimum/intel/openvino/utils.py

IlyasMoutawwakil · 2024-10-10T09:53:14Z

optimum/exporters/openvino/__main__.py

@@ -444,12 +439,14 @@ class StoreAttr(object):

        from optimum.intel.openvino.quantization import _weight_only_quantization

+        submodel = core.read_model(submodel_path)


if the model is gonna be read either ways, why nesting return_model_sizes and modify the behavior of all the export functions ? please explain the necessity for making export functions return model sizes as well, because it doesn't make sense for me.

the porpose of this read model in this place is prepare model for quantization, if model not planned to be compressed, there is no need in this step.
There are several cases for this:

nncf is not available (in this case we would like to notify user that if model applicable for quantization by model size, he loose opportunity to optimize it and calculation of model size is helpful)

quantization disabled by export config (e.g. load_in_8bit=False in from_pretrained or --weight-format fp16/fp32 in optimum-cli

default decision about necessary to compress model based on model size, we have rule by default apply int8 weight compression only in case if model size > 1B parameters and any additional quantization configs not provided (it is also the reason why we need calculate model size and avoid extra reading model if at this moment we already know that weight compression is not applicable)

in some cases, when user uses cli and not plan to use model in the same machine, it is just waste of time and possibly memory to read model, when it does not need (if we are not speaking about additional issues caused by need to take care about it removal from memory that we try to avoid)

I also agree that avoiding an extra model read in case the model is not going to be quantized anyway is a good thing.

Introducing return_model_sizes introduces some complexity so would prefer to avoid if possible, what do you think about moving this logic (applying quantization on each submodels depending on multiple criteria) from main_export to _save_model (or export_pytorch / export_pytorch_via_onnx / export_tensorflow) ? Also the ov_config could be updated to know whether we should check the model's size (in case where quantization wasn't specified) or if the model should or shouldn't be quantized (load_in_8bit set to True of False for example) wdyt @eaidova ?

no, this is also thing what we wanted to avoid, when we save ov model, pytorch model still in memory and this memory can not be deallocated until we do not return into place where it was loaded (export_main) that lead to extra memory consumption during weight compression, see for details #878

nikita-savelyevv · 2024-10-16T08:02:40Z

The PR has been stale for a while. @IlyasMoutawwakil @echarlaix do you have any additional concerns regarding this PR?

echarlaix · 2024-10-17T13:42:17Z

optimum/exporters/openvino/__main__.py

@@ -444,12 +439,14 @@ class StoreAttr(object):

        from optimum.intel.openvino.quantization import _weight_only_quantization

+        submodel = core.read_model(submodel_path)


Introducing return_model_sizes introduces some complexity so would prefer to avoid if possible, what do you think about moving this logic (applying quantization on each submodels depending on multiple criteria) from main_export to _save_model (or export_pytorch / export_pytorch_via_onnx / export_tensorflow) ? Also the ov_config could be updated to know whether we should check the model's size (in case where quantization wasn't specified) or if the model should or shouldn't be quantized (load_in_8bit set to True of False for example) wdyt @eaidova ?

eaidova · 2024-10-22T05:06:22Z

@echarlaix @IlyasMoutawwakil could you please merge these changes? issue with inability to remove model from tmp dir is blocking several customer requests, thanks

eaidova mentioned this pull request Oct 4, 2024

fix win #924

Closed

3 tasks

eaidova force-pushed the ea/fix_win_compression branch from 4048b60 to ff6bd41 Compare October 4, 2024 14:31

eaidova force-pushed the ea/fix_win_compression branch from f113f31 to 0b2a1ee Compare October 7, 2024 06:31

nikita-savelyevv reviewed Oct 7, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

eaidova force-pushed the ea/fix_win_compression branch from 0b2a1ee to b02753a Compare October 7, 2024 06:42

eaidova requested review from echarlaix and AlexKoff88 October 7, 2024 06:42

AlexKoff88 reviewed Oct 7, 2024

View reviewed changes

optimum/exporters/openvino/convert.py Show resolved Hide resolved

AlexKoff88 approved these changes Oct 7, 2024

View reviewed changes

eaidova added openvino-test Trigger OpenVINO slow tests and removed openvino-test Trigger OpenVINO slow tests labels Oct 7, 2024

eaidova added 7 commits October 9, 2024 11:41

disable long scale usage for exported phi3

d908a5a

fix windows tmp dir issues

3071edf

reset destructor experiments

fb0b368

add tmp dir with ignore cleanup errors

a28db3a

keep bakcward compatibility

85116d8

shutil.rmtree compatibility for python 3.8

7069cb1

apply updated content

06625ca

eaidova force-pushed the ea/fix_win_compression branch from 13c9356 to 06625ca Compare October 9, 2024 08:02

nikita-savelyevv reviewed Oct 9, 2024

View reviewed changes

optimum/intel/openvino/utils.py Show resolved Hide resolved

nikita-savelyevv added the openvino-test Trigger OpenVINO slow tests label Oct 9, 2024

use tmp dir in tests

c66f866

eaidova force-pushed the ea/fix_win_compression branch 2 times, most recently from 3d521c0 to ef74d0c Compare October 9, 2024 14:55

Merge branch 'main' into ea/fix_win_compression

83a8800

eaidova force-pushed the ea/fix_win_compression branch from ef74d0c to 83a8800 Compare October 9, 2024 15:43

echarlaix requested a review from IlyasMoutawwakil October 9, 2024 16:11

IlyasMoutawwakil reviewed Oct 10, 2024

View reviewed changes

eaidova requested a review from IlyasMoutawwakil October 14, 2024 05:00

echarlaix approved these changes Oct 17, 2024

View reviewed changes

sammysun0711 mentioned this pull request Oct 21, 2024

Failed to remove Tmp directory on Windows after export openvino model #826

Open

eaidova closed this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix issue with imposibility remove uncompressed model in tmp after compression #925

fix issue with imposibility remove uncompressed model in tmp after compression #925

eaidova commented Oct 4, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 4, 2024

IlyasMoutawwakil Oct 10, 2024

eaidova Oct 10, 2024 •

edited

Loading

nikita-savelyevv Oct 16, 2024

echarlaix Oct 17, 2024

eaidova Oct 17, 2024 •

edited

Loading

nikita-savelyevv commented Oct 16, 2024

echarlaix Oct 17, 2024

eaidova commented Oct 22, 2024

		@@ -444,12 +439,14 @@ class StoreAttr(object):

		from optimum.intel.openvino.quantization import _weight_only_quantization

		submodel = core.read_model(submodel_path)

fix issue with imposibility remove uncompressed model in tmp after compression #925

fix issue with imposibility remove uncompressed model in tmp after compression #925

Conversation

eaidova commented Oct 4, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Oct 4, 2024

IlyasMoutawwakil Oct 10, 2024

Choose a reason for hiding this comment

eaidova Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

nikita-savelyevv Oct 16, 2024

Choose a reason for hiding this comment

echarlaix Oct 17, 2024

Choose a reason for hiding this comment

eaidova Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

nikita-savelyevv commented Oct 16, 2024

echarlaix Oct 17, 2024

Choose a reason for hiding this comment

eaidova commented Oct 22, 2024

eaidova commented Oct 4, 2024 •

edited

Loading

eaidova Oct 10, 2024 •

edited

Loading

eaidova Oct 17, 2024 •

edited

Loading