[PT FE] Optimize memory usage of patch_model #27428

mvafin · 2024-11-06T10:48:59Z

Details:

no_jit_trace was using extra memory to get and store trace state, which contained all graph that was produced before, that increase memory consumption of tracing. For example for FLUX model it used about 20Gb of extra memory
Saving args on meta device didn't work without no_jit_trace. To workaround this issue we now pass args directly to forward without saving them in Trampoline. This allows better flow for arguments and reduce memory used to save those args. However this changes the behavior of evaluate of ModuleExtension, because now it uses the args that were passed to convert and not the original args.

optimum-cli for FLUX with torch_dtype=torch.bfloat16 before change:

optimum-cli for FLUX with torch_dtype=torch.bfloat16 after change:

Note: optimum doesn't yet support torch_dtype=torch.bfloat16 for FLUX.

Tickets:

CVS-151254

Signed-off-by: Maxim Vafin <[email protected]>

### Details: - *`no_jit_trace` was using extra memory to get and store trace state, which contained all graph that was produced before, that increase memory consumption of tracing. For example for FLUX model it used about 20Gb of extra memory* - *Saving `args` on meta device didn't work without `no_jit_trace`. To workaround this issue we now pass args directly to forward without saving them in `Trampoline`. This allows better flow for arguments and reduce memory used to save those args. However this changes the behavior of `evaluate` of `ModuleExtension`, because now it uses the args that were passed to `convert` and not the original args.* optimum-cli for FLUX with `torch_dtype=torch.bfloat16` before change: ![image](https://github.com/user-attachments/assets/f070068a-e52e-4558-956e-95afa64d1dbc) optimum-cli for FLUX with `torch_dtype=torch.bfloat16` after change: ![image](https://github.com/user-attachments/assets/a76fe1df-2410-4b92-9b01-38ef40133b2b) Note: optimum doesn't yet support `torch_dtype=torch.bfloat16` for FLUX. ### Tickets: - *CVS-151254* --------- Signed-off-by: Maxim Vafin <[email protected]>

### Details: - *Cherry-pick #27428 and #27413 in 24.6 branch* ### Tickets: - *ticket-id* --------- Signed-off-by: Maxim Vafin <[email protected]>

mvafin added 2 commits November 5, 2024 16:43

[PT FE] Inherit signature from forward while patching

e1fb4a4

Signed-off-by: Maxim Vafin <[email protected]>

[PT FE] Optimize memory usage in patch_model

a162c69

Signed-off-by: Maxim Vafin <[email protected]>

mvafin requested a review from a team as a code owner November 6, 2024 10:49

github-actions bot added category: Python API OpenVINO Python bindings category: PyTorch FE OpenVINO PyTorch Frontend labels Nov 6, 2024

mvafin requested review from eaidova, ilya-lavrenov and slyalin November 6, 2024 10:55

Merge branch 'master' into mvafin/pt_fe/optimize_patch

a9b362b

eaidova approved these changes Nov 8, 2024

View reviewed changes

mvafin added this pull request to the merge queue Nov 11, 2024

Merged via the queue into openvinotoolkit:master with commit f9118af Nov 11, 2024
166 checks passed

mvafin deleted the mvafin/pt_fe/optimize_patch branch November 11, 2024 10:54

mvafin mentioned this pull request Nov 22, 2024

[PT FE] Improve 16bit patching #27693

Merged

github-merge-queue bot pushed a commit that referenced this pull request Nov 27, 2024

[PT FE] Improve 16bit patching (#27693)

1c9b23c

### Details: - *Cherry-pick #27428 and #27413 in 24.6 branch* ### Tickets: - *ticket-id* --------- Signed-off-by: Maxim Vafin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PT FE] Optimize memory usage of patch_model #27428

[PT FE] Optimize memory usage of patch_model #27428

mvafin commented Nov 6, 2024

[PT FE] Optimize memory usage of patch_model #27428

[PT FE] Optimize memory usage of patch_model #27428

Conversation

mvafin commented Nov 6, 2024

Details:

Tickets: