Don't duplicate frozen parameters during predict() #20851

mattdangerw · 2025-02-03T17:43:22Z

On the Jax backend we were not using donate_argnums during predict(). This works when a model is mostly trainable, but when a model is mostly or all frozen, this will result in 2x the memory jump (which is why we use donate_argnums for fit and evaluate).

This change adds donate_argnums to the predict function to avoid the memory spike. But because this means all incoming state (including the trainable variables) will be deleted by jax, this means we need to sync the trainable variables state much like in fit and evaluate. An alternative would be to change the predict_step signature (so we could only donate non-trainable variables), but this would be a breaking change and confusing.

codecov-commenter · 2025-02-03T17:47:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.25%. Comparing base (fc1b26d) to head (84a4897).
Report is 4 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #20851      +/-   ##
==========================================
+ Coverage   82.04%   82.25%   +0.20%     
==========================================
  Files         559      559              
  Lines       52367    52374       +7     
  Branches     8096     8096              
==========================================
+ Hits        42964    43078     +114     
+ Misses       7427     7305     -122     
- Partials     1976     1991      +15

Flag	Coverage Δ
keras	`82.06% <100.00%> (+0.20%)`	⬆️
keras-jax	`64.18% <100.00%> (-0.08%)`	⬇️
keras-numpy	`58.99% <0.00%> (+<0.01%)`	⬆️
keras-openvino	`32.55% <0.00%> (+2.73%)`	⬆️
keras-tensorflow	`64.82% <0.00%> (+<0.01%)`	⬆️
keras-torch	`64.15% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

On the Jax backend we were not using donate_argnums during predict. This works when a model is mostly trainable, but when a model is mostly or all frozen, this will result in 2x the memory jump (which is why we use donate_argnums for fit and evaluate). This change adds donate_argnums to the predict function to avoid the memory spike. But because this means all incoming state (including the trainable variables) will be deleted by jax, this means we need to sync the trainable variables state much like in fit and evaluate. An alternative would be to change the predict_step signature (so we could only donate non-trainable variables), but this would be a breaking change and confusing.

mattdangerw · 2025-02-03T19:38:24Z

Notably this will come up for lora + transformers.

gemma = keras_hub.CausalLM.from_preset("gemma...")
gemma.backbone.enable_lora(4)
gemma.fit(...)
gemma.predict(example) # double optimal memory usage do to doubled frozen params

fchollet

LGTM, thanks for the fix!

google-ml-butler bot added the size:S label Feb 3, 2025

google-ml-butler bot assigned gbaned Feb 3, 2025

mattdangerw requested a review from fchollet February 3, 2025 17:43

google-ml-butler bot added the awaiting review label Feb 3, 2025

mattdangerw force-pushed the predict-memory-use-fix branch from e4eefb4 to 84a4897 Compare February 3, 2025 19:36

fchollet approved these changes Feb 4, 2025

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Feb 4, 2025

fchollet merged commit 3b0d4de into keras-team:master Feb 4, 2025
9 of 10 checks passed

google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase kokoro:force-run labels Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't duplicate frozen parameters during predict() #20851

Don't duplicate frozen parameters during predict() #20851

mattdangerw commented Feb 3, 2025

codecov-commenter commented Feb 3, 2025 •

edited

Loading

mattdangerw commented Feb 3, 2025

fchollet left a comment

Don't duplicate frozen parameters during predict() #20851

Don't duplicate frozen parameters during predict() #20851

Conversation

mattdangerw commented Feb 3, 2025

codecov-commenter commented Feb 3, 2025 • edited Loading

Codecov Report

mattdangerw commented Feb 3, 2025

fchollet left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 3, 2025 •

edited

Loading