Sharded weights support #2218

james77777778 · 2025-04-19T11:47:03Z

Please see the colab for an example using Gemma2 2B:
https://colab.research.google.com/drive/1iF_Psb6aEV2pkajT-q9ZBjpoO4RX4-Qa?usp=sharing

This PR adds support for sharded weights in KerasPresetSaver and KerasPresetLoader.
The default max_shard_size is set to 10GB.

Kindly ping @divyashreepathihalli @mattdangerw

Note: This feature requires the latest Keras (git+https://github.com/keras-team/keras.git). It is difficult to ensure the backward compatibility.

Related to #2084

mattdangerw · 2025-04-20T22:53:50Z

@james77777778 thanks will take a look! We don't need to be backwards compatible here, the error message you have which an action the user can take is as good as we can do here I think.

mattdangerw

Thanks! Just a couple comments.

keras_hub/src/utils/preset_utils.py

mattdangerw · 2025-04-25T01:28:47Z

keras_hub/src/utils/preset_utils.py

+            dtype = keras.backend.standardize_dtype(dtype)
+            dtype_size = int(
+                (
+                    dtype.replace("bfloat", "")


can you explain what's going on here? maybe flip this to a dtype_size function (just so you can add a quick docstring?)

I have updated the code, and it should be more explicit by using regex.

keras_hub/src/utils/preset_utils_test.py

mattdangerw · 2025-04-25T01:31:36Z

keras_hub/src/utils/preset_utils_test.py

+            "use_post_attention_norm": True,
+            "use_sliding_window_attention": True,
+        }
+        backbone = GemmaBackbone(**init_kwargs)  # ~4.4MB


Can we make this even smaller? Feel free to use bert or something simple if its easier. Try to make this run as fast as possible while testing the business logic.

I have changed the config to make the backbone smaller (422KB). It now takes only 2.5 seconds from the start to the end of the test.

github-actions bot added the Gemma Gemma model specific issues label Apr 19, 2025

james77777778 added the kokoro:force-run Runs Tests on GPU label Apr 19, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 19, 2025

james77777778 force-pushed the sharded_weights_support branch from 9c92ba4 to bf9966a Compare April 20, 2025 07:08

james77777778 added the kokoro:force-run Runs Tests on GPU label Apr 20, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 20, 2025

mattdangerw self-requested a review April 20, 2025 22:52

mattdangerw reviewed Apr 25, 2025

View reviewed changes

james77777778 added 2 commits April 25, 2025 11:23

Add support for sharded weights.

b7d330f

Add max_shard_size to Backbone and Task. Simplify the test.

2b70a6f

james77777778 force-pushed the sharded_weights_support branch from bf9966a to 2b70a6f Compare April 25, 2025 06:08

james77777778 added the kokoro:force-run Runs Tests on GPU label Apr 25, 2025

james77777778 requested a review from mattdangerw April 25, 2025 08:25

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 25, 2025

sachinprasadhs added the stat:awaiting keras-eng label Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharded weights support #2218

Sharded weights support #2218

james77777778 commented Apr 19, 2025 •

edited

Loading

mattdangerw commented Apr 20, 2025 •

edited

Loading

mattdangerw left a comment

mattdangerw Apr 25, 2025

james77777778 Apr 25, 2025

mattdangerw Apr 25, 2025

james77777778 Apr 25, 2025

Sharded weights support #2218

Are you sure you want to change the base?

Sharded weights support #2218

Conversation

james77777778 commented Apr 19, 2025 • edited Loading

mattdangerw commented Apr 20, 2025 • edited Loading

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Apr 25, 2025

Choose a reason for hiding this comment

james77777778 Apr 25, 2025

Choose a reason for hiding this comment

mattdangerw Apr 25, 2025

Choose a reason for hiding this comment

james77777778 Apr 25, 2025

Choose a reason for hiding this comment

james77777778 commented Apr 19, 2025 •

edited

Loading

mattdangerw commented Apr 20, 2025 •

edited

Loading