Add a subtle fix for gemma 2 conversions

Gemma 2 will use different normalization constants for the query depending of the model size. 9b = head_dim 27b = hidden_dim / num_query_heads We need to slightly tweak our config conversion to account for this.
keras-team · Jul 19, 2024 · 8df5959 · 8df5959
1 parent b0c21b3
commit 8df5959
Showing 1 changed file with 4 additions and 1 deletion.
diff --git a/keras_nlp/src/utils/transformers/convert_gemma.py b/keras_nlp/src/utils/transformers/convert_gemma.py
@@ -59,7 +59,10 @@ def load_gemma_backbone(cls, preset, load_weights):
             "hidden_dim": transformers_config["hidden_size"],
             "intermediate_dim": transformers_config["intermediate_size"] * 2,
             "head_dim": transformers_config["head_dim"],
-            "query_head_dim_normalize": False,
+            "query_head_dim_normalize": (
+                transformers_config["head_dim"]
+                == transformers_config["query_pre_attn_scalar"]
+            ),
             "use_post_ffw_norm": True,
             "use_post_attention_norm": True,
             "final_logit_soft_cap": transformers_config[