Add a subtle fix for gemma 2 conversions #1701

mattdangerw · 2024-07-19T20:56:52Z

Gemma 2 will use different normalization constants for the query depending of the model size.

9b = head_dim
27b = hidden_dim / num_query_heads

We need to slightly tweak our config conversion to account for this.

SamanehSaadat

Thanks, Matt!

Just one comment: we may want to update the docstring here since it says Gemma2 always uses hidden_dim/num_query_heads.

Gemma 2 will use different normalization constants for the query depending of the model size. 9b = head_dim 27b = hidden_dim / num_query_heads We need to slightly tweak our config conversion to account for this.

mattdangerw requested a review from grasskin July 19, 2024 20:57

github-actions bot added the Gemma Gemma model specific issues label Jul 19, 2024

mattdangerw requested a review from SamanehSaadat July 19, 2024 20:57

SamanehSaadat approved these changes Jul 19, 2024

View reviewed changes

Add a subtle fix for gemma 2 conversions

3e5f3ff

Gemma 2 will use different normalization constants for the query depending of the model size. 9b = head_dim 27b = hidden_dim / num_query_heads We need to slightly tweak our config conversion to account for this.

mattdangerw force-pushed the subtle-fix-for-gemma-2-conversion branch from 8df5959 to 3e5f3ff Compare July 19, 2024 21:57

mattdangerw merged commit 3131ca9 into keras-team:master Jul 19, 2024
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a subtle fix for gemma 2 conversions #1701

Add a subtle fix for gemma 2 conversions #1701

mattdangerw commented Jul 19, 2024

SamanehSaadat left a comment

Add a subtle fix for gemma 2 conversions #1701

Add a subtle fix for gemma 2 conversions #1701

Conversation

mattdangerw commented Jul 19, 2024

SamanehSaadat left a comment

Choose a reason for hiding this comment