Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a subtle fix for gemma 2 conversions #1701

Merged

Conversation

mattdangerw
Copy link
Member

Gemma 2 will use different normalization constants for the query depending of the model size.

9b = head_dim
27b = hidden_dim / num_query_heads

We need to slightly tweak our config conversion to account for this.

@mattdangerw mattdangerw requested a review from grasskin July 19, 2024 20:57
@github-actions github-actions bot added the Gemma Gemma model specific issues label Jul 19, 2024
Copy link
Member

@SamanehSaadat SamanehSaadat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Matt!

Just one comment: we may want to update the docstring here since it says Gemma2 always uses hidden_dim/num_query_heads.

Gemma 2 will use different normalization constants for the query
depending of the model size.

9b = head_dim
27b = hidden_dim / num_query_heads

We need to slightly tweak our config conversion to account for this.
@mattdangerw mattdangerw force-pushed the subtle-fix-for-gemma-2-conversion branch from 8df5959 to 3e5f3ff Compare July 19, 2024 21:57
@mattdangerw mattdangerw merged commit 3131ca9 into keras-team:master Jul 19, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants