Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed default TokenAndPositionEmbedding initializer to 'uniform' #1237

Merged
merged 1 commit into from
Sep 12, 2023

Conversation

jackd
Copy link
Contributor

@jackd jackd commented Sep 7, 2023

TokenAndPositionEmbedding advertises itself as a combination of a Embedding and PositionEmbedding, but the inconsistency in initializer leads to a surprising degradation of performance, e.g. replacing the custom layer in this example with TokenAndPositionEmbedding makes results significantly worse (unless you manually specify the initializer). AFAIK there is no theoretical reason why glorot initialization should be here.

@jackd
Copy link
Contributor Author

jackd commented Sep 8, 2023

I have no idea what Needs /gcbrun from a collaborator means...

@mattdangerw
Copy link
Member

I have no idea what Needs /gcbrun from a collaborator means...

This is some manually triggered GPU testing we maintainers will run for some PRs. No need on this one.

@mattdangerw
Copy link
Member

Thanks! Yeah we should definitely be using the same initializer as keras.layers.Embedding anyway for consistency, so this change looks good to me.

Nice catch!

@mattdangerw
Copy link
Member

Hmm, looks like this is causing some test failures where we asserted the default while testing configs, but I actually believe that will no longer be the case with #1238. I will just rebase this PR, I think everything should pass after that.

@mattdangerw mattdangerw force-pushed the tp-embeddings-initializer branch from f8d7675 to cf31bfe Compare September 11, 2023 23:43
@mattdangerw mattdangerw merged commit afe0432 into keras-team:master Sep 12, 2023
@jackd jackd deleted the tp-embeddings-initializer branch September 12, 2023 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants