-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port albert transformer checkpoint #1767
Conversation
hf_weight_key="albert.encoder.embedding_hidden_mapping_in.bias", | ||
) | ||
|
||
keras_prefix = backbone.get_layer("group_0_inner_layer_0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't really a prefix, it's a layer. maybe change the name?
) | ||
|
||
keras_prefix = backbone.get_layer("group_0_inner_layer_0") | ||
hf_prefix = "albert.encoder.albert_layer_groups.0.albert_layers.0." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this seems two hardcoded. Both Keras and HF have config support for multiple groups, but this assumes that num_hidden_groups=1
and maybe that inner_group_num=1
too.
If it's easy to write this in such a way that we are not assuming those constants we should just do that. Otherwise, we should at least throw if num_hidden_groups > 1
or inner_group_num > 1
, which could be true for a user upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Makes sense. I have made changes to fix it. Does this approach works?
This looks good to me! I pushed some formatting fixes to limit line length, but will pull this in. |
Just noticed while porting keras-team#1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model
Here's a colab to test encoder style checkpoints like this if it's helpful https://colab.research.google.com/gist/mattdangerw/a1fedc952d28c8395021a94b77e75872/albert-test.ipynb |
Just noticed while porting keras-team#1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model
Just noticed while porting #1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model
* port albert * update test * resolve comments * changed name * minor formatting fixes --------- Co-authored-by: Matt Watson <[email protected]>
Just noticed while porting keras-team#1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model
Hi @mattdangerw @ariG23498,
Ported albert transformers checkpoint in kerasNLP. Please check. Thank you!