Port albert transformer checkpoint #1767

cosmo3769 · 2024-08-12T14:16:04Z

Ported albert transformers checkpoint in kerasNLP. Please check. Thank you!

mattdangerw · 2024-08-12T19:09:37Z

keras_nlp/src/utils/transformers/convert_albert.py

+        hf_weight_key="albert.encoder.embedding_hidden_mapping_in.bias",
+    )
+
+    keras_prefix = backbone.get_layer("group_0_inner_layer_0")


this isn't really a prefix, it's a layer. maybe change the name?

mattdangerw · 2024-08-12T19:31:57Z

keras_nlp/src/utils/transformers/convert_albert.py

+    )
+
+    keras_prefix = backbone.get_layer("group_0_inner_layer_0")
+    hf_prefix = "albert.encoder.albert_layer_groups.0.albert_layers.0."


Also, this seems two hardcoded. Both Keras and HF have config support for multiple groups, but this assumes that num_hidden_groups=1 and maybe that inner_group_num=1 too.

If it's easy to write this in such a way that we are not assuming those constants we should just do that. Otherwise, we should at least throw if num_hidden_groups > 1 or inner_group_num > 1, which could be true for a user upload.

Yeah. Makes sense. I have made changes to fix it. Does this approach works?

mattdangerw · 2024-08-19T22:42:49Z

This looks good to me! I pushed some formatting fixes to limit line length, but will pull this in.

Just noticed while porting keras-team#1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model

mattdangerw · 2024-08-19T23:23:13Z

Here's a colab to test encoder style checkpoints like this if it's helpful https://colab.research.google.com/gist/mattdangerw/a1fedc952d28c8395021a94b77e75872/albert-test.ipynb

Just noticed while porting keras-team#1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model

Just noticed while porting #1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model

* port albert * update test * resolve comments * changed name * minor formatting fixes --------- Co-authored-by: Matt Watson <[email protected]>

Just noticed while porting keras-team#1767 that the default learning rate for our classifier does not work for albert pretrained checkpoints. Let's lower it for this model

cosmo3769 added 2 commits August 12, 2024 19:44

port albert

db4198e

update test

f59a20b

mattdangerw reviewed Aug 12, 2024

View reviewed changes

cosmo3769 added 2 commits August 16, 2024 18:46

resolve comments

5342a32

changed name

fd0285a

cosmo3769 requested a review from mattdangerw August 16, 2024 13:23

minor formatting fixes

56226fc

mattdangerw mentioned this pull request Aug 19, 2024

Lower the default learning rate for albert #1786

Merged

mattdangerw merged commit c60e9ad into keras-team:master Aug 19, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port albert transformer checkpoint #1767

Port albert transformer checkpoint #1767

cosmo3769 commented Aug 12, 2024

mattdangerw Aug 12, 2024

mattdangerw Aug 12, 2024

cosmo3769 Aug 16, 2024

mattdangerw commented Aug 19, 2024

mattdangerw commented Aug 19, 2024

Port albert transformer checkpoint #1767

Port albert transformer checkpoint #1767

Conversation

cosmo3769 commented Aug 12, 2024

mattdangerw Aug 12, 2024

Choose a reason for hiding this comment

mattdangerw Aug 12, 2024

Choose a reason for hiding this comment

cosmo3769 Aug 16, 2024

Choose a reason for hiding this comment

mattdangerw commented Aug 19, 2024

mattdangerw commented Aug 19, 2024