Add FNet Backbone #643

abheesht17 · 2023-01-07T18:29:10Z

Checkpoint Conversion Notebook: https://colab.research.google.com/drive/1VcLbisTI72yUhufLwxPmwGotNRMvhI4U?usp=sharing.

Note: I've taken great care to make sure the kernel/bias initializers for every layer are correct. Please confirm if they are correct.

mattdangerw · 2023-01-10T01:34:38Z

keras_nlp/models/fnet/fnet_backbone.py

+
+
+def fnet_kernel_initializer(mode="fnet_default", **kwargs):
+    if mode == "fnet_default":


let's just go with fnet_default if that's the better one. this won't be exposed, so need to stick a lot of options here

The issue is that for the embedding projection layer, flax_default (https://github.com/keras-team/keras-nlp/pull/643/files#diff-2a64a80c1e1e4587b93364e7a5f6b2157075c7af63793e47696613305d66be08R146) is used, and fnet_default for the rest. That's why I've kept two modes here. It isn't a "switch on-switch off for all" kinda argument.

Can you link to the code? It is somewhat sounding like this was just laziness on passing the initializer around fully. If so, this might be another place to ignore too.

At a high level, for things like activations (or anything that affects pretrained checkpoints) we have to be 100% aligned with upstream. For things like initializers, we can afford to be a little more editorial, keep things simpler where possible.

Yeah, sure!

These are the defaults they use: https://github.com/google-research/google-research/blob/master/f_net/layers.py#L33-L36.

Embedding layers https://github.com/keras-team/keras-nlp/blob/d962a0aa2e45f50fe8572da168a0d9dc6c754fed/keras_nlp/models/fnet/fnet_backbone.py#L111-L129

https://github.com/google-research/google-research/blob/master/f_net/layers.py#L364-L380

Embedding projection layer https://github.com/keras-team/keras-nlp/blob/d962a0aa2e45f50fe8572da168a0d9dc6c754fed/keras_nlp/models/fnet/fnet_backbone.py#L142-L149

https://github.com/google-research/google-research/blob/master/f_net/layers.py#L386-L388

Here, they forgot to pass the initializer...and hence, use Flax defaults. This is the Flax default: https://flax.readthedocs.io/en/latest/_modules/flax/linen/linear.html#Dense (lecun_normal(), which is a special case of variance scaling).

Intermediate dense layer
https://github.com/google-research/google-research/blob/master/f_net/layers.py#L74-L79 (they use their own defaults here)

But for output dense layer, they forgot to pass the bias initializer (and hence, I pass "zeros" to that layer): https://github.com/google-research/google-research/blob/master/f_net/layers.py#L81

Pooler layer
https://github.com/google-research/google-research/blob/master/f_net/models.py#L85-L86
Again, they forgot to pass their own bias initializer default...and hence, I have "zeros" here.

I can mail the authors to confirm whether it was an oversight, or whether there was any intention behind this (I have mailed them before, and they have replied promptly) :D

mattdangerw · 2023-01-10T01:42:58Z

keras_nlp/models/fnet/fnet_backbone.py

+                dropout=dropout,
+                kernel_initializer=fnet_kernel_initializer(stddev=0.02),
+                bias_initializer=fnet_bias_initializer(),
+                bias_initializer_output_dense="zeros",


this is interesting, do we know why they do this?

I feel it's probably an oversight on their part. They forgot to pass the bias initializer, and the Flax default is "zeros". I doubt there is a reason for this, but not sure.

If we think it's just an oversight, I might just ignore it, as otherwise this would be polluting our "modular" API. Thankfully this initializer stuff is totally out of the picture for the 99% use case of using checkpoints.

If someone is trying to pretrain fnet, discovers this is an issue and raises a bug with us, we can happily fix down the road.

mattdangerw

Looking good! Just two small comments

mattdangerw · 2023-01-12T23:06:36Z

keras_nlp/models/fnet/fnet_backbone.py

+
+    This class implements a bi-directional Fourier Transform-based encoder as
+    described in ["FNet: Mixing Tokens with Fourier Transforms"](https://arxiv.org/abs/2105.03824).
+    It includes the embedding lookups and FNet layers, but not the masked


FNet layers -> keras_nlp.layers.FNetEncoder layers

That will autogenerate a cross link in our docs which might be nice.

mattdangerw · 2023-01-13T01:49:20Z

keras_nlp/models/fnet/fnet_backbone.py

+    The default constructor gives a fully customizable, randomly initialized FNet
+    encoder with any number of layers and embedding dimensions. To load
+    preset architectures and weights, use the `from_preset` constructor.
+


Note: unlike other models, FNet does not take in a "padding_mask" input, the "<pad>" token is handled equivalently to all other tokens in the input sequence.

jbischof

A few quick clarifications needed, thanks!

keras_nlp/models/f_net/f_net_backbone.py

jbischof · 2023-01-13T22:18:39Z

keras_nlp/models/f_net/f_net_backbone.py

+            dtype=tf.float32,
+        )(x)
+
+        # Project the embedding to `hidden_dim`.


The embedding is already of size hidden_dim. Does this do anything?

Yeah, it's just a (hidden_dim, hidden_dim) linear layer. It's there in the official code: https://github.com/google-research/google-research/blob/master/f_net/layers.py#L386-L388.

I'll remove the comment.

Weird do you have any intuition here? I get why Albert has it but this makes no sense

I don't think the linear layer serves any specific purpose. In fact, they use this linear layer in their BERT implementation as well (they implemented BERT in order to do a comparative study between the two models).

Ok as long as the checkpoint load!

keras_nlp/models/f_net/f_net_backbone.py

jbischof

Thank you!

abheesht17 added 6 commits January 7, 2023 18:18

Add rough FNetBackbone

645df1c

Add FNetBackbone to __init__.py

ae89ba6

Add __init__.py

328a201

Some basic formatting

921db40

Use correct initializers

e8accc6

Fix

3b29207

mattdangerw self-requested a review January 9, 2023 19:17

mattdangerw reviewed Jan 10, 2023

View reviewed changes

abheesht17 added 3 commits January 10, 2023 17:45

Add UTs

05ac44d

Small change

ace6527

Fix UTs

d962a0a

abheesht17 requested a review from mattdangerw January 10, 2023 16:24

jbischof self-requested a review January 10, 2023 21:31

abheesht17 mentioned this pull request Jan 12, 2023

Add FNetClassifier #650

Closed

Use common initialiser everywhere

73dbe27

mattdangerw approved these changes Jan 13, 2023

View reviewed changes

abheesht17 and others added 4 commits January 13, 2023 09:56

Address comments

24a619d

Rename fnet to f_net

5e5a3f2

Minor fix

1021832

Merge branch 'master' into fnet-backbone

d772704

mattdangerw mentioned this pull request Jan 13, 2023

Rename fnet_encoder.py -> f_net_encoder.py #656

Closed

jbischof reviewed Jan 13, 2023

View reviewed changes

abheesht17 added 2 commits January 14, 2023 09:02

Remove emb proj comment

d63e3eb

fnet_encoder -> f_net_encoder

7a03a39

jbischof approved these changes Jan 14, 2023

View reviewed changes

jbischof merged commit 49c5486 into keras-team:master Jan 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FNet Backbone #643

Add FNet Backbone #643

abheesht17 commented Jan 7, 2023

mattdangerw Jan 10, 2023

abheesht17 Jan 10, 2023 •

edited

Loading

mattdangerw Jan 10, 2023

abheesht17 Jan 11, 2023 •

edited

Loading

abheesht17 Jan 11, 2023

mattdangerw Jan 10, 2023

abheesht17 Jan 10, 2023

mattdangerw Jan 10, 2023

mattdangerw left a comment

mattdangerw Jan 12, 2023

mattdangerw Jan 13, 2023

jbischof left a comment

jbischof Jan 13, 2023

abheesht17 Jan 14, 2023

abheesht17 Jan 14, 2023

jbischof Jan 14, 2023

abheesht17 Jan 14, 2023

jbischof Jan 14, 2023

jbischof left a comment



		def fnet_kernel_initializer(mode="fnet_default", **kwargs):
		if mode == "fnet_default":

Add FNet Backbone #643

Add FNet Backbone #643

Conversation

abheesht17 commented Jan 7, 2023

Choose a reason for hiding this comment

abheesht17 Jan 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abheesht17 Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

abheesht17 Jan 10, 2023 •

edited

Loading

abheesht17 Jan 11, 2023 •

edited

Loading