Rework model docstrings for progressive disclosure of complexity for f_net #879

ADITYADAS1999 · 2023-03-19T08:21:58Z

Rework model docstrings for progressive disclosure of complexity #867

Make sure to update any "custom vocabulary" examples to match the model actual vocabulary type and special token requirements (varies per model).
Test out all docstring snippets!
Make sure to follow our code style guidelines re indentation etc.

mattdangerw

This needs another careful pass, may not have found all the issue. Make sure you have rewritten all the docstring for the model in question, and tested them out.

mattdangerw · 2023-03-21T00:43:27Z

keras_nlp/models/f_net/f_net_classifier.py

@@ -55,7 +55,7 @@ class FNetClassifier(Task):
            `None`, this model will not apply preprocessing, and inputs should
            be preprocessed before calling the model.

-    Example usage:


This looks like it is missing most of the content on the BERT classifier, may be worth another look.

mattdangerw · 2023-03-21T00:44:14Z

keras_nlp/models/f_net/f_net_masked_lm_preprocessor.py

+    second = tf.constant(["The fox tripped.", "Oh look, a whale."])
+    preprocessor((first, second))
+    ```
+    Mapping with `tf.data.Dataset`.


newline before this heading

mattdangerw · 2023-03-21T00:45:14Z

keras_nlp/models/f_net/f_net_preprocessor.py

+    preprocessor = keras_nlp.models.BertPreprocessor(tokenizer)
+    preprocessor("The quick brown fox jumped.")
+    ```
+    Mapping with `tf.data.Dataset`.


newline before

mattdangerw · 2023-03-21T00:49:40Z

keras_nlp/models/f_net/f_net_preprocessor.py

+    # Custom vocabulary.
+    vocab = ["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"]
+    vocab += ["The", "quick", "brown", "fox", "jumped", "."]
+    tokenizer = keras_nlp.models.BertTokenizer(vocabulary=vocab)


this still says Bert in a lot of places, and uses the wrong vocabulary type for fnet. We will need to show sentencepiece here, you can look around at other PRs for examples on showing that type of vocabulary.

this still says Bert in a lot of places, and uses the wrong vocabulary type for fnet. We will need to show sentencepiece here, you can look around at other PRs for examples on showing that type of vocabulary.

can you suggest some example like this, I can not find a proper one !

chenmoneygithub

Thanks for the PR! In general there are many examples erroring out, please test the modified examples. Thanks!

chenmoneygithub · 2023-03-28T01:04:04Z

keras_nlp/models/f_net/f_net_masked_lm_preprocessor.py

+    Mapping with `tf.data.Dataset`.
+    ```python
+    preprocessor = keras_nlp.models.FNetMaskedLMPreprocessor.from_preset(
+        "bert_base_en_uncased"


Fix the preset: f_net_base_en

chenmoneygithub · 2023-03-28T01:12:40Z

keras_nlp/models/f_net/f_net_masked_lm_preprocessor.py

    ```python
    # Load the preprocessor from a preset.
    preprocessor = keras_nlp.models.FNetMaskedLMPreprocessor.from_preset(
-        "f_net_base_en"
+        "f_net_base_en_uncased"


Fix the preset: f_net_base_en

chenmoneygithub · 2023-03-28T01:16:50Z

keras_nlp/models/f_net/f_net_masked_lm_preprocessor.py

-        user_defined_symbols="[MASK]",
+    # Map sentence pairs.
+    ds = tf.data.Dataset.from_tensor_slices((first, second))
+    # Watch out for tf.data's default unpacking of tuples here!


Not by this PR - I think it is worth calling out first and second will be concatenated if calling preprocessor in this way. Now the comment just says "watch out" without showing the output. Maybe we can add "sentence pairs are automatically packed before tokenization"? @mattdangerw thoughts on this?

Ah, that is not quite the issue here.

The fact that the outputs are concatenated is not that surprising. The fact the tf.data handles tuples specially is! Basically if you just called ds = ds.map(preprocessor) here, you would see your second input being passed as a label and not a feature. It's an annoying gotcha, but not our to solve I think.

It stems from the fact that these two calls are handled differently...

tf.data.Dataset.from_tensor_slices([[1, 2, 3], [1, 2, 3]]).map(lambda x: x) # OK tf.data.Dataset.from_tensor_slices(([1, 2, 3], [1, 2, 3])).map(lambda x: x) # ERROR

We can update this comment if we want, but I would not do it on this PR. I would do that on a separate PR, for all the models at once (so we don't forget to update this elsewhere).

I believe this should be fine, if we generate separate PR for different models at once

Yeah, the comment above was meant as an explainer. Let's stick to the language we have been using in other PRs verbatim for this PR.

chenmoneygithub · 2023-03-28T01:18:44Z

keras_nlp/models/f_net/f_net_preprocessor.py

    Examples:
+
+    Directly calling the layer on data.
    ```python
    tokenizer = keras_nlp.models.FNetTokenizer(proto="model.spm")


Change this to use from_preset()

chenmoneygithub · 2023-03-28T01:19:43Z

keras_nlp/models/f_net/f_net_preprocessor.py

+    # Custom vocabulary.
+    vocab = ["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"]
+    vocab += ["The", "quick", "brown", "fox", "jumped", "."]
+    tokenizer = keras_nlp.models.FNetTokenizer(vocabulary=vocab)


This won't work, FNetTokenizer needs SPE proto.

This won't work, FNetTokenizer needs SPE proto.

So, here we can remove this custom vocabulary ?

chenmoneygithub · 2023-03-28T01:19:55Z

keras_nlp/models/f_net/f_net_preprocessor.py

+    Mapping with `tf.data.Dataset`.
+    ```python
+    preprocessor = keras_nlp.models.FNetPreprocessor.from_preset(
+        "bert_base_en_uncased"


wrong preset here

chenmoneygithub · 2023-03-28T01:20:07Z

keras_nlp/models/f_net/f_net_tokenizer.py

-    tokenizer(["the quick brown fox", "the earth is round"])
+    # Unbatched input.
+    tokenizer = keras_nlp.models.FNetTokenizer.from_preset(
+        "bert_base_en_uncased",


wrong preset

chenmoneygithub · 2023-03-28T01:20:26Z

keras_nlp/models/f_net/f_net_tokenizer.py

+    # Custom vocabulary.
+    vocab = ["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"]
+    vocab += ["The", "quick", "brown", "fox", "jumped", "."]
+    tokenizer = keras_nlp.models.FNetTokenizer(vocabulary=vocab)


this won't work as well. We can just delete the custom vocab example

mattdangerw · 2023-04-01T02:42:56Z

/gcbrun

chenmoneygithub · 2023-04-04T20:24:26Z

@ADITYADAS1999 There are still some pending comments unaddressed, could you give me a ping after fixing them? Thanks!

ADITYADAS1999 · 2023-04-05T02:24:02Z

@ADITYADAS1999 There are still some pending comments unaddressed, could you give me a ping after fixing them? Thanks!

I try to fix them asap and inform 👍🏻

ADITYADAS1999 · 2023-04-06T08:13:00Z

hey @chenmoneygithub can you check the fixes now.

mattdangerw · 2023-04-12T17:08:27Z

/gcbrun

ADITYADAS1999 · 2023-04-14T06:20:58Z

Thanks for reviewing but the acceleration testing are still failing some reason.

ADITYADAS1999 added 14 commits March 19, 2023 08:49

Update f_net_backbone.py

704ccec

Update f_net_classifier.py

4052381

Update f_net_masked_lm.py

ee496e4

Update f_net_masked_lm_preprocessor.py

c6942d8

Update f_net_preprocessor.py

bbc1794

Update f_net_tokenizer.py

1443679

Update code format f_net_backbone

a85e7db

Update code format f_net_classifier

41c1dc0

Update code format f_net_masked_lm_preprocessor

a405f32

Update code format f_net_preprocessor

7139387

Update code format f_net_tokenizer

9c56a28

Add some necessary changes

0fe924d

Add some necessary changes

728012f

Update f_net_classifier.py

1e1f2d0

mattdangerw reviewed Mar 21, 2023

View reviewed changes

ADITYADAS1999 added 5 commits March 21, 2023 20:00

Add newline before this heading

1fefb8e

Add newline before this heading

d873c30

minor fixes

b4743f5

minor fixes

bc613cc

Merge branch 'keras-team:master' into my_third_branch

ed9bdc6

mattdangerw assigned chenmoneygithub Mar 22, 2023

ADITYADAS1999 requested a review from mattdangerw March 23, 2023 01:47

chenmoneygithub suggested changes Mar 28, 2023

View reviewed changes

ADITYADAS1999 added 7 commits March 28, 2023 09:15

Merge branch 'keras-team:master' into my_third_branch

f37e7e8

Fixes the wrong presets

3aa7fdb

minor fixes

3e05cf8

Remove custom vocab example

facbbc4

Merge branch 'keras-team:master' into my_third_branch

fdd7934

update code format

0f2cf90

update code format

a53de58

ADITYADAS1999 added 3 commits April 5, 2023 16:46

Merge branch 'keras-team:master' into my_third_branch

5fe0282

update wrong presets

00dd693

Remove custom vocab

438c41d

ADITYADAS1999 requested a review from chenmoneygithub April 7, 2023 12:56

Fixes

0686945

mattdangerw merged commit 1c9ab0b into keras-team:master Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework model docstrings for progressive disclosure of complexity for f_net #879

Rework model docstrings for progressive disclosure of complexity for f_net #879

ADITYADAS1999 commented Mar 19, 2023 •

edited

Loading

mattdangerw left a comment

mattdangerw Mar 21, 2023

mattdangerw Mar 21, 2023

mattdangerw Mar 21, 2023

mattdangerw Mar 21, 2023

ADITYADAS1999 Mar 22, 2023

chenmoneygithub left a comment

chenmoneygithub Mar 28, 2023

chenmoneygithub Mar 28, 2023

chenmoneygithub Mar 28, 2023

mattdangerw Apr 1, 2023

ADITYADAS1999 Apr 1, 2023

mattdangerw Apr 4, 2023

chenmoneygithub Mar 28, 2023

chenmoneygithub Mar 28, 2023

ADITYADAS1999 Apr 5, 2023 •

edited

Loading

chenmoneygithub Mar 28, 2023

chenmoneygithub Mar 28, 2023

chenmoneygithub Mar 28, 2023

mattdangerw commented Apr 1, 2023

chenmoneygithub commented Apr 4, 2023

ADITYADAS1999 commented Apr 5, 2023

ADITYADAS1999 commented Apr 6, 2023 •

edited

Loading

mattdangerw commented Apr 12, 2023

ADITYADAS1999 commented Apr 14, 2023

Rework model docstrings for progressive disclosure of complexity for f_net #879

Rework model docstrings for progressive disclosure of complexity for f_net #879

Conversation

ADITYADAS1999 commented Mar 19, 2023 • edited Loading

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenmoneygithub left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ADITYADAS1999 Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw commented Apr 1, 2023

chenmoneygithub commented Apr 4, 2023

ADITYADAS1999 commented Apr 5, 2023

ADITYADAS1999 commented Apr 6, 2023 • edited Loading

mattdangerw commented Apr 12, 2023

ADITYADAS1999 commented Apr 14, 2023

ADITYADAS1999 commented Mar 19, 2023 •

edited

Loading

ADITYADAS1999 Apr 5, 2023 •

edited

Loading

ADITYADAS1999 commented Apr 6, 2023 •

edited

Loading