Mlm mask generator docstring adding example #916

abuelnasr0 · 2023-03-24T17:41:51Z

for the contribution welcome issue #120 I added a simple example to show the best practice of using MLMMaskGenerator to train a language model

Prepare data, we can generate random strings or load some small datasets from TFDS (tensorflow datasets).
Instantiate a tokenizer, and call tokenize on the data.
Apply MLMMaskGenerator on the tokenized data.
Feed the masked data to a dummy MLM model by calling model.fit().

link to google colab with the example:
https://colab.research.google.com/drive/19GmGy4nCe2-AgRsHAqdVIMmjxR_WxSkB#scrollTo=YdPZIknsopS_

waiting for feedback.
thank you.

chenmoneygithub

Thanks for the PR!

keras_nlp/layers/masked_lm_mask_generator.py

mattdangerw · 2023-03-30T20:22:57Z

keras_nlp/layers/masked_lm_mask_generator.py

+    An end-to-end masked language model training using masked language mask
+    generator.
+    ```python
+    train_data = tf.constant([


Some high level comments...

I think showing a more fleshed out usage is OK to do here, but I would scale it down significantly from what we are showing here.

We do not want to show keras_nlp.models usage here in our lower level layers. And we probably do not want to show fit() either. We should generally try to keep our usages fairly local to the symbols in question, so we avoid a ton of cross cutting dependencies when we update a symbol usage.

I might instead just show something that looks closer to a real world example, but is still just focused on the layer itself.

pad_id, cls_id, sep_id, mask_id = 0, 1, 2, 3 batch = [ [cls_id, 4, 5, 6, sep_id, 7, 8, sep_id, pad_id, pad_id], [cls_id, 4, 5, sep_id, 6, 7, 8, 9, sep_id, pad_id], ] # the rest of the MaskedLMMaskGenerator and invocation

like last push?

Ah yes perhaps, sorry about that!

This may be me needing to sync up with @chenmoneygithub more. I do not think we want the examples here to depend on a totally separate part of our API, I will talk with @chenmoneygithub about it.

Just talked with @chenmoneygithub, apologies for all the confusion here. We are scaling up our contributors, so our wires may get crossed a few times :)

But this latest version looks good to me! Thanks very much!

No problem. Writing docstrings and changes help me to get familiar with the library more. Thanks for the feedback.
One thing: should I run that /gcbrun or a mentor should run it ?

@abuelnasr0 good question. We are playing with different options here, but /gcbrun would need to be added to project by a maintainer. Unclear if we are going to require it long term, but no action should be needed from you!

mattdangerw

This LGTM!

chenmoneygithub

Thank you!

abuelnasr0 added 3 commits March 24, 2023 19:34

Add example to MLMMaskGenerator to show best Practice

d14d828

Add example to MLMMaskGenerator to show best Practice: format the code

0d621e4

Add example to masked language model generater: Edit

856977a

mattdangerw requested a review from chenmoneygithub March 29, 2023 00:09

mattdangerw assigned chenmoneygithub Mar 29, 2023

chenmoneygithub suggested changes Mar 29, 2023

View reviewed changes

Add Bert masked language model to the example

2e88a24

mattdangerw reviewed Mar 30, 2023

View reviewed changes

abuelnasr0 added 2 commits March 30, 2023 23:21

Add simple example: masking a batch that contains special tokens

b715978

Format the code

4a268d9

mattdangerw approved these changes Mar 31, 2023

View reviewed changes

abuelnasr0 requested a review from chenmoneygithub March 31, 2023 01:27

chenmoneygithub approved these changes Mar 31, 2023

View reviewed changes

chenmoneygithub merged commit 1795d10 into keras-team:master Mar 31, 2023

abuelnasr0 deleted the MLMMaskGenerator-docstring-adding-example branch March 31, 2023 02:02

abuelnasr0 restored the MLMMaskGenerator-docstring-adding-example branch March 31, 2023 02:02

abuelnasr0 deleted the MLMMaskGenerator-docstring-adding-example branch December 28, 2023 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mlm mask generator docstring adding example #916

Mlm mask generator docstring adding example #916

abuelnasr0 commented Mar 24, 2023 •

edited

Loading

chenmoneygithub left a comment

mattdangerw Mar 30, 2023

abuelnasr0 Mar 30, 2023

mattdangerw Mar 30, 2023

mattdangerw Mar 31, 2023

abuelnasr0 Mar 31, 2023

mattdangerw Mar 31, 2023

mattdangerw left a comment

chenmoneygithub left a comment

Mlm mask generator docstring adding example #916

Mlm mask generator docstring adding example #916

Conversation

abuelnasr0 commented Mar 24, 2023 • edited Loading

chenmoneygithub left a comment

Choose a reason for hiding this comment

mattdangerw Mar 30, 2023

Choose a reason for hiding this comment

abuelnasr0 Mar 30, 2023

Choose a reason for hiding this comment

mattdangerw Mar 30, 2023

Choose a reason for hiding this comment

mattdangerw Mar 31, 2023

Choose a reason for hiding this comment

abuelnasr0 Mar 31, 2023

Choose a reason for hiding this comment

mattdangerw Mar 31, 2023

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

chenmoneygithub left a comment

Choose a reason for hiding this comment

abuelnasr0 commented Mar 24, 2023 •

edited

Loading