RobertaMaskedLM task and preprocessor #653

mattdangerw · 2023-01-13T20:18:33Z

This is exposing a task model for a mask token prediction task. This is the task used to pre-train RoBERTa, and could be used for futher fine-tuning.

mattdangerw · 2023-01-13T21:17:56Z

Removing reviewers while I fix tests!

mattdangerw · 2023-01-13T23:27:51Z

Ok! This is ready for review again.

There is still a test failure related to random seeds on tf.nightly I will have to look into. If we don't think we can fully control randomness deterministically across tf versions, I can just re-write the asserts to be a little more general for preprocessing.

jbischof

This is great, left some initial comments.

jbischof · 2023-01-15T23:19:29Z

keras_nlp/models/roberta/roberta_masked_lm.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""RoBERTa classification model."""


Update docstring

It doesn't seem updated?

Am I missing something?

Oops sorry! Dunno how I spaced on this!

keras_nlp/models/roberta/roberta_masked_lm.py

keras_nlp/models/roberta/roberta_masked_lm_preprocessor.py

keras_nlp/models/roberta/roberta_multi_segment_packer.py

keras_nlp/models/roberta/roberta_tokenizer.py

chenmoneygithub

Overall looks good!

keras_nlp/models/roberta/roberta_masked_lm.py

keras_nlp/models/roberta/roberta_masked_lm_preprocessor.py

chenmoneygithub · 2023-01-18T00:11:47Z

keras_nlp/models/roberta/roberta_masked_lm_preprocessor.py

+        tokenizer=tokenizer,
+        sequence_length=20,
+    )
+    preprocessor(" quick fox quick fox")


Shall we remove the leading space to keep the consistency of inputs?

keras_nlp/models/roberta/roberta_masked_lm_preprocessor.py

keras_nlp/models/roberta/roberta_preprocessor.py

jbischof

Adding some more comments. Haven't looked at the tests yet.

jbischof · 2023-01-21T16:31:08Z

keras_nlp/models/roberta/roberta_masked_lm.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""RoBERTa classification model."""


It doesn't seem updated?

keras_nlp/models/roberta/roberta_masked_lm.py

keras_nlp/models/roberta/roberta_masked_lm_preprocessor.py

keras_nlp/models/roberta/roberta_multi_segment_packer.py

jbischof

It seems basically ready to me. Have you played around with this in colab? How is the UX?

jbischof · 2023-01-27T22:49:15Z

keras_nlp/models/roberta/roberta_masked_lm.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""RoBERTa classification model."""


Am I missing something?

keras_nlp/models/roberta/roberta_masked_lm.py

keras_nlp/models/roberta/roberta_masked_lm_preprocessor.py

keras_nlp/models/roberta/roberta_preprocessor.py

keras_nlp/models/roberta/roberta_masked_lm.py

mattdangerw · 2023-01-28T01:01:20Z

@jbischof

It seems basically ready to me. Have you played around with this in colab? How is the UX?

Heh UX is in the eye of the beholder, but in my biased view, this is correct for where we are at. Here's a quick colab ->
https://colab.research.google.com/gist/mattdangerw/fead93c20342c70e438a253f392bd425/roberta-mlm.ipynb

The major usability gap I see is one that stands for causal learning as well. There is no way to do either of these with densely packed, fixed sequence length windowing, which is how this is done for both GPT2 and RoBERTa pretraining. I can open an issue next week with some musing and possible approaches, but let's not solve that on this PR.

jbischof · 2023-01-28T15:51:34Z

Yes @mattdangerw please open an issue for data streaming! The code is only a toy until we can replicate the behavior and performance of the original pretraining.

mattdangerw · 2023-01-28T23:01:55Z

Yes @mattdangerw please open an issue for data streaming!

Not quite data streaming, data windowing. And I don't think the workflow shown here is toy, it just suitable to some datasets and problems.

Put shortly, an offering that only did densely packed sequence windowing would be incomplete. And a offering that only did padded windowing is also incomplete. We only have the latter right now.

But all of this is "streaming"--this boils down to options in how the data preprocessing stream should function. Definitely look at the colab if you haven't yet!

At the meta level--agreed that replicating pre-training is a good goal, but let's work to that incrementally!

mattdangerw · 2023-01-30T19:04:25Z

I've opened #701 to cover the discussion above. This is an important question we need to solve at the library level I think, but let's not solve on this PR.

There are a lot of open question to tackle there. And it may be that what we have here is sufficient, and what we need is simply a recipe for pretraining, where most of the preprocessing happens in a separate job entirely.

jbischof

Thank you, glad to see our task models expanding!!!

The test asserting random output seems a bit wonky, I would recommended mocking instead.

jbischof · 2023-01-31T00:47:01Z

keras_nlp/models/roberta/roberta_masked_lm_preprocessor_test.py

+            mask_selection_length=4,
+            sequence_length=12,
+        )
+        keras.utils.set_random_seed(42)


Is it good practice to assert random output as a function of a random seed? My understanding is that standard process is to mock the function with random output and then just have an overall integration test.

Let me poke around. The thing that generates the random output is kinda complex down in MaskedLMMaskGenerator, and when mapping with tf.data the whole call graph will be compiled. Mocking might be more trouble than it's worth, and only apply in the note compiled case.

In our tests there we just do a lot of shape assertions, and don't assert the exact structure. Maybe that is the move.

Regardless, I am down to remove the random seed setting.

OK, got some more deterministic seedless testing in. Did not go with the mock as I think that would have ended up being pretty fragile.

I also exposed a few more of our underlying "mask generator" options of the preprocessing layer. I think they will be useful to have (they certainly are for these tests).

jbischof

My apologies, I didn't mean to pollute the API just for testing! I'd rather just have an integration test in the short term than change the API.

mattdangerw · 2023-02-01T00:11:18Z

My apologies, I didn't mean to pollute the API just for testing! I'd rather just have an integration test in the short term than change the API.

I do think it probably makes sense to add these options. It is very reasonable to want to only replace the mask with a mask token (rather than 80% mask token, 10% random token, 10% same token done in original BERT and RoBERTa I believe).

I had been on the fence about adding them, was going to wait for a contributor to ask for it. But after I was noticing how it simplified testing, I am down to add them in.

mattdangerw · 2023-02-01T00:19:02Z

Fairseq for example has these -> https://github.com/facebookresearch/fairseq/blob/b5a039c292facba9c73f59ff34621ec131d82341/fairseq/data/legacy/masked_lm_dataset.py#L46-L50

jbischof · 2023-02-01T00:56:53Z

Apologies for misunderstanding, merge away!

This adds the underlying options for how masks are generated from the mask generator layer. This is turn allows us to write some tests for the preprocessor that are fully deterministic, while still testing the logic in the preprocessor layer itself.

mattdangerw requested review from chenmoneygithub and jbischof and removed request for jbischof and chenmoneygithub January 13, 2023 20:25

mattdangerw requested review from chenmoneygithub and jbischof January 13, 2023 23:23

jbischof reviewed Jan 15, 2023

View reviewed changes

chenmoneygithub suggested changes Jan 18, 2023

View reviewed changes

mattdangerw force-pushed the roberta-masked-lm branch from aea3d2b to dc62952 Compare January 20, 2023 05:05

jbischof suggested changes Jan 21, 2023

View reviewed changes

mattdangerw mentioned this pull request Jan 24, 2023

Add BartTokenizer and BART Presets #685

Merged

mattdangerw force-pushed the roberta-masked-lm branch 2 times, most recently from 8044017 to 88b5200 Compare January 25, 2023 21:31

jbischof reviewed Jan 27, 2023

View reviewed changes

mattdangerw force-pushed the roberta-masked-lm branch from 88b5200 to 270991a Compare January 28, 2023 01:02

mattdangerw mentioned this pull request Jan 28, 2023

Clean up optional dependencies listed in setup.py #687

Closed

chenmoneygithub approved these changes Jan 30, 2023

View reviewed changes

jbischof approved these changes Jan 31, 2023

View reviewed changes

jbischof reviewed Jan 31, 2023

View reviewed changes

mattdangerw force-pushed the roberta-masked-lm branch from 7264179 to b291e58 Compare February 1, 2023 02:57

Add a RoBERTa masked langauge model task

01a239c

mattdangerw added 5 commits February 1, 2023 09:57

Address comments

f6ea8e6

Fix test

0c07a46

Another test fix

d9f2ebe

fix self references

5fb5b8c

mattdangerw force-pushed the roberta-masked-lm branch from b291e58 to 5fb5b8c Compare February 1, 2023 17:57

mattdangerw merged commit 9a9be88 into keras-team:master Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RobertaMaskedLM task and preprocessor #653

RobertaMaskedLM task and preprocessor #653

mattdangerw commented Jan 13, 2023

mattdangerw commented Jan 13, 2023

mattdangerw commented Jan 13, 2023

jbischof left a comment

jbischof Jan 15, 2023

jbischof Jan 21, 2023

jbischof Jan 27, 2023

mattdangerw Jan 28, 2023 •

edited

Loading

chenmoneygithub left a comment

chenmoneygithub Jan 18, 2023

jbischof left a comment

jbischof Jan 21, 2023

jbischof left a comment

jbischof Jan 27, 2023

mattdangerw commented Jan 28, 2023 •

edited

Loading

jbischof commented Jan 28, 2023

mattdangerw commented Jan 28, 2023

mattdangerw commented Jan 30, 2023

jbischof left a comment

jbischof Jan 31, 2023

mattdangerw Jan 31, 2023

mattdangerw Jan 31, 2023

jbischof left a comment

mattdangerw commented Feb 1, 2023

mattdangerw commented Feb 1, 2023

jbischof commented Feb 1, 2023

RobertaMaskedLM task and preprocessor #653

RobertaMaskedLM task and preprocessor #653

Conversation

mattdangerw commented Jan 13, 2023

mattdangerw commented Jan 13, 2023

mattdangerw commented Jan 13, 2023

jbischof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw Jan 28, 2023 • edited Loading

Choose a reason for hiding this comment

chenmoneygithub left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw commented Jan 28, 2023 • edited Loading

jbischof commented Jan 28, 2023

mattdangerw commented Jan 28, 2023

mattdangerw commented Jan 30, 2023

jbischof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

mattdangerw commented Feb 1, 2023

mattdangerw commented Feb 1, 2023

jbischof commented Feb 1, 2023

mattdangerw Jan 28, 2023 •

edited

Loading

mattdangerw commented Jan 28, 2023 •

edited

Loading