Default compilation for Albert, Distilbert, Roberta MaskedLM #833

shivance · 2023-03-11T06:33:57Z

Partially fixes #830

shivance · 2023-03-11T06:36:24Z

Still working on experimentation with LRs and convergence.

mattdangerw · 2023-03-13T19:42:50Z

@shivance thanks! That was going to be my first question.

Code looks good though!

mattdangerw · 2023-03-14T03:50:26Z

keras_nlp/models/albert/albert_masked_lm.py

+        self.compile(
+            loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+            optimizer=keras.optimizers.Adam(5e-5),
+            metrics=keras.metrics.SparseCategoricalAccuracy(),


Actually, I think we want weighted_metrics here as we do pass sample weights for this task.

shivance · 2023-03-14T03:52:48Z

Hey @mattdangerw , could you please once review keras-io semantic similarity tutorial as well ! Thanks

shivance · 2023-03-16T10:33:06Z

Notebook : https://www.kaggle.com/code/shivanshuman/does-tensorflow-task-converge

Roberta

Having hard time with roberta, even with batch size of 16, repeatedly getting OOM. Still for three different LR, lr=5e-5 , outperforms other two.

Albert :

Something is not quite right here.
#831 mentioned the same issue regarding albert classifiers.

DistilBert

Does slightly better at 1e-4 than 5e-5

mattdangerw · 2023-03-17T03:44:30Z

Thank you! This is super helpful analysis, super appreciate it!

Maybe let's go with 5e-5 everywhere? As performance is almost the same on distilbert. Lower rates are probably "more conservative" in terms of instability. And having the same number will be simpler for now.

shivance · 2023-03-17T03:51:21Z

Sounds good, @mattdangerw, made the change.

shivance · 2023-03-17T03:55:47Z

Also, the run of Albert with AdamW just finished,

AdamW is still in experimental api of tensorflow, so shall I go ahead and put Adam Optimizer for albert?

mattdangerw · 2023-03-17T19:35:28Z

AdamW is still in experimental api of tensorflow, so shall I go ahead and put Adam Optimizer for albert?

Great question. AdamW is flat broken on certain versions of TensorFlow (I believe 2.9), so we have been avoiding it for now. Nothing worse than trying to run our library against tf 2.9 and getting now error messages and nothing converging (which is what would happen).

But maybe we can find a way to either check the tf version before we do the auto compilation or something? A little gross, but helpful to users.

Anyway, what you have looks perfect for now. We can handle the AdamW question down the road.

mattdangerw

Great work! Thanks so much for doing all the extra testing behind the scenes here.

…eam#833) * adding compilation defaults * Update albert_masked_lm.py * Update roberta_masked_lm.py * Update distil_bert_masked_lm.py * Update distil_bert_masked_lm.py * Update distil_bert_masked_lm.py

adding compilation defaults

ef0891e

mattdangerw reviewed Mar 14, 2023

View reviewed changes

shivance mentioned this pull request Mar 15, 2023

Add compilation defaults for the Fnet MaskedLM task model #834

Merged

Update albert_masked_lm.py

44d79df

shivance mentioned this pull request Mar 16, 2023

Albert fine tuning does not always converge #831

Closed

shivance added 3 commits March 16, 2023 03:43

Update roberta_masked_lm.py

6eec0ca

Update distil_bert_masked_lm.py

2ba331f

Update distil_bert_masked_lm.py

b60634f

shivance requested a review from mattdangerw March 16, 2023 16:20

Update distil_bert_masked_lm.py

bde4fe1

mattdangerw approved these changes Mar 17, 2023

View reviewed changes

mattdangerw merged commit cfe1fca into keras-team:master Mar 17, 2023

shivance deleted the compilation-defaults branch April 16, 2023 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default compilation for Albert, Distilbert, Roberta MaskedLM #833

Default compilation for Albert, Distilbert, Roberta MaskedLM #833

shivance commented Mar 11, 2023

shivance commented Mar 11, 2023

mattdangerw commented Mar 13, 2023

mattdangerw Mar 14, 2023 •

edited

Loading

shivance Mar 14, 2023

shivance commented Mar 14, 2023

shivance commented Mar 16, 2023 •

edited

Loading

mattdangerw commented Mar 17, 2023

shivance commented Mar 17, 2023

shivance commented Mar 17, 2023

mattdangerw commented Mar 17, 2023

mattdangerw left a comment

Default compilation for Albert, Distilbert, Roberta MaskedLM #833

Default compilation for Albert, Distilbert, Roberta MaskedLM #833

Conversation

shivance commented Mar 11, 2023

shivance commented Mar 11, 2023

mattdangerw commented Mar 13, 2023

mattdangerw Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

shivance Mar 14, 2023

Choose a reason for hiding this comment

shivance commented Mar 14, 2023

shivance commented Mar 16, 2023 • edited Loading

Roberta

Albert :

DistilBert

mattdangerw commented Mar 17, 2023

shivance commented Mar 17, 2023

shivance commented Mar 17, 2023

mattdangerw commented Mar 17, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Mar 14, 2023 •

edited

Loading

shivance commented Mar 16, 2023 •

edited

Loading