Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default compilation for Albert, Distilbert, Roberta MaskedLM #833

Merged
merged 6 commits into from
Mar 17, 2023

Conversation

shivance
Copy link
Collaborator

Partially fixes #830

@shivance
Copy link
Collaborator Author

Still working on experimentation with LRs and convergence.

@mattdangerw
Copy link
Member

@shivance thanks! That was going to be my first question.

Code looks good though!

self.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.Adam(5e-5),
metrics=keras.metrics.SparseCategoricalAccuracy(),
Copy link
Member

@mattdangerw mattdangerw Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we want weighted_metrics here as we do pass sample weights for this task.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure Matt.

@shivance
Copy link
Collaborator Author

Hey @mattdangerw , could you please once review keras-io semantic similarity tutorial as well ! Thanks

@shivance
Copy link
Collaborator Author

shivance commented Mar 16, 2023

Notebook : https://www.kaggle.com/code/shivanshuman/does-tensorflow-task-converge

Roberta

Having hard time with roberta, even with batch size of 16, repeatedly getting OOM. Still for three different LR, lr=5e-5 , outperforms other two.

image

Albert :

Something is not quite right here.
#831 mentioned the same issue regarding albert classifiers.

image

DistilBert

Does slightly better at 1e-4 than 5e-5

image

@shivance shivance requested a review from mattdangerw March 16, 2023 16:20
@mattdangerw
Copy link
Member

Thank you! This is super helpful analysis, super appreciate it!

Maybe let's go with 5e-5 everywhere? As performance is almost the same on distilbert. Lower rates are probably "more conservative" in terms of instability. And having the same number will be simpler for now.

@shivance
Copy link
Collaborator Author

Sounds good, @mattdangerw, made the change.

@shivance
Copy link
Collaborator Author

Also, the run of Albert with AdamW just finished,

image

AdamW is still in experimental api of tensorflow, so shall I go ahead and put Adam Optimizer for albert?

@mattdangerw
Copy link
Member

AdamW is still in experimental api of tensorflow, so shall I go ahead and put Adam Optimizer for albert?

Great question. AdamW is flat broken on certain versions of TensorFlow (I believe 2.9), so we have been avoiding it for now. Nothing worse than trying to run our library against tf 2.9 and getting now error messages and nothing converging (which is what would happen).

But maybe we can find a way to either check the tf version before we do the auto compilation or something? A little gross, but helpful to users.

Anyway, what you have looks perfect for now. We can handle the AdamW question down the road.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks so much for doing all the extra testing behind the scenes here.

@mattdangerw mattdangerw merged commit cfe1fca into keras-team:master Mar 17, 2023
shivance added a commit to shivance/keras-nlp that referenced this pull request Mar 26, 2023
…eam#833)

* adding compilation defaults

* Update albert_masked_lm.py

* Update roberta_masked_lm.py

* Update distil_bert_masked_lm.py

* Update distil_bert_masked_lm.py

* Update distil_bert_masked_lm.py
@shivance shivance deleted the compilation-defaults branch April 16, 2023 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add compilation defaults for the MaskedLM task models
2 participants