-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default compilation for Albert, Distilbert, Roberta MaskedLM #833
Conversation
Still working on experimentation with LRs and convergence. |
@shivance thanks! That was going to be my first question. Code looks good though! |
self.compile( | ||
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), | ||
optimizer=keras.optimizers.Adam(5e-5), | ||
metrics=keras.metrics.SparseCategoricalAccuracy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we want weighted_metrics here as we do pass sample weights for this task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure Matt.
Hey @mattdangerw , could you please once review keras-io semantic similarity tutorial as well ! Thanks |
Notebook : https://www.kaggle.com/code/shivanshuman/does-tensorflow-task-converge RobertaHaving hard time with roberta, even with batch size of 16, repeatedly getting OOM. Still for three different LR, Albert :Something is not quite right here. DistilBertDoes slightly better at 1e-4 than 5e-5 |
Thank you! This is super helpful analysis, super appreciate it! Maybe let's go with 5e-5 everywhere? As performance is almost the same on distilbert. Lower rates are probably "more conservative" in terms of instability. And having the same number will be simpler for now. |
Sounds good, @mattdangerw, made the change. |
Great question. But maybe we can find a way to either check the tf version before we do the auto compilation or something? A little gross, but helpful to users. Anyway, what you have looks perfect for now. We can handle the AdamW question down the road. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks so much for doing all the extra testing behind the scenes here.
…eam#833) * adding compilation defaults * Update albert_masked_lm.py * Update roberta_masked_lm.py * Update distil_bert_masked_lm.py * Update distil_bert_masked_lm.py * Update distil_bert_masked_lm.py
Partially fixes #830