Improved LAMB optimizer. #1334

rajanksin · 2020-03-18T16:59:54Z

Using Fused Adam with weight decay kernel.

Observed ~100ms per step savings while training BERT(large) on single GPU.

Results training Bert-Large Trainable params: 334,882,823 (execution time in seconds) :

	1xGPU/XLA/eager/10-epochs	1xGPU/XLA/non-eager/10-epochs	CPU/eager/1-epoch
PY LAMB	4936.97	4825.38	14165
CUSTOM LAMB	4332.60	4345.20	12249
Speedup	12.24%	9.95%	13.53%

Using Fused Adam with weight decay kernel.

bot-of-gabrieldemarmiesse · 2020-03-18T17:00:44Z

@junjiek

You are owner of some files modified in this pull request.
Would you kindly review the changes whenever you have the time to?
Thank you very much.

gabrieldemarmiesse · 2020-03-18T17:24:41Z

@spidydev thank you for the pull request! you might want to modify tools/install_so_files.py to fix the pytest error.

Also could you provide the speedup in percent of total time? For the CPU and the GPU version? With eager, no eager and XLA? See #1156

Would it be possible to keep the python version too? See how we do for activations. It's to be able to run the optimizer on TPUs and ROCm. It's also useful because when we compile things against a specific tensorflow version and we then cannot use it on any other tensorflow version (see #1317). For example, people compiling TF from source won't be able to use your implementation.
We should also test that the Cuda and C++ implementation gives the same numbers as the python one to ensure everything stays compatible going forward.

As you can see, using a custom op for something in tensorflow addons has a high UX and maintainance cost. We need to make sure the speedup is worth it before accepting to maintain a custom op.

rajanksin · 2020-03-18T17:51:10Z

@gabrieldemarmiesse Thanks for all the pointers. Will get back with more info/updates.

rajanksin · 2020-03-19T18:17:48Z

@gabrieldemarmiesse added some number in the description. Thnx !!

gabrieldemarmiesse · 2020-03-22T16:55:53Z

@tensorflow/sig-addons-maintainers are the speedup significant enough for merging? Even if we merge, is it worth it to have a C++ version? As only a minority of people in their sane mind would train a model on CPU and expect maximum speed.

@spidydev , we'll do a vote in private and then tell you if we think the speedup is worth it compared to the maintenance cost.

gabrieldemarmiesse · 2020-03-23T17:50:05Z

@spidydev , in the end, we'll keep the proposed CUDA implementation, but we'll prefer to keep the pure python implementation for CPU. @Squadrick proposed himself to review your pull request.

Squadrick · 2020-03-24T18:51:29Z

training_op_helpers.* aren't public in TF. Created a PR for making them public: tensorflow/tensorflow#37873.

This PR is blocked until that is resolved.

rajanksin · 2020-03-24T19:44:57Z

@gabrieldemarmiesse @Squadrick Quick question about the comment "we'll prefer to keep the pure python implementation for CPU". Can you give some pointers how do we do this ? AFAIK , if we do not register a CPU kernel, TF is going to throw error:"No kernel registered"

gabrieldemarmiesse · 2020-03-24T21:23:00Z

@spidydev , in theory, we'd like to do that but I have no idea how to do it in practice. I hope it's possible. We would need it too for activation functions.

bhack · 2021-03-19T12:57:53Z

@spidydev Thank for your contribution.

@tomerk As this optimizer is used by the Tensorflow model SIG, if they are not interested to review these changes I will close the PR.
It is 1 year old and we don't have currently the bandwidth to maintain another custom ops.

bhack · 2022-05-10T13:18:13Z

Thanks you for the contribution. I suggest to move this PR to https://github.com/keras-team/keras-nlp

Improved LAMB optimizer.

742fb0b

Using Fused Adam with weight decay kernel.

rajanksin requested review from facaiy, Squadrick and WindQAQ as code owners March 18, 2020 16:59

boring-cyborg bot added custom-ops optimizers labels Mar 18, 2020

googlebot added the cla: yes label Mar 18, 2020

rajanksin added 3 commits March 19, 2020 11:28

Added TF_ADDONS_PY_OPS check to keep the python implementation.

cebc9b0

lint fix

cb173fc

lint fix

d54bbea

gabrieldemarmiesse assigned Squadrick Mar 23, 2020

Squadrick added the blocked Pending something elses completion label Mar 24, 2020

Squadrick mentioned this pull request Mar 24, 2020

Move training_op_helpers to framework tensorflow/tensorflow#37873

Closed

bhack added the ecosystem-review label Mar 19, 2021

seanpmorgan added the backlog-grooming-to-close label Dec 16, 2021

bhack closed this May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved LAMB optimizer. #1334

Improved LAMB optimizer. #1334

rajanksin commented Mar 18, 2020 •

edited

Loading

bot-of-gabrieldemarmiesse commented Mar 18, 2020

gabrieldemarmiesse commented Mar 18, 2020 •

edited

Loading

rajanksin commented Mar 18, 2020

rajanksin commented Mar 19, 2020

gabrieldemarmiesse commented Mar 22, 2020 •

edited

Loading

gabrieldemarmiesse commented Mar 23, 2020

Squadrick commented Mar 24, 2020

rajanksin commented Mar 24, 2020

gabrieldemarmiesse commented Mar 24, 2020

bhack commented Mar 19, 2021

bhack commented May 10, 2022

Improved LAMB optimizer. #1334

Improved LAMB optimizer. #1334

Conversation

rajanksin commented Mar 18, 2020 • edited Loading

bot-of-gabrieldemarmiesse commented Mar 18, 2020

gabrieldemarmiesse commented Mar 18, 2020 • edited Loading

rajanksin commented Mar 18, 2020

rajanksin commented Mar 19, 2020

gabrieldemarmiesse commented Mar 22, 2020 • edited Loading

gabrieldemarmiesse commented Mar 23, 2020

Squadrick commented Mar 24, 2020

rajanksin commented Mar 24, 2020

gabrieldemarmiesse commented Mar 24, 2020

bhack commented Mar 19, 2021

bhack commented May 10, 2022

rajanksin commented Mar 18, 2020 •

edited

Loading

gabrieldemarmiesse commented Mar 18, 2020 •

edited

Loading

gabrieldemarmiesse commented Mar 22, 2020 •

edited

Loading