Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved LAMB optimizer. #1334

Closed
wants to merge 4 commits into from
Closed

Conversation

rajanksin
Copy link

@rajanksin rajanksin commented Mar 18, 2020

Using Fused Adam with weight decay kernel.

  • Observed ~100ms per step savings while training BERT(large) on single GPU.

Results training Bert-Large Trainable params: 334,882,823 (execution time in seconds) :

1xGPU/XLA/eager/10-epochs 1xGPU/XLA/non-eager/10-epochs CPU/eager/1-epoch
PY LAMB 4936.97 4825.38 14165
CUSTOM LAMB 4332.60 4345.20 12249
Speedup 12.24% 9.95% 13.53%

Using Fused Adam with weight decay kernel.
@bot-of-gabrieldemarmiesse

@junjiek

You are owner of some files modified in this pull request.
Would you kindly review the changes whenever you have the time to?
Thank you very much.

@gabrieldemarmiesse
Copy link
Member

gabrieldemarmiesse commented Mar 18, 2020

@spidydev thank you for the pull request! you might want to modify tools/install_so_files.py to fix the pytest error.

Also could you provide the speedup in percent of total time? For the CPU and the GPU version? With eager, no eager and XLA? See #1156

Would it be possible to keep the python version too? See how we do for activations. It's to be able to run the optimizer on TPUs and ROCm. It's also useful because when we compile things against a specific tensorflow version and we then cannot use it on any other tensorflow version (see #1317). For example, people compiling TF from source won't be able to use your implementation.
We should also test that the Cuda and C++ implementation gives the same numbers as the python one to ensure everything stays compatible going forward.

As you can see, using a custom op for something in tensorflow addons has a high UX and maintainance cost. We need to make sure the speedup is worth it before accepting to maintain a custom op.

@rajanksin
Copy link
Author

@gabrieldemarmiesse Thanks for all the pointers. Will get back with more info/updates.

@rajanksin
Copy link
Author

@gabrieldemarmiesse added some number in the description. Thnx !!

@gabrieldemarmiesse
Copy link
Member

gabrieldemarmiesse commented Mar 22, 2020

@tensorflow/sig-addons-maintainers are the speedup significant enough for merging? Even if we merge, is it worth it to have a C++ version? As only a minority of people in their sane mind would train a model on CPU and expect maximum speed.

@spidydev , we'll do a vote in private and then tell you if we think the speedup is worth it compared to the maintenance cost.

@gabrieldemarmiesse
Copy link
Member

@spidydev , in the end, we'll keep the proposed CUDA implementation, but we'll prefer to keep the pure python implementation for CPU. @Squadrick proposed himself to review your pull request.

@Squadrick
Copy link
Member

training_op_helpers.* aren't public in TF. Created a PR for making them public: tensorflow/tensorflow#37873.

This PR is blocked until that is resolved.

@Squadrick Squadrick added the blocked Pending something elses completion label Mar 24, 2020
@rajanksin
Copy link
Author

@gabrieldemarmiesse @Squadrick Quick question about the comment "we'll prefer to keep the pure python implementation for CPU". Can you give some pointers how do we do this ? AFAIK , if we do not register a CPU kernel, TF is going to throw error:"No kernel registered"

@gabrieldemarmiesse
Copy link
Member

@spidydev , in theory, we'd like to do that but I have no idea how to do it in practice. I hope it's possible. We would need it too for activation functions.

@bhack
Copy link
Contributor

bhack commented Mar 19, 2021

@spidydev Thank for your contribution.

@tomerk As this optimizer is used by the Tensorflow model SIG, if they are not interested to review these changes I will close the PR.
It is 1 year old and we don't have currently the bandwidth to maintain another custom ops.

@bhack
Copy link
Contributor

bhack commented May 10, 2022

Thanks you for the contribution. I suggest to move this PR to https://github.com/keras-team/keras-nlp

@bhack bhack closed this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants