-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make both torch.amp
and apex.amp
available as backend for mixed precision training
#91
Make both torch.amp
and apex.amp
available as backend for mixed precision training
#91
Conversation
User can choose the backend for mixed precision training by specifying keyword argument `amp_backend` to `LRFinder` now.
…mulation is enabled Since further advanced tricks for gradient accumulation can be done by overriding `LRFinder._train_batch()`, it seems it's not necessary to do it by our own. Also, removing it can make less surprises once there is overflow while training in lower precision. See also this section in `apex` document: https://nvidia.github.io/apex/advanced.html#gradient-accumulation-across-iterations
According to this issue actions/setup-python#544, it seems supports for Python<3.6 are dropped after the base image is updated to Ubuntu 22.04. |
@NaleRaphael, thanks for the PR, looks good. I've updated the CI workflow, it should be functional again. You probably have to rebase this branch. |
Sure, wait for a moment and I'll update it. |
Sync up with master branch for up-to-date CI setttings.
updated: fix reference links By the way, there is one thing I forget to mention. Since the default value of the new keyword argument This might be a breaking change to user. However, current behavior seems more reasonable to me now. Because mixed precision training provided by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! Thanks @NaleRaphael
Hi @davidtvs, here is a summary of changes made in this PR for issue #67, #90.
Required changes to integrate
torch.amp
torch.cuda.amp.GradScaler
has to be passed intoLRFinder
, because it's will be used in the following stages:torch.amp.autocast()
need to be called in forward pass:Proposed changes
3 new keyword arguments are added to
LRFinder.__init__()
:amp_backend
: a string to select AMP backendamp_config
: a dict to store arguments required bytorch.amp.autocast()
grad_scaler
: atorch.cuda.amp.GradScaler
instance to be used inLRFinder._train_batch()
This should maximize the flexibility for user to control how AMP works with
LRFinder
.If there is a need to apply advanced tricks with
torch.amp
(e.g., for multi-GPUs/models/losses 1), it's still achievable by just overridingLRFinder._train_batch()
. So we can focus on the current implementation for gradient accumulation without worrying about other variants.Note
The new script
examples/mnist_with_amp.py
can be used to check the results produced by runningLRFinder
with different AMP backends. Here are the results produced on my machine with command$ python mnist_with_amp.py --batch_size=32 --tqdm --amp_backend=...
:apex.amp
torch.amp
* In these 3 figures, suggested LRs are all the same one: 2.42E-01
Package information:
2386a912164b0c5cfcd8be7a2b890fbac5607c82
to build. see also this comment)As always, feel free to let me know if there is anything can be improved.