-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hotfix] ddp + manual_optimisation #4976
Conversation
…ng manual optimization
This reverts commit ccca6b6
…htning/pytorch-lightning into bug/fixfix_ddp_manual
Hello @tchaton! Thanks for updating this PR.
Comment last updated at 2020-12-07 18:24:01 UTC |
Codecov Report
@@ Coverage Diff @@
## master #4976 +/- ##
======================================
Coverage 93% 93%
======================================
Files 130 130
Lines 9527 9547 +20
======================================
+ Hits 8843 8871 +28
+ Misses 684 676 -8 |
…htning/pytorch-lightning into bug/fixfix_ddp_manual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add return types to the newly added methods?
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this over the line @tchaton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests passes but loss is NaN, please check again the gradient flow in the model.
make_manual_backward(loss_ones_gen, opt_dis) | ||
|
||
# this will accumulate gradients for 2 batches and then call opt_gen.step() | ||
opt_gen.step(closure=gen_closure, make_optimizer_step=batch_idx % 2 == 0, optim='sgd') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt_gen.step(closure=gen_closure, make_optimizer_step=batch_idx % 2 == 0, optim='sgd') | |
opt_gen.step(closure=gen_closure, make_optimizer_step=(batch_idx % 2 == 0), optim='sgd') |
what is this extra argument optim="sgd"
? This doesn't look right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an arbitrary arguments. Just to make sure it is properly given to optimizer.step(*args, **kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, understood. Could we name it something else, like **extra_kwargs or similar so it is not misleading to be a required arg?
Thanks for the review @awaelchli, I think we can reduce the priority and focus on getting this right! Seems like we need to test a lot more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the already mentioned issues, I really like it. So no other requests from my side.
Just to put my thoughts for discussion:
We may not even need to worry about modifying the code too much other than defining the additional reduce hook function, for a first pass it would be good to check if calling |
In manual optimization, we don't return the loss. Therefore, we get a loss=nan in the prob_bar. Need to take care of this. When calling self.log("train_loss", loss) it is fine :) |
…htning/pytorch-lightning into bug/fixfix_ddp_manual
@awaelchli After offline discussion I think we should merge this PR as it stands, and create a followup issue to refactor the fix to be less intrusive whilst trying to sync upstream with the native DDP implementation. I think over the interim it's better since currently DDP is broken with manual optim. thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will remove the request for changes
hi, when can we turn off find_unused_parameters as False to use DDP when using manual_optimization? |
If your model/optimization needs it in pytorch, it will need it in Lightning and vice versa. |
What does this PR do?
Fixes #4953
SeanNaren EDIT:
There are two parts to this fix:
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃