-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different AMP & buffer configurations in one experiment, fix minor bugs #389
Conversation
|
||
@torch.no_grad() | ||
def apply_accumulated_grads_(self, scale_by: Optional[float] = None): | ||
if self.reuse_grad_buffers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This previously caused a bug where reuse=True peers would not be scaled by scale_by
. As a result, if there was a mix of reuse=True and reuse=False peers, reuse=True would have larger gradients and dominate the reuse=False peers.
self._grads = [ | ||
torch.zeros_like(grad, device=self.accumulate_grads_on) for grad in self._grad_buffers() | ||
] | ||
yield from self._grads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wold actually an error with reuse_grad_buffers=True, but it worked because noone asked for more than len(grad_buffers) elements
Codecov Report
@@ Coverage Diff @@
## master #389 +/- ##
==========================================
- Coverage 84.11% 83.37% -0.75%
==========================================
Files 70 71 +1
Lines 6423 6497 +74
==========================================
+ Hits 5403 5417 +14
- Misses 1020 1080 +60
|
def _unscale_grads_( | ||
self, optimizer: Optimizer, inv_scale: torch.Tensor, found_inf: torch.Tensor, allow_fp16: bool | ||
) -> Dict[torch.device, torch.Tensor]: | ||
return super()._unscale_grads_(optimizer, inv_scale, found_inf, allow_fp16=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it ignore the allow_fp16
value always setting it to True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's the same trick that fairscale uses to allow training without master fp32 weights
https://github.com/facebookresearch/fairscale/blob/main/fairscale/optim/grad_scaler.py
(got referred there by @TimDettmers)
Added a quick comment explaining that
hivemind/optim/collaborative.py
Outdated
elif self._grads is None: | ||
with torch.no_grad(): | ||
return | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (suggestion): Since the if
branch ends with return
, we can remove else:
and one indent level for the remaining code. This may make it look simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I respectfully disagree here: it does reduce indentation, but:
- the code in the else clause is quite simple as it is
- crucially, last time the code was wrong because i've accidentally removed return and didn't think about the ramifications
If you do insist, please state that explicitly, i'll remove the else clause anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that else:
after return
is obviously redundant and seems like a bug, so I insist on choosing one of the two options:
return
withoutelse:
(preferred since it implies less indentation, more clarity)else:
withoutreturn
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
New features:
The new behavior of CollaborativeOptimizer with fp16 is: