Support different AMP & buffer configurations in one experiment, fix minor bugs #389

justheuristic · 2021-09-25T04:01:40Z

New features:

CollaborativeOptimizer can now combine fp16=True and reuse_grad_buffers=True with a special scaler
CollaborativeOptimizer peers with reuse_grad_buffers=True and reuse_grad_buffers=False can now co-exist
CollaborativeOptimizer peers with and without AMP can now co-exist

The new behavior of CollaborativeOptimizer with fp16 is:

grad_scaler=None: regular fp32 behavior
reuse_grad_buffers=False with GradScaler: works as usual, independently un-scales each tensor before accumulation, does not affect internal optimizer
reuse_grad_buffers=True with GradScaler: when calling scaler.step(opt), it will raise error and complain that it requires HivemindGradScaler
reuse_grad_buffers=False with HivemindGradScaler: applies unscale/update only around global optimizer step

justheuristic · 2021-09-25T04:02:48Z

hivemind/optim/collaborative.py


    @torch.no_grad()
    def apply_accumulated_grads_(self, scale_by: Optional[float] = None):
-        if self.reuse_grad_buffers:


This previously caused a bug where reuse=True peers would not be scaled by scale_by. As a result, if there was a mix of reuse=True and reuse=False peers, reuse=True would have larger gradients and dominate the reuse=False peers.

justheuristic · 2021-09-25T04:03:56Z

hivemind/optim/collaborative.py

-                self._grads = [
-                    torch.zeros_like(grad, device=self.accumulate_grads_on) for grad in self._grad_buffers()
-                ]
-        yield from self._grads


This wold actually an error with reuse_grad_buffers=True, but it worked because noone asked for more than len(grad_buffers) elements

hivemind/optim/collaborative.py

codecov · 2021-09-25T04:17:42Z

Codecov Report

Merging #389 (773a3bb) into master (4a9bc92) will decrease coverage by 0.74%.
The diff coverage is 23.65%.

@@            Coverage Diff             @@
##           master     #389      +/-   ##
==========================================
- Coverage   84.11%   83.37%   -0.75%     
==========================================
  Files          70       71       +1     
  Lines        6423     6497      +74     
==========================================
+ Hits         5403     5417      +14     
- Misses       1020     1080      +60

Impacted Files	Coverage Δ
hivemind/optim/collaborative.py	`23.77% <7.89%> (-1.32%)`	⬇️
hivemind/optim/grad_scaler.py	`33.33% <33.33%> (ø)`
hivemind/optim/__init__.py	`100.00% <100.00%> (ø)`
hivemind/dht/node.py	`91.44% <0.00%> (-1.19%)`	⬇️
hivemind/averaging/matchmaking.py	`83.75% <0.00%> (-0.32%)`	⬇️

hivemind/optim/grad_scaler.py

borzunov · 2021-09-27T16:08:15Z

hivemind/optim/grad_scaler.py

+    def _unscale_grads_(
+        self, optimizer: Optimizer, inv_scale: torch.Tensor, found_inf: torch.Tensor, allow_fp16: bool
+    ) -> Dict[torch.device, torch.Tensor]:
+        return super()._unscale_grads_(optimizer, inv_scale, found_inf, allow_fp16=True)


Why does it ignore the allow_fp16 value always setting it to True?

that's the same trick that fairscale uses to allow training without master fp32 weights
https://github.com/facebookresearch/fairscale/blob/main/fairscale/optim/grad_scaler.py
(got referred there by @TimDettmers)

Added a quick comment explaining that

hivemind/optim/collaborative.py

borzunov · 2021-09-27T16:24:30Z

hivemind/optim/collaborative.py

-        elif self._grads is None:
-            with torch.no_grad():
+            return
+        else:


nit (suggestion): Since the if branch ends with return, we can remove else: and one indent level for the remaining code. This may make it look simpler.

I respectfully disagree here: it does reduce indentation, but:

the code in the else clause is quite simple as it is

crucially, last time the code was wrong because i've accidentally removed return and didn't think about the ramifications

If you do insist, please state that explicitly, i'll remove the else clause anyway

I would say that else: after return is obviously redundant and seems like a bug, so I insist on choosing one of the two options:

return without else: (preferred since it implies less indentation, more clarity)

else: without return

hivemind/optim/grad_scaler.py

Co-authored-by: Alexander Borzunov <[email protected]>

lessons from sahajbert-xl

ea127d9

justheuristic requested review from yhn112 and borzunov September 25, 2021 04:01

justheuristic commented Sep 25, 2021

View reviewed changes

hivemind/optim/collaborative.py Show resolved Hide resolved

and now its black

512cadc

justheuristic changed the title ~~FP16 support, a few patches from sahajbert~~ Extended AMP support, a few patches from sahajbert Sep 25, 2021

justheuristic added 2 commits September 25, 2021 07:09

isort

33e4afe

isort

b5c06bd

justheuristic changed the title ~~Extended AMP support, a few patches from sahajbert~~ Extended AMP support with reuse_grad_buffers, a few patches from sahajbert Sep 25, 2021

mryab changed the title ~~Extended AMP support with reuse_grad_buffers, a few patches from sahajbert~~ Extend AMP support with reuse_grad_buffers, improve cross-device averaging Sep 26, 2021

mryab changed the title ~~Extend AMP support with reuse_grad_buffers, improve cross-device averaging~~ Improve AMP support with reuse_grad_buffers and cross-device averaging Sep 26, 2021

mryab changed the title ~~Improve AMP support with reuse_grad_buffers and cross-device averaging~~ Support different AMP configurations in one experiment Sep 26, 2021

justheuristic changed the title ~~Support different AMP configurations in one experiment~~ Support different AMP & reuse configurations in one experiment, fix minor bugs Sep 27, 2021

justheuristic changed the title ~~Support different AMP & reuse configurations in one experiment, fix minor bugs~~ Support different AMP & buffer configurations in one experiment, fix minor bugs Sep 27, 2021

borzunov requested changes Sep 27, 2021

View reviewed changes

justheuristic and others added 12 commits September 27, 2021 19:32

Update hivemind/optim/grad_scaler.py

daee08a

Co-authored-by: Alexander Borzunov <[email protected]>

Update hivemind/optim/grad_scaler.py

28cfc99

Co-authored-by: Alexander Borzunov <[email protected]>

Update hivemind/optim/collaborative.py

7134bd9

Co-authored-by: Alexander Borzunov <[email protected]>

Update hivemind/optim/collaborative.py

6387e7c

Co-authored-by: Alexander Borzunov <[email protected]>

Update hivemind/optim/collaborative.py

72edf02

Co-authored-by: Alexander Borzunov <[email protected]>

Update hivemind/optim/grad_scaler.py

4ed96d8

Co-authored-by: Alexander Borzunov <[email protected]>

Update hivemind/optim/grad_scaler.py

3f79f55

Co-authored-by: Alexander Borzunov <[email protected]>

explain why allow_fp16

56e9fa2

Merge remote-tracking branch 'origin/fp16' into fp16

c5a7355

explain why allow_fp16

d4493e8

Update hivemind/optim/grad_scaler.py

3288b57

Co-authored-by: Alexander Borzunov <[email protected]>

Update hivemind/optim/collaborative.py

2da4895

Co-authored-by: Alexander Borzunov <[email protected]>

justheuristic added 2 commits September 27, 2021 20:46

explain why allow_fp16

9718af3

extended docstr

e80c3f7

borzunov approved these changes Sep 27, 2021

View reviewed changes

TimDettmers and others added 2 commits September 28, 2021 00:27

review

1940447

review

773a3bb

justheuristic merged commit 1d862c9 into master Sep 28, 2021

justheuristic deleted the fp16 branch September 28, 2021 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support different AMP & buffer configurations in one experiment, fix minor bugs #389

Support different AMP & buffer configurations in one experiment, fix minor bugs #389

justheuristic commented Sep 25, 2021

justheuristic Sep 25, 2021 •

edited

Loading

justheuristic Sep 25, 2021

codecov bot commented Sep 25, 2021 •

edited

Loading

borzunov Sep 27, 2021

justheuristic Sep 27, 2021

borzunov Sep 27, 2021

justheuristic Sep 27, 2021

borzunov Sep 27, 2021

Support different AMP & buffer configurations in one experiment, fix minor bugs #389

Support different AMP & buffer configurations in one experiment, fix minor bugs #389

Conversation

justheuristic commented Sep 25, 2021

justheuristic Sep 25, 2021 • edited Loading

Choose a reason for hiding this comment

justheuristic Sep 25, 2021

Choose a reason for hiding this comment

codecov bot commented Sep 25, 2021 • edited Loading

Codecov Report

borzunov Sep 27, 2021

Choose a reason for hiding this comment

justheuristic Sep 27, 2021

Choose a reason for hiding this comment

borzunov Sep 27, 2021

Choose a reason for hiding this comment

justheuristic Sep 27, 2021

Choose a reason for hiding this comment

borzunov Sep 27, 2021

Choose a reason for hiding this comment

justheuristic Sep 25, 2021 •

edited

Loading

codecov bot commented Sep 25, 2021 •

edited

Loading