Implement weight as part of the allreduce protocol, not matchmaking #384

justheuristic · 2021-09-18T22:07:03Z

This PR allows specifying allreduce weights in AllReduceRunner, instead of gathering them during matchmaking.

This will allow peers to use their actual batch size in both DPU and advance matchmaking (aka @yhn112 -style matchmaking)

[WIP] implement advance matchmaking as a working example

codecov · 2021-09-18T22:10:36Z

Codecov Report

Merging #384 (2c28b82) into master (d809e30) will decrease coverage by 0.02%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master     #384      +/-   ##
==========================================
- Coverage   84.04%   84.02%   -0.03%     
==========================================
  Files          70       70              
  Lines        6426     6423       -3     
==========================================
- Hits         5401     5397       -4     
- Misses       1025     1026       +1

Impacted Files	Coverage Δ
hivemind/averaging/allreduce.py	`76.77% <90.90%> (-0.65%)`	⬇️
hivemind/averaging/averager.py	`86.23% <100.00%> (ø)`
hivemind/averaging/partition.py	`98.01% <100.00%> (-0.04%)`	⬇️

justheuristic · 2021-09-18T22:13:15Z

hivemind/averaging/allreduce.py

@@ -37,13 +38,12 @@ class AllReduceRunner(ServicerBase):
    :param prefix: namespace for servicer's RPCs (typically, equal to prefix for group keys)
    :param group_id: unique identifier of this specific all-reduce run
    :param tensors: local tensors that should be averaged with groupmates
-    :param tensors: local tensors that should be averaged with groupmates


yeah, that was in the master code :)

borzunov

I've left one comment, everything else is good :)

borzunov · 2021-09-20T15:49:47Z

hivemind/averaging/allreduce.py

@@ -180,9 +182,10 @@ async def _generate_input_for_peer(self, peer_index: int) -> AsyncIterator[avera
            code=averaging_pb2.PART_FOR_AVERAGING,
            group_id=self.group_id,
            tensor_part=first_part,
+            metadata=self._weight_binary,


Can we create a new double weight; field instead and avoid encoding the weight to the binary format manually?

hivemind/proto/averaging.proto

Co-authored-by: Alexander Borzunov <[email protected]>

implement parts as part of the allreduce protocol, not matchmaking

d05ae40

justheuristic requested a review from borzunov September 18, 2021 22:07

as long as its black

a7492ff

justheuristic changed the title ~~[DO NOT MERGE YET] implement parts as part of the allreduce protocol, not matchmaking~~ Implement weight as part of the allreduce protocol, not matchmaking Sep 18, 2021

as long as its black

effa9be

justheuristic commented Sep 18, 2021

View reviewed changes

Merge branch 'master' into allreduce_weights

3a3da4a

borzunov requested changes Sep 20, 2021

View reviewed changes

borzunov and others added 3 commits September 24, 2021 16:57

review

e9a5c9a

double trouble

c6fa324

Merge branch 'master' into allreduce_weights

12e790c

justheuristic requested a review from borzunov September 24, 2021 14:31

borzunov approved these changes Sep 24, 2021

View reviewed changes

hivemind/proto/averaging.proto Outdated Show resolved Hide resolved

Update hivemind/proto/averaging.proto

2c28b82

Co-authored-by: Alexander Borzunov <[email protected]>

justheuristic merged commit 4a9bc92 into master Sep 24, 2021

justheuristic deleted the allreduce_weights branch September 24, 2021 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement weight as part of the allreduce protocol, not matchmaking #384

Implement weight as part of the allreduce protocol, not matchmaking #384

justheuristic commented Sep 18, 2021

codecov bot commented Sep 18, 2021 •

edited

Loading

justheuristic Sep 18, 2021

borzunov left a comment

borzunov Sep 20, 2021

justheuristic Sep 24, 2021

Implement weight as part of the allreduce protocol, not matchmaking #384

Implement weight as part of the allreduce protocol, not matchmaking #384

Conversation

justheuristic commented Sep 18, 2021

codecov bot commented Sep 18, 2021 • edited Loading

Codecov Report

justheuristic Sep 18, 2021

Choose a reason for hiding this comment

borzunov left a comment

Choose a reason for hiding this comment

borzunov Sep 20, 2021

Choose a reason for hiding this comment

justheuristic Sep 24, 2021

Choose a reason for hiding this comment

codecov bot commented Sep 18, 2021 •

edited

Loading