You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main quest is to overhaul this method to work via P2P
create a test where two AllreduceRunners are manually created with two connected hivemind.P2P instances
call rpc_aggregate_part in allreduce patern, similarly to this test
test that it still works with large tensors (>10k values)
Since gRPC is not optimized for large messages, we slice these tensors into parts using split_for_streaming/combine_from_streaming. That said, P2P transport may not need this partitioning.
The text was updated successfully, but these errors were encountered:
[this PR should be merged into libp2p branch]
Right now, the single heaviest communication pattern in hivemind is averaging model parameters with peers. This is currently done via AllreduceRunner in this RPC:
https://github.com/learning-at-home/hivemind/blob/master/hivemind/client/averaging/allreduce.py#L204
... called via this request method:
https://github.com/learning-at-home/hivemind/blob/master/hivemind/client/averaging/allreduce.py#L137
The main quest is to overhaul this method to work via P2P
Since gRPC is not optimized for large messages, we slice these tensors into parts using split_for_streaming/combine_from_streaming. That said, P2P transport may not need this partitioning.
The text was updated successfully, but these errors were encountered: