Add CollaborativeOptimizer #215

leshanbog · 2021-04-08T07:17:51Z

This PR introduces necessary objects for collaborative training so that client code for participation is clean and concise (like default train loop).

WeightedAverager - DecentralizedAverager that averages trainable params or gradients with peer-wise weights
CollaborativeOptimizer - performs model updates after collaboratively accumulating a target (large) batch size across peers

…laborative_averager

justheuristic · 2021-04-08T12:12:26Z

[example usage]

import time, socket
import torch, torch.nn as nn
import hivemind

with socket.socket() as sock:
    coordinator_exists = sock.connect_ex(("127.0.0.1", 1337)) == 0
if not coordinator_exists:
    dht = hivemind.DHT(listen_on='127.0.0.1:1337', start=True),
    
model = nn.Sequential(nn.Linear(32, 16), nn.ReLU(), nn.Linear(16, 32))

opt = hivemind.CollaborativeOptimizer(
    opt=torch.optim.Adam(model.parameters()),
    dht=hivemind.DHT(initial_peers=['127.0.0.1:1337'], start=True),
    prefix='test_exp', target_group_size=2,
    target_batch_size=32, batch_size_per_step=1, verbose=True,
    start=True
)


while True:
    x = torch.randn(10, 32)
    time.sleep(1)
    loss = torch.mean((x - model(x)) ** 2)
    loss.backward()
    opt.step()

mryab

The rationale for all logger.debug replacements is that we don't want to pollute the output of training scripts unless asked by the user. An alternative solution is to control the verbosity by an argument

hivemind/client/averaging/weighted.py

hivemind/client/optim/collaborative.py

hivemind/client/averaging/training.py

Co-authored-by: Max Ryabinin <[email protected]>

hivemind/client/optim/collaborative.py

Co-authored-by: Max Ryabinin <[email protected]>

hivemind/client/optim/collaborative.py

Co-authored-by: Max Ryabinin <[email protected]>

…laborative_averager

hivemind/client/averaging/training.py

nevec · 2021-04-10T18:23:11Z

hivemind/client/optim/collaborative.py

+            min_refresh_period, max_refresh_period, default_refresh_period
+        self.expected_drift_peers, self.expected_drift_rate = expected_drift_peers, expected_drift_rate
+        self.averaging_timeout, self.metadata_expiration = averaging_timeout, metadata_expiration
+        self.averager = TrainingAverager(


Maybe we should accept averager class in constructor or give a possibility to override averager in subclass?
I don't see an easy way to change averaging logic (e.g. average opt stats) without copypasting the whole class. Correct me if I'm wrong

Good point, that might prove useful in subsequent research for secure/robust averagers
/* Added averager_cls parameter that allows user to override the averager class */

Conversation log with @nevec : we have instead opted for a _make_averager method for more versatility

leshanbog and others added 14 commits April 7, 2021 21:45

Initial draft

9d42600

weighted averager

913aba6

add pause support

d18a760

support extra_tensors

8d3ce9a

actually support extra_tensors

c6f6718

re-normalize even on error

521762a

fetch collaboration state

7160f3f

update to the most recent version

11dad3d

remove resolved TODO

6b76cf0

increment defaults

b9c6a88

increment defaults

13a282c

fix cyclic load state

e0abf27

actually load extra_tensors

63bfb09

Typos fix

1019b12

justheuristic requested review from nevec and foksly April 8, 2021 09:45

mryab self-requested a review April 8, 2021 11:56

justheuristic added 3 commits April 8, 2021 14:59

Merge branch 'master' into collaborative_averager

3b194b8

fix get_dht_time import

85dcb21

Merge remote-tracking branch 'origin/collaborative_averager' into col…

80d9064

…laborative_averager

mryab changed the title ~~Collaborative averager~~ Add CollaborativeOptimizer Apr 9, 2021

mryab requested changes Apr 9, 2021

View reviewed changes

justheuristic and others added 7 commits April 10, 2021 00:11

move stuff around

56a35b7

finish renaming

fe5968a

Merge branch 'master' into collaborative_averager

8ba730e

switch to new averager features

8c323f7

review

6690fc3

actually use no_grad wrapper

bf98def

remove TODOs, add docstrings

821555a

justheuristic added 5 commits April 10, 2021 19:42

remove TODOs, add docstrings

c28ea79

remove TODOs, add docstrings

7565c4d

limit message frequency

a5220ad

limit message frequency

3e88ab1

rolback

bf883ad