Statistics averaging #229

nevec · 2021-04-16T21:51:06Z

This PR extends TrainingAverager with optimizer's stats averaging. It should serve as base for subsequent decentralized adaptive optims implementations.

implement averaging feature
add basic test
move hivemind.client.optim into separate module
fix typo, which has lead to infinite recursion

justheuristic · 2021-04-17T12:16:03Z

hivemind/client/averaging/training.py

@@ -46,7 +49,7 @@ def __init__(self, opt: torch.optim.Optimizer, *, average_parameters: bool, aver
    def step(self, wait: bool = True, **kwargs):
        """ Average optimizer weights and gradients with peers. """
        if not wait:
-            return run_in_background(self.step, wait=False, **kwargs)
+            return run_in_background(self.step, wait=True, **kwargs)


good catch!

borzunov · 2021-04-18T03:04:15Z

tests/test_averaging.py

+            grad_avg = 0.5 * (x1.grad + x2.grad)
+            stats_avg = 0.5 * (opt1.state[x1]["exp_avg_sq"] + opt2.state[x2]["exp_avg_sq"])
+
+        f1 = averager1.step(wait=False)


Suggested change

f1 = averager1.step(wait=False)

# We set wait=False to test hivemind.utils.run_in_background() usage

f1 = averager1.step(wait=False)

nit: Using wait=False and then waiting for the result looked surprising to me. I'd suggest clarifying this with the comment.

Actually the main purpose of wait=False is to prevent deadlock, when averager1 waits for averager2 to join.
Fix: write this explicitly in comment

Oh, sure, missed that! Thanks for the explanation :)

borzunov

I approve the PR, but ask you to consider the minor change I've suggested :)

…/hivemind into extend_averager

nevec added 3 commits April 17, 2021 00:23

Add statistics averaging feature

38f6707

Bugfix: prevents infinite recursion when wait=False

91423bf

Refactor: move hivemind.client.optim to hivemind.optim

8aae58d

nevec requested a review from justheuristic April 16, 2021 21:51

Merge branch 'master' into extend_averager

de5e174

justheuristic reviewed Apr 17, 2021

View reviewed changes

justheuristic approved these changes Apr 17, 2021

View reviewed changes

justheuristic requested review from mryab and borzunov and removed request for mryab April 17, 2021 12:17

borzunov reviewed Apr 18, 2021

View reviewed changes

borzunov approved these changes Apr 18, 2021

View reviewed changes

nevec added 2 commits April 18, 2021 16:43

Refactor

84459cd

Merge branch 'extend_averager' of https://github.com/learning-at-home…

d7bec31

…/hivemind into extend_averager

nevec merged commit 8c3bd93 into master Apr 18, 2021

nevec deleted the extend_averager branch April 18, 2021 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistics averaging #229

Statistics averaging #229

nevec commented Apr 16, 2021

justheuristic Apr 17, 2021

borzunov Apr 18, 2021

nevec Apr 18, 2021

borzunov Apr 18, 2021

borzunov left a comment •

edited

Loading

	f1 = averager1.step(wait=False)
	# We set wait=False to test hivemind.utils.run_in_background() usage
	f1 = averager1.step(wait=False)

Statistics averaging #229

Statistics averaging #229

Conversation

nevec commented Apr 16, 2021

justheuristic Apr 17, 2021

Choose a reason for hiding this comment

borzunov Apr 18, 2021

Choose a reason for hiding this comment

nevec Apr 18, 2021

Choose a reason for hiding this comment

borzunov Apr 18, 2021

Choose a reason for hiding this comment

borzunov left a comment • edited Loading

Choose a reason for hiding this comment

borzunov left a comment •

edited

Loading