move PerformanceEMA to utils, TrainingAverager to optim, update utils #405

justheuristic · 2021-11-07T11:02:14Z

implement and test async wrapper for ContextManager (used in DecentralizedAverager and ProgressTracker)
implement .reset_timer in PerformanceEMA (used when progress was reset, e.g. with fp16 gradient overflow, which should not affect samples per second)
move PerformanceEMA to hivemind.utils (rationale: will be used in hivemind.moe in @mryab 's pipelining exps)
move TrainingAverager to hivemind.optim (for compliance with hivemind.Optimizer and future deprecation in favour of Training StateAverager)
fix process-wide RSA keys in the validator

justheuristic · 2021-11-07T11:05:40Z

hivemind/utils/asyncio.py

+async def enter_asynchronously(context: AbstractContextManager):
+    """Wrap a non-async context so that it can be entered asynchronously"""
+    async with _AsyncContextWrapper(context) as ret_value:
+        yield ret_value


note: we can't simply

try: yield await loop.run_in_executor(context.__enter__(...)) finally: context.__exit__(None, None, None)

b/c this option does not correctly propagate exceptions into the inner context manager

codecov · 2021-11-07T11:22:40Z

Codecov Report

Merging #405 (1a0e20a) into master (7c4d13f) will decrease coverage by 0.23%.
The diff coverage is 92.00%.

@@            Coverage Diff             @@
##           master     #405      +/-   ##
==========================================
- Coverage   83.73%   83.50%   -0.24%     
==========================================
  Files          73       73              
  Lines        6678     6687       +9     
==========================================
- Hits         5592     5584       -8     
- Misses       1086     1103      +17

Impacted Files	Coverage Δ
hivemind/averaging/__init__.py	`100.00% <ø> (ø)`
hivemind/optim/training_averager.py	`95.83% <ø> (ø)`
hivemind/utils/performance_ema.py	`79.48% <33.33%> (ø)`
hivemind/__init__.py	`100.00% <100.00%> (ø)`
hivemind/averaging/averager.py	`85.75% <100.00%> (-0.23%)`	⬇️
hivemind/optim/__init__.py	`100.00% <100.00%> (ø)`
hivemind/optim/adaptive.py	`77.77% <100.00%> (ø)`
hivemind/optim/collaborative.py	`23.80% <100.00%> (ø)`
hivemind/optim/simple.py	`81.42% <100.00%> (ø)`
hivemind/utils/asyncio.py	`99.01% <100.00%> (+0.14%)`	⬆️
... and 2 more

mryab · 2021-11-07T12:49:48Z

hivemind/averaging/averager.py

@@ -453,7 +461,7 @@ async def _run_allreduce(self, group_info: GroupInfo, min_vector_size: int, **kw
                None, load_balance_peers, self.total_size, download_bandwidths, min_vector_size
            )

-            async with self.get_tensors_async() as local_tensors:
+            async with enter_asynchronously(self.get_tensors()) as local_tensors:


This no longer seems to acquire lock_averaged_tensors (which should really be averaged_tensors_lock BTW), is this intended?

It does, inside the get_tensors

mryab · 2021-11-07T12:55:06Z

hivemind/utils/performance_ema.py

@@ -37,15 +37,19 @@ def update(self, task_size: float, interval: Optional[float] = None) -> float:
        self.samples_per_second = 1 / max(adjusted_seconds_per_sample, self.eps)
        return self.samples_per_second

+    def reset_timer(self):


Right now it appears this method has only one usage in the same class. Is it going to have more usages outside of the class? If not, maybe you can have it as a private method or simply keep it inlined

Yes, it will :)

mryab · 2021-11-07T13:02:30Z

tests/conftest.py

@@ -33,6 +33,9 @@ def event_loop():
 def cleanup_children():
    yield

+    with RSAPrivateKey._process_wide_key_lock:
+        RSAPrivateKey._process_wide_key = None


Didn't see this change in the description and right now it seems to have no effect, do we need this?

This has no effect, but it will if you run existing tests in different order.

TL;DR here is how it breaks things:

you create something that instantiates RSAPrivateKey.process_wide

and then all subsequent tests use the same key

crucially, if you create two DHT instances with RSA validator AFTER instantiating private key, everything breaks because they both inherit your key

Future PRs introduce tests that instantiate validators before the sensitive tests and break everything in bizarre ways.

justheuristic added 2 commits November 7, 2021 13:51

util updates

d75c2f2

black-isort

967a0db

justheuristic commented Nov 7, 2021

View reviewed changes

justheuristic requested a review from mryab November 7, 2021 11:06

justheuristic added 2 commits November 7, 2021 14:11

move TrainingAverager to hivemind.optim

d821d1b

typo

1a0e20a

mryab reviewed Nov 7, 2021

View reviewed changes

mryab approved these changes Nov 7, 2021

View reviewed changes

justheuristic changed the title ~~[hivemind.Optimizer] update utilities, move modules around~~ update utils for hivemind.Optmizer; move PerformanceEMA to utils, move TrainingAverager to optim Nov 7, 2021

justheuristic changed the title ~~update utils for hivemind.Optmizer; move PerformanceEMA to utils, move TrainingAverager to optim~~ move PerformanceEMA to utils, move TrainingAverager to optim, update utils Nov 7, 2021

justheuristic changed the title ~~move PerformanceEMA to utils, move TrainingAverager to optim, update utils~~ PerformanceEMA -> utils, TrainingAverager -> optim, update utils Nov 7, 2021

justheuristic changed the title ~~PerformanceEMA -> utils, TrainingAverager -> optim, update utils~~ move PerformanceEMA to utils, TrainingAverager to optim, update utils Nov 7, 2021

justheuristic merged commit ed42040 into master Nov 7, 2021

justheuristic deleted the optimizer_utils_update branch November 7, 2021 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move PerformanceEMA to utils, TrainingAverager to optim, update utils #405

move PerformanceEMA to utils, TrainingAverager to optim, update utils #405

justheuristic commented Nov 7, 2021 •

edited

Loading

justheuristic Nov 7, 2021

codecov bot commented Nov 7, 2021

mryab Nov 7, 2021

justheuristic Nov 7, 2021

mryab Nov 7, 2021

justheuristic Nov 7, 2021

mryab Nov 7, 2021

justheuristic Nov 7, 2021

move PerformanceEMA to utils, TrainingAverager to optim, update utils #405

move PerformanceEMA to utils, TrainingAverager to optim, update utils #405

Conversation

justheuristic commented Nov 7, 2021 • edited Loading

justheuristic Nov 7, 2021

Choose a reason for hiding this comment

codecov bot commented Nov 7, 2021

Codecov Report

mryab Nov 7, 2021

Choose a reason for hiding this comment

justheuristic Nov 7, 2021

Choose a reason for hiding this comment

mryab Nov 7, 2021

Choose a reason for hiding this comment

justheuristic Nov 7, 2021

Choose a reason for hiding this comment

mryab Nov 7, 2021

Choose a reason for hiding this comment

justheuristic Nov 7, 2021

Choose a reason for hiding this comment

justheuristic commented Nov 7, 2021 •

edited

Loading