Refactor MPFuture to use a single pipe/thread per process #298

justheuristic · 2021-06-30T12:24:08Z

TODO list:

MPFuture no longer uses SyncManager because SyncManager is not fault tolerant; If one process fails (e.g. is terminated) during interaction with SyncManager, it will randomly hang other processes that access the same object.

Note 1:
forking a process results in fork not having background threads:

Moreover, if one explicitly prints the background thread inside a forked process, it will be displayed as stopped (while in master it is started)

__

Note 2:
sending pipes over pipes is not free

This is not due to lengthy serialization or large size, but due to the need to open new files on de-serialization.

In contrast, creating a shared value is much faster

codecov · 2021-06-30T12:25:36Z

Codecov Report

Merging #298 (cc113c9) into master (2e1bb9c) will increase coverage by 0.09%.
The diff coverage is 90.81%.

@@            Coverage Diff             @@
##           master     #298      +/-   ##
==========================================
+ Coverage   81.62%   81.72%   +0.09%     
==========================================
  Files          63       63              
  Lines        5813     5866      +53     
==========================================
+ Hits         4745     4794      +49     
- Misses       1068     1072       +4

Impacted Files	Coverage Δ
hivemind/hivemind_cli/run_server.py	`0.00% <0.00%> (ø)`
hivemind/utils/limits.py	`25.00% <ø> (ø)`
hivemind/server/task_pool.py	`43.16% <28.57%> (+0.71%)`	⬆️
hivemind/client/averaging/__init__.py	`72.49% <80.00%> (ø)`
hivemind/utils/mpfuture.py	`94.68% <93.78%> (-0.71%)`	⬇️
hivemind/client/averaging/training.py	`62.50% <100.00%> (+0.33%)`	⬆️
hivemind/dht/__init__.py	`77.62% <100.00%> (ø)`
hivemind/utils/__init__.py	`100.00% <100.00%> (ø)`
hivemind/utils/compression.py	`96.18% <100.00%> (+0.05%)`	⬆️
hivemind/client/averaging/key_manager.py	`95.45% <0.00%> (-2.28%)`	⬇️
... and 2 more

hivemind/utils/mpfuture.py

justheuristic · 2021-06-30T22:59:03Z

hivemind/utils/mpfuture.py


-from hivemind.utils.threading import run_in_background
+import torch


Note on torch: it is indeed weird, but so far we're still not sure how else to implement shared value for py3.7

Options considered:

current version (with torch.empty)

mp.Value or mp.Event - cannot send to other processes (cannot serialize)

using multiprocessing.shared_memory - incompatible with py3.7 (and thus colab & kaggle kernels)

using _posixshmem (extra dependency to requirements.txt)

using mp.Pipe - back to where we started, will need an extra pipe per each future; too many open files

I guess that's alright for now, but please import the two methods/attributes explicitly (probably with a short comment that explains its necessity, like "needed for python 3.7-compatible shared memory")

…y available

…uture

hivemind/server/task_pool.py

hivemind/utils/compression.py

hivemind/utils/__init__.py

hivemind/client/averaging/training.py

mryab · 2021-07-03T11:07:57Z

hivemind/utils/mpfuture.py


-from hivemind.utils.threading import run_in_background
+import torch


I guess that's alright for now, but please import the two methods/attributes explicitly (probably with a short comment that explains its necessity, like "needed for python 3.7-compatible shared memory")

hivemind/utils/mpfuture.py

tests/test_util_modules.py

Co-authored-by: Max Ryabinin <[email protected]>

…uture

hivemind/utils/mpfuture.py

yhn112 · 2021-07-03T13:46:59Z

hivemind/utils/mpfuture.py

+    _global_sender_pipe: Optional[PipeEnd] = None  # a pipe that is used to send results/exceptions to this process
+    _pipe_waiter_thread: Optional[threading.Thread] = None  # process-specific thread that receives results/exceptions
+    _active_futures: Optional[Dict[UID, MPFuture]] = None  # pending or running futures originated from current process
+    _active_pid: Optional[PID] = None  # pid of currently active process; used to handle forks natively


Is it a really good practice, to use None as a default value, if it's never used, or even checked for?
At least for _active_futures it seems really simple to have {} as a default.

Though mutable defaults are generally frowned upon in function arguments, perhaps in that case it's ok, since we use the global class field anyway

And for non-mutable ones, maybe just a type annotation would suffice (unless we need to explicitly check that is was not initialized)

We have a similar pattern/issue in other parts of the repo, let's stick to None for now and discuss global issue on the nearest meeting

hivemind/utils/mpfuture.py

mryab · 2021-07-03T14:22:41Z

hivemind/utils/mpfuture.py

+            except (BrokenPipeError, EOFError):
+                logger.debug(f"MPFuture backend was shut down (pid={pid})")
+            except Exception as e:
+                logger.exception(f"MPFuture: could not retrieve update: caught {repr(e)} (pid={pid})")


You don't need to specify the class that the log entry refers to, since our logging includes class and method names — in essence, you're saying the same thing twice

Co-authored-by: Michael Diskin <[email protected]>

hivemind/utils/mpfuture.py

…uture

Co-authored-by: Michael Diskin <[email protected]>

…uture

justheuristic added 2 commits June 28, 2021 16:09

wip

2c57843

Merge branch 'master' into a-better-future

6918249

borzunov self-requested a review June 30, 2021 13:14

justheuristic added 2 commits July 1, 2021 00:11

working MPFuture prototype

1c04d87

pep8

6e73f92

justheuristic requested review from mryab and yhn112 June 30, 2021 21:11

borzunov reviewed Jun 30, 2021

View reviewed changes

hivemind/utils/mpfuture.py Outdated Show resolved Hide resolved

review

3a01b30

borzunov reviewed Jun 30, 2021

View reviewed changes

hivemind/utils/mpfuture.py Outdated Show resolved Hide resolved

justheuristic added 2 commits July 1, 2021 01:27

review

63c9652

go-deeper test, pytorch version for now

38e0815

justheuristic commented Jun 30, 2021

View reviewed changes

justheuristic added 16 commits July 1, 2021 02:06

partially transfer to new MPFuture

dc83f61

partially transfer to new MPFuture

70193f0

partially transfer to new MPFuture

b4d901f

py37 compatibility

e64fbb6

edge cases

f408a7a

edge cases

1998174

WIP

f2fc224

refactor global variables as class variables, make results immediatel…

31e83bc

…y available

enum-based message type

0d825bd

sync set event

40cef93

set event threadsafe

d5c2005

clarify docstr, rm global lock

4751c2b

Merge branch 'master' into a-better-future

061e1b8

review

8de6f8f

WIP

5823cf3

test done callback

ee77e0b

justheuristic added 2 commits July 3, 2021 03:32

Merge remote-tracking branch 'origin/a-better-future' into a-better-f…

8e7fc0f

…uture

shutdown gracefully

72b7444

justheuristic changed the title ~~A brighter MPFuture for hivemind~~ Refactor MPFuture to use a single pipe per process Jul 3, 2021

justheuristic changed the title ~~Refactor MPFuture to use a single pipe per process~~ Refactor MPFuture to use a single pipe/thread per process Jul 3, 2021

review

123bd06

mryab reviewed Jul 3, 2021

View reviewed changes

justheuristic and others added 10 commits July 3, 2021 15:23

Update hivemind/server/task_pool.py

831b3f7

Co-authored-by: Max Ryabinin <[email protected]>

Update hivemind/utils/__init__.py

09464a2

Co-authored-by: Max Ryabinin <[email protected]>

Update hivemind/client/averaging/training.py

f492bed

Co-authored-by: Max Ryabinin <[email protected]>

review

0808601

Merge remote-tracking branch 'origin/a-better-future' into a-better-f…

3f37eea

…uture

review

c8add97

review

c9dee07

review

8de6c70

review

6206233

review

c8b518b

yhn112 requested changes Jul 3, 2021

View reviewed changes

hivemind/utils/mpfuture.py Outdated Show resolved Hide resolved

hivemind/utils/mpfuture.py Outdated Show resolved Hide resolved

yhn112 reviewed Jul 3, 2021

View reviewed changes

mryab approved these changes Jul 3, 2021

View reviewed changes

Update hivemind/utils/mpfuture.py

18d8abf

Co-authored-by: Michael Diskin <[email protected]>

yhn112 reviewed Jul 3, 2021

View reviewed changes

hivemind/utils/mpfuture.py Outdated Show resolved Hide resolved

hivemind/utils/mpfuture.py Outdated Show resolved Hide resolved

yhn112 and others added 6 commits July 3, 2021 17:32

review

6e0db1d

Merge remote-tracking branch 'origin/a-better-future' into a-better-f…

8401f98

…uture

Update hivemind/utils/mpfuture.py

976d689

Co-authored-by: Michael Diskin <[email protected]>

Update hivemind/utils/mpfuture.py

1cd7fe4

Co-authored-by: Michael Diskin <[email protected]>

switch to RuntimeError

9568979

Merge remote-tracking branch 'origin/a-better-future' into a-better-f…

cc113c9

…uture

yhn112 approved these changes Jul 3, 2021

View reviewed changes

justheuristic merged commit 200fbec into master Jul 3, 2021

justheuristic deleted the a-better-future branch July 3, 2021 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor MPFuture to use a single pipe/thread per process #298

Refactor MPFuture to use a single pipe/thread per process #298

justheuristic commented Jun 30, 2021 •

edited

Loading

codecov bot commented Jun 30, 2021 •

edited

Loading

justheuristic Jun 30, 2021 •

edited

Loading

mryab Jul 3, 2021

mryab Jul 3, 2021

yhn112 Jul 3, 2021

mryab Jul 3, 2021

justheuristic Jul 3, 2021

mryab Jul 3, 2021

justheuristic Jul 3, 2021


		from hivemind.utils.threading import run_in_background
		import torch

Refactor MPFuture to use a single pipe/thread per process #298

Refactor MPFuture to use a single pipe/thread per process #298

Conversation

justheuristic commented Jun 30, 2021 • edited Loading

codecov bot commented Jun 30, 2021 • edited Loading

Codecov Report

justheuristic Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

mryab Jul 3, 2021

Choose a reason for hiding this comment

mryab Jul 3, 2021

Choose a reason for hiding this comment

yhn112 Jul 3, 2021

Choose a reason for hiding this comment

mryab Jul 3, 2021

Choose a reason for hiding this comment

justheuristic Jul 3, 2021

Choose a reason for hiding this comment

mryab Jul 3, 2021

Choose a reason for hiding this comment

justheuristic Jul 3, 2021

Choose a reason for hiding this comment

justheuristic commented Jun 30, 2021 •

edited

Loading

codecov bot commented Jun 30, 2021 •

edited

Loading

justheuristic Jun 30, 2021 •

edited

Loading