-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor MPFuture to use a single pipe/thread per process #298
Conversation
Codecov Report
@@ Coverage Diff @@
## master #298 +/- ##
==========================================
+ Coverage 81.62% 81.72% +0.09%
==========================================
Files 63 63
Lines 5813 5866 +53
==========================================
+ Hits 4745 4794 +49
- Misses 1068 1072 +4
|
hivemind/utils/mpfuture.py
Outdated
|
||
from hivemind.utils.threading import run_in_background | ||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note on torch: it is indeed weird, but so far we're still not sure how else to implement shared value for py3.7
Options considered:
- current version (with torch.empty)
- mp.Value or mp.Event - cannot send to other processes (cannot serialize)
- using multiprocessing.shared_memory - incompatible with py3.7 (and thus colab & kaggle kernels)
- using _posixshmem (extra dependency to requirements.txt)
- using mp.Pipe - back to where we started, will need an extra pipe per each future; too many open files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's alright for now, but please import the two methods/attributes explicitly (probably with a short comment that explains its necessity, like "needed for python 3.7-compatible shared memory")
hivemind/utils/mpfuture.py
Outdated
|
||
from hivemind.utils.threading import run_in_background | ||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's alright for now, but please import the two methods/attributes explicitly (probably with a short comment that explains its necessity, like "needed for python 3.7-compatible shared memory")
Co-authored-by: Max Ryabinin <[email protected]>
Co-authored-by: Max Ryabinin <[email protected]>
Co-authored-by: Max Ryabinin <[email protected]>
_global_sender_pipe: Optional[PipeEnd] = None # a pipe that is used to send results/exceptions to this process | ||
_pipe_waiter_thread: Optional[threading.Thread] = None # process-specific thread that receives results/exceptions | ||
_active_futures: Optional[Dict[UID, MPFuture]] = None # pending or running futures originated from current process | ||
_active_pid: Optional[PID] = None # pid of currently active process; used to handle forks natively |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a really good practice, to use None
as a default value, if it's never used, or even checked for?
At least for _active_futures
it seems really simple to have {}
as a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though mutable defaults are generally frowned upon in function arguments, perhaps in that case it's ok, since we use the global class field anyway
And for non-mutable ones, maybe just a type annotation would suffice (unless we need to explicitly check that is was not initialized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a similar pattern/issue in other parts of the repo, let's stick to None for now and discuss global issue on the nearest meeting
hivemind/utils/mpfuture.py
Outdated
except (BrokenPipeError, EOFError): | ||
logger.debug(f"MPFuture backend was shut down (pid={pid})") | ||
except Exception as e: | ||
logger.exception(f"MPFuture: could not retrieve update: caught {repr(e)} (pid={pid})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to specify the class that the log entry refers to, since our logging includes class and method names — in essence, you're saying the same thing twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Co-authored-by: Michael Diskin <[email protected]>
Co-authored-by: Michael Diskin <[email protected]>
Co-authored-by: Michael Diskin <[email protected]>
TODO list:
__await__
)MPFuture no longer uses SyncManager because SyncManager is not fault tolerant; If one process fails (e.g. is terminated) during interaction with SyncManager, it will randomly hang other processes that access the same object.
Note 1:
forking a process results in fork not having background threads:
Moreover, if one explicitly prints the background thread inside a forked process, it will be displayed as stopped (while in master it is started)
__
Note 2:

sending pipes over pipes is not free
This is not due to lengthy serialization or large size, but due to the need to open new files on de-serialization.
In contrast, creating a shared value is much faster
