-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow failed import of Pool from multiprocessing #8162
Conversation
"ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770."
src/pip/_internal/commands/list.py
Outdated
# This is done for 2x speed up of requests to pypi.org | ||
# so that "real time" of this function | ||
# is almost equal to "user time" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should likely be rewritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or just moved inside the "try" block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor point on the comment, butotherwise LGTM
src/pip/_internal/commands/list.py
Outdated
# This is done for 2x speed up of requests to pypi.org | ||
# so that "real time" of this function | ||
# is almost equal to "user time" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or just moved inside the "try" block.
|
||
pool.close() | ||
pool.join() | ||
except ImportError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert of when Python actually imports class Pool
and raises ImportError
but in my opinion it should be in the actual importing code:
from multiprocessing.dummy import Pool
In CPython they wrap importing of module in try except clause.
(https://github.com/python/cpython/blob/d9a43e20facdf4ad10186f820601c6580e1baa80/Lib/multiprocessing/synchronize.py#L24-L33)
So the except part will never be executed in current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it might be cleaner to move both for loops in a fetch_latest_info_multithreaded
and fetch_latest_info_single_threaded
, and then include the from multiprocessing.dummy import Pool
in the fetch_latest_info_multithreaded
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the ImportError block needs to be put on the from multiprocessing.dummy
line instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was wondering if this reliance on delayed import would be allowed, I'll change it if it's really necessary
I'm not sure what you mean by "the except part will never be executed" though @CrafterKolyan, since that exception is exactly what caused this issue; raising from within an except block still raises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure from the original bug report which statement is triggering the ImportError
. We should probably determine that, and just wrap that in the try...except
(maybe setting a flag that we test later to choose the right code path).
Also, should this code have a test? Otherwise there's a (small) risk that the two implementations get out of line.
I think something like this may be a fix:
and also we should delete
from the beginning of the file |
ok cool yeah, use that instead, it's much more pythonic |
|
Hi there, a bit late for the party but I think we can abstract it a bit further, i.e. we can allow try:
# Maybe adding a filter in this case
return map_threaded(latest_info_func, packages)
except ImportError:
return map(latest_info_func, packages) I hope that the parallelization in |
While we're all discussing colors for the bikeshed (i.e. the various forms to write this) here's my take on how to paint the bikeshed 😉 diff --git a/src/pip/_internal/commands/list.py b/src/pip/_internal/commands/list.py
index cf3be7eb..a0558079 100644
--- a/src/pip/_internal/commands/list.py
+++ b/src/pip/_internal/commands/list.py
@@ -5,7 +5,6 @@ from __future__ import absolute_import
import json
import logging
-from multiprocessing.dummy import Pool
from pip._vendor import six
from pip._vendor.requests.adapters import DEFAULT_POOLSIZE
@@ -24,9 +23,30 @@ from pip._internal.utils.misc import (
)
from pip._internal.utils.packaging import get_installer
+try:
+ from multiprocessing.dummy import Pool
+except ImportError:
+ Pool = None # type: ignore
+
logger = logging.getLogger(__name__)
+def map_threaded(function, sequence):
+ assert Pool is not None
+
+ # DEFAULT_POOLSIZE is the pool size used by requests.
+ with Pool(DEFAULT_POOLSIZE):
+ for retval in pool.imap_unordered(function, sequence):
+ yield retval
+
+
+def map_threaded_if_possible(function, sequence):
+ if Pool is None:
+ return map(function, sequence)
+
+ return map_threaded(function, sequence)
+
+
class ListCommand(IndexGroupCommand):
"""
List installed packages, including editables.
@@ -210,18 +230,10 @@ class ListCommand(IndexGroupCommand):
dist.latest_filetype = typ
return dist
- # This is done for 2x speed up of requests to pypi.org
- # so that "real time" of this function
- # is almost equal to "user time"
- pool = Pool(DEFAULT_POOLSIZE)
-
- for dist in pool.imap_unordered(latest_info, packages):
+ for dist in map_threaded_if_possible(latest_info, packages):
if dist is not None:
yield dist
- pool.close()
- pool.join()
-
def output_package_listing(self, packages, options):
packages = sorted(
packages,
|
Don't forget the Also, can we confirm precisely where the
But we need the traceback to see what precise statement fails. I'm concerned that as we don't have a test platform without multithreading, if we're not careful, we'll check the wrong place and won't catch our mistake. |
termux/termux-packages#5222 (comment) contains a traceback for this issue. |
Sharing the same concern with @pfmoore, since after a glance at CPython, it seems that the error is raised during initialization of multiprocessing objects (synchronize is the module raising the error at import stage). If so, the ingredients to mix the paint can be prepared as follows def map_threaded(function, iterable): # I mean map_threaded_if_possible
try:
pool = Pool(DEFAULT_POOLSIZE) # ugh no context manager for Python 2's pool
except ImportError:
return map(function, iterable)
else:
yield from pool.imap_unordered(function, iterable)
pool.close()
pool.join() Mocking for testing of this in particular is not too difficult IMHO; however testing for thread-safety in general when we expand multithreading of networking code can be not as easy. Edit: I'm typing way too slow again, thanks @deveshks for the pointer. |
... and And for that matter, yielding from a try...except is very dodgy. Would I don't mean to pick on your particular comment, @McSinyx, but can I ask that we try to refrain from posting speculative bits of code here? I'm guilty of doing so myself (and for that matter, of a rather too quick approval of the original PR, although I think that original code would have worked fine, so maybe it wasn't so bad to have approved it!) But if we're going to debate the most correct and robust solution, it's getting very hard to keep track of what's good code and what seemed reasonable but turned out to not work. My biggest concern here is that we don't have a platform to test proposed solutions on (other than getting @landfillbaby to check for us) so we need to be fairly conservative in how we address the issue. |
Guilty as charged and in agreement. TBH, at this point, I feel like we should revert the original PR until we figure out the finer details of what we want parallelized code in pip to look like, what strategy we use to support platforms that don't have threading etc. |
If anyone has an android 7 or newer smartphone testing should be as easy as installing termux and then running apt update && apt install python
pip install --upgrade pip And then apply a patch and test Me, @landfillbaby and other termux-involved people can help testing in any case. |
Sorry @pfmoore (and thanks for pointing out That being said, to fix the linked issue, this PR should be sufficient, giventhis comment in CPython codebase. bpo-3770 might never be fixed, but I think the comment can be used as a canonical reference to the raise of ImportError. The module |
closed because #8167 is arguably better |
@Grimler91 and @landfillbaby, now that pip 20.2 is released with |
it works, yes! good |
Thank you for the quick response, @landfillbaby! happy noises 🎉 |
quick fix for #8161