Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow failed import of Pool from multiprocessing #8162

Closed
wants to merge 2 commits into from
Closed

allow failed import of Pool from multiprocessing #8162

wants to merge 2 commits into from

Conversation

landfillbaby
Copy link

quick fix for #8161

"ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770."
Comment on lines 213 to 215
# This is done for 2x speed up of requests to pypi.org
# so that "real time" of this function
# is almost equal to "user time"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should likely be rewritten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or just moved inside the "try" block.

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor point on the comment, butotherwise LGTM

Comment on lines 213 to 215
# This is done for 2x speed up of requests to pypi.org
# so that "real time" of this function
# is almost equal to "user time"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or just moved inside the "try" block.


pool.close()
pool.join()
except ImportError:
Copy link
Contributor

@CrafterKolyan CrafterKolyan Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert of when Python actually imports class Pool and raises ImportError but in my opinion it should be in the actual importing code:

from multiprocessing.dummy import Pool

In CPython they wrap importing of module in try except clause.
(https://github.com/python/cpython/blob/d9a43e20facdf4ad10186f820601c6580e1baa80/Lib/multiprocessing/synchronize.py#L24-L33)

So the except part will never be executed in current implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it might be cleaner to move both for loops in a fetch_latest_info_multithreaded and fetch_latest_info_single_threaded, and then include the from multiprocessing.dummy import Pool in the fetch_latest_info_multithreaded function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the ImportError block needs to be put on the from multiprocessing.dummy line instead.

Copy link
Author

@landfillbaby landfillbaby Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was wondering if this reliance on delayed import would be allowed, I'll change it if it's really necessary
I'm not sure what you mean by "the except part will never be executed" though @CrafterKolyan, since that exception is exactly what caused this issue; raising from within an except block still raises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure from the original bug report which statement is triggering the ImportError. We should probably determine that, and just wrap that in the try...except (maybe setting a flag that we test later to choose the right code path).

Also, should this code have a test? Otherwise there's a (small) risk that the two implementations get out of line.

@CrafterKolyan
Copy link
Contributor

CrafterKolyan commented Apr 28, 2020

I think something like this may be a fix:

    def fetch_latest_info_multithreaded(packages, latest_info_func):
        from multiprocessing.dummy import Pool

        # This is done for 2x speed up of requests to pypi.org
        # so that "real time" of this function
        # is almost equal to "user time"
        pool = Pool(DEFAULT_POOLSIZE)

        for dist in pool.imap_unordered(latest_info_func, packages):
            if dist is not None:
                yield dist

        pool.close()
        pool.join()

    def fetch_latest_info_singlethreaded(packages, latest_info_func):
        for dist in packages:
            dist = latest_info_func(dist)
            if dist is not None:
                yield dist

    def iter_packages_latest_infos(self, packages, options):
        with self._build_session(options) as session:
            finder = self._build_package_finder(options, session)

            def latest_info(dist):
                typ = 'unknown'
                all_candidates = finder.find_all_candidates(dist.key)
                if not options.pre:
                    # Remove prereleases
                    all_candidates = [candidate for candidate in all_candidates
                                      if not candidate.version.is_prerelease]

                evaluator = finder.make_candidate_evaluator(
                    project_name=dist.project_name,
                )
                best_candidate = evaluator.sort_best_candidate(all_candidates)
                if best_candidate is None:
                    return None

                remote_version = best_candidate.version
                if best_candidate.link.is_wheel:
                    typ = 'wheel'
                else:
                    typ = 'sdist'
                # This is dirty but makes the rest of the code much cleaner
                dist.latest_version = remote_version
                dist.latest_filetype = typ
                return dist

            try:
                fetch_latest_info_multithreaded(packages, latest_info)
            # See issue #8161
            except ImportError:
                fetch_latest_info_singlethreaded(packages, latest_info)

and also we should delete

from multiprocessing.dummy import Pool

from the beginning of the file

@landfillbaby
Copy link
Author

ok cool yeah, use that instead, it's much more pythonic

@deveshks
Copy link
Contributor

I think something like this may be a fix:

# See issue #8161 should instead include the actual issue link: https://github.com/pypa/pip/issues/8161 in place of #8161

@McSinyx
Copy link
Contributor

McSinyx commented Apr 29, 2020

Hi there, a bit late for the party but I think we can abstract it a bit further, i.e. we can allow

try:
    # Maybe adding a filter in this case
    return map_threaded(latest_info_func, packages)
except ImportError:
    return map(latest_info_func, packages)

I hope that the parallelization in pip list is the beginning of parallel code in pip, and the above might make it easier to implement it in other places.

@pradyunsg
Copy link
Member

pradyunsg commented Apr 29, 2020

While we're all discussing colors for the bikeshed (i.e. the various forms to write this) here's my take on how to paint the bikeshed 😉

diff --git a/src/pip/_internal/commands/list.py b/src/pip/_internal/commands/list.py
index cf3be7eb..a0558079 100644
--- a/src/pip/_internal/commands/list.py
+++ b/src/pip/_internal/commands/list.py
@@ -5,7 +5,6 @@ from __future__ import absolute_import
 
 import json
 import logging
-from multiprocessing.dummy import Pool
 
 from pip._vendor import six
 from pip._vendor.requests.adapters import DEFAULT_POOLSIZE
@@ -24,9 +23,30 @@ from pip._internal.utils.misc import (
 )
 from pip._internal.utils.packaging import get_installer
 
+try:
+    from multiprocessing.dummy import Pool
+except ImportError:
+    Pool = None  # type: ignore
+
 logger = logging.getLogger(__name__)
 
 
+def map_threaded(function, sequence):
+    assert Pool is not None
+
+    # DEFAULT_POOLSIZE is the pool size used by requests.
+    with Pool(DEFAULT_POOLSIZE):
+        for retval in pool.imap_unordered(function, sequence):
+            yield retval
+
+
+def map_threaded_if_possible(function, sequence):
+    if Pool is None:
+        return map(function, sequence)
+
+    return map_threaded(function, sequence)
+
+
 class ListCommand(IndexGroupCommand):
     """
     List installed packages, including editables.
@@ -210,18 +230,10 @@ class ListCommand(IndexGroupCommand):
                 dist.latest_filetype = typ
                 return dist
 
-            # This is done for 2x speed up of requests to pypi.org
-            # so that "real time" of this function
-            # is almost equal to "user time"
-            pool = Pool(DEFAULT_POOLSIZE)
-
-            for dist in pool.imap_unordered(latest_info, packages):
+            for dist in map_threaded_if_possible(latest_info, packages):
                 if dist is not None:
                     yield dist
 
-            pool.close()
-            pool.join()
-
     def output_package_listing(self, packages, options):
         packages = sorted(
             packages,

@pfmoore
Copy link
Member

pfmoore commented Apr 29, 2020

Don't forget the pool.close(); pool.join() in map_threaded.

Also, can we confirm precisely where the ImportError is raised? #8161 said

not bothering with traceback, since it's easily traced to the use of multiprocessing.dummy.Pool in commands/list.py line 216.

But we need the traceback to see what precise statement fails.

I'm concerned that as we don't have a test platform without multithreading, if we're not careful, we'll check the wrong place and won't catch our mistake.

@deveshks
Copy link
Contributor

termux/termux-packages#5222 (comment) contains a traceback for this issue.

@McSinyx
Copy link
Contributor

McSinyx commented Apr 29, 2020

Sharing the same concern with @pfmoore, since after a glance at CPython, it seems that the error is raised during initialization of multiprocessing objects (synchronize is the module raising the error at import stage). If so, the ingredients to mix the paint can be prepared as follows

def map_threaded(function, iterable):  # I mean map_threaded_if_possible
    try:
        pool = Pool(DEFAULT_POOLSIZE)  # ugh no context manager for Python 2's pool
    except ImportError:
        return map(function, iterable)
    else:
        yield from pool.imap_unordered(function, iterable)
        pool.close()
        pool.join()

Mocking for testing of this in particular is not too difficult IMHO; however testing for thread-safety in general when we expand multithreading of networking code can be not as easy.

Edit: I'm typing way too slow again, thanks @deveshks for the pointer.

@pfmoore
Copy link
Member

pfmoore commented Apr 29, 2020

... and yield from isn't Python 2 compatible 🙁

And for that matter, yielding from a try...except is very dodgy. Would map_threaded be a generator or not with this code?

I don't mean to pick on your particular comment, @McSinyx, but can I ask that we try to refrain from posting speculative bits of code here? I'm guilty of doing so myself (and for that matter, of a rather too quick approval of the original PR, although I think that original code would have worked fine, so maybe it wasn't so bad to have approved it!) But if we're going to debate the most correct and robust solution, it's getting very hard to keep track of what's good code and what seemed reasonable but turned out to not work.

My biggest concern here is that we don't have a platform to test proposed solutions on (other than getting @landfillbaby to check for us) so we need to be fairly conservative in how we address the issue.

@pradyunsg
Copy link
Member

can I ask that we try to refrain from posting speculative bits of code here?

Guilty as charged and in agreement.

TBH, at this point, I feel like we should revert the original PR until we figure out the finer details of what we want parallelized code in pip to look like, what strategy we use to support platforms that don't have threading etc.

@Grimler91
Copy link

My biggest concern here is that we don't have a platform to test proposed solutions on

If anyone has an android 7 or newer smartphone testing should be as easy as installing termux and then running

apt update && apt install python
pip install --upgrade pip

And then apply a patch and test pip list -o. Messing around with patches might be annoying on a small phone screen though, setting up sshd and ssh'ing into termux would be more convenient.

Me, @landfillbaby and other termux-involved people can help testing in any case.

@McSinyx
Copy link
Contributor

McSinyx commented Apr 29, 2020

Sorry @pfmoore (and thanks for pointing out yield from in py2), my code sample was totally unnecessary and distracting (I think the only point there was to use the else clause), given we got the traceback now. Agreeing with @pradyunsg, probably we would want to revert the multiprocessing code until having better analysis on supported platform. Not because for termux in particular, but I'm wondering if there's any other platform failing short of it for other reasons, since threading and locking mechanisms are provided by lower level than what we can control.

That being said, to fix the linked issue, this PR should be sufficient, giventhis comment in CPython codebase. bpo-3770 might never be fixed, but I think the comment can be used as a canonical reference to the raise of ImportError. The module synchronize, AFAIK, is late-imported during initialization of objects as shown above.

@landfillbaby
Copy link
Author

closed because #8167 is arguably better

@McSinyx
Copy link
Contributor

McSinyx commented Aug 1, 2020

@Grimler91 and @landfillbaby, now that pip 20.2 is released with multiprocessing.dummy usage selective of platform, could you give pip list -o a try on termux to see if it works correctly?

@landfillbaby
Copy link
Author

it works, yes! good

@McSinyx
Copy link
Contributor

McSinyx commented Aug 1, 2020

Thank you for the quick response, @landfillbaby! happy noises 🎉

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants