-
-
Notifications
You must be signed in to change notification settings - Fork 22
Ensure import urllib3
can succeed even if trio and twisted aren't installed
#42
Ensure import urllib3
can succeed even if trio and twisted aren't installed
#42
Conversation
b264ac1
to
eedad8e
Compare
To reiterate - this is very much a draft - just looking for feedback on the approach before I go about fixing any broken tests. |
1bb84db
to
579772e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I think this is going in the good direction! I personally like the string approach. Of course, for an important API question like this, I'll wait for the approval of @njsmith (and tests) before merging. :)
I added some thoughts below about the discovery of new backends and error handling, which is an important part of the design here. Please tell me what you think. Thanks!
urllib3/_backends/__init__.py
Outdated
try: | ||
load_backend_fn() | ||
output.append(backend_name) | ||
except: # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want at least except Exception
in order to avoid catching KeyboardInterrupt exceptions.
Any good reason not to simply use ImportError
? I guess the import can fail with different errors, but as a user, I would like to know if a dependency is half-installed.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea - thanks!
urllib3/_backends/__init__.py
Outdated
return BACKENDS[backend_name]()(**_backend_args) | ||
else: | ||
valid_backends = ", ".join(sorted(BACKENDS.keys())) | ||
raise ValueError("Backend must be one of: {}".format(valid_backends)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to use a static list of backends in the else
clause, and let the loading of a backend fail if the dependency is not installed. Which might be more intuitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow - can you rephrase your suggestion?
This else block is raising an error if the user specifies an unknown backend. At this point, I'm assuming that we will have a finite list of backends in urllib3 and will not allow users to write their own without adding them upstream - maybe we want to reconsider this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was not easy to understand. I'll try to be clearer this time.
So I had two suggestions. The first one, above, was simply about replacing except
by except Exception
or except ImportError
.
This suggestion would have more impact. I was wondering about hard coding the valid backends (currently that would be trio, twisted and sync) and always report them as available in the ValueError raised in the else clause of load_backend
. I think this would help with the discovery of new backends. Now, if a user tries to use such a backend but the import fails, I don't know if we should catch the ImportError exception or let it propagate.
I'm really thinking aloud here: I don't know if that would be better or not!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pquentin Apologies - I think I've misled you. The available_backends
function is not used - the error I am raising contains all "known" backends.
I added available_backends
to assist in debugging from a repl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to remove the Looks like we're on the same page now.available_backends
function as it serves no purpose beyond debugging, and probably wouldn't be exposed as a user visible API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I managed to misread this. Sorry!
I think we can leave available_backends as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the back and forth on this. But it didn’t make sense for me to implement available_backends just yet without agreeing on some kind of user interface for it. I’d like to do that separately.
We also need to be careful with that functionality - as it will import all of the backends, it’s probably something that we don’t want to encourage users to call.
OK, brace yourselves. This change has uncovered a race in the import logic which means that two modules do not consider two different imports of the Steps to reproduce:
This still happens if we switch out our python wtf? |
I've added some code that documents the issue in 1a73eff - @pquentin any ideas? FWIW, I've had a scan of http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html and don’t immediately spot any bad practices, but could well be missing something. I speculate that our |
Regarding the public API, see: #25 (comment) |
TIL something about Python. This: from .sync_backend import SyncBackend
def load_sync_backend():
return SyncBackend Is subtly different to this: def load_sync_backend():
from .sync_backend import SyncBackend
return SyncBackend I believe that in the second case, I am triggering a variant of the Double Import Trap. While the repro is different, the outcome is that I have two copies of the same module. Two copies of the same module means I have two copies of the Tl;dr: This means that we can’t catch LoopAbort once we re-import it, as the raised exception is of a different type than the type specified in the ‘except’ block. Perhaps reassuringly, this also fails: from .sync_backend import SyncBackend
def load_sync_backend():
from .sync_backend import SyncBackend
return SyncBackend I'll need to think quite deeply about how to solve this, as ideally we would allow ourselves to load backends dynamically, but we will need to acknowledge that they cannot communicate using types (including Exceptions) as their types will be considered different if we load them in a subsequent run of the import system. |
Wow, nice detective work here! 💯 Would adding a rich comparison function to LoopAbort work? |
This looks really weird to me, in particular this bit:
I really don't think this is supposed to be possible! From the double-import trap link: "The reason this is problematic is that every module in that directory is now potentially accessible under two different names", but in the above, the two different @oliland I don't suppose you've cut down a minimal reproducer? To pick some random folks who I know know a lot about the import system: @ncoghlan @brettcannon Do you have any idea how in Python 3.6 you could end up with two different imports of the same module under the same name? |
Here's another oddity: when I add a I thought this must mean that there's another copy of
If we ignore the So where the heck does this other |
My initial suspicion would be that you might be running into one of the subtle race conditions [1] that have affected the per-module import locks at various points in time, and are hence getting two concurrent imports of the "urllib3._backends._common" module. However, that would only be a plausible explanation if the error was coming up on 3.6.2, and everything worked fine on 3.6.3 and later. [1] https://bugs.python.org/issue30891, https://bugs.python.org/issue30876, https://bugs.python.org/issue31070 |
@njsmith good points. As you've pointed out that both imports should resolve to the same key in I've really struggled to repro this outside of the urllib3 repository. The only other thing I've noticed is that only the There's tons of other weird stuff, hard to tell if relevant or not. Adding another line to context['h11_response'] = event
from .._backends._common import LoopAbort
raise LoopAbort Which is cool, apart from the fact that this shouldn't be necessary. For bonus points, add these lines: context['h11_response'] = event
print("about to raise loopabort")
print(id(LoopAbort))
from ._backends._common import LoopAbort
print(id(LoopAbort))
raise LoopAbort What happens here? You guessed it - an I'll continue to try to extract a repo, though at this point I think I will need to delete the rest of urllib3 around this code to get there. |
Having failed to produce a minimal repro, I started to delete the rest of urllib3 around this bug and believe I have found the code responsible for this confusion: https://github.com/python-trio/urllib3/blob/bleach-spike/test/test_no_ssl.py#L67 Our failing test class in question uses this On setup it iterates through Removing all calls to this ModuleStash object solves the problem and all imports resolve to the same Interestingly, it's possible to make the tests pass 50% of the time if you only remove the line There are plenty of caveats in the docs [1][2] around screwing around with sys.modules, so we definitely shouldn't do this if it's purely in the interest of coverage. Python 3 docs come with a particularly suitable warning:
so tl;dr: It looks like two ingredients are required to get this crazy behaviour:
[1] https://docs.python.org/3/library/sys.html#sys.modules |
It looks like this crazy module stashing thing is in upstream urllib3, so we should definitely make a case for removing it there. |
Wow, yeah, that could certainly produce weird effects. Excellent detective work. It makes sense to want to test that urllib3 copes with a missing I agree that it'd be good to try to fix this upstream. While that's happening, we can mark these tests as skipped, so they don't block progress on the branch. |
Btw, I think this is demonstrating a weird quirk of how python works in general:
When python compiles a function body, it statically looks for any variables that are involved in assignment operations, and marks those as local, throughout the whole function body. This is why if you want to assign to a local from inside a function, you need to say so explicitly: X = 1
def foo():
# creates a local, doesn't affect global
X = 2
def bar():
# works fine
print(X)
def baz():
# fails! Python thinks you're trying to print the local created below!
print(X)
X = 2
def quux():
# tells python that in this function, X refers to the global
global X
# now this works
print(X)
# and this mutates the global
X = 2 |
c4ffdd6
to
bbecba2
Compare
Ah, yeah, doing a forced reload can definitely cause weirdness, especially if other modules are doing "from reloaded.module import attribute" (which is the case for Very nice detective work tracking that down :) |
5825189
to
c61736d
Compare
@@ -90,6 +90,5 @@ def import_ssl(): | |||
|
|||
self.assertRaises(ImportError, import_ssl) | |||
|
|||
@pytest.mark.xfail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test now passes. hooray!
import urllib3 | ||
|
||
|
||
class TestHTTPWithoutSSL(HTTPDummyServerTestCase, TestWithoutSSL): | ||
|
||
@pytest.mark.skip(reason=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is cursed. boo.
@@ -311,8 +311,7 @@ def __init__(self, host, port, backend=None, | |||
source_address=None, tunnel_host=None, tunnel_port=None, | |||
tunnel_headers=None): | |||
self.is_verified = False | |||
|
|||
self._backend = backend or SyncBackend() | |||
self._backend = backend or load_backend("sync") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: if this function constantly calls from .sync_backend import SyncBackend
- is it constantly re-running the import machinery whenever we create a connection?
If so, we should special case SyncBackend
to skip the import machinery in the interests of making the common case more efficient.
Alternatively, we should ensure that users have always initialized a backend by this point (which is how I plan for async backends to work), and just leave the or
case for unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-importing an already-imported module is pretty cheap, because python immediately notices that it already has the loaded module cached and uses that. So it's like a few dict lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, also: what we actually want the logic to be here (I think) is:
-
If this is the sync version of the code, then the backend must be
None
or"sync"
orBackend("sync")
. Anything else is an error. -
If this is the async version of the code, then the backend must be specified, and must be not the sync backend.
And... we kind of want to do this check inside all of the classes that take backend
arguments.
So I suggest adding a normalize_backend
function that takes a user backend
argument, and returns a valid Backend
or else an error.
The trickiest bit is detecting whether we are in sync mode or async mode. Eventually we should add a nice way to do this to the code generation script, but for now one can do:
async def f():
return None
obj = f()
if obj is None:
SYNC_MODE = True
else:
# prevent unawaited coroutine warning
obj.close()
SYNC_MODE = False
c61736d
to
115e6cd
Compare
115e6cd
to
680943f
Compare
@@ -1,9 +0,0 @@ | |||
from urllib3.packages import six |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the point in this file any more, so I truncated it. I guess it would make sense to put load_backend
in here, and have that be the only interface to this module, but that feels like making a decision.
Additionally, I'm not sure why a private module (_backends
) would expose a public interface (via __all__
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I did that then probably my logic was, this is a private API from the user point of view (they shouldn't mention _backend
at all), and an opaque module from the point of view of the rest of the internal code (they can use the stuff in _backend.__all__
, but shouldn't go digging deeper).
This is a pretty fine distinction and not particularly important.
|
||
__all__ = ['SyncBackend'] | ||
|
||
if six.PY3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this doesn't seem correct - we should be targeting 3.5+, not 3+ (although maybe twisted is a special case)
urllib3/_backends/_loader.py
Outdated
@@ -0,0 +1,67 @@ | |||
import sys | |||
|
|||
from ..backends import Backend as UserSpecifiedBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to call out here that this is a user interface and should be handled appropriately. It's also not a "Backend" - it's a recipe for building one, but our users shouldn't need to think about that.
urllib3/_backends/_loader.py
Outdated
from ..backends import Backend as UserSpecifiedBackend | ||
|
||
|
||
def check_for_python_3_5(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This should be taken care of by the higher level backend checking logic, plus the fact that you can't get into the async code at all unless you're on a new enough python.
@@ -15,8 +15,8 @@ | |||
|
|||
|
|||
class TwistedBackend: | |||
def __init__(self, reactor): | |||
self._reactor = reactor | |||
def __init__(self, reactor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change attempts to ensure that all of our backends can be initialized without any arguments, which would be nice.
@@ -0,0 +1,9 @@ | |||
class Backend: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a user (or downstream repository) interface.
@njsmith Thanks for the explaination around variable hoisting! Very useful to know indeed. Now that we're out of that rats nest, I've addressed @pquentin's comments and made moves towards adding the interface that @njsmith and @kennethreitz iterated on. This is looking a bit more merge-able now IMO. I would appreciate some more feedback, especially around the comments I added. I'm happy to close this and open another PR if the above context is too distracting. |
test/test_backends.py
Outdated
) | ||
def test_async(self): | ||
load_backend("trio") | ||
load_backend("twisted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests just assert that the interface doesn't go down in flames - should we be asserting anything about the returned socket wrappers?
Ideally their interface would be defined somewhere and we could play with them a little bit, but that may be better suited to another test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it'd be good to have some shared tests for basic backend functionality, but I think making it a separate PR would be better. (It would also help a lot to add bleached tests, so we could test both the sync and async backends with the same test code.)
I'll try to take a closer look later, but quick note: currently our async mode requires 3.6+, because it uses an async generator. See the check in |
Ideally we should consolidate this check somehow. Maybe the tests should check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also want to update the demo/
directory.
Still some work to do here, but it's coming along nicely!
urllib3/backends.py
Outdated
@@ -0,0 +1,9 @@ | |||
class Backend: | |||
""" | |||
Specifies the desired backend and any argumnets passed to it's constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos: "arguments", "its" (this isn't short for "it is", so no apostrophe)
test/test_backends.py
Outdated
) | ||
def test_async(self): | ||
load_backend("trio") | ||
load_backend("twisted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it'd be good to have some shared tests for basic backend functionality, but I think making it a separate PR would be better. (It would also help a lot to add bleached tests, so we could test both the sync and async backends with the same test code.)
test/test_backends.py
Outdated
""" | ||
def test_dummy(self): | ||
with pytest.raises(ImportError): | ||
load_backend("dummy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the load_dummy
stuff from the actual backend code – it's going to create funny coverage issues, and it makes error messages awkward. And I don't think we need to worry too much about testing that if a package is missing then we get an import error. (Or if we do want to test that, we could reuse whatever mechanism we come up with to test the missing ssl
case.)
So I'd replace this test with one that checks what happens if someone tries to use some unknown backend.
test/test_backends.py
Outdated
@pytest.mark.skipif( | ||
sys.version_info < (3, 5), | ||
reason="async backends require Python 3.5 or greater", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to replace this check with not hasattr(urllib3, "AsyncPoolManager")
@@ -311,8 +311,7 @@ def __init__(self, host, port, backend=None, | |||
source_address=None, tunnel_host=None, tunnel_port=None, | |||
tunnel_headers=None): | |||
self.is_verified = False | |||
|
|||
self._backend = backend or SyncBackend() | |||
self._backend = backend or load_backend("sync") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, also: what we actually want the logic to be here (I think) is:
-
If this is the sync version of the code, then the backend must be
None
or"sync"
orBackend("sync")
. Anything else is an error. -
If this is the async version of the code, then the backend must be specified, and must be not the sync backend.
And... we kind of want to do this check inside all of the classes that take backend
arguments.
So I suggest adding a normalize_backend
function that takes a user backend
argument, and returns a valid Backend
or else an error.
The trickiest bit is detecting whether we are in sync mode or async mode. Eventually we should add a nice way to do this to the code generation script, but for now one can do:
async def f():
return None
obj = f()
if obj is None:
SYNC_MODE = True
else:
# prevent unawaited coroutine warning
obj.close()
SYNC_MODE = False
urllib3/_backends/_loader.py
Outdated
from ..backends import Backend as UserSpecifiedBackend | ||
|
||
|
||
def check_for_python_3_5(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This should be taken care of by the higher level backend checking logic, plus the fact that you can't get into the async code at all unless you're on a new enough python.
b88aa63
to
26477ec
Compare
26477ec
to
ebbbf86
Compare
@njsmith comments addressed. This is getting a little complex - I've tried to make the error handling as simple as possible for now, we can decide later how best to help users to recover from these errors. |
@njsmith ping |
A few things are missing:
|
Hello! I opened #47 which builds on this PR and addresses the comments. |
This is
a draftan attempt at resolving #25This ensures that we never directly import trio / twisted until we need them. We do this by adding some scary logic to
_backends
.Looking for feedback on the import mechanism.