Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Restrict URL preparation to HTTP/HTTPS" #3738

Merged
merged 2 commits into from
Dec 1, 2016

Conversation

Lukasa
Copy link
Member

@Lukasa Lukasa commented Nov 30, 2016

This reverts commit 34af72c.

This commit was added by @tiran to try to avoid the IDNA-encoding logic that we added in v2.12.0. I believe that the other workarounds we merged for that are sufficient, and this change broke docker-py and probably broke others.

@tiran, can you confirm that your code continues to function with this change reverted? I'd like to be able to merge this and ship a v2.12.3 if at all possible.

Resolves #3735.

@nateprewitt
Copy link
Member

@Lukasa, I think it would be beneficial readding the test after this reversion. It still provides a useful check going forward and will pass regardless of the tightened scheme check.

@Lukasa
Copy link
Member Author

Lukasa commented Nov 30, 2016

Heh, I just noticed that that test never actually tested the relevant function. WHAT IDIOT REVIEWED THIS CODE THE LAST TIME? 😉

I'll write some new tests that actually codify our behaviour here so that we avoid regressing.

@gvilarino
Copy link

I lost 6 hours of my day troubleshooting this bug 😨 🔫

@ghost
Copy link

ghost commented Dec 1, 2016

@Lukasa sorry for being a hound but do you have an eta on this 2.12.3 release? We just lost a day of dev / test trying to figure out what broke docker-py until we found this. Thx

  • MS

@marcosnils
Copy link

marcosnils commented Dec 1, 2016

@mshahpalerra just pin your requests dep to an older 2.11.x version and you'll be ok. @Lukasa please release a new version of requests ASAP or this will be another left-pad.

@ghost
Copy link

ghost commented Dec 1, 2016

@marcosnils - already being done but that has some other complications as we are changing this pinning on code that is pre prod so has to be retested (not your problem i know :) ) - thanks for the tip

@Lukasa
Copy link
Member Author

Lukasa commented Dec 1, 2016

@marcosnils Well that's a ludicrously hyperbolic thing to say.

The left-pad incident broke all users of the package. This breaks docker-py, and docker-py only. The vast majority of our users have not noticed a problem with v2.12.2. There are several bug reports both here and on docker-py, there is a good temporary workaround, so if it's all the same to you I'm going to take long enough over this to get it right.

It is much more important to me that we don't keep rushing out v2.12.x releases that require further panicked fixes than it is that we make it possible for you to avoid placing pins in your dependency tree. In this case, the patch that we're reverting was added for a reason, and I'd like to ensure that that reason is still covered rather than re-break someone else.

@Lukasa
Copy link
Member Author

Lukasa commented Dec 1, 2016

@mshahpalerra As to the ETA of v2.12.3, it will be as soon as I can get confirmation from @tiran that the problem he was originally trying to fix is still fixed in this branch. If it isn't, then it will be as soon as I can work with him to build a new patch that avoids this specific issue.

@tiran
Copy link
Contributor

tiran commented Dec 1, 2016

ACK

Custodia's test suite for my http+unix adapter is passing with @Lukasa 's branch.

@Lukasa
Copy link
Member Author

Lukasa commented Dec 1, 2016

Ok, let's ship a 2.12.3 then. Thanks @tiran!

@lslebodn
Copy link

Custodia's test suite for my http+unix adapter is passing with @Lukasa 's branch.

I it really amazing that http+unix t will work with python-requests-3.0. Unfortunately, python-requests-3.0 has not been released yet. But 2.12.3 is released and already in some distributions: fedora-26 and debian unstable and might get to debian testing in few days.

2.12.0 introduced IDNA-encoding logic which broke http+unix:// 2.12.3 with this patch reverted the fix due to #3735 but did not provide any alternative solution. And therefore http+unix:// is broken again.

Is there a reason why cannot be fixed in 2.12.* as well?

@nateprewitt
Copy link
Member

@Islebodn could you provide an example http+unix:// URI/filepath that you're having issues with in 2.12.3? This should be fixed even without #3713 but perhaps we missed something.

@Lukasa
Copy link
Member Author

Lukasa commented Dec 22, 2016

Correct, this should be fixed in 2.12.4.

@lslebodn
Copy link

@Islebodn could you provide an example http+unix:// URI/filepath that you're having issues with in 2.12.3? This should be fixed even without #3713 but perhaps we missed something.
Correct, this should be fixed in 2.12.4.
Thank you very much for fast response. i really appreciate it.

I did some tests with more version of pyhton-requests and I found out that 2.12.3 and 2.14.1 does not work well with upper case letters in path. The url is

>>> from requests.compat import quote
>>> sock_path ='/tmp/UPPER/testing.socket'
>>> url = 'http+unix://' + quote(sock_path, safe='')
>>> print url
http+unix://%2Ftmp%2FUPPER%2Ftesting.socket

>>> import requests
>>> r = requests.Request('GET', url=url)
>>> p = r.prepare()
>>> p.url
'http+unix://%2ftmp%2fupper%2ftesting.socket/'

I've just tested and upper case letters works well in such situation with 2.10.0 and 2.11.1

@nateprewitt
Copy link
Member

Hmm, so you're right. This is an issue in that this path will pass through IDNA without throwing an exception, but IDNA is forcing it to lowercase. This is valid behaviour for a hostname which should be case insensitive but doesn't work for paths (which really shouldn't be IDNA encoded in the first place). This is a bug, but the way unix sockets have worked in the past is kind of incorrect too. It supplies a path in place of the hostname we're expecting.

I don't have an immediate solution for this other than suggesting you store the socket file in a lower-cased path. We'll have to wait for @Lukasa's opinion on this and I'll take a deeper dive if it's deemed needed.

@lslebodn
Copy link

Hmm, so you're right. This is an issue in that this path will pass through IDNA without throwing an exception, but IDNA is forcing it to lowercase.
This is valid behaviour for a hostname which should be case insensitive

Agree; that's the reason why reverted patch tried to use IDNA only with http:// and https://

but doesn't work for paths (which really shouldn't be IDNA encoded in the first place). This is a bug, but the way unix sockets have worked in the past is kind of incorrect too. It supplies a path in place of the hostname we're expecting.

I am not sure whether it's incorrect. it is an URI with different protocol then http. The similar use case is with different protocols: ldap, ldaps which expect hostname/IP and ldapi which expect url encoded path to unix socket e.g. http://www.openldap.org/lists/openldap-devel/200201/msg00231.html
http://www.openldap.org/doc/admin24/runningslapd.html

@nateprewitt
Copy link
Member

So, I did some more investigating and this actually is completely unrelated to IDNA.

urllib3/urllib3#911 introduced a forced lowercasing of all host names in urllib3 which was released in urllib3 1.17. The version of urllib3 used in Requests prior to 2.12.0 was 1.16 which is why we hadn't hit this before. urllib3's behaviour here is completely correct because host names should be able to be lowercased without affecting the URIs usability.

As I said above, passing a path has worked in the past but is technically not in line with how we're parsing the URI. Again, I think we'll have to wait for @Lukasa to weigh in here.

@sigmavirus24
Copy link
Contributor

urllib3's behaviour here is completely correct because host names should be able to be lowercased without affecting the URIs usability.

Right.

As I said above, passing a path has worked in the past but is technically not in line with how we're parsing the URI. Again, I think we'll have to wait for @Lukasa to weigh in here.

Since there is no specification around UNIX socket URIs, I would think we'd treat that as a path, not a hostname. shrug

@nateprewitt
Copy link
Member

nateprewitt commented Dec 22, 2016

Since there is no specification around UNIX socket URIs, I would think we'd treat that as a path, not a hostname. shrug

Agreed, that seems like the sensible thing to do if Requests is going to properly support unix sockets. We'd need to expand both how urllib3 parses URIs, and our flow in prepare_url, rather than relying on our current workarounds. That approach would fix a lot of the IDNA issues at their core too, because our primary conflict has been passing non-hostnames (IPs and paths) into the encoder. Maybe something to consider in 3.0.0.

@Lukasa
Copy link
Member Author

Lukasa commented Dec 22, 2016

Hrm. This is not ideal. However, it's further evidence that Requests is really ill suited to handling these use-cases. It makes an enormous number of assumptions about URL structure that are simply not reasonable.

I have no interest in expanding urllib3's notion of a URL to encompass this use-case. I would, however, be willing to add an extension to allow its URL case normalisation to apply only to specific schemes.

@nateprewitt
Copy link
Member

nateprewitt commented Dec 22, 2016

So I have an issue ready to throw up on urllib3 to catalog this appropriately, but have another question before we start down the scheme specific normalization route.

Storing something in the form /tmp/path/out.socket as the host and not path of the URI seems off. The authority component isn't required in the URI, so wouldn't it be more correct to detect a URI without a host and store the value in the path variable? I realize that we're adhering to the URL subset of URIs, but can't find anything specific on whether the authority requirement differs. Moving to using path would allow us to avoid this dance around values in the host param that aren't actually hosts.

@Lukasa
Copy link
Member Author

Lukasa commented Dec 23, 2016

@nateprewitt I don't think we do store /tmp/path/out.socket as the host. I think we only store it if its urlencoded when we receive it, which seems more reasonable to me. Though admittedly I'm not the URI expert (that'd be @sigmavirus24).

@nateprewitt
Copy link
Member

@Lukasa, you're right, I was conflating behaviours in my explanation. %-encoding the path is a hack around prepare_url because Requests refuses URLs without a host at the model level, rather than adapter. The host is mandated for http/https schemes but becomes a grey area with the use of "http+".

If we want to support "http+{unix,docker,???}", which seems poorly defined but in use, we'd probably need to relax the hard stop at no host until we get to the adapter. This would allow someone to pass /tmp/path/out.socket and we can treat it correctly as a path, without needing a work around. Then we can avoid modifying correctly functioning code to accommodate the hack. If we don't intend to support "http+" in 3.0.0 then this is probably a moot point.

@tiran
Copy link
Contributor

tiran commented Dec 23, 2016 via email

@nateprewitt
Copy link
Member

@tiran thanks, for clarifying! I'll defer to those of you that are much more well versed in this space than I am :)

I'll open the issue in urllib3 with Lukasa's originally suggested fix.

@tiran
Copy link
Contributor

tiran commented Dec 23, 2016 via email

@lslebodn
Copy link

lslebodn commented Jan 2, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants