Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

params no longer work for non-http URLs #1879

Closed
ibuildthecloud opened this issue Jan 24, 2014 · 26 comments
Closed

params no longer work for non-http URLs #1879

ibuildthecloud opened this issue Jan 24, 2014 · 26 comments

Comments

@ibuildthecloud
Copy link

Commit b149be5 specifically removed all processing of non-HTTP(s) URLs. Previous to this commit if you were to pass in params, it would be appended to the URL. Since this commit it no longer works. Specifically I ran into this issue when running docker-py against a unix socket.

I understand that for non standard URLs you don't want to parse them. It seems that if the params were passed, then you should still append them. I'll gladly put in a pull request for this if it seems like a valid issue. A simple fix would be the below

        # Don't do any URL preparation for oddball schemes
        if ':' in url and not url.lower().startswith('http'):
            encoded_params = self._encode_params(params)
            if len(encoded_params) > 0:
                if url.find('?') == -1:
                    url = '%s?%s' % (url, encoded_params)
                else:
                    url = '%s&%s' % (url, encoded_params)
            self.url = url
            return
@Lukasa
Copy link
Member

Lukasa commented Jan 24, 2014

Hmmm.

Hmmm.

I'm very much on the fence here. On the one hand, I can see your argument, and it makes plenty of sense. There's definitely utility in being able to pass 'params' to requests and have it do the right thing.

Unfortunately, it violates the distinction we drew when we accepted that change. The nature of that case was to say "if this URL isn't for HTTP, we have no idea what it means and we shouldn't touch it". That's because the URL syntax is not very heavily specified. In particular, there's no well-defined syntax for how the query string should look. It is generally written as a set of kvps, like in HTTP, but RFC 3986 does not go into detail about how these should look, instead leaving them to be scheme specific.

With that in mind, I think I'm -0.5 on this. We just can't be sure that we're doing the right thing with parameters.

@sigmavirus24
Copy link
Contributor

@ibuildthecloud can you provide an example of the URL you were requesting with docker-py?

@ibuildthecloud
Copy link
Author

Totally understand the perspective, but since the url and params are both
specified by the users, it seems like acceptable behaviour. Basically
requests is just saying, "I have no clue what this funky URL is, but you
told me to add params, so I'll do the documented behaviour of adding ?k=v"
If your funky non-standard URL is not compatible with params, then the user
shouldn't pass them in.

The problem I specifically have is that between versions 2.0.1 and 2.1.0
this behaviour changed. So either requests should respect the old
behaviour, or consumers of requests need to change. Specifically I need to
get docker-py to change. But that change for them doesn't see to pretty
IMO. docker-py can run against HTTP or a unix socket. So the code is
relatively agnostic to which it is using. But asking them to change this
means that every they use params they must do "if http: params={} else:
urlencode(...)"

On Fri, Jan 24, 2014 at 3:24 PM, Cory Benfield [email protected]:

Hmmm.

Hmmm.

I'm very much on the fence here. On the one hand, I can see your argument,
and it makes plenty of sense. There's definitely utility in being able to
pass 'params' to requests and have it do the right thing.

Unfortunately, it violates the distinction we drew when we accepted that
change. The nature of that case was to say "if this URL isn't for HTTP, we
have no idea what it means and we shouldn't touch it". That's because the
URL syntax is not very heavily specified. In particular, there's no
well-defined syntax for how the query string should look. It is
generally written as a set of kvps, like in HTTP, but RFC 3986 does not
go into detail https://tools.ietf.org/html/rfc3986#section-3.4 about
how these should look, instead leaving them to be scheme specific.

With that in mind, I think I'm -0.5 on this. We just can't be sure that
we're doing the right thing with parameters.


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/issues/1879#issuecomment-33267892
.

@ibuildthecloud
Copy link
Author

The URL looks like 'unix://var/run/docker.sock/v1.6/containers/json' and the code is passing in params so that the url becomes 'unix://var/run/docker.sock/v1.6/containers/json?all=1&limit=-1&trunc_cmd=0'

@Lukasa
Copy link
Member

Lukasa commented Jan 25, 2014

I absolutely see that problem, but I feel very uncomfortable with the behaviour of assuming a certain logic for params.

What are docker-py doing to route over the unix domain socket? At the very least they'll have to be mounting a Transport adapter.

@sigmavirus24
Copy link
Contributor

What are docker-py doing to route over the unix domain socket? At the very
least they'll have to be mounting a Transport adapter.

I was thinking the same thing, but the Transport adapter doesn't handle
params. That can only be handled by a pepared request object. I'm not sure
they could pass on the params they need to tack on with the adapter in this
case. Unless, you had another idea how to go about that @Lukasa, I think that
may be a dead end. I agree though that the parameter handling should only be
for HTTP(S) URLs since those are the only ones we are really positioned or
likely to support.

docker-py could use URI templates to bypass having to use the params
parameter to requests, but that would likely introduce a new dependency which
they might not want.

@Lukasa
Copy link
Member

Lukasa commented Jan 25, 2014

What I was asking, actually, was whether they had a TransportAdapter subclass. If they do, it's not the end of the world to subclass/monkeypatch the PreparedRequest object.

@ibuildthecloud
Copy link
Author

This is getting a bit past my knowledge of the internals of requests, but they seem to have extended requests.adapters.HTTPAdapter and created a UNIX socket one. When you create a docker Client object, that extends requests.Session and calls mount() to register the UNIXAdapter. You can look at the specific details at https://github.com/dotcloud/docker-py/blob/master/docker/unixconn/unixconn.py

So maybe this is a good thing, I wasn't aware that docker-py was doing specific logic to make unix sockets work. Perhaps there's a different way about this then that docker-py can do. I'll have to look further into docker-py or pull in one of the maintainers. Let me dig a bit into the internals and see what they are doing because obviously they are ignoring the appended query string when opening the actual socket, but I think the current issue is that the "GET /url" also doesn't have the query string. So I'm assuming there needs to be some way to get the params passed to requests into the GET line.

@Lukasa
Copy link
Member

Lukasa commented Jan 26, 2014

So, the issue is that as @sigmavirus24 mentioned, Requests handles parsing the URL about two layers higher than the Transport Adapter (in the PreparedRequest object). Certainly docker-py can work around this by playing about with that object (either monkeypatching it or using the explicit PreparedRequest flow), but the real meat of this issue is whether they should have to.

I remain as I was before, at -0.5. It feels like we'd be doing the wrong thing. I am, however, open to being convinced on this issue.

@sigmavirus24
Copy link
Contributor

I took a look to see how simple that would be for them to implement but they're code seems quite full of misdirection. That said, the way they're currently doing everything, I find it hard to believe they'll be up for using the explicit flow @Lukasa linked to.

@Lukasa
Copy link
Member

Lukasa commented Jan 26, 2014

So, million-dollar question: is it our job to un-break docker-py, or should we apologise for not spotting that this could happen but assert that we shouldn't change our current behaviour?

My worry is that unix: as the URI scheme actually doesn't seem to specify anything about how to encode the URI: at least, I can't find a doc. This isn't unexpected, since there's no requirement that unix domain sockets use HTTP as the application-layer protocol, they can do whatever the hell they want. This is a rare situation where the application-layer isn't specified in the name. For instance, LDAP appears to define an ldapi: scheme that is for use over unix domain sockets, and has defined semantics. Sadly, HTTP does not.

With that in mind, we can do a number of things.

  1. Assume that whenever the user points Requests to a unix domain socket they want to run HTTP over it, and behave accordingly. That's not totally unreasonable, but it runs the risk of breaking third-party libraries for other application layer protocols that may also want to use bare unix sockets through Requests. This is the reason we stopped parsing the URLs in the first place (see loosen URL handling for non-native URL schemes #1717).
  2. Allow users the possibility of shooting themselves in the foot by saying that we will always prepare params as though they're HTTP-like, even when we don't know what the hell you're doing (i.e. non https? URI).
  3. Don't change our current behaviour: apologise, but say that guessing is bad.

On balance, my order of preference is 3 -> 1 -> 2. I think 2 is dangerous, violates the Zen of Python and undoes the genuinely good change that was in #1717. I think that 1 is OK: it's not unreasonable to do that, but I'm convinced we'll break another third-party library. That might not be the end of the world, but it's not ideal either.

3 continues to be the argument that convinces me most. We made a decision that we should do a bit less guessing about what the user wants from us in situations we don't understand. I think that was a good decision, and I think we should stick with it. I can certainly be convinced towards 1, but right now I'm inclined towards doing 3.

@sigmavirus24
Copy link
Contributor

I'm in favor of 3. I just looked at docker-py to see if I could give @ibuildthecloud a hand in preparing a PR. That said, 1 is plenty dangerous in my opinion. Just because someone is using requests to talk to something does not mean we can assume it is not translating a PreparedRequest into some other format. I'm not convinced that it is reasonable (or not totally unreasonable) to assume that they're using HTTP/1.1.

@ibuildthecloud
Copy link
Author

How about a fourth option. I totally agree that requests should treat non-HTTP URLs as opaque strings and allowing somebody to shoot themselves in the foot by appending ?k=v might not be desirable either. My hang up though is that if you use a non-HTTP URL the params argument to requests becomes useless and ignored. It's nice that you can plug in a TransportAdapter for a non-standard transport but the side effect now is that if your TransportAdapter works off of non-HTTP URLs the params argument can't be used anymore. Instead of effectively dropping the params argument, what if it was saved and passed onto the TransportAdapter to do with it what it wants. So the change I'm thinking is basically to add the below

    def prepare_url(self, url, params):
        """Prepares the given HTTP URL."""

        enc_params = self._encode_params(params)

        # Save params for non-HTTP URL based TransportAdapters
        self.params = params
        self.encoded_params = enc_params

So basically just save both the url and the params in the PreparedRequest. Now I already tried to modify docker-py to use this approach but then realized the logic to encode the params is in an internal method in PreparedRequest. So in order to not duplicate the logic (and not call an internal method), I though prepare_url could save the encoded params. Or move PreparedRequests._encode_params to be some non-internal util method.

@sigmavirus24
Copy link
Contributor

@ibuildthecloud the short answer is no.

The long answer is this: Transport Adapters only handle the transmission of requests and responses. Prepared Requests are the representation of a request ready to be sent. Sessions encapsulate the logic (and only the logic) of a user-agent session (lightly speaking) including handling of cookies, and determining which transport adapter to use to perform an interaction.

The pattern, is this: Each object only knows what it has to know about. Parameters are related to requests only, transport adapters should care not what parameters are sent. They should concern themselves only with processing a request and generating a response.

Perhaps the best solution for docker-py is to monkeypatch the PreparedRequest object's prepare_url method to not check the scheme of the URL. With that in mind, you will be duplicating some code unfortunately.

@ibuildthecloud
Copy link
Author

Hmmm...

Who's responsibility is it to generate and write the line "GET /x?k=b"
line in the HTTP request? Isn't that the transport that does that?

On Sun, Jan 26, 2014 at 2:42 PM, Ian Cordasco [email protected]:

@ibuildthecloud https://github.com/ibuildthecloud the short answer is
no.

The long answer is this: Transport Adapters only handle the
transmission of requests and responses. Prepared Requests are the
representation of a request ready to be sent. Sessions encapsulate the
logic (and only the logic) of a user-agent session (lightly speaking)
including handling of cookies, and determining which transport adapter to
use to perform an interaction.

The pattern, is this: Each object only knows what it has to know about.
Parameters are related to requests only, transport adapters should care not
what parameters are sent. They should concern themselves only with
processing a request and generating a response.

Perhaps the best solution for docker-py is to monkeypatch the
PreparedRequest object's prepare_url method to not check the scheme of
the URL. With that in mind, you will be duplicating some code unfortunately.


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/issues/1879#issuecomment-33331429
.

@Lukasa
Copy link
Member

Lukasa commented Jan 26, 2014

Yes, but that job is trivial. It's not the transport's job to work out what the URL should be, just to put it in between the word GET and the word HTTP/1.1. It's the job of higher layers to work out what the URL should be.

@ibuildthecloud
Copy link
Author

But for non-HTTP URLs you are already saying that requests isn't responsible for understanding the URL, so who is then? Can't there be a hook for this? Again, it doesn't seem right that params is just ignored and dropped. That means the caller to requests is not fully abstracted away from the TransportAdapter because in a non-HTTP URL based approach it changes the nature of the params argument.

@ibuildthecloud
Copy link
Author

Additionally, its not trivial to generate "GET /x?k=b" In order to do that you must parse the URL, so it does require specific knowledge of the URL format to create the GET line

@sigmavirus24
Copy link
Contributor

Right. The transport adapter expects to only be talking to HTTP services so it is fair to make assumptions about the URL being passed to it. One of those assumptions is that the URL can be parsed to find the path. No such assumptions can be made by the HTTP adapter about the unix protocol. You would also note that we are not generating that line in requests or in the adapter but instead that takes place in urllib3.

@Lukasa
Copy link
Member

Lukasa commented Jan 26, 2014

You've misunderstood my argument, I think. =)

Requests isn't responsible for understanding URLs on non-HTTP schemes because Requests is a HTTP library. The HTTPAdapter is not the only part of Requests that understands HTTP: the whole Requests stack does. However, people want the freedom to use the Requests API on non-HTTP schemes, which Requests allows (I've written one for FTP, for example).

Allow me to break out Requests design. There are three layers: Request/PreparedRequest/Response, Session, TransportAdapter. Each of these knows about HTTP in a different way.

The first thinks in terms of individual HTTP messages. It builds up the structure for these messages, but in a way that can be mutated. Headers are a dictionary, URLs are components, etc. The PreparedRequest is an intermediate step that does some parsing of data as HTTP, but not much. These layers understand HTTP messages.

The Session layer thinks in terms of HTTP as state: in essence, it acts a bit like a user agent (though it isn't). It knows how to maintain cookies and when to apply them, and it provides room for persistent attributes. It understands the persistence layer of HTTP.

The TransportAdapter layer thinks about HTTP in terms of connections and connection pools. It knows how HTTP messages are structured on the wire, and how to handle chunked data (which is in effect an unusual way of representing HTTP bodies on the wire). This layer also handles TLS and connection maintenance. This layer understands how to serialize HTTP messages into their three basic components: request header, header block, body.

In this model, building URLs from their component structures is a job for the top layer. It knows how HTTP URLs work and provides convenience attributes for working with them. The Session object doesn't touch them at all, and all the TransportAdapter does is work out whether or not it needs to include the full URL in the request URI or whether it can omit everything before the path. Doing that is trivial: look for the first '/' character and split on that. This is not the same as parsing a URL.

For non HTTP URLs the user of Requests is responsible for understanding how to build the URL, because it's the only party involved that knows how.

@ibuildthecloud
Copy link
Author

Okay, let me digest this a bit. Let me point out that what docker-py is doing is HTTP. The only difference is that instead of sending the request over a TCP socket, it is sending it over a UNIX socket. So were not talking about a non-HTTP service. The problem though is that the URL is overloaded in HTTP. It is used as both a means of describing the transport (IP host and TCP port or SSL) and the HTTP path to be put in the request line. So for docker-py a unix scheme URL is used because we are not looking for IP host and TCP port, but instead the file system path to the unix socket.

Would it make sense to have docker-py use the url format http+unix://socketpath/path?k=v to undicate that this is a http URL but has a custom transport. (I have no clue if the http+unix scheme will work with urllib and things, but I've seen that format used in other things)

@sigmavirus24
Copy link
Contributor

That scheme may work but the Transport Adapter would need to likely strip off the http+.

Also you're in luck because it seems urlparse will handle it well:

>>> import requests
>>> requests.compat
<module 'requests.compat' from '/usr/lib64/python2.7/site-packages/requests/compat.pyc'>
>>> requests.compat.urlparse
<function urlparse at 0x7f3d60b9ec80>
>>> requests.compat.urlparse('http+unix://host/path')
ParseResult(scheme='http+unix', netloc='host', path='/path', params='', query='', fragment='')

@ibuildthecloud
Copy link
Author

The UnixAdapter in docker-py already messes with the url to determine the correct path_url and socket path. I actually quickly modified docker-py and http+unix:// seemed to work. I'll go down the path of putting in a PR for docker-py to use http+unix. So hopefully that will be accepted and I'll stop bothering you.

I still think its a gap that for not http scheme URL you just toss params in the trash. There should be a nice hook to provide custom logic to deal with params. But I'm pretty satisfied that http+unix seems to work.

@sigmavirus24
Copy link
Contributor

There should be a nice hook to provide custom logic to deal with params.

I think it's fairly obvious that @Lukasa and I disagree with this sentiment. I'm not sure we can say it enough times but requests is an HTTP library. While docker-py is using it to perform HTTP over a unix socket, not every client using it on a Unix socket will be doing HTTP necessarily. There is no reasoning beyond some extraordinary uses for us to enable a hook or any other simpler means when there are already documented ways around this.

Frankly, it is far from impossible to accomplish but just because we make it possible does not mean we should encourage it by making it simple.

I'm glad you've found one way around it and are working on a pull request to docker-py.

Cheers!

@ibuildthecloud
Copy link
Author

Thanks for your help and quick replies. I should have clarified from the
beginning that I was talking about HTTP over a custom transport. It never
even occurred to me that requests could be used for non-HTTP.

On Sun, Jan 26, 2014 at 5:05 PM, Ian Cordasco [email protected]:

There should be a nice hook to provide custom logic to deal with params.

I think it's fairly obvious that @Lukasa https://github.com/Lukasa and
I disagree with this sentiment. I'm not sure we can say it enough times but
requests is an HTTP library. While docker-py is using it to perform HTTP
over a unix socket, not every client using it on a Unix socket will be
doing HTTP necessarily. There is no reasoning beyond some extraordinary
uses for us to enable a hook or any other simpler means when there are
already documented ways around this.

Frankly, it is far from impossible to accomplish but just because we make
it possible does not mean we should encourage it by making it simple.

I'm glad you've found one way around it and are working on a pull request
to docker-py.

Cheers!


Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/issues/1879#issuecomment-33335146
.

@sigmavirus24
Copy link
Contributor

@ibuildthecloud we love to help when possible.

It never even occurred to me that requests could be used for non-HTTP.

As @Lukasa mentioned, he wrote an FTP adapter. User's can implement whatever backend they want with Transport Adapters. They do not have to use urllib3. They can also process a Prepared Request however they like. We cannot facilitate all of their needs even when they're performing HTTP over a different transport.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants