Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what if my password has a reserved delimiter in it? #11

Closed
glyph opened this issue Jun 19, 2017 · 13 comments
Closed

what if my password has a reserved delimiter in it? #11

glyph opened this issue Jun 19, 2017 · 13 comments

Comments

@glyph
Copy link
Collaborator

glyph commented Jun 19, 2017

Right now if I do this:

>>> from hyperlink import URL
>>> example = URL.fromText("https://example.com/")
>>> example.replace(userinfo="alpha:my#password")

I get this:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
    example.replace(userinfo="alpha:my#password")
  File "/Users/glyph/.virtualenvs/tmp-a0b3197f7a1da77/lib/python3.6/site-package
s/hyperlink/_url.py", line 861, in replace
    userinfo=_optional(userinfo, self.userinfo),
  File "/Users/glyph/.virtualenvs/tmp-a0b3197f7a1da77/lib/python3.6/site-package
s/hyperlink/_url.py", line 615, in __init__
    self._userinfo = _textcheck("userinfo", userinfo, '/?#@')
  File "/Users/glyph/.virtualenvs/tmp-a0b3197f7a1da77/lib/python3.6/site-package
s/hyperlink/_url.py", line 410, in _textcheck
    % (''.join(delims), name, value))
ValueError: one or more reserved delimiters /?#@ present in userinfo: 'alpha:my#
password'

The API for setting a secret is sort of silly already (manually sticking a colon into the userinfo string), so "fixing" this might not involve any behavior change, but rather add a new argument (.replace(secret=...?)) but there should be some way to take a string that a user typed into a password box and embed it into the URL somehow without forcing the caller to do any wacky percent encoding of their own.

@mahmoud
Copy link
Member

mahmoud commented Jun 19, 2017

Agreed, userinfo is a source of awkwardness. Adding an argument is a possibility, but also we could do a helper function, format_userinfo() or something, as using passwords in the userinfo field is quite strongly deprecated.

@glyph
Copy link
Collaborator Author

glyph commented Jun 25, 2017

Self-authenticating URLs are a valuable construct. There are good reasons to deprecate authentication-bearing URLs in a variety of contexts (like: don't let users click on them, maybe), but this deprecation has no teeth, specifically because there's no other way to address this use case.

My personal use-case for userinfo has to do with embedding database passwords into postgresql://-scheme URLs to feed to sqlalchemy, not clicking on in a web browser :).

@mahmoud
Copy link
Member

mahmoud commented Jul 3, 2017

Having just implemented #23, I'm increasingly leaning toward a helper function that takes a couple non-percent-encoded arguments and gives back a percent-encoded userinfo string suitable to passing to URL methods. I do slightly wonder whether the arguments should be bytestrings to enable binary passwords, etc., to which I think the answer is yes.

@glyph
Copy link
Collaborator Author

glyph commented Jul 3, 2017

I do slightly wonder whether the arguments should be bytestrings to enable binary passwords, etc., to which I think the answer is yes.

I have mixed feelings.

On the "text" side: passwords (and usernames) are by their nature necessarily text; a user enters them with a keyboard, needs to read them with their eyes etc.

On the "bytes" side: other systems might have arbitrarily weird or inconsistent rules about normalization, and that might need to be reflected as bytes.

@glyph
Copy link
Collaborator Author

glyph commented Jul 3, 2017

I'm increasingly leaning toward a helper function that takes a couple non-percent-encoded arguments and gives back a percent-encoded userinfo string suitable to passing to URL methods.

I still disagree with this pretty strongly, because you have this problem over and over again - in the userinfo, in the path, in the query, in the fragment - and in every place you need slightly different versions of the helper function.

I also disagree because it's not obvious to users of the URL object that usernames can't contain colons, and making the user do escaping themselves is abdicating responsibility for the whole point of the library.

So I think we need to:

  1. separate "username" and "password" and have restrictions on both of them, particularly that "username" can't contain ":".
  2. add some kind of consistent idiom for addressing "fully decoded" elements of a URL, including username, password, path, query, and fragment; maybe "host" should be in there as well, even though hostnames don't have any escape equivalent (you can't escape a colon or a slash in a hostname, they just can't contain them).

@glyph
Copy link
Collaborator Author

glyph commented Aug 9, 2017

I just ran into this again in the query string.

Currently blocked upgrading Twisted on one project because of Hyperlink's change to keep = encoded as %3D; the mini query-language which said project has to parse uses = as part of its own syntax, and .asIRI() no longer gives back query parameters with = in them.

We definitely need some kind of fully-decoded view of every element of the URL structure.

@mahmoud
Copy link
Member

mahmoud commented Aug 9, 2017

"Fully-decoded" is kind of a loaded term here. = is a reserved character. What scheme is the URL? Can you provide some examples?

@glyph
Copy link
Collaborator Author

glyph commented Aug 9, 2017

"Fully-decoded" is kind of a loaded term here.

Yes, I'll try to be more precise.

= is a reserved character.

Yes… which is why it has to be escaped. But I want the un-escaped version!

What scheme is the URL?

As you know, there is only one scheme for URLs, really: https ;)

Can you provide some examples?

Only one example is necessary, really;

>>> URL.fromText('https://example.com/?argument=3&argument=4&operator=%3D').asIRI().get('operator')
['%3D']

Clearly the value the application wants here is "=", and has correctly percent-encoded it to be %3D.

To be clear: I think that the current behavior is subtle, but correct: putting a "=" at that point in the URL would de-normalize it at best and corrupt it at worst. And, for maximum information-preservation, .query (et. al.) are defined as the literal text in the URL at the current point in its transformation.

However, I do still want this - application puts data into the URL, application gets data out of the URL - to be as convenient and idiomatic as possible. Something like this:

>>> URL.fromText('https://example.com/?argument=3&argument=4&operator=%3D').asIRI().decoded.get('operator')
['=']

the decoded attribute could then have any readable values (.query, .path, .username, .password (addressing the original issue on this ticket), etc.), but would crucially not have methods like click, child, .add, asIRI for creating modified versions of the URL. (Also I think it probably shouldn't imply a call to .asIRI, instead treating any as-yet-not-encoded %-encodings as latin-1, so as to be consistent when invalid UTF-8 sequences appear; if you want to call .asIRI first, call it.)

What do you think?

@wbolster
Copy link
Contributor

wbolster commented Nov 20, 2017

just another idea: perhaps all api taking a userinfo= arg can also accept a (user, password) tuple, e.g. .replace(userinfo=("me", "secret")). similar approaches are used by python-requests.

@glyph
Copy link
Collaborator Author

glyph commented Jan 1, 2018

This will be fixed by #54 I believe.

@mahmoud
Copy link
Member

mahmoud commented Feb 26, 2018

That's right! #54 was merged, and now DecodedURL is in hyperlink==18.0.0, which was just released! Give it a whirl! 🎉

@mahmoud mahmoud closed this as completed Feb 26, 2018
@glyph
Copy link
Collaborator Author

glyph commented Mar 3, 2018

This release is amazing.

There are a few minor issues:

  • Really want to see some docs!
  • It's a teensy bit awkward, when I want to construct a DecodedURL structurally, to have to do parse("").replace(...).
  • Is this right?
>>> parse('?x+y=a+b')
DecodedURL(url=URL.from_text('?x%2By=a%2Bb'))
>>> parse('?x+y=a+b').query
(('x+y', 'a+b'),)

I feel like + in the query string is supposed to be a shorthand for space, and not equivalent to %2B.

@mahmoud
Copy link
Member

mahmoud commented Mar 3, 2018

@glyph docs are on the docket! tbh I kind of wanted a soft rollout because I had a feeling that there would be changes as we get around to using the new APIs. Until documented for a reasonable amount of time, I don't yet consider things truly released.

Agreed that it should be easier to programmatically construct. How about making the url parameter optional, so it becomes DecodedURL().replace(...) instead? Could also add a from_parts() classmethod.

I might be misremembering, but I think + is a holdover of form encoding, and not really part of the 3986 URL, like a MIME encoding thing. %2B and %20 are definitely + and space, unambiguously, and the more compatible bet for encoding overall. Does that answer the question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants