[Python 3] Basic auth relying on utf-8 encoding by default #4564

princess-entrapta · 2018-03-28T16:01:08Z

I got deceived by default requests behavior for auth headers.

As using a password containing some non-ascii chars, the basic auth method silently encoded in latin1, which caused authentication to fail as it was expected to be utf-8 encoded server side.

Utf-8 is default encoding in modern terminals and in python3 strings. As a result, requests behavior for auth is asymmetrical with curl and is odd when being reversed by authentication servers. Requests is expected to fail on non-binary input, or comply to web standards of using utf-8 text strings.

As a workaround I encoded the string in utf-8 beforehand authing with requests.

Reproduction Steps

import requests
requests.get('https://example.com', auth=('user', 'àéïòù'))

Expected Result

Basic auth header is base64encoded version of "user:àéïòù".encode('utf-8')

Actual Result

Basic auth header is base64encoded version of "user:àéïòù".encode('latin1')

Workaround

requests.get('https://example.com', auth=('user', 'àéïòù'.encode('utf-8')))

sigmavirus24 · 2018-03-28T17:09:41Z

We could extend https://github.com/requests/requests/blob/b66908e7b647689793e299edc111bf9910e93ad3/requests/auth.py#L79 to accept an encoding and that could pass it through to https://github.com/requests/requests/blob/b66908e7b647689793e299edc111bf9910e93ad3/requests/auth.py#L28 but the extra parameter there would need to default to latin-1 as it does not. Alternatively, this could be added to the toolbelt

nateprewitt · 2018-03-28T18:50:47Z

If we added an extra param does that give us anything new? The current solution we have was provided in #3662 and gives the user the option to choose by sending unambiguous bytes. It also prevents parameter creep on many functions we could provide an encoding. As for defaults, most of the other recent choices made around iso-8859-1 vs utf-8 suggest there isn’t a “right” choice.

As @sigmavirus24 said, we need to maintain the latin-1 default for backwards compat on Requests 2.X. For Requests 3.0, if we enforce a hard fail on a non-bytes value, does that provides a better user experience? I’m not sure it does. If we change the defaults for the whole library to UTF-8 does that put us more inline with the 2018 web? Probably, but it would be nice to have some solid numbers/specifications on that.

princess-entrapta · 2018-03-29T07:55:04Z

@nateprewitt It seems to me utf-8 is a quite broadly accepted default regarding text encoding in web content, it also is the default encoding of Python strings. On the other hand, what was the justification for using iso-8859-1 specifically, even in lower Request versions ?

Either failing on non-bytes or silently encoding in utf-8 are behavior I believe more expectable in user perspective and more coherent within the general environment. In case an arbitrary behavior need to be maintained, I believe a proper documentation warning would also help. Personally I had to browse source code in order to understand what went on under the hood.

sigmavirus24 · 2018-03-29T12:52:01Z

@arthur-hav Most of the web still implements HTTP/1.1 as defined in RFC 2616 with some exceptional cases following the revisions in 7230, 7231, 7232, 7233, 7234, and 7235. The default encoding for the web is latin-1. Presuming that encoding is the safest backwards-compatible solution. We've been plenty fair in offering other ways to handle this, but it seems like you want to make a backwards incompatible break without understanding the history of the library or the specification it implements.

princess-entrapta · 2018-03-29T14:13:20Z

@sigmavirus24 As much as I understand from wikipedia and discussion on web auth header https://stackoverflow.com/questions/7242316/what-encoding-should-i-use-for-http-basic-authentication the issue was never really settled on non-ascii characters although, fact I was unaware of, the RFC 2616 did state iso-8859-1 to be the default encoding for headers.

I am not really set on "what I want" and certainly won't insist on making changes that are likely to break things. That being said, I would certainly advocate toward making it easier to realize what is going on when trying to authenticate with non-ascii character passwords. Curl defaults to utf-8, apparently major web browsers as well, so it is to be expected a fair part of interfaces will expect utf-8 too and disregard the 1999 RFC, making this a possible common pitfall.

CarliJoy · 2020-03-16T17:28:16Z

Hello,

quite some time passed since this issues was discussed.
I also would like to opt in for using UTF-8 by default - in my case I can't use my corporate proxy with pip with my proper password using latin1.
The issues is related to pypa/pip#5801

@chrahunt writes there:

It is probably safe to assume that the encoding of credential fields should be UTF-8 as:

Browsers use it (source)

When servers want to request a specific charset, their only option is UTF-8. (source)

Also in the meanwhile python2 is outdated and python3 uses UTF8 by default.

So there should be no need on enforcing latin1 anymore.
If you still want to enforce backward compatibility, I would suggest this:
A (dirty) workaround: Catch the UnicodeEncodeError and try to encode in UTF8 in this case.
As UTF8 is based on latin1 this should work for probably 90% of the cases and is backward compatible, still allowing pip users to use non latin1 passwords.

    # We use latin1 for backwards compatibility by default but allow
    # unicode once we can't encode latin1
    if isinstance(username, str):
        try:
            username = username.encode('latin1')
        except UnicodeEncodeError:
            username = username.encode('utf8')

    if isinstance(password, str):
        try:
            password = password.encode('latin1')
        except UnicodeEncodeError:
            password = password.encode('utf8')
        password = password.encode('utf8')

Please comment so it can be decided which way to go - pure UTF8 or dirty catching non latin1.

CarliJoy · 2020-05-06T08:52:34Z

@sigmavirus24 Can I change your mind on this topic?
As requests is the base for pip it actually breaks things for users.
As today all modern browser use UTF-8, it is cumbersome for users to be forced to use latin1 (there is no choice when using pip, as you can't add byte code to the config).

So from a user perspective it is quite strange that auth fails with requests but works with all browsers and curl.

If people need backwards compatibility they still could use bytecode to encode to latin-1. For me these "numbers" (and the issues linked) would be actually enough to switch to a default UTF-8 - at least on Unicode errors and switch to utf8 default on a new major release.

I know requests want to continue support Python2 - so maybe some could help adopting the snipped so it will work in Python2 as well (could not check so far).

CarliJoy · 2020-05-06T09:05:12Z

Also note: Just short after the last comment (before mine) Mozilla made the change to UTF8 (following Chrome): https://www.fxsitecompat.dev/en-CA/docs/2018/basic-auth-credentials-are-now-encoded-in-utf-8-instead-of-iso-8859-1/

So this is an two years old issue and old major browsers and tools use UTF8 as default option for years now and the new standard (RFC7617) clearly states basically that UTF8 is the only option.
This might be the time to assume it is safe to switch ;-)
@sigmavirus24 @nateprewitt what do you think?

sigmavirus24 · 2020-05-07T12:33:35Z

@CarliJoy I'm no longer a maintainer and the maintenance team is focusing on the bare minimum work to keep Requests secure at this point.

So this is an two years old issue and old major browsers and tools use UTF8 as default option for years now and the new standard (RFC7617) clearly states basically that UTF8 is the only option.

This is all great and well until you consider that there are servers out there that people still need to interact with that haven't been updated much past the era of RFC 2616 and suddenly sending UTF8 doesn't work for them.

Further no project attempting to follow SemVer can change this kind of behaviour in anything other than a major version release (e.g., requests 3.0) and that's unlikely to happen any time soon. While I'm not diametrically opposed to the behaviour, I also have no say in the matter. Please check next time before tagging someone in multiple comments to see if they're still relevant to the project.

anderseknert · 2020-08-07T22:02:04Z

Having wasted a good hour of an otherwise fine evening on this before finding this - kind of surprising to see a modern library these days catering to a spec dating two decades(!) back, but anyway - easily worked around by manually setting the auth header:

b64bytes = base64.b64encode(f'{username}:{password}'.encode('utf-8'))
userpass_encoded = 'Basic ' + str(b64bytes, 'utf-8')

r = requests.post(url, data=json.dumps(payload), headers={'Authorization': str(userpass_encoded)})

CarliJoy · 2020-08-07T22:27:01Z

@anderseknert arthur-hav suggested an easier workaround already in the issue itself:
requests.get('https://example.com', auth=(username.encode('utf-8), password.encode('utf-8')))

No need to manipulate the header itself.

anderseknert · 2020-08-07T22:39:07Z

Thanks @CarliJoy - seems I missed that somehow - probably since I had the workaround in place already when coming here :) That's indeed better.

As discussed upstream in psf/requests#4564 , HTTP basic auth usernames and passwords sent to requests as Python text strings are encoded as latin1. This of course makes it impossible to log in with a username or password containing characters not represented in latin1, as the reporter of mwclient#315 found out. To work around this rather old-fashioned default, let's intercept string usernames and passwords and encode them as utf-8 before sending them to requests. Anyone dealing with a really old server that can't handle utf-8, or something like that, can encode the username and password appropriately and provide them as bytestrings. Signed-off-by: Adam Williamson <[email protected]>

As discussed upstream in psf/requests#4564 , HTTP basic auth usernames and passwords sent to requests as Python text strings are encoded as latin1. This of course makes it impossible to log in with a username or password containing characters not represented in latin1, as the reporter of #315 found out. To work around this rather old-fashioned default, let's intercept string usernames and passwords and encode them as utf-8 before sending them to requests. Anyone dealing with a really old server that can't handle utf-8, or something like that, can encode the username and password appropriately and provide them as bytestrings. Signed-off-by: Adam Williamson <[email protected]>

…e passed along Works around psf/requests#4564

sethmlarson · 2024-05-19T19:07:06Z

Changing this requires a backwards incompatible change which IMO isn't worth the squeeze right now. Closing this unless others have stronger opinions.

jarek mentioned this issue Nov 14, 2018

Basic auth should be unicode pallets/werkzeug#945

Closed

CarliJoy mentioned this issue Mar 16, 2020

No support for non-latin1 characters in credentials pypa/pip#5801

Open

FlorianVeaux mentioned this issue Aug 5, 2020

Using unicode username/password for RabbitMQ agent will result in 401 unauthorised access DataDog/integrations-core#7176

Closed

This was referenced May 25, 2022

user/pwd encoding is assumed (hardcoded) to be utf-8 miguelgrinberg/Flask-HTTPAuth#151

Closed

Basic Auth credentials are not encoded rapi-doc/RapiDoc#758

Closed

perrinjerome mentioned this issue Oct 7, 2022

ZPublisher.utils.basic_auth_decode decodes with latin 1 zopefoundation/Zope#1061

Closed

Ousret mentioned this issue Aug 30, 2023

Wishlist for 3.0 milestone jawah/niquests#3

Closed

AdamWill mentioned this issue Jan 27, 2024

httpauth only allows latin-1 characters in username and password mwclient/mwclient#315

Closed

AdamWill mentioned this issue Jan 27, 2024

HTTP basic auth: encode username and password as UTF-8 (#315) mwclient/mwclient#316

Merged

csm10495 mentioned this issue Apr 4, 2024

Error when calling PyHTCC with Euro sign in password csm10495/pyhtcc#16

Closed

csm10495 added a commit to csm10495/pyhtcc that referenced this issue Apr 4, 2024

Fix #16. Pre-encode user/password as bytes to allow all of utf-8 to b…

d330b36

…e passed along Works around psf/requests#4564

sethmlarson closed this as completed May 19, 2024

jokasimr mentioned this issue Jul 9, 2024

Setup Azure Blobs for tests with large files scipp/scippneutron#476

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python 3] Basic auth relying on utf-8 encoding by default #4564

[Python 3] Basic auth relying on utf-8 encoding by default #4564

princess-entrapta commented Mar 28, 2018 •

edited

Loading

sigmavirus24 commented Mar 28, 2018

nateprewitt commented Mar 28, 2018

princess-entrapta commented Mar 29, 2018 •

edited

Loading

sigmavirus24 commented Mar 29, 2018

princess-entrapta commented Mar 29, 2018 •

edited

Loading

CarliJoy commented Mar 16, 2020

CarliJoy commented May 6, 2020

CarliJoy commented May 6, 2020

sigmavirus24 commented May 7, 2020

anderseknert commented Aug 7, 2020

CarliJoy commented Aug 7, 2020

anderseknert commented Aug 7, 2020

sethmlarson commented May 19, 2024

[Python 3] Basic auth relying on utf-8 encoding by default #4564

[Python 3] Basic auth relying on utf-8 encoding by default #4564

Comments

princess-entrapta commented Mar 28, 2018 • edited Loading

Reproduction Steps

Expected Result

Actual Result

Workaround

sigmavirus24 commented Mar 28, 2018

nateprewitt commented Mar 28, 2018

princess-entrapta commented Mar 29, 2018 • edited Loading

sigmavirus24 commented Mar 29, 2018

princess-entrapta commented Mar 29, 2018 • edited Loading

CarliJoy commented Mar 16, 2020

CarliJoy commented May 6, 2020

CarliJoy commented May 6, 2020

sigmavirus24 commented May 7, 2020

anderseknert commented Aug 7, 2020

CarliJoy commented Aug 7, 2020

anderseknert commented Aug 7, 2020

sethmlarson commented May 19, 2024

princess-entrapta commented Mar 28, 2018 •

edited

Loading

princess-entrapta commented Mar 29, 2018 •

edited

Loading

princess-entrapta commented Mar 29, 2018 •

edited

Loading