Basic http auth + unicode = error #3662

sztomi · 2016-11-02T15:32:03Z

Description

It is not possible to send a basic http authentication using a username or password that contains Unicode data.

What happens

UnicodeEncodeError is thrown. Traceback:

  File "(my code)", line 163, in _get
    auth=(self.user, self.password))
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 388, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python3.5/site-packages/requests/models.py", line 297, in prepare
    self.prepare_auth(auth, url)
  File "/usr/local/lib/python3.5/site-packages/requests/models.py", line 490, in prepare_auth
    r = auth(self)
  File "/usr/local/lib/python3.5/site-packages/requests/auth.py", line 51, in __call__
    r.headers['Authorization'] = _basic_auth_str(self.username, self.password)
  File "/usr/local/lib/python3.5/site-packages/requests/auth.py", line 31, in _basic_auth_str
    b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0171' in position 7: ordinal not in range(256)

Expected behavior

The authentication is encoded as utf-8 (at least if charset=utf-8 is provided in the header).

How to reproduce

Consider the following request:

        user = 'űser'
        password = 'páéáésswőrd'
        response = requests.get(
            url='http://example.com'
            headers={'Content-Type': 'application/xml; charset=utf-8'},
            auth=(user, password))

I think that the culprit is this line (https://github.com/kennethreitz/requests/blob/master/requests/auth.py#L32), which assumes latin-1 encoding regardless of the charset header:

def _basic_auth_str(username, password):
    """Returns a Basic Auth string."""

    authstr = 'Basic ' + to_native_string(
        b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
    )

    return authstr

Workaround

This seems to work:

    from requests.auth import to_native_string

    # ... snip ...

        auth = 'Basic ' + to_native_string(b64encode('{}:{}'.format(self.user, self.password).encode('utf-8')).strip())
        response = requests.get(
            url=self.base_url + uri,
            headers={
                'Content-Type': 'application/xml; charset=utf-8',
                'Authorization': auth
            })

Version info

$ pip show requests

---
Metadata-Version: 2.0
Name: requests
Version: 2.9.1
Summary: Python HTTP for Humans.
Home-page: http://python-requests.org
Author: Kenneth Reitz
Author-email: [email protected]
Installer: pip
License: Apache 2.0
Location: /usr/local/lib/python3.5/site-packages
Requires: 
Classifiers:
  Development Status :: 5 - Production/Stable
  Intended Audience :: Developers
  Natural Language :: English
  License :: OSI Approved :: Apache Software License
  Programming Language :: Python
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.3
  Programming Language :: Python :: 3.4
  Programming Language :: Python :: 3.5

The text was updated successfully, but these errors were encountered:

Lukasa · 2016-11-02T15:35:47Z

Yeah, that looks like a bug.

I think in this case the best fix is to allow the user to provide bytestrings for the username and password, and if they do that to simply use the bytestring directly rather than to try to encode.

Are you interested in providing a test and patch for this?

sztomi · 2016-11-02T15:54:19Z

@Lukasa Gladly, but I'm a bit overburdened at the moment. I'll have spare time in 2-3 weeks, if it's still open, I'll take a peek.

Lukasa · 2016-11-02T15:55:05Z

Ok cool, I'll mark this as contributor friendly and if no-one else picks it up by the time you have time you should take a swing at it.

ghost · 2016-11-11T23:31:43Z

Hello, @Lukasa!

Your idea about byte strings looks very good and fully matches the white spaces in spec.

But.

There are two ways to release your idea:

Save user/pass in bytes. Looks not good, because in fact we always need to check type of variable, before use it.
Convert user/pass to strings in init(). Looks not good, because we lose the original values.

And last, I think 95% peoples will be write code like this:

u = 'Дмитрий' # my name in Russian
p = 'password'
r = request.get(url, auth=(u.encode('utf-8'), p))

To my mind, it looks not 'for humans'.
Without this patch we can write this for same result:

r = request.get(url, auth=(u.encode('utf-8').decode('latin1'), p))

But we can change only one line of code:

- b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
+ b64encode(('%s:%s' % (username, password)).encode('utf-8')).strip()

After that the same code will look as:

r = request.get(url, auth=(u, p))

It looks for Humans :)

What do you think about all this?

Sorry for my grammar.

Lukasa · 2016-11-12T08:42:57Z

@klimenko It does look better that way, but it's unfortunately just moving the problem. Now anyone whose server is expecting a non-UTF-8 encoded username is going to get tripped up, and so we'll have to re-open this issue when someone says "my server wanted Latin1 and now doesn't get it".

It's better to use bytestrings because that way we avoid making a guess that is wrong. If the users still want the helpful automatic choice, they can pass a unicode string, but if they want to do something more specific we have an escape hatch for them.

rmhasan · 2016-11-16T04:07:17Z

Hi guys, I would like to take a crack at this.

nateprewitt · 2016-11-16T16:06:58Z

@rmhasan thanks for the interest in contributing! It may be important to note that PR #3673 is already open to address this. You may want to keep an eye on the outcome of that before spending time working on a solution.

rmhasan · 2016-11-17T03:05:09Z

@nateprewitt I will keep an eye on it, thanks.

nateprewitt · 2016-11-23T04:07:51Z

Resolved by #3673.

papparotzi · 2018-09-21T13:56:11Z

Thanks, I got past it!

Lukasa added the Contributor Friendly label Nov 2, 2016

Lukasa closed this as completed Nov 23, 2016

nateprewitt mentioned this issue Mar 28, 2018

[Python 3] Basic auth relying on utf-8 encoding by default #4564

Closed

cjerdonek mentioned this issue Sep 20, 2018

No support for non-latin1 characters in credentials pypa/pip#5801

Open

jarek mentioned this issue Nov 14, 2018

Basic auth should be unicode pallets/werkzeug#945

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic http auth + unicode = error #3662

Basic http auth + unicode = error #3662

sztomi commented Nov 2, 2016

Lukasa commented Nov 2, 2016

sztomi commented Nov 2, 2016

Lukasa commented Nov 2, 2016

ghost commented Nov 11, 2016

Lukasa commented Nov 12, 2016

rmhasan commented Nov 16, 2016

nateprewitt commented Nov 16, 2016

rmhasan commented Nov 17, 2016

nateprewitt commented Nov 23, 2016

papparotzi commented Sep 21, 2018

Basic http auth + unicode = error #3662

Basic http auth + unicode = error #3662

Comments

sztomi commented Nov 2, 2016

Description

What happens

Expected behavior

How to reproduce

Workaround

Version info

Lukasa commented Nov 2, 2016

sztomi commented Nov 2, 2016

Lukasa commented Nov 2, 2016

ghost commented Nov 11, 2016

Lukasa commented Nov 12, 2016

rmhasan commented Nov 16, 2016

nateprewitt commented Nov 16, 2016

rmhasan commented Nov 17, 2016

nateprewitt commented Nov 23, 2016

papparotzi commented Sep 21, 2018