Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic http auth + unicode = error #3662

Closed
sztomi opened this issue Nov 2, 2016 · 10 comments
Closed

Basic http auth + unicode = error #3662

sztomi opened this issue Nov 2, 2016 · 10 comments

Comments

@sztomi
Copy link

sztomi commented Nov 2, 2016

Description

It is not possible to send a basic http authentication using a username or password that contains Unicode data.

What happens

UnicodeEncodeError is thrown. Traceback:

  File "(my code)", line 163, in _get
    auth=(self.user, self.password))
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 388, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python3.5/site-packages/requests/models.py", line 297, in prepare
    self.prepare_auth(auth, url)
  File "/usr/local/lib/python3.5/site-packages/requests/models.py", line 490, in prepare_auth
    r = auth(self)
  File "/usr/local/lib/python3.5/site-packages/requests/auth.py", line 51, in __call__
    r.headers['Authorization'] = _basic_auth_str(self.username, self.password)
  File "/usr/local/lib/python3.5/site-packages/requests/auth.py", line 31, in _basic_auth_str
    b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0171' in position 7: ordinal not in range(256)

Expected behavior

The authentication is encoded as utf-8 (at least if charset=utf-8 is provided in the header).

How to reproduce

Consider the following request:

        user = 'űser'
        password = 'páéáésswőrd'
        response = requests.get(
            url='http://example.com'
            headers={'Content-Type': 'application/xml; charset=utf-8'},
            auth=(user, password))

I think that the culprit is this line (https://github.com/kennethreitz/requests/blob/master/requests/auth.py#L32), which assumes latin-1 encoding regardless of the charset header:

def _basic_auth_str(username, password):
    """Returns a Basic Auth string."""

    authstr = 'Basic ' + to_native_string(
        b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
    )

    return authstr

Workaround

This seems to work:

    from requests.auth import to_native_string

    # ... snip ...

        auth = 'Basic ' + to_native_string(b64encode('{}:{}'.format(self.user, self.password).encode('utf-8')).strip())
        response = requests.get(
            url=self.base_url + uri,
            headers={
                'Content-Type': 'application/xml; charset=utf-8',
                'Authorization': auth
            })

Version info

$ pip show requests

---
Metadata-Version: 2.0
Name: requests
Version: 2.9.1
Summary: Python HTTP for Humans.
Home-page: http://python-requests.org
Author: Kenneth Reitz
Author-email: [email protected]
Installer: pip
License: Apache 2.0
Location: /usr/local/lib/python3.5/site-packages
Requires: 
Classifiers:
  Development Status :: 5 - Production/Stable
  Intended Audience :: Developers
  Natural Language :: English
  License :: OSI Approved :: Apache Software License
  Programming Language :: Python
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.3
  Programming Language :: Python :: 3.4
  Programming Language :: Python :: 3.5


@Lukasa
Copy link
Member

Lukasa commented Nov 2, 2016

Yeah, that looks like a bug.

I think in this case the best fix is to allow the user to provide bytestrings for the username and password, and if they do that to simply use the bytestring directly rather than to try to encode.

Are you interested in providing a test and patch for this?

@sztomi
Copy link
Author

sztomi commented Nov 2, 2016

@Lukasa Gladly, but I'm a bit overburdened at the moment. I'll have spare time in 2-3 weeks, if it's still open, I'll take a peek.

@Lukasa
Copy link
Member

Lukasa commented Nov 2, 2016

Ok cool, I'll mark this as contributor friendly and if no-one else picks it up by the time you have time you should take a swing at it.

@ghost
Copy link

ghost commented Nov 11, 2016

Hello, @Lukasa!

Your idea about byte strings looks very good and fully matches the white spaces in spec.

But.

There are two ways to release your idea:

  1. Save user/pass in bytes. Looks not good, because in fact we always need to check type of variable, before use it.
  2. Convert user/pass to strings in init(). Looks not good, because we lose the original values.

And last, I think 95% peoples will be write code like this:

u = 'Дмитрий' # my name in Russian
p = 'password'
r = request.get(url, auth=(u.encode('utf-8'), p))

To my mind, it looks not 'for humans'.
Without this patch we can write this for same result:

r = request.get(url, auth=(u.encode('utf-8').decode('latin1'), p))

But we can change only one line of code:

- b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
+ b64encode(('%s:%s' % (username, password)).encode('utf-8')).strip() 

After that the same code will look as:

r = request.get(url, auth=(u, p))

It looks for Humans :)

What do you think about all this?

Sorry for my grammar.

@Lukasa
Copy link
Member

Lukasa commented Nov 12, 2016

@klimenko It does look better that way, but it's unfortunately just moving the problem. Now anyone whose server is expecting a non-UTF-8 encoded username is going to get tripped up, and so we'll have to re-open this issue when someone says "my server wanted Latin1 and now doesn't get it".

It's better to use bytestrings because that way we avoid making a guess that is wrong. If the users still want the helpful automatic choice, they can pass a unicode string, but if they want to do something more specific we have an escape hatch for them.

@rmhasan
Copy link
Contributor

rmhasan commented Nov 16, 2016

Hi guys, I would like to take a crack at this.

@nateprewitt
Copy link
Member

@rmhasan thanks for the interest in contributing! It may be important to note that PR #3673 is already open to address this. You may want to keep an eye on the outcome of that before spending time working on a solution.

@rmhasan
Copy link
Contributor

rmhasan commented Nov 17, 2016

@nateprewitt I will keep an eye on it, thanks.

@nateprewitt
Copy link
Member

Resolved by #3673.

@papparotzi
Copy link

Thanks, I got past it!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants