Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header values must be of type str or bytes #90

Closed
jin10086 opened this issue May 9, 2017 · 2 comments
Closed

Header values must be of type str or bytes #90

jin10086 opened this issue May 9, 2017 · 2 comments

Comments

@jin10086
Copy link

jin10086 commented May 9, 2017

I have a puzzle,
why headers_raw_to_dict don't return to this?

>>> import w3lib.http
>>> w3lib.http.headers_raw_to_dict(b"Content-type: text/html\n\rAccept: gzip\n\n")   
{'Content-type': 'text/html', 'Accept': 'gzip'}

now it return this

>>> import w3lib.http
>>> w3lib.http.headers_raw_to_dict(b"Content-type: text/html\n\rAccept: gzip\n\n")   
{'Content-type': ['text/html'], 'Accept': ['gzip']}

i use headers_raw_to_dict when i want to copy Request Headers from chrome,

In [31]: copy_from_chrome = """Accept:text/html,application/xhtml+xml,applicati
    ...: on/xml;q=0.9,image/webp,*/*;q=0.8^M
    ...: Accept-Encoding:gzip, deflate, sdch^M
    ...: Accept-Language:zh-CN,zh;q=0.8^M
    ...: Cache-Control:max-age=0^M
    ...: Connection:keep-alive^M
    ...: Cookie:username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:
    ...: ODdmYWI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e71
    ...: 21c4dfd6f04d7548ccb6680f56e74a32c5f3f9dc3d4"^M
    ...: Host:pes^M
    ...: Upgrade-Insecure-Requests:1^M
    ...: User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHT
    ...: ML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"""

In [32]: from w3lib.http import headers_raw_to_dict

In [33]: headers = headers_raw_to_dict(copy_from_chrome)
In [35]: headers
Out[35]:
{'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/
*;q=0.8'],
 'Accept-Encoding': ['gzip, deflate, sdch'],
 'Accept-Language': ['zh-CN,zh;q=0.8'],
 'Cache-Control': ['max-age=0'],
 'Connection': ['keep-alive'],
 'Cookie': ['username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:ODdm
YWI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e7121c4dfd6f04d75
48ccb6680f56e74a32c5f3f9dc3d4"'],
 'Host': ['pes'],
 'Upgrade-Insecure-Requests': ['1'],
 'User-Agent': ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/57.0.2987.133 Safari/537.36']}

then i use this headers for requests

In [36]: import requests

In [37]: z = requests.get(url,headers=headers)

but Header values must be of type str or bytes
so i need do this

In [39]: {i:headers[i][0] for i in headers}
Out[39]:
{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*
;q=0.8',
 'Accept-Encoding': 'gzip, deflate, sdch',
 'Accept-Language': 'zh-CN,zh;q=0.8',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Cookie': 'username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:ODdmY
WI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e7121c4dfd6f04d754
8ccb6680f56e74a32c5f3f9dc3d4"',
 'Host': 'pes',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, l
ike Gecko) Chrome/57.0.2987.133 Safari/537.36'}
@kmike
Copy link
Member

kmike commented May 9, 2017

@kimg1234 the reason is that there may be several headers with the same name, so we need to preserve all values. Taking a first value is not correct because this way all other headers are discarded.

But that's not good w3lib's data format doesn't work as-is with requests; +1 to have a function which converts between these two data formats. Another option is refactor headers_raw_to_dict and create a function which parses headers to a list of (name, value) tuples; headers_raw_to_dict should use this function, and if I'm not mistaken, this data format is supported by requests.

@jin10086
Copy link
Author

thanks you reply,
i will refactor headers_raw_to_dict and create a function which parses headers to a list of (name, value) tuples
I have been tested and found, this data format don't supported by requests.

win7 64 py2.711

In [1]: from w3lib.http import headers_raw_to_dict

In [2]: import requests

In [3]: requests.__version__
Out[3]: '2.14.1'

In [4]: copy_from_chrome = """Accept:text/html,application/xhtml+xml,applicatio
   ...: n/xml;q=0.9,image/webp,*/*;q=0.8^M
   ...: Accept-Encoding:gzip, deflate, sdch^M
   ...: Accept-Language:zh-CN,zh;q=0.8^M
   ...: Cache-Control:max-age=0^M
   ...: Connection:keep-alive^M
   ...: Cookie:username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:O
   ...: DdmYWI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e7121
   ...: c4dfd6f04d7548ccb6680f56e74a32c5f3f9dc3d4"^M
   ...: Host:pes^M
   ...: Upgrade-Insecure-Requests:1^M
   ...: User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTM
   ...: L, like Gecko) Chrome/57.0.2987.133 Safari/537.36"""

In [5]: headers = headers_raw_to_dict(copy_from_chrome)

In [6]: url = 'http://pes/itemfail/'

In [7]: z = requests.get(url,headers=headers)
---------------------------------------------------------------------------
InvalidHeader                             Traceback (most recent call last)
<ipython-input-7-c380b61ec890> in <module>()
----> 1 z = requests.get(url,headers=headers)

d:\python27\lib\site-packages\requests\api.pyc in get(url, params, **kwargs)
     70
     71     kwargs.setdefault('allow_redirects', True)
---> 72     return request('get', url, params=params, **kwargs)
     73
     74

d:\python27\lib\site-packages\requests\api.pyc in request(method, url, **kwargs)

     56     # cases, and look like a memory leak in others.
     57     with sessions.Session() as session:
---> 58         return session.request(method=method, url=url, **kwargs)
     59
     60

d:\python27\lib\site-packages\requests\sessions.pyc in request(self, method, url
, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies
, hooks, stream, verify, cert, json)
    502             hooks = hooks,
    503         )
--> 504         prep = self.prepare_request(req)
    505
    506         proxies = proxies or {}

d:\python27\lib\site-packages\requests\sessions.pyc in prepare_request(self, req
uest)
    434             auth=merge_setting(auth, self.auth),
    435             cookies=merged_cookies,
--> 436             hooks=merge_hooks(request.hooks, self.hooks),
    437         )
    438         return p

d:\python27\lib\site-packages\requests\models.pyc in prepare(self, method, url,
headers, files, data, params, auth, cookies, hooks, json)
    301         self.prepare_method(method)
    302         self.prepare_url(url, params)
--> 303         self.prepare_headers(headers)
    304         self.prepare_cookies(cookies)
    305         self.prepare_body(data, files, json)

d:\python27\lib\site-packages\requests\models.pyc in prepare_headers(self, heade
rs)
    441             for header in headers.items():
    442                 # Raise exception on invalid header value.
--> 443                 check_header_validity(header)
    444                 name, value = header
    445                 self.headers[to_native_string(name)] = value

d:\python27\lib\site-packages\requests\utils.pyc in check_header_validity(header
)
    870     except TypeError:
    871         raise InvalidHeader("Header value %s must be of type str or byte
s, "
--> 872                             "not %s" % (value, type(value)))
    873
    874

InvalidHeader: Header value ['keep-alive'] must be of type str or bytes, not <ty
pe 'list'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants