Header values must be of type str or bytes #90

jin10086 · 2017-05-09T09:39:35Z

I have a puzzle，
why headers_raw_to_dict don't return to this?

>>> import w3lib.http
>>> w3lib.http.headers_raw_to_dict(b"Content-type: text/html\n\rAccept: gzip\n\n")   
{'Content-type': 'text/html', 'Accept': 'gzip'}

now it return this

>>> import w3lib.http
>>> w3lib.http.headers_raw_to_dict(b"Content-type: text/html\n\rAccept: gzip\n\n")   
{'Content-type': ['text/html'], 'Accept': ['gzip']}

i use headers_raw_to_dict when i want to copy Request Headers from chrome,

In [31]: copy_from_chrome = """Accept:text/html,application/xhtml+xml,applicati
    ...: on/xml;q=0.9,image/webp,*/*;q=0.8^M
    ...: Accept-Encoding:gzip, deflate, sdch^M
    ...: Accept-Language:zh-CN,zh;q=0.8^M
    ...: Cache-Control:max-age=0^M
    ...: Connection:keep-alive^M
    ...: Cookie:username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:
    ...: ODdmYWI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e71
    ...: 21c4dfd6f04d7548ccb6680f56e74a32c5f3f9dc3d4"^M
    ...: Host:pes^M
    ...: Upgrade-Insecure-Requests:1^M
    ...: User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHT
    ...: ML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"""

In [32]: from w3lib.http import headers_raw_to_dict

In [33]: headers = headers_raw_to_dict(copy_from_chrome)
In [35]: headers
Out[35]:
{'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/
*;q=0.8'],
 'Accept-Encoding': ['gzip, deflate, sdch'],
 'Accept-Language': ['zh-CN,zh;q=0.8'],
 'Cache-Control': ['max-age=0'],
 'Connection': ['keep-alive'],
 'Cookie': ['username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:ODdm
YWI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e7121c4dfd6f04d75
48ccb6680f56e74a32c5f3f9dc3d4"'],
 'Host': ['pes'],
 'Upgrade-Insecure-Requests': ['1'],
 'User-Agent': ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/57.0.2987.133 Safari/537.36']}

then i use this headers for requests

In [36]: import requests

In [37]: z = requests.get(url,headers=headers)

but Header values must be of type str or bytes
so i need do this

In [39]: {i:headers[i][0] for i in headers}
Out[39]:
{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*
;q=0.8',
 'Accept-Encoding': 'gzip, deflate, sdch',
 'Accept-Language': 'zh-CN,zh;q=0.8',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Cookie': 'username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:ODdmY
WI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e7121c4dfd6f04d754
8ccb6680f56e74a32c5f3f9dc3d4"',
 'Host': 'pes',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, l
ike Gecko) Chrome/57.0.2987.133 Safari/537.36'}

The text was updated successfully, but these errors were encountered:

kmike · 2017-05-09T10:01:29Z

@kimg1234 the reason is that there may be several headers with the same name, so we need to preserve all values. Taking a first value is not correct because this way all other headers are discarded.

But that's not good w3lib's data format doesn't work as-is with requests; +1 to have a function which converts between these two data formats. Another option is refactor headers_raw_to_dict and create a function which parses headers to a list of (name, value) tuples; headers_raw_to_dict should use this function, and if I'm not mistaken, this data format is supported by requests.

jin10086 · 2017-05-10T01:06:34Z

thanks you reply,
i will refactor headers_raw_to_dict and create a function which parses headers to a list of (name, value) tuples
I have been tested and found， this data format don't supported by requests.

win7 64 py2.711

In [1]: from w3lib.http import headers_raw_to_dict

In [2]: import requests

In [3]: requests.__version__
Out[3]: '2.14.1'

In [4]: copy_from_chrome = """Accept:text/html,application/xhtml+xml,applicatio
   ...: n/xml;q=0.9,image/webp,*/*;q=0.8^M
   ...: Accept-Encoding:gzip, deflate, sdch^M
   ...: Accept-Language:zh-CN,zh;q=0.8^M
   ...: Cache-Control:max-age=0^M
   ...: Connection:keep-alive^M
   ...: Cookie:username-pes-8888="2|1:0|10:1494207240|17:username-pes-8888|48:O
   ...: DdmYWI4NmQtNDA0OC00Y2YzLTg3ZjYtOWE3Mzk0YmRiZTA2|3284b8f38c8d142ac8e7121
   ...: c4dfd6f04d7548ccb6680f56e74a32c5f3f9dc3d4"^M
   ...: Host:pes^M
   ...: Upgrade-Insecure-Requests:1^M
   ...: User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTM
   ...: L, like Gecko) Chrome/57.0.2987.133 Safari/537.36"""

In [5]: headers = headers_raw_to_dict(copy_from_chrome)

In [6]: url = 'http://pes/itemfail/'

In [7]: z = requests.get(url,headers=headers)
---------------------------------------------------------------------------
InvalidHeader                             Traceback (most recent call last)
<ipython-input-7-c380b61ec890> in <module>()
----> 1 z = requests.get(url,headers=headers)

d:\python27\lib\site-packages\requests\api.pyc in get(url, params, **kwargs)
     70
     71     kwargs.setdefault('allow_redirects', True)
---> 72     return request('get', url, params=params, **kwargs)
     73
     74

d:\python27\lib\site-packages\requests\api.pyc in request(method, url, **kwargs)

     56     # cases, and look like a memory leak in others.
     57     with sessions.Session() as session:
---> 58         return session.request(method=method, url=url, **kwargs)
     59
     60

d:\python27\lib\site-packages\requests\sessions.pyc in request(self, method, url
, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies
, hooks, stream, verify, cert, json)
    502             hooks = hooks,
    503         )
--> 504         prep = self.prepare_request(req)
    505
    506         proxies = proxies or {}

d:\python27\lib\site-packages\requests\sessions.pyc in prepare_request(self, req
uest)
    434             auth=merge_setting(auth, self.auth),
    435             cookies=merged_cookies,
--> 436             hooks=merge_hooks(request.hooks, self.hooks),
    437         )
    438         return p

d:\python27\lib\site-packages\requests\models.pyc in prepare(self, method, url,
headers, files, data, params, auth, cookies, hooks, json)
    301         self.prepare_method(method)
    302         self.prepare_url(url, params)
--> 303         self.prepare_headers(headers)
    304         self.prepare_cookies(cookies)
    305         self.prepare_body(data, files, json)

d:\python27\lib\site-packages\requests\models.pyc in prepare_headers(self, heade
rs)
    441             for header in headers.items():
    442                 # Raise exception on invalid header value.
--> 443                 check_header_validity(header)
    444                 name, value = header
    445                 self.headers[to_native_string(name)] = value

d:\python27\lib\site-packages\requests\utils.pyc in check_header_validity(header
)
    870     except TypeError:
    871         raise InvalidHeader("Header value %s must be of type str or byte
s, "
--> 872                             "not %s" % (value, type(value)))
    873
    874

InvalidHeader: Header value ['keep-alive'] must be of type str or bytes, not <ty
pe 'list'>

jin10086 closed this as completed May 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Header values must be of type str or bytes #90

Header values must be of type str or bytes #90

jin10086 commented May 9, 2017

kmike commented May 9, 2017

jin10086 commented May 10, 2017

Header values must be of type str or bytes #90

Header values must be of type str or bytes #90

Comments

jin10086 commented May 9, 2017

kmike commented May 9, 2017

jin10086 commented May 10, 2017