-
-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some encoding warnings in Python 3.10 (PEP 597) #1614
Conversation
piptools/cache.py
Outdated
@@ -41,7 +41,7 @@ def __str__(self) -> str: | |||
|
|||
|
|||
def read_cache_file(cache_file_path: str) -> CacheDict: | |||
with open(cache_file_path) as cache_file: | |||
with open(cache_file_path, encoding="ascii") as cache_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and why ascii and not utf-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because by default, json.dump
has ensure_ascii=True
1, which guarantees all non-ASCII characters in the output will be escaped.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atugushev WDYT about this? Sounds ok? I do thing that the alternative would be to change the dump and disable ensure_ascii and enforce use of utf8.
I am personally a big supporter of utf-8 everywhere, but I am not sure which approach is better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ambivalent. The cache is for the legacy resolver which will be deprecated and removed later, so I'm not sure if should we improve the legacy code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> json.dumps({'ok': '👌'}, ensure_ascii=False)
'{"ok": "👌"}'
>>> json.dumps({'ok': '👌'}, ensure_ascii=True)
'{"ok": "\\ud83d\\udc4c"}'
>>> json.dumps({'ok': '👌'})
'{"ok": "\\ud83d\\udc4c"}'
While I don't expect non-ASCII symbols in the dependency cache, however, would be nice to have an explicit encoding='utf-8'
param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. utf-8
also looks good enough for me.
OTOH, I worry about another problem -- There are some codecs not compatible with ascii:
{'cp037',
'cp1026',
'cp1140',
'cp273',
'cp424',
'cp500',
'cp875',
'hz',
'shift_jis_2004',
'shift_jisx0213',
'utf_16',
'utf_16_be',
'utf_16_le',
'utf_32',
'utf_32_be',
'utf_32_le',
'utf_7',
'utf_8_sig'}
If one of them is the default codec on the user's environment, both encoding='ascii'
and encoding='utf-8'
may raise a UnicodeDecodeError
. They are rare, but they exist.
Is it a real problem that we should worry about? Should we catch UnicodeDecodeError
and raise a CorruptCacheError
for this case, or use encoding='locale'
for the best compatibility?
code
# https://docs.python.org/3/library/codecs.html#standard-encodings
codec_list = ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1125', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_t', 'koi8_u', 'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig']
non_compatible = set()
for codec in codec_list:
for i in range(128):
c = chr(i)
try:
if c.encode(codec) != c.encode("ascii"):
print(f"{codec=}, {i=}, {c=}, {c.encode(codec) = }")
non_compatible.add(codec)
except UnicodeEncodeError:
pass
print(non_compatible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cache is corrupted you'll see the following error:
$ pip-compile
The dependency cache seems to have been corrupted.
Inspect, or delete, the following file:
/Users/albert/Library/Caches/pip-tools/depcache-cp3.10.json
The workaround is to delete the file as suggested above or use --rebuild
flag which clears any caches upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnicodeDecodeError
is not caught here, it will crash and print a traceback, I think it's not a good UX for users. (Try adding b"\xA2".decode("ascii")
after doc = json.load(cache_file)
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix some EncodingWarnings
Just out of curiosity, how to get those warnings? I've tried on 3.10 but didn't see any.
According to PEP 597 and the documentation, |
Please force use of We already did this everywhere as this ensures that the files are portable, regardless current locale of the system where they are running. I even run pip-compile from different platforms on the same network share/mount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
See:
[1] https://docs.python.org/3/whatsnew/3.10.html#optional-encodingwarning-and-encoding-locale-option
[2] https://peps.python.org/pep-0597/
Contributor checklist
Maintainer checklist
backwards incompatible
,feature
,enhancement
,deprecation
,bug
,dependency
,docs
orskip-changelog
as they determine changelog listing.