-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-125682: Python implementation of json.loads()
accepts non-ascii digits
#125687
Conversation
Misc/NEWS.d/next/Library/2024-10-18-09-51-29.gh-issue-125682.vsj4cU.rst
Outdated
Show resolved
Hide resolved
Misc/NEWS.d/next/Library/2024-10-18-09-51-29.gh-issue-125682.vsj4cU.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is a backwards-incompatible change, so we should consider adding a minor note in "What's New", and I'm not sure that it should be backported.
Eh, I don't think a "What's New" entry is warranted. This only applies to the pure-Python version of JSON, which isn't the default, so I highly doubt anyone is relying on it. I think not backporting is probably reasonable, though. |
Yes, upon further consideration I agree: This is inconsistent between the C and Python implementations of the same module, and seems not to have been intentional, since our docs directly reference RFC 7159 and ECMA-404 which both clearly specify what are "digits" in the context of JSON. |
Co-authored-by: Tal Einat <[email protected]>
json.loads()
accepts unicode digitsjson.loads()
accepts non-ascii digits
Misc/NEWS.d/next/Library/2024-10-18-09-51-29.gh-issue-125682.vsj4cU.rst
Outdated
Show resolved
Hide resolved
…sj4cU.rst Co-authored-by: Peter Bierma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks good, thank you!
I'll let Tal have the final decision on whether to backport this and #125683. I'm personally -1 on backporting--Python implementations that are relying on this might get a breaking change between patches. This is small enough that it's fine for minor versions, but I'm not comfortable with putting this in a patch release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
Alternatively we could use the re.ASCII flag. These are two equivalent solutions.
I think that this should be backported. |
Thanks @nineteendo for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
… of JSON decoder (pythonGH-125687) (cherry picked from commit d358425) Co-authored-by: Nice Zombies <[email protected]>
GH-125692 is a backport of this pull request to the 3.13 branch. |
… of JSON decoder (pythonGH-125687) (cherry picked from commit d358425) Co-authored-by: Nice Zombies <[email protected]>
GH-125693 is a backport of this pull request to the 3.12 branch. |
Thanks @serhiy-storchaka. For future reference, I'd be interested in your reasoning regarding whether or not to backport this. |
I have not merged the backports yet. My reasoning is that this is a clear bug. There are should be really good reasons to not backport it, for example if there is a known user code that depends on it or at least plausible scenario of this, or the fix has significant performance cost, or depends on feature that did not exist in older version, or the code was changed so much, that backporting requires significant efforts. This is not the case. There is an old and stable specification of JSON, and current code does not follow it. It only accepts numbers with non-ASCII digits if they start with an ASCII digit -- generating such representation even by mistake is improbably. And there is a difference between the C (used by default) and the Python implementations. The origin of such bug is clear. The code was correct in Python 2, with 8-bit strings, and the bug was introduced when it was changed to parse Unicode strings. Backporting bugfixes is the default procedure. Not backporting requires reasons. |
Excellent, thank you for such a clear and thorough explanation! |
…n of JSON decoder (GH-125687) (GH-125693) (cherry picked from commit d358425) Co-authored-by: Nice Zombies <[email protected]>
…n of JSON decoder (GH-125687) (GH-125692) (cherry picked from commit d358425) Co-authored-by: Nice Zombies <[email protected]>
json.loads()
accepts non-ascii digits #125682