Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python implementation of json.loads() accepts non-ascii digits #125682

Closed
nineteendo opened this issue Oct 18, 2024 · 4 comments
Closed

Python implementation of json.loads() accepts non-ascii digits #125682

nineteendo opened this issue Oct 18, 2024 · 4 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@nineteendo
Copy link
Contributor

nineteendo commented Oct 18, 2024

Bug report

Bug description:

You should be careful when matching unicode regexes:

NUMBER_RE = re.compile(
r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))

>>> import sys
>>> sys.modules["_json"] = None
>>> import json
>>> json.loads("[1\uff10, 0.\uff10, 0e\uff10]")
[10, 0.0, 0.0]

I think it's safer to use [0-9] instead of \d here.

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

@ZeroIntensity
Copy link
Member

Why is this a bug? int and float support this as well:

>>> int("\uff10")
0
>>> float("0e\uff10")
0.0

Arguably, the bug is that the C implementation does not support it.

@taleinat
Copy link
Contributor

This can indeed be considered a bug, since it does not conform to the JSON specification in the standards.

See, for example, Section 6: Numbers in IETF RFC 8259, which clearly defines the supported digits.

@ZeroIntensity ZeroIntensity added stdlib Python modules in the Lib dir 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Oct 18, 2024
@ZeroIntensity
Copy link
Member

Ah, I've applied the appropriate labels. Thank you for clarification :)

@nineteendo nineteendo changed the title Python implementation of json.loads() accepts unicode digits Python implementation of json.loads() accepts non-ascii digits Oct 18, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 18, 2024
… of JSON decoder (pythonGH-125687)

(cherry picked from commit d358425)

Co-authored-by: Nice Zombies <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 18, 2024
… of JSON decoder (pythonGH-125687)

(cherry picked from commit d358425)

Co-authored-by: Nice Zombies <[email protected]>
@hauntsaninja
Copy link
Contributor

Good spot!

serhiy-storchaka pushed a commit that referenced this issue Oct 21, 2024
…n of JSON decoder (GH-125687) (GH-125693)

(cherry picked from commit d358425)

Co-authored-by: Nice Zombies <[email protected]>
serhiy-storchaka pushed a commit that referenced this issue Oct 21, 2024
…n of JSON decoder (GH-125687) (GH-125692)

(cherry picked from commit d358425)

Co-authored-by: Nice Zombies <[email protected]>
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants