Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster json string parsing #118

Merged
merged 2 commits into from
Jun 10, 2022
Merged

Faster json string parsing #118

merged 2 commits into from
Jun 10, 2022

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Jun 10, 2022

This significantly speeds up JSON string decoding. Using some ideas borrowed from yyjson, as well as some general performance tricks, we:

  • Use a new lookup table for determining interesting char values that require special handling.
  • Have separate processing loops for ascii and unicode
  • Split off the (less common) escape handling routine into a separate function.
  • Manually unroll the string processing loops
  • Apply an inline asm hack to get GCC to generate better code

Besides a faster string parsing inner loop, this also lets us determine if a string is all ascii characters for free. In this case, we skip calling PyUnicode_DecodeUTF8 and instead manually create the new string object, further accelerating decoding.

In general, this results in:

  • 2x speedup for decoding ascii-only strings
  • 2x speedup for decoding ascii strings with escape characters (e.g. \\n -> \n)
  • ~20% speedup for decoding strings containing non-ascii unicode characters

jcrist added 2 commits June 9, 2022 22:39
This significantly speeds up JSON string decoding. Using some ideas
borrowed from yyjson, as well as some general performance tricks, we:

- Use a new lookup table for determining interesting char values that
require special handling.
- Have separate processing loops for ascii and unicode
- Split off the (less common) escape handling routine into a separate
function.
- Manually unroll the string processing loops
- Apply an inline asm hack to get GCC to generate better code

Besides a faster string parsing inner loop, this also lets us determine
if a string is all ascii characters for free. In this case, we skip
calling `PyUnicode_DecodeUTF8` and instead manually create the new
string object, further accelerating decoding.
@jcrist jcrist merged commit d698fe2 into main Jun 10, 2022
@jcrist jcrist deleted the faster-json-strings branch June 10, 2022 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant