Evaluate whether pysimdjson could be used in Rally #1046

dliappis · 2020-08-10T10:27:10Z

There are largely two areas where handling large chunks of JSON impacts performance in Rally:

Parsing the JSON source
Creating Python (dict) Objects from JSON

The simjson project seems to take advantage of modern SIMD vector instructions to achieve much higher performance than other libraries.

The pysimjson project beings those benefits to Python via bindings with prebuilt binary wheels for a lot of platforms. Additionally, it provides JSON pointers via at(), or proxies for objects and lists to reduce the creation of Python objects. We've been hitting these issues at various points e.g. in #941 and #935 (i.e. especially after using an async-io based load generator).

Given the benchmark results this could be a very useful library to use. Both projects use Apache 2.0 license.

TkTech · 2020-08-10T17:58:22Z

I'm watching this issue - if you find any missing functionality or issues in pysimdjson that would block this, let me know and they'll be resolved.

pquentin · 2022-06-29T12:03:43Z

The three main contenders for parsing JSON are:

the standard json module, written in pure Python
orjson, written in Rust
as mentioned above, pysimdjson, a Cython wrapper around simdjson, a C++ library that can handle JSON at multiple GB/s.

Ease of use

Nothing beats the standard library here, but orjson and pysimdjson both provide wheels, so no compilation is needed in practice. orjson is more popular (3.4k stars vs. 0.5k for pysimdjson). orjson is also more actively maintained (which makes sense as pysimdjson is only a wrapper). But orjson had Python 3.10 wheels before pysimdjson. Neither currently has Python 3.11 wheels. Small note: orjson only serializes to/deserializes from bytes, which makes sense but is more restrictive than the standard library.

Speed

For small JSON documents with a lot of structure, using an alternative JSON parser won't help much, because the bulk of the time is spent inside CPython creating and allocating the correct structure.
With larger documents and less structure, orjson is actually slightly faster than pysimdjson, but you still spend 95% of your time in CPython, so both are around 2x faster
However, if you don't need all the keys, but only a few of them, then pysimdjson has an API for you that can get you 10x speedups over orjson

pquentin · 2022-06-29T12:04:42Z

A good test bed for pysimdjson support for extracting specific keys is this parse() function that currently uses ijson and is crucial to avoid client-side bottlenecks: https://github.com/elastic/rally/blob/master/esrally/driver/runner.py#L736-L792

TkTech · 2022-06-29T18:12:40Z

Neither currently has Python 3.11 wheels.

Keep in mind 3.11 is not out yet, and you should never push beta tag wheels to pypi as the ABI is not yet stable. When 3.11 is released and cibuildwheel is updated, pysimdjson (and orjson) will push 3.11 wheels.

berglh · 2022-08-24T06:23:11Z

While I don't have anything super useful to add here in terms of replacements, I would just like to throw my anecdotal hat into this ring with respect to the elastic/logs track I was trying to run against our new NVMe backed hot data tier on on-prem hardware within an ECE cluster. The results I was getting scaling from targeting 1 shard to 2 shards and beyond didn't improve the overall indexing throughput. I specifically increased the corpus size to around 60 days of data to ensure I had plenty of events to index. My goal was to understand the behaviour the new cluster with respect to hot spotting, shard and replica counts. Unfortunately, Elastic Rally initially gave me the wrong idea.

It wasn't until I ran multiple copies of Elastic Rally with identical settings concurrently from the same host was I able to actually start approach any of the hardware limits in the cluster. In the end, I had to run 12x Elastic Rally instances on the elastic\logs track to bottleneck the CPU on the hot data tier. I executed all 12 instances from a single server (backed by NVMe, 128 GB of RAM, 32c/64t, 10 Gb network). This resulted in the actual indexing rate rising from 60-70,000 doc/s to 550-600,000 docs/s. The reality was that the server sending the logs weren't a limiting factor, nor were the hot data tier nodes, but Elastic Rally in quickly providing the documents fast enough to index.

My suspicion was that, similar to the Golang stdlb for encoding/json, that the performance is not super optimised in Python. This issue seems to validate that theory, I just wanted to provide a real world example of where Elastic Rally performance is producing results that could be easily misconstrued by naive users such as myself.

pquentin · 2022-08-24T06:40:02Z

@berglh Thanks for the report! It's true that you should always check that the client is not the bottleneck. Until we fix #1399, would you mind running https://github.com/benfred/py-spy on one of the Rally processes? It will tell us what exactly is being slow.

berglh · 2022-08-25T05:15:37Z

@pquentin I'm not sure if you were after the flame graph specifically or a different format. Can run again with the other output if required. I went ahead and cleared out or cluster password from the SVG. I didn't see anything specifically JSON related in the hotspots, but there's a lot going on as I captured the parent and subprocesses of the elastic/logs track. esrally_profile
Edit: ~~Looks like github munged the SVG :?~~

pquentin · 2022-08-25T13:02:59Z

I opened #1566 so that this issue stays focused on pysimdjson.

dliappis added enhancement Improves the status quo :misc Changes that don't affect users directly: linter fixes, test improvements, etc. labels Aug 10, 2020

TkTech mentioned this issue Aug 10, 2020

Prove we're worth using in real-world cases TkTech/pysimdjson#47

Closed

pquentin mentioned this issue Aug 25, 2022

Having to run 12x Elastic Rally instances on the elastic\logs track to bottleneck the CPU on the hot data tier #1566

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate whether pysimdjson could be used in Rally #1046

Evaluate whether pysimdjson could be used in Rally #1046

dliappis commented Aug 10, 2020

TkTech commented Aug 10, 2020

pquentin commented Jun 29, 2022

pquentin commented Jun 29, 2022

TkTech commented Jun 29, 2022

berglh commented Aug 24, 2022 •

edited

Loading

pquentin commented Aug 24, 2022

berglh commented Aug 25, 2022 •

edited

Loading

pquentin commented Aug 25, 2022

Evaluate whether pysimdjson could be used in Rally #1046

Evaluate whether pysimdjson could be used in Rally #1046

Comments

dliappis commented Aug 10, 2020

TkTech commented Aug 10, 2020

pquentin commented Jun 29, 2022

Ease of use

Speed

pquentin commented Jun 29, 2022

TkTech commented Jun 29, 2022

berglh commented Aug 24, 2022 • edited Loading

pquentin commented Aug 24, 2022

berglh commented Aug 25, 2022 • edited Loading

pquentin commented Aug 25, 2022

berglh commented Aug 24, 2022 •

edited

Loading

berglh commented Aug 25, 2022 •

edited

Loading