Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
storage: replace CommandQueue with spanlatch.Manager
This commit replaces the CommandQueue with the spanlatch.Manager, which was introduced in #31997. See that PR for an introduction to how the structure differs from the CommandQueue and how it improves performance on microbenchmarks. This is mostly a mechanical change. One important detail is that it removes the CommandQueue debug change. We found that the page was buggy (or straight up broken) and it wasn't actively used by members of Core when debugging problems. In its place, the commit revives the "slow requests" metric for latching, which hasn't been hooked up in over a year. _### Benchmarks _#### Standard Benchmarks These benchmarks are standard benchmarks that we commonly run. They were run with varying node sizes, cluster sizes, and pre-split counts. ``` name old ops/sec new ops/sec delta kv0/cores=4/nodes=1/splits=0 1.99k ± 2% 2.06k ± 1% +3.22% (p=0.008 n=5+5) kv0/cores=4/nodes=1/splits=100 2.25k ± 1% 2.38k ± 1% +6.01% (p=0.008 n=5+5) kv0/cores=4/nodes=3/splits=0 1.60k ± 0% 1.69k ± 2% +5.53% (p=0.008 n=5+5) kv0/cores=4/nodes=3/splits=100 3.52k ± 6% 3.65k ± 9% ~ (p=0.421 n=5+5) kv0/cores=16/nodes=1/splits=0 19.9k ± 1% 21.8k ± 1% +9.34% (p=0.008 n=5+5) kv0/cores=16/nodes=1/splits=100 24.4k ± 1% 26.1k ± 1% +7.17% (p=0.008 n=5+5) kv0/cores=16/nodes=3/splits=0 14.9k ± 1% 16.1k ± 1% +8.03% (p=0.008 n=5+5) kv0/cores=16/nodes=3/splits=100 20.6k ± 1% 22.8k ± 1% +10.79% (p=0.008 n=5+5) kv0/cores=36/nodes=1/splits=0 31.2k ± 2% 35.3k ± 1% +13.28% (p=0.008 n=5+5) kv0/cores=36/nodes=1/splits=100 45.7k ± 1% 51.1k ± 1% +11.80% (p=0.008 n=5+5) kv0/cores=36/nodes=3/splits=0 23.7k ± 2% 27.1k ± 2% +14.39% (p=0.008 n=5+5) kv0/cores=36/nodes=3/splits=100 34.9k ± 2% 45.1k ± 1% +29.44% (p=0.008 n=5+5) kv95/cores=4/nodes=1/splits=0 12.7k ± 2% 12.9k ± 2% +1.39% (p=0.151 n=5+5) kv95/cores=4/nodes=1/splits=100 12.8k ± 2% 13.1k ± 2% +2.10% (p=0.032 n=5+5) kv95/cores=4/nodes=3/splits=0 10.6k ± 1% 10.8k ± 1% +1.58% (p=0.056 n=5+5) kv95/cores=4/nodes=3/splits=100 12.3k ± 7% 12.6k ± 8% +2.61% (p=0.095 n=5+5) kv95/cores=16/nodes=1/splits=0 50.9k ± 1% 52.2k ± 1% +2.37% (p=0.008 n=5+5) kv95/cores=16/nodes=1/splits=100 52.2k ± 1% 53.0k ± 1% +1.49% (p=0.008 n=5+5) kv95/cores=16/nodes=3/splits=0 46.2k ± 1% 46.8k ± 1% +1.32% (p=0.032 n=5+5) kv95/cores=16/nodes=3/splits=100 51.0k ± 1% 53.2k ± 1% +4.25% (p=0.008 n=5+5) kv95/cores=36/nodes=1/splits=0 79.8k ± 2% 101.6k ± 1% +27.31% (p=0.008 n=5+5) kv95/cores=36/nodes=1/splits=100 104k ± 1% 107k ± 1% +2.60% (p=0.008 n=5+5) kv95/cores=36/nodes=3/splits=0 85.8k ± 1% 91.8k ± 1% +7.08% (p=0.008 n=5+5) kv95/cores=36/nodes=3/splits=100 106k ± 1% 112k ± 1% +5.51% (p=0.008 n=5+5) name old p50(ms) new p50(ms) delta kv0/cores=4/nodes=1/splits=0 3.52 ± 5% 3.40 ± 0% -3.41% (p=0.016 n=5+4) kv0/cores=4/nodes=1/splits=100 3.30 ± 0% 3.00 ± 0% -9.09% (p=0.008 n=5+5) kv0/cores=4/nodes=3/splits=0 4.70 ± 0% 4.14 ± 9% -11.91% (p=0.008 n=5+5) kv0/cores=4/nodes=3/splits=100 1.50 ± 0% 1.48 ± 8% ~ (p=0.968 n=4+5) kv0/cores=16/nodes=1/splits=0 1.40 ± 0% 1.40 ± 0% ~ (all equal) kv0/cores=16/nodes=1/splits=100 1.20 ± 0% 1.20 ± 0% ~ (all equal) kv0/cores=16/nodes=3/splits=0 2.00 ± 0% 1.90 ± 0% -5.00% (p=0.000 n=5+4) kv0/cores=16/nodes=3/splits=100 1.40 ± 0% 1.40 ± 0% ~ (all equal) kv0/cores=36/nodes=1/splits=0 1.76 ± 3% 1.60 ± 0% -9.09% (p=0.008 n=5+5) kv0/cores=36/nodes=1/splits=100 1.40 ± 0% 1.30 ± 0% -7.14% (p=0.008 n=5+5) kv0/cores=36/nodes=3/splits=0 2.56 ± 2% 2.40 ± 0% -6.25% (p=0.000 n=5+4) kv0/cores=36/nodes=3/splits=100 1.70 ± 0% 1.40 ± 0% -17.65% (p=0.008 n=5+5) kv95/cores=4/nodes=1/splits=0 0.50 ± 0% 0.50 ± 0% ~ (all equal) kv95/cores=4/nodes=1/splits=100 0.50 ± 0% 0.50 ± 0% ~ (all equal) kv95/cores=4/nodes=3/splits=0 0.60 ± 0% 0.60 ± 0% ~ (all equal) kv95/cores=4/nodes=3/splits=100 0.60 ± 0% 0.60 ± 0% ~ (all equal) kv95/cores=16/nodes=1/splits=0 0.50 ± 0% 0.50 ± 0% ~ (all equal) kv95/cores=16/nodes=1/splits=100 0.50 ± 0% 0.50 ± 0% ~ (all equal) kv95/cores=16/nodes=3/splits=0 0.70 ± 0% 0.64 ± 9% -8.57% (p=0.167 n=5+5) kv95/cores=16/nodes=3/splits=100 0.60 ± 0% 0.60 ± 0% ~ (all equal) kv95/cores=36/nodes=1/splits=0 0.50 ± 0% 0.50 ± 0% ~ (all equal) kv95/cores=36/nodes=1/splits=100 0.50 ± 0% 0.50 ± 0% ~ (all equal) kv95/cores=36/nodes=3/splits=0 0.66 ± 9% 0.60 ± 0% -9.09% (p=0.167 n=5+5) kv95/cores=36/nodes=3/splits=100 0.60 ± 0% 0.60 ± 0% ~ (all equal) name old p99(ms) new p99(ms) delta kv0/cores=4/nodes=1/splits=0 11.0 ± 0% 10.5 ± 0% -4.55% (p=0.000 n=5+4) kv0/cores=4/nodes=1/splits=100 7.90 ± 0% 7.60 ± 0% -3.80% (p=0.000 n=5+4) kv0/cores=4/nodes=3/splits=0 15.7 ± 0% 15.2 ± 0% -3.18% (p=0.008 n=5+5) kv0/cores=4/nodes=3/splits=100 8.90 ± 0% 8.12 ± 3% -8.76% (p=0.016 n=4+5) kv0/cores=16/nodes=1/splits=0 3.46 ± 2% 3.00 ± 0% -13.29% (p=0.000 n=5+4) kv0/cores=16/nodes=1/splits=100 4.50 ± 0% 3.36 ± 2% -25.33% (p=0.008 n=5+5) kv0/cores=16/nodes=3/splits=0 4.50 ± 0% 3.90 ± 0% -13.33% (p=0.008 n=5+5) kv0/cores=16/nodes=3/splits=100 5.80 ± 0% 4.10 ± 0% -29.31% (p=0.029 n=4+4) kv0/cores=36/nodes=1/splits=0 6.80 ± 0% 5.20 ± 0% -23.53% (p=0.008 n=5+5) kv0/cores=36/nodes=1/splits=100 5.80 ± 0% 4.32 ± 4% -25.52% (p=0.008 n=5+5) kv0/cores=36/nodes=3/splits=0 7.72 ± 2% 6.30 ± 0% -18.39% (p=0.000 n=5+4) kv0/cores=36/nodes=3/splits=100 7.98 ± 2% 5.20 ± 0% -34.84% (p=0.000 n=5+4) kv95/cores=4/nodes=1/splits=0 5.38 ± 3% 5.20 ± 0% -3.35% (p=0.167 n=5+5) kv95/cores=4/nodes=1/splits=100 5.00 ± 0% 5.00 ± 0% ~ (all equal) kv95/cores=4/nodes=3/splits=0 5.68 ± 3% 5.50 ± 0% -3.17% (p=0.095 n=5+4) kv95/cores=4/nodes=3/splits=100 3.60 ±31% 2.93 ± 3% -18.75% (p=0.016 n=5+4) kv95/cores=16/nodes=1/splits=0 4.10 ± 0% 4.10 ± 0% ~ (all equal) kv95/cores=16/nodes=1/splits=100 4.50 ± 0% 4.10 ± 0% -8.89% (p=0.000 n=5+4) kv95/cores=16/nodes=3/splits=0 2.60 ± 0% 2.60 ± 0% ~ (all equal) kv95/cores=16/nodes=3/splits=100 2.50 ± 0% 1.90 ± 5% -24.00% (p=0.008 n=5+5) kv95/cores=36/nodes=1/splits=0 6.60 ± 0% 6.00 ± 0% -9.09% (p=0.029 n=4+4) kv95/cores=36/nodes=1/splits=100 5.50 ± 0% 5.12 ± 2% -6.91% (p=0.008 n=5+5) kv95/cores=36/nodes=3/splits=0 4.18 ± 2% 4.02 ± 3% -3.71% (p=0.000 n=4+5) kv95/cores=36/nodes=3/splits=100 3.80 ± 0% 2.80 ± 0% -26.32% (p=0.008 n=5+5) ``` _#### Large-machine Benchmarks These benchmarks are standard benchmarks run on a single-node cluster with 72 vCPUs. ``` name old ops/sec new ops/sec delta kv0/cores=72/nodes=1/splits=0 31.0k ± 4% 36.4k ± 1% +17.57% (p=0.008 n=5+5) kv0/cores=72/nodes=1/splits=100 44.0k ± 0% 49.0k ± 1% +11.41% (p=0.008 n=5+5) kv95/cores=72/nodes=1/splits=0 52.7k ±18% 72.6k ±26% +37.70% (p=0.016 n=5+5) kv95/cores=72/nodes=1/splits=100 66.8k ±17% 68.5k ± 5% ~ (p=0.286 n=5+4) name old p50(ms) new p50(ms) delta kv0/cores=72/nodes=1/splits=0 2.30 ±13% 2.52 ± 5% ~ (p=0.214 n=5+5) kv0/cores=72/nodes=1/splits=100 3.00 ± 0% 2.90 ± 0% -3.33% (p=0.008 n=5+5) kv95/cores=72/nodes=1/splits=0 0.46 ±13% 0.50 ± 0% ~ (p=0.444 n=5+5) kv95/cores=72/nodes=1/splits=100 0.44 ±14% 0.50 ± 0% +13.64% (p=0.167 n=5+5) name old p99(ms) new p99(ms) delta kv0/cores=72/nodes=1/splits=0 18.9 ± 6% 13.3 ± 5% -29.56% (p=0.008 n=5+5) kv0/cores=72/nodes=1/splits=100 13.4 ± 2% 11.0 ± 0% -17.91% (p=0.008 n=5+5) kv95/cores=72/nodes=1/splits=0 34.4 ±34% 23.5 ±24% -31.74% (p=0.048 n=5+5) kv95/cores=72/nodes=1/splits=100 21.0 ± 0% 19.1 ± 4% -8.81% (p=0.029 n=4+4) ``` _#### Motivating Benchmarks These are benchmarks that used to generate a lot of contention in the CommandQueue. They have small cycle-lengths, indicated by the `c` specifier. The last one also includes 20% scan operations, which increases contention between non-overlapping point operations. ``` name old ops/sec new ops/sec delta kv95-c5/cores=16/nodes=1/splits=0 45.1k ± 1% 47.2k ± 4% +4.59% (p=0.008 n=5+5) kv95-c5/cores=36/nodes=1/splits=0 44.6k ± 1% 76.3k ± 1% +71.05% (p=0.008 n=5+5) kv50-c128/cores=16/nodes=1/splits=0 27.2k ± 2% 29.4k ± 1% +8.12% (p=0.008 n=5+5) kv50-c128/cores=36/nodes=1/splits=0 42.6k ± 2% 50.0k ± 1% +17.39% (p=0.008 n=5+5) kv70-20-c128/cores=16/nodes=1/splits=0 28.7k ± 1% 29.8k ± 3% +3.87% (p=0.008 n=5+5) kv70-20-c128/cores=36/nodes=1/splits=0 41.9k ± 4% 52.8k ± 2% +25.97% (p=0.008 n=5+5) name old p50(ms) new p50(ms) delta kv95-c5/cores=16/nodes=1/splits=0 0.60 ± 0% 0.60 ± 0% ~ (all equal) kv95-c5/cores=36/nodes=1/splits=0 0.90 ± 0% 0.80 ± 0% -11.11% (p=0.008 n=5+5) kv50-c128/cores=16/nodes=1/splits=0 1.10 ± 0% 1.06 ± 6% ~ (p=0.444 n=5+5) kv50-c128/cores=36/nodes=1/splits=0 1.26 ± 5% 1.30 ± 0% ~ (p=0.444 n=5+5) kv70-20-c128/cores=16/nodes=1/splits=0 0.66 ± 9% 0.60 ± 0% -9.09% (p=0.167 n=5+5) kv70-20-c128/cores=36/nodes=1/splits=0 0.70 ± 0% 0.50 ± 0% -28.57% (p=0.008 n=5+5) name old p99(ms) new p99(ms) delta kv95-c5/cores=16/nodes=1/splits=0 2.40 ± 0% 2.10 ± 0% -12.50% (p=0.000 n=5+4) kv95-c5/cores=36/nodes=1/splits=0 5.80 ± 0% 3.30 ± 0% -43.10% (p=0.000 n=5+4) kv50-c128/cores=16/nodes=1/splits=0 3.50 ± 0% 3.00 ± 0% -14.29% (p=0.008 n=5+5) kv50-c128/cores=36/nodes=1/splits=0 6.80 ± 0% 4.70 ± 0% -30.88% (p=0.079 n=4+5) kv70-20-c128/cores=16/nodes=1/splits=0 5.00 ± 0% 4.70 ± 0% -6.00% (p=0.029 n=4+4) kv70-20-c128/cores=36/nodes=1/splits=0 11.0 ± 0% 6.8 ± 0% -38.18% (p=0.008 n=5+5) ``` _#### Batching Benchmarks One optimization left out of the new spanlatch.Manager was the "covering" optimization, where commands were initially added to the interval tree as a single spanning interval and only expanded later. I ran a series of benchmarks to verify that this optimization was not needed. My hypothesis was that the order of magnitude increase the speed of the interval tree would make the optimization unnecessary. It turns out that removing the optimization hurt a few benchmarks to a small degree but speed up others tremendously (some benchmarks improved by over 400%). I suspect that the covering optimization could actually hurt in cases where it causes non-overlapping requests to overlap. It is interesting how quickly a few of these benchmarks oscillate from small losses to big wins. It makes me think that there's some non-linear behavior with the old CommandQueue that would cause its performance to quickly degrade once it became a contention bottleneck. ``` name old ops/sec new ops/sec delta kv0-b16/cores=4/nodes=1/splits=0 2.41k ± 0% 2.06k ± 3% -14.75% (p=0.008 n=5+5) kv0-b16/cores=4/nodes=1/splits=100 514 ± 0% 534 ± 1% +3.88% (p=0.008 n=5+5) kv0-b16/cores=16/nodes=1/splits=0 2.95k ± 0% 4.35k ± 0% +47.74% (p=0.008 n=5+5) kv0-b16/cores=16/nodes=1/splits=100 1.80k ± 1% 1.88k ± 1% +4.46% (p=0.008 n=5+5) kv0-b16/cores=36/nodes=1/splits=0 2.74k ± 0% 4.92k ± 1% +79.55% (p=0.008 n=5+5) kv0-b16/cores=36/nodes=1/splits=100 2.39k ± 1% 2.45k ± 1% +2.41% (p=0.008 n=5+5) kv0-b128/cores=4/nodes=1/splits=0 422 ± 0% 518 ± 1% +22.60% (p=0.008 n=5+5) kv0-b128/cores=4/nodes=1/splits=100 98.4 ± 1% 98.8 ± 1% ~ (p=0.810 n=5+5) kv0-b128/cores=16/nodes=1/splits=0 532 ± 0% 1059 ± 0% +99.16% (p=0.008 n=5+5) kv0-b128/cores=16/nodes=1/splits=100 291 ± 1% 307 ± 1% +5.18% (p=0.008 n=5+5) kv0-b128/cores=36/nodes=1/splits=0 483 ± 0% 1288 ± 1% +166.37% (p=0.008 n=5+5) kv0-b128/cores=36/nodes=1/splits=100 394 ± 1% 408 ± 1% +3.51% (p=0.008 n=5+5) kv0-b1024/cores=4/nodes=1/splits=0 49.7 ± 1% 72.8 ± 1% +46.52% (p=0.008 n=5+5) kv0-b1024/cores=4/nodes=1/splits=100 30.8 ± 0% 23.4 ± 0% -24.03% (p=0.008 n=5+5) kv0-b1024/cores=16/nodes=1/splits=0 48.9 ± 2% 160.6 ± 0% +228.38% (p=0.008 n=5+5) kv0-b1024/cores=16/nodes=1/splits=100 101 ± 1% 80 ± 0% -21.64% (p=0.008 n=5+5) kv0-b1024/cores=36/nodes=1/splits=0 37.5 ± 0% 208.1 ± 1% +454.99% (p=0.016 n=4+5) kv0-b1024/cores=36/nodes=1/splits=100 162 ± 0% 124 ± 0% -23.22% (p=0.008 n=5+5) kv95-b16/cores=4/nodes=1/splits=0 5.93k ± 0% 6.20k ± 1% +4.55% (p=0.008 n=5+5) kv95-b16/cores=4/nodes=1/splits=100 2.27k ± 1% 2.32k ± 1% +2.28% (p=0.008 n=5+5) kv95-b16/cores=16/nodes=1/splits=0 5.15k ± 1% 18.79k ± 1% +264.73% (p=0.008 n=5+5) kv95-b16/cores=16/nodes=1/splits=100 8.31k ± 1% 8.57k ± 1% +3.16% (p=0.008 n=5+5) kv95-b16/cores=36/nodes=1/splits=0 3.96k ± 0% 10.67k ± 1% +169.81% (p=0.008 n=5+5) kv95-b16/cores=36/nodes=1/splits=100 15.7k ± 2% 16.2k ± 4% +2.75% (p=0.151 n=5+5) kv95-b128/cores=4/nodes=1/splits=0 1.12k ± 1% 1.27k ± 0% +13.28% (p=0.008 n=5+5) kv95-b128/cores=4/nodes=1/splits=100 290 ± 1% 299 ± 1% +3.02% (p=0.008 n=5+5) kv95-b128/cores=16/nodes=1/splits=0 1.06k ± 0% 3.31k ± 0% +213.09% (p=0.008 n=5+5) kv95-b128/cores=16/nodes=1/splits=100 662 ±91% 1095 ± 1% +65.42% (p=0.016 n=5+4) kv95-b128/cores=36/nodes=1/splits=0 715 ± 2% 3586 ± 0% +401.21% (p=0.008 n=5+5) kv95-b128/cores=36/nodes=1/splits=100 1.15k ±90% 2.01k ± 2% +74.79% (p=0.016 n=5+4) kv95-b1024/cores=4/nodes=1/splits=0 134 ± 1% 170 ± 1% +26.59% (p=0.008 n=5+5) kv95-b1024/cores=4/nodes=1/splits=100 54.8 ± 3% 53.3 ± 3% -2.84% (p=0.056 n=5+5) kv95-b1024/cores=16/nodes=1/splits=0 104 ± 3% 367 ± 1% +252.37% (p=0.008 n=5+5) kv95-b1024/cores=16/nodes=1/splits=100 210 ± 1% 214 ± 1% +1.86% (p=0.008 n=5+5) kv95-b1024/cores=36/nodes=1/splits=0 76.5 ± 2% 383.9 ± 1% +401.67% (p=0.008 n=5+5) kv95-b1024/cores=36/nodes=1/splits=100 431 ± 1% 436 ± 1% +1.17% (p=0.016 n=5+5) name old p50(ms) new p50(ms) delta kv0-b16/cores=4/nodes=1/splits=0 3.00 ± 0% 3.40 ± 0% +13.33% (p=0.016 n=5+4) kv0-b16/cores=4/nodes=1/splits=100 15.2 ± 0% 14.7 ± 0% -3.29% (p=0.008 n=5+5) kv0-b16/cores=16/nodes=1/splits=0 10.5 ± 0% 7.7 ± 2% -26.48% (p=0.008 n=5+5) kv0-b16/cores=16/nodes=1/splits=100 17.8 ± 0% 16.8 ± 0% -5.62% (p=0.008 n=5+5) kv0-b16/cores=36/nodes=1/splits=0 26.2 ± 0% 14.2 ± 0% -45.80% (p=0.008 n=5+5) kv0-b16/cores=36/nodes=1/splits=100 29.0 ± 2% 28.3 ± 0% -2.28% (p=0.095 n=5+4) kv0-b128/cores=4/nodes=1/splits=0 17.8 ± 0% 15.2 ± 0% -14.61% (p=0.000 n=5+4) kv0-b128/cores=4/nodes=1/splits=100 79.7 ± 0% 79.7 ± 0% ~ (all equal) kv0-b128/cores=16/nodes=1/splits=0 65.0 ± 0% 32.5 ± 0% -50.00% (p=0.029 n=4+4) kv0-b128/cores=16/nodes=1/splits=100 109 ± 0% 105 ± 0% -3.85% (p=0.008 n=5+5) kv0-b128/cores=36/nodes=1/splits=0 168 ± 0% 50 ± 0% -70.02% (p=0.008 n=5+5) kv0-b128/cores=36/nodes=1/splits=100 184 ± 0% 176 ± 0% -4.50% (p=0.008 n=5+5) kv0-b1024/cores=4/nodes=1/splits=0 159 ± 0% 109 ± 0% -31.56% (p=0.000 n=5+4) kv0-b1024/cores=4/nodes=1/splits=100 252 ± 0% 319 ± 0% +26.66% (p=0.008 n=5+5) kv0-b1024/cores=16/nodes=1/splits=0 705 ± 0% 193 ± 0% -72.62% (p=0.000 n=5+4) kv0-b1024/cores=16/nodes=1/splits=100 319 ± 0% 386 ± 0% +21.05% (p=0.008 n=5+5) kv0-b1024/cores=36/nodes=1/splits=0 1.88k ± 0% 0.24k ± 0% -87.05% (p=0.008 n=5+5) kv0-b1024/cores=36/nodes=1/splits=100 436 ± 0% 570 ± 0% +30.77% (p=0.008 n=5+5) kv95-b16/cores=4/nodes=1/splits=0 1.20 ± 0% 1.20 ± 0% ~ (all equal) kv95-b16/cores=4/nodes=1/splits=100 2.60 ± 0% 2.60 ± 0% ~ (all equal) kv95-b16/cores=16/nodes=1/splits=0 6.30 ± 0% 1.40 ± 0% -77.78% (p=0.000 n=5+4) kv95-b16/cores=16/nodes=1/splits=100 1.74 ± 3% 1.76 ± 3% ~ (p=1.000 n=5+5) kv95-b16/cores=36/nodes=1/splits=0 11.5 ± 0% 5.5 ± 0% -52.17% (p=0.000 n=5+4) kv95-b16/cores=36/nodes=1/splits=100 2.42 ±20% 2.42 ±45% ~ (p=0.579 n=5+5) kv95-b128/cores=4/nodes=1/splits=0 6.60 ± 0% 6.00 ± 0% -9.09% (p=0.008 n=5+5) kv95-b128/cores=4/nodes=1/splits=100 21.4 ± 3% 21.0 ± 0% ~ (p=0.444 n=5+5) kv95-b128/cores=16/nodes=1/splits=0 30.4 ± 0% 9.4 ± 0% -69.08% (p=0.008 n=5+5) kv95-b128/cores=16/nodes=1/splits=100 38.2 ±76% 21.2 ± 4% -44.31% (p=0.063 n=5+4) kv95-b128/cores=36/nodes=1/splits=0 88.1 ± 0% 16.8 ± 0% -80.93% (p=0.000 n=5+4) kv95-b128/cores=36/nodes=1/splits=100 56.6 ±85% 29.6 ±15% ~ (p=0.873 n=5+4) kv95-b1024/cores=4/nodes=1/splits=0 52.4 ± 0% 44.0 ± 0% -16.03% (p=0.029 n=4+4) kv95-b1024/cores=4/nodes=1/splits=100 132 ± 2% 143 ± 0% +8.29% (p=0.016 n=5+4) kv95-b1024/cores=16/nodes=1/splits=0 325 ± 3% 80 ± 0% -75.51% (p=0.008 n=5+5) kv95-b1024/cores=16/nodes=1/splits=100 151 ± 0% 151 ± 0% ~ (all equal) kv95-b1024/cores=36/nodes=1/splits=0 973 ± 0% 180 ± 3% -81.55% (p=0.008 n=5+5) kv95-b1024/cores=36/nodes=1/splits=100 168 ± 0% 168 ± 0% ~ (all equal) name old p99(ms) new p99(ms) delta kv0-b16/cores=4/nodes=1/splits=0 8.40 ± 0% 10.30 ± 3% +22.62% (p=0.016 n=4+5) kv0-b16/cores=4/nodes=1/splits=100 29.4 ± 0% 27.3 ± 0% -7.14% (p=0.000 n=5+4) kv0-b16/cores=16/nodes=1/splits=0 16.3 ± 0% 15.5 ± 2% -4.91% (p=0.008 n=5+5) kv0-b16/cores=16/nodes=1/splits=100 31.5 ± 0% 29.4 ± 0% -6.67% (p=0.000 n=5+4) kv0-b16/cores=36/nodes=1/splits=0 37.7 ± 0% 28.7 ± 2% -23.77% (p=0.008 n=5+5) kv0-b16/cores=36/nodes=1/splits=100 62.1 ± 2% 68.4 ±10% +10.15% (p=0.008 n=5+5) kv0-b128/cores=4/nodes=1/splits=0 37.7 ± 0% 39.4 ± 6% +4.46% (p=0.167 n=5+5) kv0-b128/cores=4/nodes=1/splits=100 143 ± 0% 151 ± 0% +5.89% (p=0.016 n=4+5) kv0-b128/cores=16/nodes=1/splits=0 79.7 ± 0% 55.8 ± 2% -30.04% (p=0.008 n=5+5) kv0-b128/cores=16/nodes=1/splits=100 198 ± 3% 188 ± 3% -5.09% (p=0.048 n=5+5) kv0-b128/cores=36/nodes=1/splits=0 184 ± 0% 126 ± 3% -31.82% (p=0.008 n=5+5) kv0-b128/cores=36/nodes=1/splits=100 319 ± 0% 336 ± 0% +5.24% (p=0.008 n=5+5) kv0-b1024/cores=4/nodes=1/splits=0 322 ± 6% 253 ± 4% -21.35% (p=0.008 n=5+5) kv0-b1024/cores=4/nodes=1/splits=100 470 ± 0% 772 ± 4% +64.28% (p=0.016 n=4+5) kv0-b1024/cores=16/nodes=1/splits=0 1.41k ± 0% 0.56k ±11% -60.00% (p=0.000 n=4+5) kv0-b1024/cores=16/nodes=1/splits=100 530 ± 2% 772 ± 0% +45.57% (p=0.008 n=5+5) kv0-b1024/cores=36/nodes=1/splits=0 4.05k ± 7% 1.17k ± 3% -71.19% (p=0.008 n=5+5) kv0-b1024/cores=36/nodes=1/splits=100 792 ±14% 1020 ± 2% +28.81% (p=0.008 n=5+5) kv95-b16/cores=4/nodes=1/splits=0 3.90 ± 0% 3.22 ± 4% -17.44% (p=0.008 n=5+5) kv95-b16/cores=4/nodes=1/splits=100 21.0 ± 0% 19.9 ± 0% -5.24% (p=0.079 n=4+5) kv95-b16/cores=16/nodes=1/splits=0 15.2 ± 0% 7.1 ± 0% -53.29% (p=0.079 n=4+5) kv95-b16/cores=16/nodes=1/splits=100 38.5 ± 3% 37.7 ± 0% ~ (p=0.333 n=5+4) kv95-b16/cores=36/nodes=1/splits=0 128 ± 2% 52 ± 0% -59.16% (p=0.000 n=5+4) kv95-b16/cores=36/nodes=1/splits=100 41.1 ±13% 39.2 ±33% ~ (p=0.984 n=5+5) kv95-b128/cores=4/nodes=1/splits=0 17.8 ± 0% 14.7 ± 0% -17.42% (p=0.079 n=4+5) kv95-b128/cores=4/nodes=1/splits=100 107 ± 2% 106 ± 5% ~ (p=0.683 n=5+5) kv95-b128/cores=16/nodes=1/splits=0 75.5 ± 0% 23.1 ± 0% -69.40% (p=0.008 n=5+5) kv95-b128/cores=16/nodes=1/splits=100 107 ±34% 120 ± 2% ~ (p=1.000 n=5+4) kv95-b128/cores=36/nodes=1/splits=0 253 ± 4% 71 ± 0% -71.86% (p=0.016 n=5+4) kv95-b128/cores=36/nodes=1/splits=100 166 ±19% 164 ±74% ~ (p=0.310 n=5+5) kv95-b1024/cores=4/nodes=1/splits=0 146 ± 3% 101 ± 0% -31.01% (p=0.000 n=5+4) kv95-b1024/cores=4/nodes=1/splits=100 348 ± 4% 366 ± 6% ~ (p=0.317 n=4+5) kv95-b1024/cores=16/nodes=1/splits=0 624 ± 3% 221 ± 2% -64.52% (p=0.008 n=5+5) kv95-b1024/cores=16/nodes=1/splits=100 325 ± 3% 319 ± 0% ~ (p=0.444 n=5+5) kv95-b1024/cores=36/nodes=1/splits=0 1.56k ± 5% 0.41k ± 2% -73.71% (p=0.008 n=5+5) kv95-b1024/cores=36/nodes=1/splits=100 336 ± 0% 336 ± 0% ~ (all equal) ``` Release note (performance improvement): Replace Replica latching mechanism with new optimized data structure that improves throughput, especially under heavy contention.
- Loading branch information