Skip to content

Commit

Permalink
storage: replace CommandQueue with spanlatch.Manager
Browse files Browse the repository at this point in the history
This commit replaces the CommandQueue with the spanlatch.Manager, which
was introduced in #31997. See that PR for an introduction to how the
structure differs from the CommandQueue and how it improves performance
on microbenchmarks.

This is mostly a mechanical change. One important detail is that it removes
the CommandQueue debug change. We found that the page was buggy (or straight
up broken) and it wasn't actively used by members of Core when debugging problems.
In its place, the commit revives the "slow requests" metric for latching, which
hasn't been hooked up in over a year.

_### Benchmarks

_#### Standard Benchmarks

These benchmarks are standard benchmarks that we commonly run. They were run with
varying node sizes, cluster sizes, and pre-split counts.

```
name                              old ops/sec  new ops/sec  delta
kv0/cores=4/nodes=1/splits=0       1.99k ± 2%   2.06k ± 1%   +3.22%  (p=0.008 n=5+5)
kv0/cores=4/nodes=1/splits=100     2.25k ± 1%   2.38k ± 1%   +6.01%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=0       1.60k ± 0%   1.69k ± 2%   +5.53%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=100     3.52k ± 6%   3.65k ± 9%     ~     (p=0.421 n=5+5)
kv0/cores=16/nodes=1/splits=0      19.9k ± 1%   21.8k ± 1%   +9.34%  (p=0.008 n=5+5)
kv0/cores=16/nodes=1/splits=100    24.4k ± 1%   26.1k ± 1%   +7.17%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=0      14.9k ± 1%   16.1k ± 1%   +8.03%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=100    20.6k ± 1%   22.8k ± 1%  +10.79%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=0      31.2k ± 2%   35.3k ± 1%  +13.28%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=100    45.7k ± 1%   51.1k ± 1%  +11.80%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=0      23.7k ± 2%   27.1k ± 2%  +14.39%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=100    34.9k ± 2%   45.1k ± 1%  +29.44%  (p=0.008 n=5+5)
kv95/cores=4/nodes=1/splits=0      12.7k ± 2%   12.9k ± 2%   +1.39%  (p=0.151 n=5+5)
kv95/cores=4/nodes=1/splits=100    12.8k ± 2%   13.1k ± 2%   +2.10%  (p=0.032 n=5+5)
kv95/cores=4/nodes=3/splits=0      10.6k ± 1%   10.8k ± 1%   +1.58%  (p=0.056 n=5+5)
kv95/cores=4/nodes=3/splits=100    12.3k ± 7%   12.6k ± 8%   +2.61%  (p=0.095 n=5+5)
kv95/cores=16/nodes=1/splits=0     50.9k ± 1%   52.2k ± 1%   +2.37%  (p=0.008 n=5+5)
kv95/cores=16/nodes=1/splits=100   52.2k ± 1%   53.0k ± 1%   +1.49%  (p=0.008 n=5+5)
kv95/cores=16/nodes=3/splits=0     46.2k ± 1%   46.8k ± 1%   +1.32%  (p=0.032 n=5+5)
kv95/cores=16/nodes=3/splits=100   51.0k ± 1%   53.2k ± 1%   +4.25%  (p=0.008 n=5+5)
kv95/cores=36/nodes=1/splits=0     79.8k ± 2%  101.6k ± 1%  +27.31%  (p=0.008 n=5+5)
kv95/cores=36/nodes=1/splits=100    104k ± 1%    107k ± 1%   +2.60%  (p=0.008 n=5+5)
kv95/cores=36/nodes=3/splits=0     85.8k ± 1%   91.8k ± 1%   +7.08%  (p=0.008 n=5+5)
kv95/cores=36/nodes=3/splits=100    106k ± 1%    112k ± 1%   +5.51%  (p=0.008 n=5+5)

name                              old p50(ms)  new p50(ms)  delta
kv0/cores=4/nodes=1/splits=0        3.52 ± 5%    3.40 ± 0%   -3.41%  (p=0.016 n=5+4)
kv0/cores=4/nodes=1/splits=100      3.30 ± 0%    3.00 ± 0%   -9.09%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=0        4.70 ± 0%    4.14 ± 9%  -11.91%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=100      1.50 ± 0%    1.48 ± 8%     ~     (p=0.968 n=4+5)
kv0/cores=16/nodes=1/splits=0       1.40 ± 0%    1.40 ± 0%     ~     (all equal)
kv0/cores=16/nodes=1/splits=100     1.20 ± 0%    1.20 ± 0%     ~     (all equal)
kv0/cores=16/nodes=3/splits=0       2.00 ± 0%    1.90 ± 0%   -5.00%  (p=0.000 n=5+4)
kv0/cores=16/nodes=3/splits=100     1.40 ± 0%    1.40 ± 0%     ~     (all equal)
kv0/cores=36/nodes=1/splits=0       1.76 ± 3%    1.60 ± 0%   -9.09%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=100     1.40 ± 0%    1.30 ± 0%   -7.14%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=0       2.56 ± 2%    2.40 ± 0%   -6.25%  (p=0.000 n=5+4)
kv0/cores=36/nodes=3/splits=100     1.70 ± 0%    1.40 ± 0%  -17.65%  (p=0.008 n=5+5)
kv95/cores=4/nodes=1/splits=0       0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=4/nodes=1/splits=100     0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=4/nodes=3/splits=0       0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95/cores=4/nodes=3/splits=100     0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95/cores=16/nodes=1/splits=0      0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=16/nodes=1/splits=100    0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=16/nodes=3/splits=0      0.70 ± 0%    0.64 ± 9%   -8.57%  (p=0.167 n=5+5)
kv95/cores=16/nodes=3/splits=100    0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95/cores=36/nodes=1/splits=0      0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=36/nodes=1/splits=100    0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=36/nodes=3/splits=0      0.66 ± 9%    0.60 ± 0%   -9.09%  (p=0.167 n=5+5)
kv95/cores=36/nodes=3/splits=100    0.60 ± 0%    0.60 ± 0%     ~     (all equal)

name                              old p99(ms)  new p99(ms)  delta
kv0/cores=4/nodes=1/splits=0        11.0 ± 0%    10.5 ± 0%   -4.55%  (p=0.000 n=5+4)
kv0/cores=4/nodes=1/splits=100      7.90 ± 0%    7.60 ± 0%   -3.80%  (p=0.000 n=5+4)
kv0/cores=4/nodes=3/splits=0        15.7 ± 0%    15.2 ± 0%   -3.18%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=100      8.90 ± 0%    8.12 ± 3%   -8.76%  (p=0.016 n=4+5)
kv0/cores=16/nodes=1/splits=0       3.46 ± 2%    3.00 ± 0%  -13.29%  (p=0.000 n=5+4)
kv0/cores=16/nodes=1/splits=100     4.50 ± 0%    3.36 ± 2%  -25.33%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=0       4.50 ± 0%    3.90 ± 0%  -13.33%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=100     5.80 ± 0%    4.10 ± 0%  -29.31%  (p=0.029 n=4+4)
kv0/cores=36/nodes=1/splits=0       6.80 ± 0%    5.20 ± 0%  -23.53%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=100     5.80 ± 0%    4.32 ± 4%  -25.52%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=0       7.72 ± 2%    6.30 ± 0%  -18.39%  (p=0.000 n=5+4)
kv0/cores=36/nodes=3/splits=100     7.98 ± 2%    5.20 ± 0%  -34.84%  (p=0.000 n=5+4)
kv95/cores=4/nodes=1/splits=0       5.38 ± 3%    5.20 ± 0%   -3.35%  (p=0.167 n=5+5)
kv95/cores=4/nodes=1/splits=100     5.00 ± 0%    5.00 ± 0%     ~     (all equal)
kv95/cores=4/nodes=3/splits=0       5.68 ± 3%    5.50 ± 0%   -3.17%  (p=0.095 n=5+4)
kv95/cores=4/nodes=3/splits=100     3.60 ±31%    2.93 ± 3%  -18.75%  (p=0.016 n=5+4)
kv95/cores=16/nodes=1/splits=0      4.10 ± 0%    4.10 ± 0%     ~     (all equal)
kv95/cores=16/nodes=1/splits=100    4.50 ± 0%    4.10 ± 0%   -8.89%  (p=0.000 n=5+4)
kv95/cores=16/nodes=3/splits=0      2.60 ± 0%    2.60 ± 0%     ~     (all equal)
kv95/cores=16/nodes=3/splits=100    2.50 ± 0%    1.90 ± 5%  -24.00%  (p=0.008 n=5+5)
kv95/cores=36/nodes=1/splits=0      6.60 ± 0%    6.00 ± 0%   -9.09%  (p=0.029 n=4+4)
kv95/cores=36/nodes=1/splits=100    5.50 ± 0%    5.12 ± 2%   -6.91%  (p=0.008 n=5+5)
kv95/cores=36/nodes=3/splits=0      4.18 ± 2%    4.02 ± 3%   -3.71%  (p=0.000 n=4+5)
kv95/cores=36/nodes=3/splits=100    3.80 ± 0%    2.80 ± 0%  -26.32%  (p=0.008 n=5+5)
```

_#### Large-machine Benchmarks

These benchmarks are standard benchmarks run on a single-node cluster with 72 vCPUs.

```
name                              old ops/sec  new ops/sec  delta
kv0/cores=72/nodes=1/splits=0      31.0k ± 4%   36.4k ± 1%  +17.57%  (p=0.008 n=5+5)
kv0/cores=72/nodes=1/splits=100    44.0k ± 0%   49.0k ± 1%  +11.41%  (p=0.008 n=5+5)
kv95/cores=72/nodes=1/splits=0     52.7k ±18%   72.6k ±26%  +37.70%  (p=0.016 n=5+5)
kv95/cores=72/nodes=1/splits=100   66.8k ±17%   68.5k ± 5%     ~     (p=0.286 n=5+4)

name                              old p50(ms)  new p50(ms)  delta
kv0/cores=72/nodes=1/splits=0       2.30 ±13%    2.52 ± 5%     ~     (p=0.214 n=5+5)
kv0/cores=72/nodes=1/splits=100     3.00 ± 0%    2.90 ± 0%   -3.33%  (p=0.008 n=5+5)
kv95/cores=72/nodes=1/splits=0      0.46 ±13%    0.50 ± 0%     ~     (p=0.444 n=5+5)
kv95/cores=72/nodes=1/splits=100    0.44 ±14%    0.50 ± 0%  +13.64%  (p=0.167 n=5+5)

name                              old p99(ms)  new p99(ms)  delta
kv0/cores=72/nodes=1/splits=0       18.9 ± 6%    13.3 ± 5%  -29.56%  (p=0.008 n=5+5)
kv0/cores=72/nodes=1/splits=100     13.4 ± 2%    11.0 ± 0%  -17.91%  (p=0.008 n=5+5)
kv95/cores=72/nodes=1/splits=0      34.4 ±34%    23.5 ±24%  -31.74%  (p=0.048 n=5+5)
kv95/cores=72/nodes=1/splits=100    21.0 ± 0%    19.1 ± 4%   -8.81%  (p=0.029 n=4+4)
```

_#### Motivating Benchmarks

These are benchmarks that used to generate a lot of contention in the CommandQueue.
They have small cycle-lengths, indicated by the `c` specifier. The last one also includes
20% scan operations, which increases contention between non-overlapping point operations.

```
name                                    old ops/sec  new ops/sec  delta
kv95-c5/cores=16/nodes=1/splits=0        45.1k ± 1%   47.2k ± 4%   +4.59%  (p=0.008 n=5+5)
kv95-c5/cores=36/nodes=1/splits=0        44.6k ± 1%   76.3k ± 1%  +71.05%  (p=0.008 n=5+5)
kv50-c128/cores=16/nodes=1/splits=0      27.2k ± 2%   29.4k ± 1%   +8.12%  (p=0.008 n=5+5)
kv50-c128/cores=36/nodes=1/splits=0      42.6k ± 2%   50.0k ± 1%  +17.39%  (p=0.008 n=5+5)
kv70-20-c128/cores=16/nodes=1/splits=0   28.7k ± 1%   29.8k ± 3%   +3.87%  (p=0.008 n=5+5)
kv70-20-c128/cores=36/nodes=1/splits=0   41.9k ± 4%   52.8k ± 2%  +25.97%  (p=0.008 n=5+5)

name                                    old p50(ms)  new p50(ms)  delta
kv95-c5/cores=16/nodes=1/splits=0         0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95-c5/cores=36/nodes=1/splits=0         0.90 ± 0%    0.80 ± 0%  -11.11%  (p=0.008 n=5+5)
kv50-c128/cores=16/nodes=1/splits=0       1.10 ± 0%    1.06 ± 6%     ~     (p=0.444 n=5+5)
kv50-c128/cores=36/nodes=1/splits=0       1.26 ± 5%    1.30 ± 0%     ~     (p=0.444 n=5+5)
kv70-20-c128/cores=16/nodes=1/splits=0    0.66 ± 9%    0.60 ± 0%   -9.09%  (p=0.167 n=5+5)
kv70-20-c128/cores=36/nodes=1/splits=0    0.70 ± 0%    0.50 ± 0%  -28.57%  (p=0.008 n=5+5)

name                                    old p99(ms)  new p99(ms)  delta
kv95-c5/cores=16/nodes=1/splits=0         2.40 ± 0%    2.10 ± 0%  -12.50%  (p=0.000 n=5+4)
kv95-c5/cores=36/nodes=1/splits=0         5.80 ± 0%    3.30 ± 0%  -43.10%  (p=0.000 n=5+4)
kv50-c128/cores=16/nodes=1/splits=0       3.50 ± 0%    3.00 ± 0%  -14.29%  (p=0.008 n=5+5)
kv50-c128/cores=36/nodes=1/splits=0       6.80 ± 0%    4.70 ± 0%  -30.88%  (p=0.079 n=4+5)
kv70-20-c128/cores=16/nodes=1/splits=0    5.00 ± 0%    4.70 ± 0%   -6.00%  (p=0.029 n=4+4)
kv70-20-c128/cores=36/nodes=1/splits=0    11.0 ± 0%     6.8 ± 0%  -38.18%  (p=0.008 n=5+5)
```

_#### Batching Benchmarks

One optimization left out of the new spanlatch.Manager was the "covering" optimization,
where commands were initially added to the interval tree as a single spanning interval
and only expanded later. I ran a series of benchmarks to verify that this optimization
was not needed. My hypothesis was that the order of magnitude increase the speed of the
interval tree would make the optimization unnecessary.

It turns out that removing the optimization hurt a few benchmarks to a small
degree but speed up others tremendously (some benchmarks improved by over 400%).
I suspect that the covering optimization could actually hurt in cases where it
causes non-overlapping requests to overlap. It is interesting how quickly a few
of these benchmarks oscillate from small losses to big wins. It makes me think
that there's some non-linear behavior with the old CommandQueue that would cause
its performance to quickly degrade once it became a contention bottleneck.

```
name                                    old ops/sec  new ops/sec  delta
kv0-b16/cores=4/nodes=1/splits=0         2.41k ± 0%   2.06k ± 3%   -14.75%  (p=0.008 n=5+5)
kv0-b16/cores=4/nodes=1/splits=100         514 ± 0%     534 ± 1%    +3.88%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=0        2.95k ± 0%   4.35k ± 0%   +47.74%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=100      1.80k ± 1%   1.88k ± 1%    +4.46%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=0        2.74k ± 0%   4.92k ± 1%   +79.55%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=100      2.39k ± 1%   2.45k ± 1%    +2.41%  (p=0.008 n=5+5)
kv0-b128/cores=4/nodes=1/splits=0          422 ± 0%     518 ± 1%   +22.60%  (p=0.008 n=5+5)
kv0-b128/cores=4/nodes=1/splits=100       98.4 ± 1%    98.8 ± 1%      ~     (p=0.810 n=5+5)
kv0-b128/cores=16/nodes=1/splits=0         532 ± 0%    1059 ± 0%   +99.16%  (p=0.008 n=5+5)
kv0-b128/cores=16/nodes=1/splits=100       291 ± 1%     307 ± 1%    +5.18%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=0         483 ± 0%    1288 ± 1%  +166.37%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=100       394 ± 1%     408 ± 1%    +3.51%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=0        49.7 ± 1%    72.8 ± 1%   +46.52%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=100      30.8 ± 0%    23.4 ± 0%   -24.03%  (p=0.008 n=5+5)
kv0-b1024/cores=16/nodes=1/splits=0       48.9 ± 2%   160.6 ± 0%  +228.38%  (p=0.008 n=5+5)
kv0-b1024/cores=16/nodes=1/splits=100      101 ± 1%      80 ± 0%   -21.64%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=0       37.5 ± 0%   208.1 ± 1%  +454.99%  (p=0.016 n=4+5)
kv0-b1024/cores=36/nodes=1/splits=100      162 ± 0%     124 ± 0%   -23.22%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=0        5.93k ± 0%   6.20k ± 1%    +4.55%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=100      2.27k ± 1%   2.32k ± 1%    +2.28%  (p=0.008 n=5+5)
kv95-b16/cores=16/nodes=1/splits=0       5.15k ± 1%  18.79k ± 1%  +264.73%  (p=0.008 n=5+5)
kv95-b16/cores=16/nodes=1/splits=100     8.31k ± 1%   8.57k ± 1%    +3.16%  (p=0.008 n=5+5)
kv95-b16/cores=36/nodes=1/splits=0       3.96k ± 0%  10.67k ± 1%  +169.81%  (p=0.008 n=5+5)
kv95-b16/cores=36/nodes=1/splits=100     15.7k ± 2%   16.2k ± 4%    +2.75%  (p=0.151 n=5+5)
kv95-b128/cores=4/nodes=1/splits=0       1.12k ± 1%   1.27k ± 0%   +13.28%  (p=0.008 n=5+5)
kv95-b128/cores=4/nodes=1/splits=100       290 ± 1%     299 ± 1%    +3.02%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=0      1.06k ± 0%   3.31k ± 0%  +213.09%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=100      662 ±91%    1095 ± 1%   +65.42%  (p=0.016 n=5+4)
kv95-b128/cores=36/nodes=1/splits=0        715 ± 2%    3586 ± 0%  +401.21%  (p=0.008 n=5+5)
kv95-b128/cores=36/nodes=1/splits=100    1.15k ±90%   2.01k ± 2%   +74.79%  (p=0.016 n=5+4)
kv95-b1024/cores=4/nodes=1/splits=0        134 ± 1%     170 ± 1%   +26.59%  (p=0.008 n=5+5)
kv95-b1024/cores=4/nodes=1/splits=100     54.8 ± 3%    53.3 ± 3%    -2.84%  (p=0.056 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=0       104 ± 3%     367 ± 1%  +252.37%  (p=0.008 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=100     210 ± 1%     214 ± 1%    +1.86%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=0      76.5 ± 2%   383.9 ± 1%  +401.67%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=100     431 ± 1%     436 ± 1%    +1.17%  (p=0.016 n=5+5)

name                                    old p50(ms)  new p50(ms)  delta
kv0-b16/cores=4/nodes=1/splits=0          3.00 ± 0%    3.40 ± 0%   +13.33%  (p=0.016 n=5+4)
kv0-b16/cores=4/nodes=1/splits=100        15.2 ± 0%    14.7 ± 0%    -3.29%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=0         10.5 ± 0%     7.7 ± 2%   -26.48%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=100       17.8 ± 0%    16.8 ± 0%    -5.62%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=0         26.2 ± 0%    14.2 ± 0%   -45.80%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=100       29.0 ± 2%    28.3 ± 0%    -2.28%  (p=0.095 n=5+4)
kv0-b128/cores=4/nodes=1/splits=0         17.8 ± 0%    15.2 ± 0%   -14.61%  (p=0.000 n=5+4)
kv0-b128/cores=4/nodes=1/splits=100       79.7 ± 0%    79.7 ± 0%      ~     (all equal)
kv0-b128/cores=16/nodes=1/splits=0        65.0 ± 0%    32.5 ± 0%   -50.00%  (p=0.029 n=4+4)
kv0-b128/cores=16/nodes=1/splits=100       109 ± 0%     105 ± 0%    -3.85%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=0         168 ± 0%      50 ± 0%   -70.02%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=100       184 ± 0%     176 ± 0%    -4.50%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=0         159 ± 0%     109 ± 0%   -31.56%  (p=0.000 n=5+4)
kv0-b1024/cores=4/nodes=1/splits=100       252 ± 0%     319 ± 0%   +26.66%  (p=0.008 n=5+5)
kv0-b1024/cores=16/nodes=1/splits=0        705 ± 0%     193 ± 0%   -72.62%  (p=0.000 n=5+4)
kv0-b1024/cores=16/nodes=1/splits=100      319 ± 0%     386 ± 0%   +21.05%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=0      1.88k ± 0%   0.24k ± 0%   -87.05%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=100      436 ± 0%     570 ± 0%   +30.77%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=0         1.20 ± 0%    1.20 ± 0%      ~     (all equal)
kv95-b16/cores=4/nodes=1/splits=100       2.60 ± 0%    2.60 ± 0%      ~     (all equal)
kv95-b16/cores=16/nodes=1/splits=0        6.30 ± 0%    1.40 ± 0%   -77.78%  (p=0.000 n=5+4)
kv95-b16/cores=16/nodes=1/splits=100      1.74 ± 3%    1.76 ± 3%      ~     (p=1.000 n=5+5)
kv95-b16/cores=36/nodes=1/splits=0        11.5 ± 0%     5.5 ± 0%   -52.17%  (p=0.000 n=5+4)
kv95-b16/cores=36/nodes=1/splits=100      2.42 ±20%    2.42 ±45%      ~     (p=0.579 n=5+5)
kv95-b128/cores=4/nodes=1/splits=0        6.60 ± 0%    6.00 ± 0%    -9.09%  (p=0.008 n=5+5)
kv95-b128/cores=4/nodes=1/splits=100      21.4 ± 3%    21.0 ± 0%      ~     (p=0.444 n=5+5)
kv95-b128/cores=16/nodes=1/splits=0       30.4 ± 0%     9.4 ± 0%   -69.08%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=100     38.2 ±76%    21.2 ± 4%   -44.31%  (p=0.063 n=5+4)
kv95-b128/cores=36/nodes=1/splits=0       88.1 ± 0%    16.8 ± 0%   -80.93%  (p=0.000 n=5+4)
kv95-b128/cores=36/nodes=1/splits=100     56.6 ±85%    29.6 ±15%      ~     (p=0.873 n=5+4)
kv95-b1024/cores=4/nodes=1/splits=0       52.4 ± 0%    44.0 ± 0%   -16.03%  (p=0.029 n=4+4)
kv95-b1024/cores=4/nodes=1/splits=100      132 ± 2%     143 ± 0%    +8.29%  (p=0.016 n=5+4)
kv95-b1024/cores=16/nodes=1/splits=0       325 ± 3%      80 ± 0%   -75.51%  (p=0.008 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=100     151 ± 0%     151 ± 0%      ~     (all equal)
kv95-b1024/cores=36/nodes=1/splits=0       973 ± 0%     180 ± 3%   -81.55%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=100     168 ± 0%     168 ± 0%      ~     (all equal)

name                                    old p99(ms)  new p99(ms)  delta
kv0-b16/cores=4/nodes=1/splits=0          8.40 ± 0%   10.30 ± 3%   +22.62%  (p=0.016 n=4+5)
kv0-b16/cores=4/nodes=1/splits=100        29.4 ± 0%    27.3 ± 0%    -7.14%  (p=0.000 n=5+4)
kv0-b16/cores=16/nodes=1/splits=0         16.3 ± 0%    15.5 ± 2%    -4.91%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=100       31.5 ± 0%    29.4 ± 0%    -6.67%  (p=0.000 n=5+4)
kv0-b16/cores=36/nodes=1/splits=0         37.7 ± 0%    28.7 ± 2%   -23.77%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=100       62.1 ± 2%    68.4 ±10%   +10.15%  (p=0.008 n=5+5)
kv0-b128/cores=4/nodes=1/splits=0         37.7 ± 0%    39.4 ± 6%    +4.46%  (p=0.167 n=5+5)
kv0-b128/cores=4/nodes=1/splits=100        143 ± 0%     151 ± 0%    +5.89%  (p=0.016 n=4+5)
kv0-b128/cores=16/nodes=1/splits=0        79.7 ± 0%    55.8 ± 2%   -30.04%  (p=0.008 n=5+5)
kv0-b128/cores=16/nodes=1/splits=100       198 ± 3%     188 ± 3%    -5.09%  (p=0.048 n=5+5)
kv0-b128/cores=36/nodes=1/splits=0         184 ± 0%     126 ± 3%   -31.82%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=100       319 ± 0%     336 ± 0%    +5.24%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=0         322 ± 6%     253 ± 4%   -21.35%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=100       470 ± 0%     772 ± 4%   +64.28%  (p=0.016 n=4+5)
kv0-b1024/cores=16/nodes=1/splits=0      1.41k ± 0%   0.56k ±11%   -60.00%  (p=0.000 n=4+5)
kv0-b1024/cores=16/nodes=1/splits=100      530 ± 2%     772 ± 0%   +45.57%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=0      4.05k ± 7%   1.17k ± 3%   -71.19%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=100      792 ±14%    1020 ± 2%   +28.81%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=0         3.90 ± 0%    3.22 ± 4%   -17.44%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=100       21.0 ± 0%    19.9 ± 0%    -5.24%  (p=0.079 n=4+5)
kv95-b16/cores=16/nodes=1/splits=0        15.2 ± 0%     7.1 ± 0%   -53.29%  (p=0.079 n=4+5)
kv95-b16/cores=16/nodes=1/splits=100      38.5 ± 3%    37.7 ± 0%      ~     (p=0.333 n=5+4)
kv95-b16/cores=36/nodes=1/splits=0         128 ± 2%      52 ± 0%   -59.16%  (p=0.000 n=5+4)
kv95-b16/cores=36/nodes=1/splits=100      41.1 ±13%    39.2 ±33%      ~     (p=0.984 n=5+5)
kv95-b128/cores=4/nodes=1/splits=0        17.8 ± 0%    14.7 ± 0%   -17.42%  (p=0.079 n=4+5)
kv95-b128/cores=4/nodes=1/splits=100       107 ± 2%     106 ± 5%      ~     (p=0.683 n=5+5)
kv95-b128/cores=16/nodes=1/splits=0       75.5 ± 0%    23.1 ± 0%   -69.40%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=100      107 ±34%     120 ± 2%      ~     (p=1.000 n=5+4)
kv95-b128/cores=36/nodes=1/splits=0        253 ± 4%      71 ± 0%   -71.86%  (p=0.016 n=5+4)
kv95-b128/cores=36/nodes=1/splits=100      166 ±19%     164 ±74%      ~     (p=0.310 n=5+5)
kv95-b1024/cores=4/nodes=1/splits=0        146 ± 3%     101 ± 0%   -31.01%  (p=0.000 n=5+4)
kv95-b1024/cores=4/nodes=1/splits=100      348 ± 4%     366 ± 6%      ~     (p=0.317 n=4+5)
kv95-b1024/cores=16/nodes=1/splits=0       624 ± 3%     221 ± 2%   -64.52%  (p=0.008 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=100     325 ± 3%     319 ± 0%      ~     (p=0.444 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=0     1.56k ± 5%   0.41k ± 2%   -73.71%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=100     336 ± 0%     336 ± 0%      ~     (all equal)
```

Release note (performance improvement): Replace Replica latching mechanism
with new optimized data structure that improves throughput, especially
under heavy contention.
  • Loading branch information
nvanbenschoten committed Dec 8, 2018
1 parent f725fe4 commit eb38f20
Show file tree
Hide file tree
Showing 31 changed files with 729 additions and 5,464 deletions.
3 changes: 0 additions & 3 deletions pkg/server/serverpb/admin.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1,196 changes: 362 additions & 834 deletions pkg/server/serverpb/status.pb.go

Large diffs are not rendered by default.

60 changes: 0 additions & 60 deletions pkg/server/serverpb/status.pb.gw.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 2 additions & 21 deletions pkg/server/serverpb/status.proto
Original file line number Diff line number Diff line change
Expand Up @@ -164,13 +164,6 @@ message PrettySpan {
string end_key = 2;
}

message CommandQueueMetrics {
int64 write_commands = 1;
int64 read_commands = 2;
int64 max_overlaps_seen = 3;
int32 tree_size = 4;
}

message RangeInfo {
PrettySpan span = 1 [ (gogoproto.nullable) = false ];
RaftState raft_state = 2 [ (gogoproto.nullable) = false ];
Expand All @@ -189,8 +182,8 @@ message RangeInfo {
repeated roachpb.Lease lease_history = 8 [ (gogoproto.nullable) = false ];
RangeProblems problems = 9 [ (gogoproto.nullable) = false ];
RangeStatistics stats = 10 [ (gogoproto.nullable) = false ];
CommandQueueMetrics cmd_q_local = 11 [ (gogoproto.nullable) = false ];
CommandQueueMetrics cmd_q_global = 12 [ (gogoproto.nullable) = false ];
storage.storagepb.LatchManagerInfo latches_local = 11 [ (gogoproto.nullable) = false ];
storage.storagepb.LatchManagerInfo latches_global = 12 [ (gogoproto.nullable) = false ];
storage.LeaseStatus lease_status = 13 [ (gogoproto.nullable) = false ];
bool quiescent = 14;
bool ticking = 15;
Expand Down Expand Up @@ -611,13 +604,6 @@ message RangeResponse {
reserved 4; // Previously used.
}

message CommandQueueRequest { int64 range_id = 1; }

message CommandQueueResponse {
storage.storagepb.CommandQueuesSnapshot snapshot = 1
[ (gogoproto.nullable) = false ];
}

// DiagnosticsRequest requests a diagnostics report.
message DiagnosticsRequest {
// node_id is a string so that "local" can be used to specify that no
Expand Down Expand Up @@ -805,11 +791,6 @@ service Status {
get : "/_status/range/{range_id}"
};
}
rpc CommandQueue(CommandQueueRequest) returns (CommandQueueResponse) {
option (google.api.http) = {
get : "/_status/range/{range_id}/cmdqueue"
};
}
rpc Diagnostics(DiagnosticsRequest)
returns (cockroach.server.diagnosticspb.DiagnosticReport) {
option (google.api.http) = {
Expand Down
30 changes: 5 additions & 25 deletions pkg/server/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -1214,11 +1214,11 @@ func (s *statusServer) Ranges(
QuiescentEqualsTicking: raftStatus != nil && metrics.Quiescent == metrics.Ticking,
RaftLogTooLarge: metrics.RaftLogTooLarge,
},
CmdQLocal: serverpb.CommandQueueMetrics(metrics.CmdQMetricsLocal),
CmdQGlobal: serverpb.CommandQueueMetrics(metrics.CmdQMetricsGlobal),
LeaseStatus: metrics.LeaseStatus,
Quiescent: metrics.Quiescent,
Ticking: metrics.Ticking,
LatchesLocal: metrics.LatchInfoLocal,
LatchesGlobal: metrics.LatchInfoGlobal,
LeaseStatus: metrics.LeaseStatus,
Quiescent: metrics.Quiescent,
Ticking: metrics.Ticking,
}
}

Expand Down Expand Up @@ -1321,26 +1321,6 @@ func (s *statusServer) Range(
return response, nil
}

// CommandQueue returns a snapshot of the command queue state for the
// specified range.
func (s *statusServer) CommandQueue(
ctx context.Context, req *serverpb.CommandQueueRequest,
) (*serverpb.CommandQueueResponse, error) {
rangeID := roachpb.RangeID(req.RangeId)
replica, err := s.stores.GetReplicaForRangeID(rangeID)
if err != nil {
return nil, err
}

if replica == nil {
return nil, roachpb.NewRangeNotFoundError(rangeID, 0)
}

return &serverpb.CommandQueueResponse{
Snapshot: replica.GetCommandQueueSnapshot(),
}, nil
}

// ListLocalSessions returns a list of SQL sessions on this node.
func (s *statusServer) ListLocalSessions(
ctx context.Context, req *serverpb.ListSessionsRequest,
Expand Down
2 changes: 1 addition & 1 deletion pkg/server/status/health_check.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ var trackedMetrics = map[string]threshold{
"ranges.unavailable": gaugeZero,
"ranges.underreplicated": gaugeZero,
"requests.backpressure.split": gaugeZero,
"requests.slow.commandqueue": gaugeZero,
"requests.slow.latch": gaugeZero,
"requests.slow.lease": gaugeZero,
"requests.slow.raft": gaugeZero,
"sys.goroutines": {gauge: true, min: 5000},
Expand Down
29 changes: 14 additions & 15 deletions pkg/storage/client_merge_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1201,18 +1201,17 @@ func TestStoreRangeMergeRHSLeaseExpiration(t *testing.T) {
}

// Install a hook to observe when a get request for a special key,
// rhsSentinel, exits the command queue.
// rhsSentinel, acquires latches and begins evaluating.
const getConcurrency = 10
rhsSentinel := roachpb.Key("rhs-sentinel")
getExitedCommandQueue := make(chan struct{}, getConcurrency)
storeCfg.TestingKnobs.OnCommandQueueAction = func(ba *roachpb.BatchRequest, action storagebase.CommandQueueAction) {
if action == storagebase.CommandQueueBeginExecuting {
for _, r := range ba.Requests {
if get := r.GetGet(); get != nil && get.RequestHeader.Key.Equal(rhsSentinel) {
getExitedCommandQueue <- struct{}{}
}
getAcquiredLatch := make(chan struct{}, getConcurrency)
storeCfg.TestingKnobs.TestingLatchFilter = func(ba roachpb.BatchRequest) *roachpb.Error {
for _, r := range ba.Requests {
if get := r.GetGet(); get != nil && get.RequestHeader.Key.Equal(rhsSentinel) {
getAcquiredLatch <- struct{}{}
}
}
return nil
}

mtc := &multiTestContext{storeConfig: &storeCfg}
Expand Down Expand Up @@ -1271,7 +1270,7 @@ func TestStoreRangeMergeRHSLeaseExpiration(t *testing.T) {
// Note that the first request would never hit this race on its own. Nor would
// any request that arrived early enough to see an outdated lease in
// Replica.mu.state.Lease. All of these requests joined the in-progress lease
// acquisition and blocked until the lease command exited the command queue,
// acquisition and blocked until the lease command acquires its latches,
// at which point the mergeComplete channel was updated. To hit the race, the
// request needed to arrive exactly between the update to
// Replica.mu.state.Lease and the update to Replica.mu.mergeComplete.
Expand Down Expand Up @@ -1300,12 +1299,12 @@ func TestStoreRangeMergeRHSLeaseExpiration(t *testing.T) {
time.Sleep(time.Millisecond)
}

// Wait for the get requests to fall out of the command queue, which is as far
// as they can get while the merge is in progress. Then wait a little bit
// longer. This tests that the requests really do get stuck waiting for the
// merge to complete without depending too heavily on implementation details.
// Wait for the get requests to acquire latches, which is as far as they can
// get while the merge is in progress. Then wait a little bit longer. This
// tests that the requests really do get stuck waiting for the merge to
// complete without depending too heavily on implementation details.
for i := 0; i < getConcurrency; i++ {
<-getExitedCommandQueue
<-getAcquiredLatch
}
time.Sleep(50 * time.Millisecond)

Expand Down Expand Up @@ -1361,7 +1360,7 @@ func TestStoreRangeMergeConcurrentRequests(t *testing.T) {
//
// This scenario previously caused deadlock. The merge was not able to
// complete until the Subsume request completed, but the Subsume request
// was stuck in the command queue until the Get request finished, which
// was unable to acquire latches until the Get request finished, which
// was itself waiting for the merge to complete. Whoops!
mtc.advanceClock(ctx)
}
Expand Down
Loading

0 comments on commit eb38f20

Please sign in to comment.