storage: replace CommandQueue with spanlatch.Manager · cockroachdb/cockroach@eb38f20

Commit

storage: replace CommandQueue with spanlatch.Manager

This commit replaces the CommandQueue with the spanlatch.Manager, which
was introduced in #31997. See that PR for an introduction to how the
structure differs from the CommandQueue and how it improves performance
on microbenchmarks.

This is mostly a mechanical change. One important detail is that it removes
the CommandQueue debug change. We found that the page was buggy (or straight
up broken) and it wasn't actively used by members of Core when debugging problems.
In its place, the commit revives the "slow requests" metric for latching, which
hasn't been hooked up in over a year.

_### Benchmarks

_#### Standard Benchmarks

These benchmarks are standard benchmarks that we commonly run. They were run with
varying node sizes, cluster sizes, and pre-split counts.

```
name                              old ops/sec  new ops/sec  delta
kv0/cores=4/nodes=1/splits=0       1.99k ± 2%   2.06k ± 1%   +3.22%  (p=0.008 n=5+5)
kv0/cores=4/nodes=1/splits=100     2.25k ± 1%   2.38k ± 1%   +6.01%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=0       1.60k ± 0%   1.69k ± 2%   +5.53%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=100     3.52k ± 6%   3.65k ± 9%     ~     (p=0.421 n=5+5)
kv0/cores=16/nodes=1/splits=0      19.9k ± 1%   21.8k ± 1%   +9.34%  (p=0.008 n=5+5)
kv0/cores=16/nodes=1/splits=100    24.4k ± 1%   26.1k ± 1%   +7.17%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=0      14.9k ± 1%   16.1k ± 1%   +8.03%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=100    20.6k ± 1%   22.8k ± 1%  +10.79%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=0      31.2k ± 2%   35.3k ± 1%  +13.28%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=100    45.7k ± 1%   51.1k ± 1%  +11.80%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=0      23.7k ± 2%   27.1k ± 2%  +14.39%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=100    34.9k ± 2%   45.1k ± 1%  +29.44%  (p=0.008 n=5+5)
kv95/cores=4/nodes=1/splits=0      12.7k ± 2%   12.9k ± 2%   +1.39%  (p=0.151 n=5+5)
kv95/cores=4/nodes=1/splits=100    12.8k ± 2%   13.1k ± 2%   +2.10%  (p=0.032 n=5+5)
kv95/cores=4/nodes=3/splits=0      10.6k ± 1%   10.8k ± 1%   +1.58%  (p=0.056 n=5+5)
kv95/cores=4/nodes=3/splits=100    12.3k ± 7%   12.6k ± 8%   +2.61%  (p=0.095 n=5+5)
kv95/cores=16/nodes=1/splits=0     50.9k ± 1%   52.2k ± 1%   +2.37%  (p=0.008 n=5+5)
kv95/cores=16/nodes=1/splits=100   52.2k ± 1%   53.0k ± 1%   +1.49%  (p=0.008 n=5+5)
kv95/cores=16/nodes=3/splits=0     46.2k ± 1%   46.8k ± 1%   +1.32%  (p=0.032 n=5+5)
kv95/cores=16/nodes=3/splits=100   51.0k ± 1%   53.2k ± 1%   +4.25%  (p=0.008 n=5+5)
kv95/cores=36/nodes=1/splits=0     79.8k ± 2%  101.6k ± 1%  +27.31%  (p=0.008 n=5+5)
kv95/cores=36/nodes=1/splits=100    104k ± 1%    107k ± 1%   +2.60%  (p=0.008 n=5+5)
kv95/cores=36/nodes=3/splits=0     85.8k ± 1%   91.8k ± 1%   +7.08%  (p=0.008 n=5+5)
kv95/cores=36/nodes=3/splits=100    106k ± 1%    112k ± 1%   +5.51%  (p=0.008 n=5+5)

name                              old p50(ms)  new p50(ms)  delta
kv0/cores=4/nodes=1/splits=0        3.52 ± 5%    3.40 ± 0%   -3.41%  (p=0.016 n=5+4)
kv0/cores=4/nodes=1/splits=100      3.30 ± 0%    3.00 ± 0%   -9.09%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=0        4.70 ± 0%    4.14 ± 9%  -11.91%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=100      1.50 ± 0%    1.48 ± 8%     ~     (p=0.968 n=4+5)
kv0/cores=16/nodes=1/splits=0       1.40 ± 0%    1.40 ± 0%     ~     (all equal)
kv0/cores=16/nodes=1/splits=100     1.20 ± 0%    1.20 ± 0%     ~     (all equal)
kv0/cores=16/nodes=3/splits=0       2.00 ± 0%    1.90 ± 0%   -5.00%  (p=0.000 n=5+4)
kv0/cores=16/nodes=3/splits=100     1.40 ± 0%    1.40 ± 0%     ~     (all equal)
kv0/cores=36/nodes=1/splits=0       1.76 ± 3%    1.60 ± 0%   -9.09%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=100     1.40 ± 0%    1.30 ± 0%   -7.14%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=0       2.56 ± 2%    2.40 ± 0%   -6.25%  (p=0.000 n=5+4)
kv0/cores=36/nodes=3/splits=100     1.70 ± 0%    1.40 ± 0%  -17.65%  (p=0.008 n=5+5)
kv95/cores=4/nodes=1/splits=0       0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=4/nodes=1/splits=100     0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=4/nodes=3/splits=0       0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95/cores=4/nodes=3/splits=100     0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95/cores=16/nodes=1/splits=0      0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=16/nodes=1/splits=100    0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=16/nodes=3/splits=0      0.70 ± 0%    0.64 ± 9%   -8.57%  (p=0.167 n=5+5)
kv95/cores=16/nodes=3/splits=100    0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95/cores=36/nodes=1/splits=0      0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=36/nodes=1/splits=100    0.50 ± 0%    0.50 ± 0%     ~     (all equal)
kv95/cores=36/nodes=3/splits=0      0.66 ± 9%    0.60 ± 0%   -9.09%  (p=0.167 n=5+5)
kv95/cores=36/nodes=3/splits=100    0.60 ± 0%    0.60 ± 0%     ~     (all equal)

name                              old p99(ms)  new p99(ms)  delta
kv0/cores=4/nodes=1/splits=0        11.0 ± 0%    10.5 ± 0%   -4.55%  (p=0.000 n=5+4)
kv0/cores=4/nodes=1/splits=100      7.90 ± 0%    7.60 ± 0%   -3.80%  (p=0.000 n=5+4)
kv0/cores=4/nodes=3/splits=0        15.7 ± 0%    15.2 ± 0%   -3.18%  (p=0.008 n=5+5)
kv0/cores=4/nodes=3/splits=100      8.90 ± 0%    8.12 ± 3%   -8.76%  (p=0.016 n=4+5)
kv0/cores=16/nodes=1/splits=0       3.46 ± 2%    3.00 ± 0%  -13.29%  (p=0.000 n=5+4)
kv0/cores=16/nodes=1/splits=100     4.50 ± 0%    3.36 ± 2%  -25.33%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=0       4.50 ± 0%    3.90 ± 0%  -13.33%  (p=0.008 n=5+5)
kv0/cores=16/nodes=3/splits=100     5.80 ± 0%    4.10 ± 0%  -29.31%  (p=0.029 n=4+4)
kv0/cores=36/nodes=1/splits=0       6.80 ± 0%    5.20 ± 0%  -23.53%  (p=0.008 n=5+5)
kv0/cores=36/nodes=1/splits=100     5.80 ± 0%    4.32 ± 4%  -25.52%  (p=0.008 n=5+5)
kv0/cores=36/nodes=3/splits=0       7.72 ± 2%    6.30 ± 0%  -18.39%  (p=0.000 n=5+4)
kv0/cores=36/nodes=3/splits=100     7.98 ± 2%    5.20 ± 0%  -34.84%  (p=0.000 n=5+4)
kv95/cores=4/nodes=1/splits=0       5.38 ± 3%    5.20 ± 0%   -3.35%  (p=0.167 n=5+5)
kv95/cores=4/nodes=1/splits=100     5.00 ± 0%    5.00 ± 0%     ~     (all equal)
kv95/cores=4/nodes=3/splits=0       5.68 ± 3%    5.50 ± 0%   -3.17%  (p=0.095 n=5+4)
kv95/cores=4/nodes=3/splits=100     3.60 ±31%    2.93 ± 3%  -18.75%  (p=0.016 n=5+4)
kv95/cores=16/nodes=1/splits=0      4.10 ± 0%    4.10 ± 0%     ~     (all equal)
kv95/cores=16/nodes=1/splits=100    4.50 ± 0%    4.10 ± 0%   -8.89%  (p=0.000 n=5+4)
kv95/cores=16/nodes=3/splits=0      2.60 ± 0%    2.60 ± 0%     ~     (all equal)
kv95/cores=16/nodes=3/splits=100    2.50 ± 0%    1.90 ± 5%  -24.00%  (p=0.008 n=5+5)
kv95/cores=36/nodes=1/splits=0      6.60 ± 0%    6.00 ± 0%   -9.09%  (p=0.029 n=4+4)
kv95/cores=36/nodes=1/splits=100    5.50 ± 0%    5.12 ± 2%   -6.91%  (p=0.008 n=5+5)
kv95/cores=36/nodes=3/splits=0      4.18 ± 2%    4.02 ± 3%   -3.71%  (p=0.000 n=4+5)
kv95/cores=36/nodes=3/splits=100    3.80 ± 0%    2.80 ± 0%  -26.32%  (p=0.008 n=5+5)
```

_#### Large-machine Benchmarks

These benchmarks are standard benchmarks run on a single-node cluster with 72 vCPUs.

```
name                              old ops/sec  new ops/sec  delta
kv0/cores=72/nodes=1/splits=0      31.0k ± 4%   36.4k ± 1%  +17.57%  (p=0.008 n=5+5)
kv0/cores=72/nodes=1/splits=100    44.0k ± 0%   49.0k ± 1%  +11.41%  (p=0.008 n=5+5)
kv95/cores=72/nodes=1/splits=0     52.7k ±18%   72.6k ±26%  +37.70%  (p=0.016 n=5+5)
kv95/cores=72/nodes=1/splits=100   66.8k ±17%   68.5k ± 5%     ~     (p=0.286 n=5+4)

name                              old p50(ms)  new p50(ms)  delta
kv0/cores=72/nodes=1/splits=0       2.30 ±13%    2.52 ± 5%     ~     (p=0.214 n=5+5)
kv0/cores=72/nodes=1/splits=100     3.00 ± 0%    2.90 ± 0%   -3.33%  (p=0.008 n=5+5)
kv95/cores=72/nodes=1/splits=0      0.46 ±13%    0.50 ± 0%     ~     (p=0.444 n=5+5)
kv95/cores=72/nodes=1/splits=100    0.44 ±14%    0.50 ± 0%  +13.64%  (p=0.167 n=5+5)

name                              old p99(ms)  new p99(ms)  delta
kv0/cores=72/nodes=1/splits=0       18.9 ± 6%    13.3 ± 5%  -29.56%  (p=0.008 n=5+5)
kv0/cores=72/nodes=1/splits=100     13.4 ± 2%    11.0 ± 0%  -17.91%  (p=0.008 n=5+5)
kv95/cores=72/nodes=1/splits=0      34.4 ±34%    23.5 ±24%  -31.74%  (p=0.048 n=5+5)
kv95/cores=72/nodes=1/splits=100    21.0 ± 0%    19.1 ± 4%   -8.81%  (p=0.029 n=4+4)
```

_#### Motivating Benchmarks

These are benchmarks that used to generate a lot of contention in the CommandQueue.
They have small cycle-lengths, indicated by the `c` specifier. The last one also includes
20% scan operations, which increases contention between non-overlapping point operations.

```
name                                    old ops/sec  new ops/sec  delta
kv95-c5/cores=16/nodes=1/splits=0        45.1k ± 1%   47.2k ± 4%   +4.59%  (p=0.008 n=5+5)
kv95-c5/cores=36/nodes=1/splits=0        44.6k ± 1%   76.3k ± 1%  +71.05%  (p=0.008 n=5+5)
kv50-c128/cores=16/nodes=1/splits=0      27.2k ± 2%   29.4k ± 1%   +8.12%  (p=0.008 n=5+5)
kv50-c128/cores=36/nodes=1/splits=0      42.6k ± 2%   50.0k ± 1%  +17.39%  (p=0.008 n=5+5)
kv70-20-c128/cores=16/nodes=1/splits=0   28.7k ± 1%   29.8k ± 3%   +3.87%  (p=0.008 n=5+5)
kv70-20-c128/cores=36/nodes=1/splits=0   41.9k ± 4%   52.8k ± 2%  +25.97%  (p=0.008 n=5+5)

name                                    old p50(ms)  new p50(ms)  delta
kv95-c5/cores=16/nodes=1/splits=0         0.60 ± 0%    0.60 ± 0%     ~     (all equal)
kv95-c5/cores=36/nodes=1/splits=0         0.90 ± 0%    0.80 ± 0%  -11.11%  (p=0.008 n=5+5)
kv50-c128/cores=16/nodes=1/splits=0       1.10 ± 0%    1.06 ± 6%     ~     (p=0.444 n=5+5)
kv50-c128/cores=36/nodes=1/splits=0       1.26 ± 5%    1.30 ± 0%     ~     (p=0.444 n=5+5)
kv70-20-c128/cores=16/nodes=1/splits=0    0.66 ± 9%    0.60 ± 0%   -9.09%  (p=0.167 n=5+5)
kv70-20-c128/cores=36/nodes=1/splits=0    0.70 ± 0%    0.50 ± 0%  -28.57%  (p=0.008 n=5+5)

name                                    old p99(ms)  new p99(ms)  delta
kv95-c5/cores=16/nodes=1/splits=0         2.40 ± 0%    2.10 ± 0%  -12.50%  (p=0.000 n=5+4)
kv95-c5/cores=36/nodes=1/splits=0         5.80 ± 0%    3.30 ± 0%  -43.10%  (p=0.000 n=5+4)
kv50-c128/cores=16/nodes=1/splits=0       3.50 ± 0%    3.00 ± 0%  -14.29%  (p=0.008 n=5+5)
kv50-c128/cores=36/nodes=1/splits=0       6.80 ± 0%    4.70 ± 0%  -30.88%  (p=0.079 n=4+5)
kv70-20-c128/cores=16/nodes=1/splits=0    5.00 ± 0%    4.70 ± 0%   -6.00%  (p=0.029 n=4+4)
kv70-20-c128/cores=36/nodes=1/splits=0    11.0 ± 0%     6.8 ± 0%  -38.18%  (p=0.008 n=5+5)
```

_#### Batching Benchmarks

One optimization left out of the new spanlatch.Manager was the "covering" optimization,
where commands were initially added to the interval tree as a single spanning interval
and only expanded later. I ran a series of benchmarks to verify that this optimization
was not needed. My hypothesis was that the order of magnitude increase the speed of the
interval tree would make the optimization unnecessary.

It turns out that removing the optimization hurt a few benchmarks to a small
degree but speed up others tremendously (some benchmarks improved by over 400%).
I suspect that the covering optimization could actually hurt in cases where it
causes non-overlapping requests to overlap. It is interesting how quickly a few
of these benchmarks oscillate from small losses to big wins. It makes me think
that there's some non-linear behavior with the old CommandQueue that would cause
its performance to quickly degrade once it became a contention bottleneck.

```
name                                    old ops/sec  new ops/sec  delta
kv0-b16/cores=4/nodes=1/splits=0         2.41k ± 0%   2.06k ± 3%   -14.75%  (p=0.008 n=5+5)
kv0-b16/cores=4/nodes=1/splits=100         514 ± 0%     534 ± 1%    +3.88%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=0        2.95k ± 0%   4.35k ± 0%   +47.74%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=100      1.80k ± 1%   1.88k ± 1%    +4.46%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=0        2.74k ± 0%   4.92k ± 1%   +79.55%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=100      2.39k ± 1%   2.45k ± 1%    +2.41%  (p=0.008 n=5+5)
kv0-b128/cores=4/nodes=1/splits=0          422 ± 0%     518 ± 1%   +22.60%  (p=0.008 n=5+5)
kv0-b128/cores=4/nodes=1/splits=100       98.4 ± 1%    98.8 ± 1%      ~     (p=0.810 n=5+5)
kv0-b128/cores=16/nodes=1/splits=0         532 ± 0%    1059 ± 0%   +99.16%  (p=0.008 n=5+5)
kv0-b128/cores=16/nodes=1/splits=100       291 ± 1%     307 ± 1%    +5.18%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=0         483 ± 0%    1288 ± 1%  +166.37%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=100       394 ± 1%     408 ± 1%    +3.51%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=0        49.7 ± 1%    72.8 ± 1%   +46.52%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=100      30.8 ± 0%    23.4 ± 0%   -24.03%  (p=0.008 n=5+5)
kv0-b1024/cores=16/nodes=1/splits=0       48.9 ± 2%   160.6 ± 0%  +228.38%  (p=0.008 n=5+5)
kv0-b1024/cores=16/nodes=1/splits=100      101 ± 1%      80 ± 0%   -21.64%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=0       37.5 ± 0%   208.1 ± 1%  +454.99%  (p=0.016 n=4+5)
kv0-b1024/cores=36/nodes=1/splits=100      162 ± 0%     124 ± 0%   -23.22%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=0        5.93k ± 0%   6.20k ± 1%    +4.55%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=100      2.27k ± 1%   2.32k ± 1%    +2.28%  (p=0.008 n=5+5)
kv95-b16/cores=16/nodes=1/splits=0       5.15k ± 1%  18.79k ± 1%  +264.73%  (p=0.008 n=5+5)
kv95-b16/cores=16/nodes=1/splits=100     8.31k ± 1%   8.57k ± 1%    +3.16%  (p=0.008 n=5+5)
kv95-b16/cores=36/nodes=1/splits=0       3.96k ± 0%  10.67k ± 1%  +169.81%  (p=0.008 n=5+5)
kv95-b16/cores=36/nodes=1/splits=100     15.7k ± 2%   16.2k ± 4%    +2.75%  (p=0.151 n=5+5)
kv95-b128/cores=4/nodes=1/splits=0       1.12k ± 1%   1.27k ± 0%   +13.28%  (p=0.008 n=5+5)
kv95-b128/cores=4/nodes=1/splits=100       290 ± 1%     299 ± 1%    +3.02%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=0      1.06k ± 0%   3.31k ± 0%  +213.09%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=100      662 ±91%    1095 ± 1%   +65.42%  (p=0.016 n=5+4)
kv95-b128/cores=36/nodes=1/splits=0        715 ± 2%    3586 ± 0%  +401.21%  (p=0.008 n=5+5)
kv95-b128/cores=36/nodes=1/splits=100    1.15k ±90%   2.01k ± 2%   +74.79%  (p=0.016 n=5+4)
kv95-b1024/cores=4/nodes=1/splits=0        134 ± 1%     170 ± 1%   +26.59%  (p=0.008 n=5+5)
kv95-b1024/cores=4/nodes=1/splits=100     54.8 ± 3%    53.3 ± 3%    -2.84%  (p=0.056 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=0       104 ± 3%     367 ± 1%  +252.37%  (p=0.008 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=100     210 ± 1%     214 ± 1%    +1.86%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=0      76.5 ± 2%   383.9 ± 1%  +401.67%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=100     431 ± 1%     436 ± 1%    +1.17%  (p=0.016 n=5+5)

name                                    old p50(ms)  new p50(ms)  delta
kv0-b16/cores=4/nodes=1/splits=0          3.00 ± 0%    3.40 ± 0%   +13.33%  (p=0.016 n=5+4)
kv0-b16/cores=4/nodes=1/splits=100        15.2 ± 0%    14.7 ± 0%    -3.29%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=0         10.5 ± 0%     7.7 ± 2%   -26.48%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=100       17.8 ± 0%    16.8 ± 0%    -5.62%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=0         26.2 ± 0%    14.2 ± 0%   -45.80%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=100       29.0 ± 2%    28.3 ± 0%    -2.28%  (p=0.095 n=5+4)
kv0-b128/cores=4/nodes=1/splits=0         17.8 ± 0%    15.2 ± 0%   -14.61%  (p=0.000 n=5+4)
kv0-b128/cores=4/nodes=1/splits=100       79.7 ± 0%    79.7 ± 0%      ~     (all equal)
kv0-b128/cores=16/nodes=1/splits=0        65.0 ± 0%    32.5 ± 0%   -50.00%  (p=0.029 n=4+4)
kv0-b128/cores=16/nodes=1/splits=100       109 ± 0%     105 ± 0%    -3.85%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=0         168 ± 0%      50 ± 0%   -70.02%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=100       184 ± 0%     176 ± 0%    -4.50%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=0         159 ± 0%     109 ± 0%   -31.56%  (p=0.000 n=5+4)
kv0-b1024/cores=4/nodes=1/splits=100       252 ± 0%     319 ± 0%   +26.66%  (p=0.008 n=5+5)
kv0-b1024/cores=16/nodes=1/splits=0        705 ± 0%     193 ± 0%   -72.62%  (p=0.000 n=5+4)
kv0-b1024/cores=16/nodes=1/splits=100      319 ± 0%     386 ± 0%   +21.05%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=0      1.88k ± 0%   0.24k ± 0%   -87.05%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=100      436 ± 0%     570 ± 0%   +30.77%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=0         1.20 ± 0%    1.20 ± 0%      ~     (all equal)
kv95-b16/cores=4/nodes=1/splits=100       2.60 ± 0%    2.60 ± 0%      ~     (all equal)
kv95-b16/cores=16/nodes=1/splits=0        6.30 ± 0%    1.40 ± 0%   -77.78%  (p=0.000 n=5+4)
kv95-b16/cores=16/nodes=1/splits=100      1.74 ± 3%    1.76 ± 3%      ~     (p=1.000 n=5+5)
kv95-b16/cores=36/nodes=1/splits=0        11.5 ± 0%     5.5 ± 0%   -52.17%  (p=0.000 n=5+4)
kv95-b16/cores=36/nodes=1/splits=100      2.42 ±20%    2.42 ±45%      ~     (p=0.579 n=5+5)
kv95-b128/cores=4/nodes=1/splits=0        6.60 ± 0%    6.00 ± 0%    -9.09%  (p=0.008 n=5+5)
kv95-b128/cores=4/nodes=1/splits=100      21.4 ± 3%    21.0 ± 0%      ~     (p=0.444 n=5+5)
kv95-b128/cores=16/nodes=1/splits=0       30.4 ± 0%     9.4 ± 0%   -69.08%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=100     38.2 ±76%    21.2 ± 4%   -44.31%  (p=0.063 n=5+4)
kv95-b128/cores=36/nodes=1/splits=0       88.1 ± 0%    16.8 ± 0%   -80.93%  (p=0.000 n=5+4)
kv95-b128/cores=36/nodes=1/splits=100     56.6 ±85%    29.6 ±15%      ~     (p=0.873 n=5+4)
kv95-b1024/cores=4/nodes=1/splits=0       52.4 ± 0%    44.0 ± 0%   -16.03%  (p=0.029 n=4+4)
kv95-b1024/cores=4/nodes=1/splits=100      132 ± 2%     143 ± 0%    +8.29%  (p=0.016 n=5+4)
kv95-b1024/cores=16/nodes=1/splits=0       325 ± 3%      80 ± 0%   -75.51%  (p=0.008 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=100     151 ± 0%     151 ± 0%      ~     (all equal)
kv95-b1024/cores=36/nodes=1/splits=0       973 ± 0%     180 ± 3%   -81.55%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=100     168 ± 0%     168 ± 0%      ~     (all equal)

name                                    old p99(ms)  new p99(ms)  delta
kv0-b16/cores=4/nodes=1/splits=0          8.40 ± 0%   10.30 ± 3%   +22.62%  (p=0.016 n=4+5)
kv0-b16/cores=4/nodes=1/splits=100        29.4 ± 0%    27.3 ± 0%    -7.14%  (p=0.000 n=5+4)
kv0-b16/cores=16/nodes=1/splits=0         16.3 ± 0%    15.5 ± 2%    -4.91%  (p=0.008 n=5+5)
kv0-b16/cores=16/nodes=1/splits=100       31.5 ± 0%    29.4 ± 0%    -6.67%  (p=0.000 n=5+4)
kv0-b16/cores=36/nodes=1/splits=0         37.7 ± 0%    28.7 ± 2%   -23.77%  (p=0.008 n=5+5)
kv0-b16/cores=36/nodes=1/splits=100       62.1 ± 2%    68.4 ±10%   +10.15%  (p=0.008 n=5+5)
kv0-b128/cores=4/nodes=1/splits=0         37.7 ± 0%    39.4 ± 6%    +4.46%  (p=0.167 n=5+5)
kv0-b128/cores=4/nodes=1/splits=100        143 ± 0%     151 ± 0%    +5.89%  (p=0.016 n=4+5)
kv0-b128/cores=16/nodes=1/splits=0        79.7 ± 0%    55.8 ± 2%   -30.04%  (p=0.008 n=5+5)
kv0-b128/cores=16/nodes=1/splits=100       198 ± 3%     188 ± 3%    -5.09%  (p=0.048 n=5+5)
kv0-b128/cores=36/nodes=1/splits=0         184 ± 0%     126 ± 3%   -31.82%  (p=0.008 n=5+5)
kv0-b128/cores=36/nodes=1/splits=100       319 ± 0%     336 ± 0%    +5.24%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=0         322 ± 6%     253 ± 4%   -21.35%  (p=0.008 n=5+5)
kv0-b1024/cores=4/nodes=1/splits=100       470 ± 0%     772 ± 4%   +64.28%  (p=0.016 n=4+5)
kv0-b1024/cores=16/nodes=1/splits=0      1.41k ± 0%   0.56k ±11%   -60.00%  (p=0.000 n=4+5)
kv0-b1024/cores=16/nodes=1/splits=100      530 ± 2%     772 ± 0%   +45.57%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=0      4.05k ± 7%   1.17k ± 3%   -71.19%  (p=0.008 n=5+5)
kv0-b1024/cores=36/nodes=1/splits=100      792 ±14%    1020 ± 2%   +28.81%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=0         3.90 ± 0%    3.22 ± 4%   -17.44%  (p=0.008 n=5+5)
kv95-b16/cores=4/nodes=1/splits=100       21.0 ± 0%    19.9 ± 0%    -5.24%  (p=0.079 n=4+5)
kv95-b16/cores=16/nodes=1/splits=0        15.2 ± 0%     7.1 ± 0%   -53.29%  (p=0.079 n=4+5)
kv95-b16/cores=16/nodes=1/splits=100      38.5 ± 3%    37.7 ± 0%      ~     (p=0.333 n=5+4)
kv95-b16/cores=36/nodes=1/splits=0         128 ± 2%      52 ± 0%   -59.16%  (p=0.000 n=5+4)
kv95-b16/cores=36/nodes=1/splits=100      41.1 ±13%    39.2 ±33%      ~     (p=0.984 n=5+5)
kv95-b128/cores=4/nodes=1/splits=0        17.8 ± 0%    14.7 ± 0%   -17.42%  (p=0.079 n=4+5)
kv95-b128/cores=4/nodes=1/splits=100       107 ± 2%     106 ± 5%      ~     (p=0.683 n=5+5)
kv95-b128/cores=16/nodes=1/splits=0       75.5 ± 0%    23.1 ± 0%   -69.40%  (p=0.008 n=5+5)
kv95-b128/cores=16/nodes=1/splits=100      107 ±34%     120 ± 2%      ~     (p=1.000 n=5+4)
kv95-b128/cores=36/nodes=1/splits=0        253 ± 4%      71 ± 0%   -71.86%  (p=0.016 n=5+4)
kv95-b128/cores=36/nodes=1/splits=100      166 ±19%     164 ±74%      ~     (p=0.310 n=5+5)
kv95-b1024/cores=4/nodes=1/splits=0        146 ± 3%     101 ± 0%   -31.01%  (p=0.000 n=5+4)
kv95-b1024/cores=4/nodes=1/splits=100      348 ± 4%     366 ± 6%      ~     (p=0.317 n=4+5)
kv95-b1024/cores=16/nodes=1/splits=0       624 ± 3%     221 ± 2%   -64.52%  (p=0.008 n=5+5)
kv95-b1024/cores=16/nodes=1/splits=100     325 ± 3%     319 ± 0%      ~     (p=0.444 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=0     1.56k ± 5%   0.41k ± 2%   -73.71%  (p=0.008 n=5+5)
kv95-b1024/cores=36/nodes=1/splits=100     336 ± 0%     336 ± 0%      ~     (all equal)
```

Release note (performance improvement): Replace Replica latching mechanism
with new optimized data structure that improves throughput, especially
under heavy contention.

Loading branch information

nvanbenschoten committed Dec 8, 2018

1 parent f725fe4 commit eb38f20

pkg/server/serverpb/admin.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

pkg/server/serverpb/status.pb.go

Large diffs are not rendered by default.

pkg/server/serverpb/status.pb.gw.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

pkg/server/serverpb/status.proto

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -164,13 +164,6 @@ message PrettySpan {
  
      string end_key = 2;

    }

    message CommandQueueMetrics {

      int64 write_commands = 1;

      int64 read_commands = 2;

      int64 max_overlaps_seen = 3;

      int32 tree_size = 4;

    }

    message RangeInfo {

      PrettySpan span = 1 [ (gogoproto.nullable) = false ];

      RaftState raft_state = 2 [ (gogoproto.nullable) = false ];

    @@ -189,8 +182,8 @@ message RangeInfo {
  
      repeated roachpb.Lease lease_history = 8 [ (gogoproto.nullable) = false ];

      RangeProblems problems = 9 [ (gogoproto.nullable) = false ];

      RangeStatistics stats = 10 [ (gogoproto.nullable) = false ];

      CommandQueueMetrics cmd_q_local = 11 [ (gogoproto.nullable) = false ];

      CommandQueueMetrics cmd_q_global = 12 [ (gogoproto.nullable) = false ];

      storage.storagepb.LatchManagerInfo latches_local = 11 [ (gogoproto.nullable) = false ];

      storage.storagepb.LatchManagerInfo latches_global = 12 [ (gogoproto.nullable) = false ];

      storage.LeaseStatus lease_status = 13 [ (gogoproto.nullable) = false ];

      bool quiescent = 14;

      bool ticking = 15;

    @@ -611,13 +604,6 @@ message RangeResponse {
  
      reserved 4; // Previously used.

    }

    message CommandQueueRequest { int64 range_id = 1; }

    message CommandQueueResponse {

      storage.storagepb.CommandQueuesSnapshot snapshot = 1

          [ (gogoproto.nullable) = false ];

    }

    // DiagnosticsRequest requests a diagnostics report.

    message DiagnosticsRequest {

      // node_id is a string so that "local" can be used to specify that no

    @@ -805,11 +791,6 @@ service Status {
  
          get : "/_status/range/{range_id}"

        };

      }

      rpc CommandQueue(CommandQueueRequest) returns (CommandQueueResponse) {

        option (google.api.http) = {

          get : "/_status/range/{range_id}/cmdqueue"

        };

      }

      rpc Diagnostics(DiagnosticsRequest)

          returns (cockroach.server.diagnosticspb.DiagnosticReport) {

        option (google.api.http) = {

pkg/server/status.go

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1214,11 +1214,11 @@ func (s *statusServer) Ranges(
  
    				QuiescentEqualsTicking: raftStatus != nil && metrics.Quiescent == metrics.Ticking,

    				RaftLogTooLarge:        metrics.RaftLogTooLarge,

    			},

    			CmdQLocal:   serverpb.CommandQueueMetrics(metrics.CmdQMetricsLocal),

    			CmdQGlobal:  serverpb.CommandQueueMetrics(metrics.CmdQMetricsGlobal),

    			LeaseStatus: metrics.LeaseStatus,

    			Quiescent:   metrics.Quiescent,

    			Ticking:     metrics.Ticking,

    			LatchesLocal:  metrics.LatchInfoLocal,

    			LatchesGlobal: metrics.LatchInfoGlobal,

    			LeaseStatus:   metrics.LeaseStatus,

    			Quiescent:     metrics.Quiescent,

    			Ticking:       metrics.Ticking,

    		}

    	}

    @@ -1321,26 +1321,6 @@ func (s *statusServer) Range(
  
    	return response, nil

    }

    // CommandQueue returns a snapshot of the command queue state for the

    // specified range.

    func (s *statusServer) CommandQueue(

    	ctx context.Context, req *serverpb.CommandQueueRequest,

    ) (*serverpb.CommandQueueResponse, error) {

    	rangeID := roachpb.RangeID(req.RangeId)

    	replica, err := s.stores.GetReplicaForRangeID(rangeID)

    	if err != nil {

    		return nil, err

    	}

    	if replica == nil {

    		return nil, roachpb.NewRangeNotFoundError(rangeID, 0)

    	}

    	return &serverpb.CommandQueueResponse{

    		Snapshot: replica.GetCommandQueueSnapshot(),

    	}, nil

    }

    // ListLocalSessions returns a list of SQL sessions on this node.

    func (s *statusServer) ListLocalSessions(

    	ctx context.Context, req *serverpb.ListSessionsRequest,

pkg/server/status/health_check.go

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -50,7 +50,7 @@ var trackedMetrics = map[string]threshold{
  
    	"ranges.unavailable":          gaugeZero,

    	"ranges.underreplicated":      gaugeZero,

    	"requests.backpressure.split": gaugeZero,

    	"requests.slow.commandqueue":  gaugeZero,

    	"requests.slow.latch":         gaugeZero,

    	"requests.slow.lease":         gaugeZero,

    	"requests.slow.raft":          gaugeZero,

    	"sys.goroutines":              {gauge: true, min: 5000},

pkg/storage/client_merge_test.go

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1201,18 +1201,17 @@ func TestStoreRangeMergeRHSLeaseExpiration(t *testing.T) {
  
    	}

    	// Install a hook to observe when a get request for a special key,

    	// rhsSentinel, exits the command queue.

    	// rhsSentinel, acquires latches and begins evaluating.

    	const getConcurrency = 10

    	rhsSentinel := roachpb.Key("rhs-sentinel")

    	getExitedCommandQueue := make(chan struct{}, getConcurrency)

    	storeCfg.TestingKnobs.OnCommandQueueAction = func(ba *roachpb.BatchRequest, action storagebase.CommandQueueAction) {

    		if action == storagebase.CommandQueueBeginExecuting {

    			for _, r := range ba.Requests {

    				if get := r.GetGet(); get != nil && get.RequestHeader.Key.Equal(rhsSentinel) {

    					getExitedCommandQueue <- struct{}{}

    				}

    	getAcquiredLatch := make(chan struct{}, getConcurrency)

    	storeCfg.TestingKnobs.TestingLatchFilter = func(ba roachpb.BatchRequest) *roachpb.Error {

    		for _, r := range ba.Requests {

    			if get := r.GetGet(); get != nil && get.RequestHeader.Key.Equal(rhsSentinel) {

    				getAcquiredLatch <- struct{}{}

    			}

    		}

    		return nil

    	}

    	mtc := &multiTestContext{storeConfig: &storeCfg}

    @@ -1271,7 +1270,7 @@ func TestStoreRangeMergeRHSLeaseExpiration(t *testing.T) {
  
    	// Note that the first request would never hit this race on its own. Nor would

    	// any request that arrived early enough to see an outdated lease in

    	// Replica.mu.state.Lease. All of these requests joined the in-progress lease

    	// acquisition and blocked until the lease command exited the command queue,

    	// acquisition and blocked until the lease command acquires its latches,

    	// at which point the mergeComplete channel was updated. To hit the race, the

    	// request needed to arrive exactly between the update to

    	// Replica.mu.state.Lease and the update to Replica.mu.mergeComplete.

    @@ -1300,12 +1299,12 @@ func TestStoreRangeMergeRHSLeaseExpiration(t *testing.T) {
  
    		time.Sleep(time.Millisecond)

    	}

    	// Wait for the get requests to fall out of the command queue, which is as far

    	// as they can get while the merge is in progress. Then wait a little bit

    	// longer. This tests that the requests really do get stuck waiting for the

    	// merge to complete without depending too heavily on implementation details.

    	// Wait for the get requests to acquire latches, which is as far as they can

    	// get while the merge is in progress. Then wait a little bit longer. This

    	// tests that the requests really do get stuck waiting for the merge to

    	// complete without depending too heavily on implementation details.

    	for i := 0; i < getConcurrency; i++ {

    		<-getExitedCommandQueue

    		<-getAcquiredLatch

    	}

    	time.Sleep(50 * time.Millisecond)

    @@ -1361,7 +1360,7 @@ func TestStoreRangeMergeConcurrentRequests(t *testing.T) {
  
    			//

    			// This scenario previously caused deadlock. The merge was not able to

    			// complete until the Subsume request completed, but the Subsume request

    			// was stuck in the command queue until the Get request finished, which

    			// was unable to acquire latches until the Get request finished, which

    			// was itself waiting for the merge to complete. Whoops!

    			mtc.advanceClock(ctx)

    		}

0 comments on commit `eb38f20`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `eb38f20`

Commit

There are no files selected for viewing

0 comments on commit eb38f20

0 comments on commit `eb38f20`