Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: abort span access is expensive (2% cpu on oltp_read_write) #122719

Open
nvanbenschoten opened this issue Apr 19, 2024 · 5 comments
Open

kv: abort span access is expensive (2% cpu on oltp_read_write) #122719

nvanbenschoten opened this issue Apr 19, 2024 · 5 comments
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Apr 19, 2024

The abort span (pkg/kv/kvserver/abortspan) is a mechanism that sets markers for aborted transactions to provide protection against an aborted but active transaction not reading values it wrote due to those intents having been removed.

The "span" is a slice of the range-id-local keyspace which is read on each BatchRequest that is part of a read-write transaction. The logic for this is here:

// Check whether this transaction has been aborted, if applicable. This
// applies to reads and writes once a transaction that has begun to
// acquire locks (see #2231 for more about why we check for aborted
// transactions on reads). Note that 1PC transactions have had their
// transaction field cleared by this point so we do not execute this
// check in that case.
if baHeader.Txn.IsLocking() {
// We don't check the abort span for a couple of special requests:
// - if the request is asking to abort the transaction, then don't check the
// AbortSpan; we don't want the request to be rejected if the transaction
// has already been aborted.
// - heartbeats don't check the abort span. If the txn is aborted, they'll
// return an aborted proto in their otherwise successful response.
// TODO(nvanbenschoten): Let's remove heartbeats from this allowlist when
// we rationalize the TODO in txnHeartbeater.heartbeat.
if !ba.IsSingleAbortTxnRequest() && !ba.IsSingleHeartbeatTxnRequest() {
if pErr := checkIfTxnAborted(ctx, rec, readWriter, *baHeader.Txn); pErr != nil {
return nil, result.Result{}, pErr
}
}

This is an additional LSM read per BatchRequest, which can be seen prominently in CPU profiles under checkIfTxnAborted, accounting for 3.59% of CPU time on write-heavy workloads:

Screenshot 2024-04-19 at 5 15 05 PM

profile_abort_span.pb.gz

Some basic experimentation with the sysbench workload (sysbench/oltp_write_only/nodes=7/cpu=16/conc=128) demonstrates about a 2% increase in throughput by disabling this abort span read (i.e. not calling checkIfTxnAborted). This testing reveals the cost of the mechanism. Optimizations (up and including disabling it) could provide up to this much benefit to throughput.

Given how significant this cost is and how much of an edge case the scenarios that the abort span is protecting against are, we should reevaluate whether there's something better that we can do here. Are there simple optimizations that could make this mechanism perform better? Could we make it a little weaker to avoid most of the cost? These questions are worthwhile to explore.

At a minimum, we should expose an option to disable these abort span checks.

Jira issue: CRDB-38032

Epic CRDB-42584

@nvanbenschoten nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team labels Apr 19, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
@tbg tbg added the o-perf-efficiency Related to performance efficiency label Oct 29, 2024
@tbg tbg changed the title kv: abort span access is expensive kv: abort span access is expensive (2% cpu) Dec 2, 2024
@tbg tbg changed the title kv: abort span access is expensive (2% cpu) kv: abort span access is expensive (2% cpu on oltp_read_write) Dec 2, 2024
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Dec 2, 2024
Informs cockroachdb#122719.

This commit updates DeclareKeysForBatch to only declare the abort span
key when the transaction has acquired locks and will need to check the
abort span. We were previously declaring the abort span key for all
batches, even if we did not intend to check the abort span.

This is a short-term patch until we get around to reworking abort span
access more thoroughly (see cockroachdb#122719).

```
name                                           old time/op    new time/op    delta
Sysbench/KV/1node_local/oltp_read_only-10         334µs ± 4%     325µs ± 4%  -2.63%  (p=0.035 n=10+9)
Sysbench/KV/1node_local/oltp_read_write-10        863µs ± 9%     860µs ±15%    ~     (p=0.661 n=10+9)
Sysbench/KV/1node_local/oltp_point_select-10     15.6µs ± 4%    15.9µs ±12%    ~     (p=0.529 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10       1.88ms ±26%    1.80ms ± 5%    ~     (p=1.000 n=10+9)
Sysbench/SQL/1node_local/oltp_read_write-10      4.22ms ± 5%    4.18ms ±11%    ~     (p=0.400 n=9+10)
Sysbench/SQL/1node_local/oltp_point_select-10     114µs ± 5%     120µs ±21%    ~     (p=0.796 n=10+10)

name                                           old alloc/op   new alloc/op   delta
Sysbench/KV/1node_local/oltp_read_write-10        487kB ± 0%     484kB ± 1%  -0.55%  (p=0.011 n=8+9)
Sysbench/KV/1node_local/oltp_read_only-10         260kB ± 0%     259kB ± 1%  -0.50%  (p=0.011 n=8+10)
Sysbench/SQL/1node_local/oltp_point_select-10    27.8kB ± 0%    27.6kB ± 0%  -0.47%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10        878kB ± 0%     876kB ± 0%  -0.25%  (p=0.003 n=10+10)
Sysbench/SQL/1node_local/oltp_read_write-10      1.25MB ± 1%    1.25MB ± 1%    ~     (p=0.146 n=10+8)
Sysbench/KV/1node_local/oltp_point_select-10     4.68kB ± 1%    4.68kB ± 2%    ~     (p=0.474 n=9+9)

name                                           old allocs/op  new allocs/op  delta
Sysbench/KV/1node_local/oltp_read_only-10           522 ± 0%       507 ± 0%  -2.72%  (p=0.000 n=10+10)
Sysbench/KV/1node_local/oltp_read_write-10        1.51k ± 0%     1.50k ± 0%  -0.92%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_point_select-10       238 ± 0%       237 ± 0%  -0.42%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10        4.76k ± 0%     4.74k ± 0%  -0.39%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_write-10       7.55k ± 0%     7.53k ± 0%  -0.24%  (p=0.003 n=10+10)
Sysbench/KV/1node_local/oltp_point_select-10       29.0 ± 0%      29.0 ± 0%    ~     (all equal)
```

Release note: None
craig bot pushed a commit that referenced this issue Dec 3, 2024
136523: kv: only declare abort span key when txn has locks r=nvanbenschoten a=nvanbenschoten

Informs #122719.

This commit updates `DeclareKeysForBatch` to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span.

This is a short-term patch until we get around to reworking abort span access more thoroughly (see #122719).

```
name                                           old time/op    new time/op    delta
Sysbench/KV/1node_local/oltp_read_only-10         334µs ± 4%     325µs ± 4%  -2.63%  (p=0.035 n=10+9)
Sysbench/KV/1node_local/oltp_read_write-10        863µs ± 9%     860µs ±15%    ~     (p=0.661 n=10+9)
Sysbench/KV/1node_local/oltp_point_select-10     15.6µs ± 4%    15.9µs ±12%    ~     (p=0.529 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10       1.88ms ±26%    1.80ms ± 5%    ~     (p=1.000 n=10+9)
Sysbench/SQL/1node_local/oltp_read_write-10      4.22ms ± 5%    4.18ms ±11%    ~     (p=0.400 n=9+10)
Sysbench/SQL/1node_local/oltp_point_select-10     114µs ± 5%     120µs ±21%    ~     (p=0.796 n=10+10)

name                                           old alloc/op   new alloc/op   delta
Sysbench/KV/1node_local/oltp_read_write-10        487kB ± 0%     484kB ± 1%  -0.55%  (p=0.011 n=8+9)
Sysbench/KV/1node_local/oltp_read_only-10         260kB ± 0%     259kB ± 1%  -0.50%  (p=0.011 n=8+10)
Sysbench/SQL/1node_local/oltp_point_select-10    27.8kB ± 0%    27.6kB ± 0%  -0.47%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10        878kB ± 0%     876kB ± 0%  -0.25%  (p=0.003 n=10+10)
Sysbench/SQL/1node_local/oltp_read_write-10      1.25MB ± 1%    1.25MB ± 1%    ~     (p=0.146 n=10+8)
Sysbench/KV/1node_local/oltp_point_select-10     4.68kB ± 1%    4.68kB ± 2%    ~     (p=0.474 n=9+9)

name                                           old allocs/op  new allocs/op  delta
Sysbench/KV/1node_local/oltp_read_only-10           522 ± 0%       507 ± 0%  -2.72%  (p=0.000 n=10+10)
Sysbench/KV/1node_local/oltp_read_write-10        1.51k ± 0%     1.50k ± 0%  -0.92%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_point_select-10       238 ± 0%       237 ± 0%  -0.42%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10        4.76k ± 0%     4.74k ± 0%  -0.39%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_write-10       7.55k ± 0%     7.53k ± 0%  -0.24%  (p=0.003 n=10+10)
Sysbench/KV/1node_local/oltp_point_select-10       29.0 ± 0%      29.0 ± 0%    ~     (all equal)
```

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot pushed a commit that referenced this issue Dec 3, 2024
136523: kv: only declare abort span key when txn has locks r=nvanbenschoten a=nvanbenschoten

Informs #122719.

This commit updates `DeclareKeysForBatch` to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span.

This is a short-term patch until we get around to reworking abort span access more thoroughly (see #122719).

```
name                                           old time/op    new time/op    delta
Sysbench/KV/1node_local/oltp_read_only-10         334µs ± 4%     325µs ± 4%  -2.63%  (p=0.035 n=10+9)
Sysbench/KV/1node_local/oltp_read_write-10        863µs ± 9%     860µs ±15%    ~     (p=0.661 n=10+9)
Sysbench/KV/1node_local/oltp_point_select-10     15.6µs ± 4%    15.9µs ±12%    ~     (p=0.529 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10       1.88ms ±26%    1.80ms ± 5%    ~     (p=1.000 n=10+9)
Sysbench/SQL/1node_local/oltp_read_write-10      4.22ms ± 5%    4.18ms ±11%    ~     (p=0.400 n=9+10)
Sysbench/SQL/1node_local/oltp_point_select-10     114µs ± 5%     120µs ±21%    ~     (p=0.796 n=10+10)

name                                           old alloc/op   new alloc/op   delta
Sysbench/KV/1node_local/oltp_read_write-10        487kB ± 0%     484kB ± 1%  -0.55%  (p=0.011 n=8+9)
Sysbench/KV/1node_local/oltp_read_only-10         260kB ± 0%     259kB ± 1%  -0.50%  (p=0.011 n=8+10)
Sysbench/SQL/1node_local/oltp_point_select-10    27.8kB ± 0%    27.6kB ± 0%  -0.47%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10        878kB ± 0%     876kB ± 0%  -0.25%  (p=0.003 n=10+10)
Sysbench/SQL/1node_local/oltp_read_write-10      1.25MB ± 1%    1.25MB ± 1%    ~     (p=0.146 n=10+8)
Sysbench/KV/1node_local/oltp_point_select-10     4.68kB ± 1%    4.68kB ± 2%    ~     (p=0.474 n=9+9)

name                                           old allocs/op  new allocs/op  delta
Sysbench/KV/1node_local/oltp_read_only-10           522 ± 0%       507 ± 0%  -2.72%  (p=0.000 n=10+10)
Sysbench/KV/1node_local/oltp_read_write-10        1.51k ± 0%     1.50k ± 0%  -0.92%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_point_select-10       238 ± 0%       237 ± 0%  -0.42%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_only-10        4.76k ± 0%     4.74k ± 0%  -0.39%  (p=0.000 n=10+10)
Sysbench/SQL/1node_local/oltp_read_write-10       7.55k ± 0%     7.53k ± 0%  -0.24%  (p=0.003 n=10+10)
Sysbench/KV/1node_local/oltp_point_select-10       29.0 ± 0%      29.0 ± 0%    ~     (all equal)
```

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@tbg
Copy link
Member

tbg commented Dec 19, 2024

@nvanbenschoten and I saw this while looking at an (admittedly, overloaded) 3x8vcpu tpcc-nowait wh=1000 cluster:

image

Notably, this was running with #136523 as well as #122862, both of which in principle reduce abort span access.

tbg added a commit to tbg/cockroach that referenced this issue Jan 24, 2025
By default it's on, meaning the current behavior is kept.
When off, CRDB is expected to be a little faster (cockroachdb#122719)
but also transactions may miss their own writes in case
they got aborted without noticing it.

Epic: CRDB-42584
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 24, 2025
By default it's on, meaning the current behavior is kept.
When off, CRDB is expected to be a little faster (cockroachdb#122719)
but also transactions may miss their own writes in case
they got aborted without noticing it.

Epic: CRDB-42584
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 24, 2025
By default it's on, meaning the current behavior is kept.
When off, CRDB is expected to be a little faster (cockroachdb#122719)
but also transactions may miss their own writes in case
they got aborted without noticing it.

Epic: CRDB-42584
Release note: None
@tbg
Copy link
Member

tbg commented Jan 24, 2025

Confirming the ~2% number on oltp_read_write on reference workload cluster:

Image

I imagine the amount of work here would go up as the LSM fills up, as more levels would have to be checked.

In the microbenchmarks, where the LSM is in-memory and smaller, the abort span doesn't register on oltp_read_write, but it does on oltp_write_only to the tune of 2% in latency (always good to remember that this is single-threaded, i.e. not saturating the machine). Because we don't have to check the abort span unless the txn has already written, most of the statements (all but the oltp_write_ones) already don't perform the abortspan check. This explains why oltp_write_only does show a difference: in that benchmark, there are no free statements (except perhaps the first). I still find it mildly surprising that we're getting 2% out of this, given that writes are also more expensive.

old:  04b808b kvserver: add cluster setting to disable abort spa
new:  0cdbdae [dnm] kvserver: disable abort span by default
args: benchdiff "./pkg/sql/tests" "-b" "-r" "Sysbench/SQL/3node/oltp_(read_write|write_only)" "-d" "1000x" "-c" "40"

running benchmarks:
 name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-24    12.5ms ± 2%    12.4ms ± 2%    ~     (p=0.113 n=9+9)
Sysbench/SQL/3node/oltp_write_only-24    4.39ms ± 1%    4.30ms ± 2%  -2.11%  (p=0.000 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-24    2.18MB ± 7%    2.17MB ± 3%    ~     (p=1.000 n=9+9)
Sysbench/SQL/3node/oltp_write_only-24     935kB ± 3%     939kB ± 3%    ~     (p=1.000 n=9+8)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-24     10.6k ± 2%     10.5k ± 2%    ~     (p=0.340 n=9+9)
Sysbench/SQL/3node/oltp_write_only-24     5.91k ± 3%     5.96k ± 4%    ~     (p=1.000 n=9+9)

I also ran an extremely unscientific benchmark using a reference cluster running oltp_read_write on top of #139574, which has a cluster setting to disable the abort span check. I ran sysbench with a ten minute checkpoint interval and flipped the cluster setting right on the checkpoint boundary. Unfortunately, even though CPU profiles confirmed that 2% of CPU were now freed up, I was unable to see a similarly sized change in qps over the affected 10 minute intervals. This doesn't mean the effect isn't there - I fully expect it to be - but means I don't get away with this unscientific experiment :-)

@tbg
Copy link
Member

tbg commented Jan 28, 2025

Rigorous experiment shows ~2.7% throughput improvement on oltp_read_write.

# on commits from https://github.com/cockroachdb/cockroach/pull/139574 that enable and disable abort span check, respectively
./pkg/cmd/roachtest/roachstress.sh -c 20 sysbench-settings/oltp_read_write/nodes=3/cpu=8/conc=64 2>&1 | tee sysbench-settings-$(git rev-parse HEAD).log
 ./collect.sh artifacts/a8c950e191f > abortspanoff.txt
./collect.sh artifacts/04b808b3024 > abortspanon.txt
$ benchstat abortspanon.txt abortspanoff.txt
           │ abortspanon.txt │          abortspanoff.txt          │
           │   queries/sec   │ queries/sec  vs base               │
Sysbench/X       22.65k ± 3%   23.26k ± 2%  +2.70% (p=0.030 n=20)

           │ abortspanon.txt │          abortspanoff.txt          │
           │    txns/sec     │  txns/sec    vs base               │
Sysbench/X       1.132k ± 3%   1.163k ± 2%  +2.70% (p=0.030 n=20)

where collect.sh:

#!/usr/bin/env bash
set -euo pipefail

# Usage: $0 <directory-with-sysbench-outputs>
for f in $(rg -l 'queries:' "${1}")
do
        queries=$(grep 'queries:' "${f}" | awk '{print $3}' | sed 's/(//g')
        txns=$(grep 'transactions:' "${f}" | awk '{print $3}' | sed 's/(//g')
        echo -e "BenchmarkSysbench/X\t1\t$queries queries/sec\t$txns txns/sec"
done

@tbg
Copy link
Member

tbg commented Jan 30, 2025

@arulajmani pointed out that when read-write transactions use buffered writes (#72614), much of the overhead of the abort span on reads should disappear. This is because the envisioned design includes overlaying the keys in the transaction write set onto the results returned from KV.

For example, in the following transaction

BEGIN;
INSERT INTO a VALUES(100, 'hi');
SELECT * FROM a WHERE k BETWEEN 50 AND 150;
COMMIT;

by the time of the SELECT statement, TxnCoordSender would have the key for 100 in its write buffer, and would make it visible even if the transaction had in the meantime been aborted. Correspondingly, buffered transactions could opt out of abort span checks entirely as long as they can prove that they have a complete buffer (i.e. haven't flushed it down to KV yet due to size constraints).

@tbg
Copy link
Member

tbg commented Feb 6, 2025

I filed #140593, which elides the abort span check in typical OLTP workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

2 participants