-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: abort span access is expensive (2% cpu on oltp_read_write) #122719
Comments
Informs cockroachdb#122719. This commit updates DeclareKeysForBatch to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span. This is a short-term patch until we get around to reworking abort span access more thoroughly (see cockroachdb#122719). ``` name old time/op new time/op delta Sysbench/KV/1node_local/oltp_read_only-10 334µs ± 4% 325µs ± 4% -2.63% (p=0.035 n=10+9) Sysbench/KV/1node_local/oltp_read_write-10 863µs ± 9% 860µs ±15% ~ (p=0.661 n=10+9) Sysbench/KV/1node_local/oltp_point_select-10 15.6µs ± 4% 15.9µs ±12% ~ (p=0.529 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 1.88ms ±26% 1.80ms ± 5% ~ (p=1.000 n=10+9) Sysbench/SQL/1node_local/oltp_read_write-10 4.22ms ± 5% 4.18ms ±11% ~ (p=0.400 n=9+10) Sysbench/SQL/1node_local/oltp_point_select-10 114µs ± 5% 120µs ±21% ~ (p=0.796 n=10+10) name old alloc/op new alloc/op delta Sysbench/KV/1node_local/oltp_read_write-10 487kB ± 0% 484kB ± 1% -0.55% (p=0.011 n=8+9) Sysbench/KV/1node_local/oltp_read_only-10 260kB ± 0% 259kB ± 1% -0.50% (p=0.011 n=8+10) Sysbench/SQL/1node_local/oltp_point_select-10 27.8kB ± 0% 27.6kB ± 0% -0.47% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 878kB ± 0% 876kB ± 0% -0.25% (p=0.003 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 1.25MB ± 1% 1.25MB ± 1% ~ (p=0.146 n=10+8) Sysbench/KV/1node_local/oltp_point_select-10 4.68kB ± 1% 4.68kB ± 2% ~ (p=0.474 n=9+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_local/oltp_read_only-10 522 ± 0% 507 ± 0% -2.72% (p=0.000 n=10+10) Sysbench/KV/1node_local/oltp_read_write-10 1.51k ± 0% 1.50k ± 0% -0.92% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_point_select-10 238 ± 0% 237 ± 0% -0.42% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 4.76k ± 0% 4.74k ± 0% -0.39% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 7.55k ± 0% 7.53k ± 0% -0.24% (p=0.003 n=10+10) Sysbench/KV/1node_local/oltp_point_select-10 29.0 ± 0% 29.0 ± 0% ~ (all equal) ``` Release note: None
136523: kv: only declare abort span key when txn has locks r=nvanbenschoten a=nvanbenschoten Informs #122719. This commit updates `DeclareKeysForBatch` to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span. This is a short-term patch until we get around to reworking abort span access more thoroughly (see #122719). ``` name old time/op new time/op delta Sysbench/KV/1node_local/oltp_read_only-10 334µs ± 4% 325µs ± 4% -2.63% (p=0.035 n=10+9) Sysbench/KV/1node_local/oltp_read_write-10 863µs ± 9% 860µs ±15% ~ (p=0.661 n=10+9) Sysbench/KV/1node_local/oltp_point_select-10 15.6µs ± 4% 15.9µs ±12% ~ (p=0.529 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 1.88ms ±26% 1.80ms ± 5% ~ (p=1.000 n=10+9) Sysbench/SQL/1node_local/oltp_read_write-10 4.22ms ± 5% 4.18ms ±11% ~ (p=0.400 n=9+10) Sysbench/SQL/1node_local/oltp_point_select-10 114µs ± 5% 120µs ±21% ~ (p=0.796 n=10+10) name old alloc/op new alloc/op delta Sysbench/KV/1node_local/oltp_read_write-10 487kB ± 0% 484kB ± 1% -0.55% (p=0.011 n=8+9) Sysbench/KV/1node_local/oltp_read_only-10 260kB ± 0% 259kB ± 1% -0.50% (p=0.011 n=8+10) Sysbench/SQL/1node_local/oltp_point_select-10 27.8kB ± 0% 27.6kB ± 0% -0.47% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 878kB ± 0% 876kB ± 0% -0.25% (p=0.003 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 1.25MB ± 1% 1.25MB ± 1% ~ (p=0.146 n=10+8) Sysbench/KV/1node_local/oltp_point_select-10 4.68kB ± 1% 4.68kB ± 2% ~ (p=0.474 n=9+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_local/oltp_read_only-10 522 ± 0% 507 ± 0% -2.72% (p=0.000 n=10+10) Sysbench/KV/1node_local/oltp_read_write-10 1.51k ± 0% 1.50k ± 0% -0.92% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_point_select-10 238 ± 0% 237 ± 0% -0.42% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 4.76k ± 0% 4.74k ± 0% -0.39% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 7.55k ± 0% 7.53k ± 0% -0.24% (p=0.003 n=10+10) Sysbench/KV/1node_local/oltp_point_select-10 29.0 ± 0% 29.0 ± 0% ~ (all equal) ``` Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
136523: kv: only declare abort span key when txn has locks r=nvanbenschoten a=nvanbenschoten Informs #122719. This commit updates `DeclareKeysForBatch` to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span. This is a short-term patch until we get around to reworking abort span access more thoroughly (see #122719). ``` name old time/op new time/op delta Sysbench/KV/1node_local/oltp_read_only-10 334µs ± 4% 325µs ± 4% -2.63% (p=0.035 n=10+9) Sysbench/KV/1node_local/oltp_read_write-10 863µs ± 9% 860µs ±15% ~ (p=0.661 n=10+9) Sysbench/KV/1node_local/oltp_point_select-10 15.6µs ± 4% 15.9µs ±12% ~ (p=0.529 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 1.88ms ±26% 1.80ms ± 5% ~ (p=1.000 n=10+9) Sysbench/SQL/1node_local/oltp_read_write-10 4.22ms ± 5% 4.18ms ±11% ~ (p=0.400 n=9+10) Sysbench/SQL/1node_local/oltp_point_select-10 114µs ± 5% 120µs ±21% ~ (p=0.796 n=10+10) name old alloc/op new alloc/op delta Sysbench/KV/1node_local/oltp_read_write-10 487kB ± 0% 484kB ± 1% -0.55% (p=0.011 n=8+9) Sysbench/KV/1node_local/oltp_read_only-10 260kB ± 0% 259kB ± 1% -0.50% (p=0.011 n=8+10) Sysbench/SQL/1node_local/oltp_point_select-10 27.8kB ± 0% 27.6kB ± 0% -0.47% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 878kB ± 0% 876kB ± 0% -0.25% (p=0.003 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 1.25MB ± 1% 1.25MB ± 1% ~ (p=0.146 n=10+8) Sysbench/KV/1node_local/oltp_point_select-10 4.68kB ± 1% 4.68kB ± 2% ~ (p=0.474 n=9+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_local/oltp_read_only-10 522 ± 0% 507 ± 0% -2.72% (p=0.000 n=10+10) Sysbench/KV/1node_local/oltp_read_write-10 1.51k ± 0% 1.50k ± 0% -0.92% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_point_select-10 238 ± 0% 237 ± 0% -0.42% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 4.76k ± 0% 4.74k ± 0% -0.39% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 7.55k ± 0% 7.53k ± 0% -0.24% (p=0.003 n=10+10) Sysbench/KV/1node_local/oltp_point_select-10 29.0 ± 0% 29.0 ± 0% ~ (all equal) ``` Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
@nvanbenschoten and I saw this while looking at an (admittedly, overloaded) 3x8vcpu tpcc-nowait wh=1000 cluster: Notably, this was running with #136523 as well as #122862, both of which in principle reduce abort span access. |
By default it's on, meaning the current behavior is kept. When off, CRDB is expected to be a little faster (cockroachdb#122719) but also transactions may miss their own writes in case they got aborted without noticing it. Epic: CRDB-42584 Release note: None
By default it's on, meaning the current behavior is kept. When off, CRDB is expected to be a little faster (cockroachdb#122719) but also transactions may miss their own writes in case they got aborted without noticing it. Epic: CRDB-42584 Release note: None
By default it's on, meaning the current behavior is kept. When off, CRDB is expected to be a little faster (cockroachdb#122719) but also transactions may miss their own writes in case they got aborted without noticing it. Epic: CRDB-42584 Release note: None
Confirming the ~2% number on oltp_read_write on reference workload cluster: ![]() I imagine the amount of work here would go up as the LSM fills up, as more levels would have to be checked. In the microbenchmarks, where the LSM is in-memory and smaller, the abort span doesn't register on
I also ran an extremely unscientific benchmark using a reference cluster running |
Rigorous experiment shows ~2.7% throughput improvement on oltp_read_write.
where
|
@arulajmani pointed out that when read-write transactions use buffered writes (#72614), much of the overhead of the abort span on reads should disappear. This is because the envisioned design includes overlaying the keys in the transaction write set onto the results returned from KV. For example, in the following transaction
by the time of the |
I filed #140593, which elides the abort span check in typical OLTP workloads. |
The abort span (
pkg/kv/kvserver/abortspan
) is a mechanism that sets markers for aborted transactions to provide protection against an aborted but active transaction not reading values it wrote due to those intents having been removed.The "span" is a slice of the range-id-local keyspace which is read on each
BatchRequest
that is part of a read-write transaction. The logic for this is here:cockroach/pkg/kv/kvserver/replica_evaluate.go
Lines 208 to 227 in 55991cb
This is an additional LSM read per BatchRequest, which can be seen prominently in CPU profiles under
checkIfTxnAborted
, accounting for 3.59% of CPU time on write-heavy workloads:profile_abort_span.pb.gz
Some basic experimentation with the sysbench workload (
sysbench/oltp_write_only/nodes=7/cpu=16/conc=128
) demonstrates about a 2% increase in throughput by disabling this abort span read (i.e. not callingcheckIfTxnAborted
). This testing reveals the cost of the mechanism. Optimizations (up and including disabling it) could provide up to this much benefit to throughput.Given how significant this cost is and how much of an edge case the scenarios that the abort span is protecting against are, we should reevaluate whether there's something better that we can do here. Are there simple optimizations that could make this mechanism perform better? Could we make it a little weaker to avoid most of the cost? These questions are worthwhile to explore.
At a minimum, we should expose an option to disable these abort span checks.
Jira issue: CRDB-38032
Epic CRDB-42584
The text was updated successfully, but these errors were encountered: