admission, storage: prototype AC limiter for concurrent compactions #136615

itsbilal · 2024-12-03T20:20:23Z

Prototype a ~~slot-based~~ admission control granter/requester well as an interface and integration point with Pebble to be able to limit concurrent compactions (and in the future, being able to pace compactions).

This granter will take into account CPU and IO resources used by a compaction, and Pebble will call into the interface to update the granter with the amount of IO taken up by a compaction. A ~~linear~~ model can be used for calculating IO tokens to deduct from compaction bytes read/written. Flushes as well as the first compaction in a store could be chosen to always run without permission.

Informs cockroachdb/pebble#1329 as well as #74697.

Jira issue: CRDB-45164

sumeerbhola · 2025-02-14T15:03:13Z

Some results from a prototype, running in a setting where disk bandwidth was a bottleneck (400MiB/s provisioned bandwidth and 8vCPU), with workload ./workload run kv --init --histograms=perf/stats.json --concurrency=512 --splits=1000 --duration=120m0s --read-percent=0 --min-block-bytes=8192 --max-block-bytes=8192 on a single node cluster.

The cluster setting allowed one compaction to run without permission, and the granter can run more if resources permit. And the read+write bandwidth utilization goal was initially set to 0.6, then lowered to 0.4 and then increased to 0.8. Since this was local SSD, there is a hack to limit compaction bandwidth to 40MiB/s (otherwise a single compaction + flush + WAL writes can consume 240MiB/s, which doesn't allow us to demonstrate any control by this granter).

The granter is forecasting and estimating every 100ms. The first graph shows that we are ~4.32 compactions until 14:20:30 (aforementioned setting 0.6), then reduce (setting 0.4) and then increase (setting 0.8) at 14:26. The estimated and observed CPU due to compactions is low and consistent with each other.

The following graph shows that the forecast admit_long.disk_write.forecast and actual disk.write.bytes are in agreement, and close to the configured (0.6 * 400 = 240MiB). And the observed writes from compactions are in agreement with the estimated writes.

The granter is initially for Pebble compactions and snapshot ingests and is aware of node CPU and per-store disk bandwidth consumption. It monitors the CPU and write bandwidth of this work (read bandwidth is not explicitly modeled, but the observation of aggregate read bandwidth can correct for this oversight). The granter is integrated with MultiQueue (for snapshot ingest) such that the semaphore is generalized. It is also integrated with Pebble. cockroachdb#136615 (comment) has some experimental results. The granter can also run in a non-resource driven mode, in which case it would simply apply fixed limits on the number of compactions (or snapshot) per store and per node. There is currently no rate limiting of long-lived work, once admitted. Such rate limiting may be necessary for production deployments. Informs cockroachdb#136615 Epic: none Release note: None

sumeerbhola · 2025-03-04T20:25:42Z

Prototype is done

itsbilal added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Dec 3, 2024

github-project-automation bot added this to [Deprecated] Storage Dec 3, 2024

itsbilal self-assigned this Dec 3, 2024

github-project-automation bot moved this to Incoming in [Deprecated] Storage Dec 3, 2024

itsbilal closed this as completed Jan 7, 2025

itsbilal reopened this Jan 7, 2025

exalate-issue-sync bot assigned sumeerbhola and unassigned itsbilal Feb 4, 2025

sumeerbhola mentioned this issue Feb 14, 2025

admit_long: prototype a granter for long-lived work #141501

Draft

sumeerbhola mentioned this issue Feb 28, 2025

db: introduce CompactionScheduler and integrate with DB cockroachdb/pebble#4297

Open

sumeerbhola closed this as completed Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission, storage: prototype AC limiter for concurrent compactions #136615

admission, storage: prototype AC limiter for concurrent compactions #136615

itsbilal commented Dec 3, 2024 •

edited by sumeerbhola

Loading

sumeerbhola commented Feb 14, 2025

sumeerbhola commented Mar 4, 2025

admission, storage: prototype AC limiter for concurrent compactions #136615

admission, storage: prototype AC limiter for concurrent compactions #136615

Comments

itsbilal commented Dec 3, 2024 • edited by sumeerbhola Loading

sumeerbhola commented Feb 14, 2025

sumeerbhola commented Mar 4, 2025

itsbilal commented Dec 3, 2024 •

edited by sumeerbhola

Loading