Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission, storage: prototype AC limiter for concurrent compactions #136615

Closed
itsbilal opened this issue Dec 3, 2024 · 2 comments
Closed

admission, storage: prototype AC limiter for concurrent compactions #136615

itsbilal opened this issue Dec 3, 2024 · 2 comments
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team

Comments

@itsbilal
Copy link
Contributor

itsbilal commented Dec 3, 2024

Prototype a slot-based admission control granter/requester well as an interface and integration point with Pebble to be able to limit concurrent compactions (and in the future, being able to pace compactions).

This granter will take into account CPU and IO resources used by a compaction, and Pebble will call into the interface to update the granter with the amount of IO taken up by a compaction. A linear model can be used for calculating IO tokens to deduct from compaction bytes read/written. Flushes as well as the first compaction in a store could be chosen to always run without permission.

Informs cockroachdb/pebble#1329 as well as #74697.

Jira issue: CRDB-45164

@itsbilal itsbilal added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Dec 3, 2024
@itsbilal itsbilal self-assigned this Dec 3, 2024
@itsbilal itsbilal closed this as completed Jan 7, 2025
@itsbilal itsbilal reopened this Jan 7, 2025
@exalate-issue-sync exalate-issue-sync bot assigned sumeerbhola and unassigned itsbilal Feb 4, 2025
@sumeerbhola
Copy link
Collaborator

Some results from a prototype, running in a setting where disk bandwidth was a bottleneck (400MiB/s provisioned bandwidth and 8vCPU), with workload ./workload run kv --init --histograms=perf/stats.json --concurrency=512 --splits=1000 --duration=120m0s --read-percent=0 --min-block-bytes=8192 --max-block-bytes=8192 on a single node cluster.

The cluster setting allowed one compaction to run without permission, and the granter can run more if resources permit. And the read+write bandwidth utilization goal was initially set to 0.6, then lowered to 0.4 and then increased to 0.8. Since this was local SSD, there is a hack to limit compaction bandwidth to 40MiB/s (otherwise a single compaction + flush + WAL writes can consume 240MiB/s, which doesn't allow us to demonstrate any control by this granter).

The granter is forecasting and estimating every 100ms. The first graph shows that we are ~4.32 compactions until 14:20:30 (aforementioned setting 0.6), then reduce (setting 0.4) and then increase (setting 0.8) at 14:26. The estimated and observed CPU due to compactions is low and consistent with each other.

Image

The following graph shows that the forecast admit_long.disk_write.forecast and actual disk.write.bytes are in agreement, and close to the configured (0.6 * 400 = 240MiB). And the observed writes from compactions are in agreement with the estimated writes.
Image

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Feb 14, 2025
The granter is initially for Pebble compactions and snapshot ingests
and is aware of node CPU and per-store disk bandwidth consumption. It
monitors the CPU and write bandwidth of this work (read bandwidth is
not explicitly modeled, but the observation of aggregate read bandwidth
can correct for this oversight).

The granter is integrated with MultiQueue (for snapshot ingest) such
that the semaphore is generalized. It is also integrated with Pebble.

cockroachdb#136615 (comment)
has some experimental results.

The granter can also run in a non-resource driven mode, in which case
it would simply apply fixed limits on the number of compactions (or
snapshot) per store and per node.

There is currently no rate limiting of long-lived work, once admitted.
Such rate limiting may be necessary for production deployments.

Informs cockroachdb#136615

Epic: none

Release note: None
@sumeerbhola
Copy link
Collaborator

Prototype is done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

2 participants