-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: Clean up snapshot rate limits #63728
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
T-kv
KV Team
Comments
This was referenced Jun 6, 2022
My plan for this is to deprecate the |
andrewbaptist
added a commit
to andrewbaptist/cockroach
that referenced
this issue
Apr 28, 2023
Fixes: cockroachdb#63728 There were two settings that controlled snapshot transfer rates, however this has caused numerous problems when they are set differently as the code makes timeout assumptions based on this rate. Epic: none Release note: This change removes a user configurable setting `kv.snapshot_recovery.max_rate`. Guidance to customers has been to always set this equal to `kv.snapshot_rebalance.max_rate` and this change will start using that setting for all snapshots. If the customer previously set `kv.snapshot_recovery.max_rate` it will be cleared after this is released and future attempts to set it will fail with: "ERROR: unknown cluster setting 'kv.snapshot_recovery.max_rate'
craig bot
pushed a commit
that referenced
this issue
May 1, 2023
102596: kv: remove kv.snapshot_recovery.max_rate setting r=tbg a=andrewbaptist Fixes: #63728 There were two settings that controlled snapshot transfer rates, however, this has caused numerous problems when they are set differently as the code makes timeout assumptions based on this rate. Epic: none Release note: This change removes a user configurable setting `kv.snapshot_recovery.max_rate`. Guidance to customers has been to always set this equal to `kv.snapshot_rebalance.max_rate` and this change will start using that setting for all snapshots. If the customer previously set `kv.snapshot_recovery.max_rate` it will be cleared after this is released and future attempts to set it will fail with: "ERROR: unknown cluster setting 'kv.snapshot_recovery.max_rate' Co-authored-by: Andrew Baptist <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
T-kv
KV Team
Raft snapshots are currently governed by two rate limit settings,
kv.snapshot_rebalance.max_rate
andkv.snapshot_recovery.max_rate
. They currently default to the same value and we recommend that if one is changed, the other be changed to match. In that case there is no good reason to have two settings; we should consolidate on one and get rid of the other (this has been a longstanding TODO in the code but there doesn't seem to have been an issue for it until now)On the other hand, things have changed enough that we might want to improve the prioritization scheme and use two different values for the rate limits. The idea behind the two settings is that the higher priority "recovery" limit is used when the snapshot is necessary to bring a range in compliance with its survivability goals, with the lower "rebalance" limit used at other times. We deprecated the use of two separate rate limits for two reasons:
concurrentSnapshotApplyLimit
(which is currently 1), we had priority inversion problems in which a slow rebalance snapshot would tie up the one allowed snapshot slot, preventing a higher-priority recovery snapshot from proceeding.Now that we have streaming snapshots, we could consider raising the
concurrentSnapshotApplyLimit
(and reserve some slots for recovery snapshots). With atomic replica changes and explicit survival goals for multi-region clusters, we should now be able to accurately classify snapshots based on whether or not they are necessary for survival goals. With those two changes, the rebalance/recovery rate limit settings make sense again and we could leave them as two separate options.Jira issue: CRDB-6748
The text was updated successfully, but these errors were encountered: