kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS #61883

aayushshah15 · 2021-03-11T23:34:01Z

Load based range rebalancing in CRDB happens at the store level (StoreRebalancer, which rebalances based on QPS) as well as at the per-range level (replicateQueue, which rebalances based on range count). In broad strokes, at the per-range level, we do the following:

For each of the current stores that have a replica for the range, we compute a list of “comparable” stores — these are stores that meet the replication constraints or are identical to the existing store in terms of their locality.
We compute capacity statistics for each of these comparable subsets of stores.
If the existing store is “overfull” relative to its subset of comparable stores, we try to transfer the existing replica away. Likewise, if any other store is “underfull” relative to the subset, we also try to relinquish our replica and move it to such an underfull store.

In contrast to this, in the StoreRebalancer, we simply fetch the list of all stores in the cluster and compute QPS-based statistics across this entire set.

To see the hazard with this, consider a scenario where we have 5 regions with 3 stores each, but all tables are constrained to, say, some single region A. This will very likely result in all the 3 stores in region A fielding higher QPS than the cluster average.

In such a scenario, the coarseness of StoreRebalancer essentially de-activates it because it computes the QPS average across the entire set of stores in the cluster. The StoreRebalancer will not try to achieve balance across the 3 stores that satisfy constraints, but rather across the entire cluster. It will try to rebalance a replica away from these 3 stores but fail because all the other, underfull, stores violate constraints.

This means that the hottest ranges in the system will not be actively rebalanced away based on QPS.
cc @nvanbenschoten

gz#7724

Epic CRDB-6437

The text was updated successfully, but these errors were encountered:

aayushshah15 · 2021-03-12T00:02:26Z

This came up in a customer call where they had heterogenous load profiles across the different regions in their cluster -- some regions experiencing a lot more traffic than others. Due to the hazard described above, they were seeing vastly different QPS on stores even within the same region.

I think we can fix this fairly non-invasively by filtering the list of stores the StoreRebalancer computes its QPS average off of.

awoods187 · 2021-03-12T18:08:47Z

Linked to https://cockroachdb.zendesk.com/agent/tickets/7724

awoods187 · 2021-03-12T18:18:20Z

@nvanbenschoten can PTAL when you return?

nvanbenschoten · 2021-03-23T02:55:36Z

@aayushshah15 thanks for writing this up and discussing it today.

I think we can fix this fairly non-invasively by filtering the list of stores the StoreRebalancer computes its QPS average off of.

Where are you suggesting that we perform the filtering? On a per-range basis in chooseLeaseToTransfer and chooseReplicaToRebalance? It seems to me that we would need to invert some of the control flow in these functions if we start considering different sets of candidates for different ranges. For instance, would qpsMinThreshold and qpsMaxThreshold, which are both functions of the unfiltered storeList, need to be computed per range?

You mentioned that a form of per-range qps rebalancing was recently removed. Do you have a reference to that?

Also, is this issue saying the same thing as #31135?

aayushshah15 · 2021-03-23T04:13:02Z

Where are you suggesting that we perform the filtering?

What you said is pretty much what I had in mind. Do you see anything wrong with it that I may have missed?

You mentioned that a form of per-range qps rebalancing was recently removed

I must have misspoken, I meant "not very recently". It looks like 8e3dad8 removed it. We seem to have made a decision to always just rely on StoreRebalancer for QPS-based rebalancing.

Also, is this issue saying the same thing as #31135?

That issue seems to be talking about a slightly different problem. It's talking about cases where some overfull store has multiple super-hot ranges that the StoreRebalancer cannot move to any other store -- because they will cause every receiving store to also be overfull. Consider something like:

The values in the curly braces indicate ranges' QPS on each of the stores:

store 1: {100, 80, 20} 
store 2: {60}
store 3: {40} 
----

avg QPS = 100
overfull threshold = 105

The ideal placement of ranges here would be something like the following, but the StoreRebalancer will not try to do this.

store 1: {80, 20}
store 2: {60}
store 3: {100, 40}

65312: kvserver: disallow replica rebalancing to dead nodes r=nvanbenschoten,lunevalex a=aayushshah15 The previously existing logic to prevent replica rebalances to dead nodes was totally broken, and it was never caught because the accompanying unit test was also broken. The previous logic had an early return clause during the computation of diversity stats for _existing replicas_ rather than during the computation of stats for _potential candidates_. Additionally, the scenario that was previously created by this unit test was never going to trigger a replica rebalance, and the unit test would never actually execute any of its checks in the face of an empty result set. This commit fixes the check to actually prevent rebalancing to dead nodes and also re-structures the test. Noticed while working on #61883. /cc @cockroachdb/kv Release note (bug fix): Replica rebalancing could previously sometimes rebalance to stores on dead nodes. This bug is now fixed. 65377: sql: fix error message for out of range interval r=otan a=rafiss fixes #62369 Release note (bug fix): CockroachDB now shows a correct error message if it tries to parse an interval that is out of range. Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>

…localities This commit teaches the `StoreRebalancer` to make load-based rebalancing decisions that are meaningful within the context of the replication constraints placed on the ranges being relocated and the set of stores that can legally receive replicas for such ranges. Previously, the `StoreRebalancer` would compute the QPS underfull and overfull thresholds based on the overall average QPS being served by all stores in the cluster. Notably, this included stores that were in replication zones that would not satisfy required constraints for the range being considered for rebalancing. This meant that the store rebalancer would effectively never be able to rebalance ranges within the stores inside heavily loaded replication zones (since all the _valid_ stores would be above the overfull thresholds). This patch is a move away from the bespoke relocation logic in the `StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the rebalancing logic used by the `replicateQueue` that already has the machinery to compute load based signals for candidates _relative to other comparable stores_. The main difference here is that the `StoreRebalancer` uses this machinery to promote convergence of QPS across stores, whereas the `replicateQueue` uses it to promote convergence of range counts. A series of preceeding commits in this patchset generalize the existing replica rebalancing logic, and this commit teaches the `StoreRebalancer` to use it. This generalization also addresses another key limitation (see cockroachdb#62922) of the `StoreRebalancer` regarding its inability to make partial improvements to a range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_ off of overfull stores, it would give up and not even move the subset of replicas it could. This is no longer the case. Resolves cockroachdb#61883 Resolves cockroachdb#62992 /cc @cockroachdb/kv Release note (performance improvement): QPS-based replica rebalancing is now aware of different constraints placed on different replication zones. This means that heterogeneously loaded replication zones (for instance, regions) will achieve a more even distribution of QPS within the stores inside each such zone.

blathers-crl · 2021-09-07T15:32:25Z

Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

blathers-crl · 2021-09-07T15:32:45Z

Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

…localities This commit teaches the `StoreRebalancer` to make load-based rebalancing decisions that are meaningful within the context of the replication constraints placed on the ranges being relocated and the set of stores that can legally receive replicas for such ranges. Previously, the `StoreRebalancer` would compute the QPS underfull and overfull thresholds based on the overall average QPS being served by all stores in the cluster. Notably, this included stores that were in replication zones that would not satisfy required constraints for the range being considered for rebalancing. This meant that the store rebalancer would effectively never be able to rebalance ranges within the stores inside heavily loaded replication zones (since all the _valid_ stores would be above the overfull thresholds). This patch is a move away from the bespoke relocation logic in the `StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the rebalancing logic used by the `replicateQueue` that already has the machinery to compute load based signals for candidates _relative to other comparable stores_. The main difference here is that the `StoreRebalancer` uses this machinery to promote convergence of QPS across stores, whereas the `replicateQueue` uses it to promote convergence of range counts. A series of preceeding commits in this patchset generalize the existing replica rebalancing logic, and this commit teaches the `StoreRebalancer` to use it. This generalization also addresses another key limitation (see cockroachdb#62922) of the `StoreRebalancer` regarding its inability to make partial improvements to a range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_ off of overfull stores, it would give up and not even move the subset of replicas it could. This is no longer the case. Resolves cockroachdb#61883 Resolves cockroachdb#62992 Release justification: Fixes high priority bug Release note (performance improvement): QPS-based replica rebalancing is now aware of different constraints placed on different replication zones. This means that heterogeneously loaded replication zones (for instance, regions) will achieve a more even distribution of QPS within the stores inside each such zone. /cc @cockroachdb/kv

65379: kvserver: actuate load-based replica rebalancing under heterogeneous localities r=aayushshah15 a=aayushshah15 This commit teaches the `StoreRebalancer` to make load-based rebalancing decisions that are meaningful within the context of the replication constraints placed on the ranges being relocated and the set of stores that can legally receive replicas for such ranges. Previously, the `StoreRebalancer` would compute the QPS underfull and overfull thresholds based on the overall average QPS being served by all stores in the cluster. Notably, this included stores that were in replication zones that would not satisfy required constraints for the range being considered for rebalancing. This meant that the store rebalancer would effectively never be able to rebalance ranges within the stores inside heavily loaded replication zones (since all the _valid_ stores would be above the overfull thresholds). This patch is a move away from the bespoke relocation logic in the `StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the rebalancing logic used by the `replicateQueue` that already has the machinery to compute load based signals for candidates _relative to other comparable stores_. The main difference here is that the `StoreRebalancer` uses this machinery to promote convergence of QPS across stores, whereas the `replicateQueue` uses it to promote convergence of range counts. A series of preceeding commits in this patchset generalize the existing replica rebalancing logic, and this commit teaches the `StoreRebalancer` to use it. This generalization also addresses another key limitation (see #62992) of the `StoreRebalancer` regarding its inability to make partial improvements to a range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_ off of overfull stores, it would give up and not even move the subset of replicas it could. This is no longer the case. Resolves #61883 Resolves #62992 Resolves #31135 /cc @cockroachdb/kv Release justification: fixes a set of major limitations behind numerous support escalations Release note (performance improvement): QPS-based rebalancing is now aware of different constraints placed on different replication zones. This means that heterogeneously loaded replication zones (for instance, regions) will achieve a more even distribution of QPS within the stores inside each such zone. Co-authored-by: Aayush Shah <[email protected]>

aayushshah15 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Mar 11, 2021

aayushshah15 changed the title ~~kvserver: StoreRebalancer is too coarse when comparing stores’ QPS~~ kvserver: StoreRebalancer is not constraints-aware when comparing stores’ QPS Mar 12, 2021

aayushshah15 mentioned this issue Apr 1, 2021

kvserver: allow partial load-based rebalancing of ranges #62992

Closed

aayushshah15 mentioned this issue May 17, 2021

kvserver: disallow replica rebalancing to dead nodes #65312

Merged

aayushshah15 self-assigned this May 17, 2021

aayushshah15 mentioned this issue May 18, 2021

kvserver: actuate load-based replica rebalancing under heterogeneous localities #65379

Merged

aayushshah15 mentioned this issue May 18, 2021

release-21.1: kvserver: disallow replica rebalancing to non-live nodes #65428

Merged

nvanbenschoten added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 7, 2021

nvanbenschoten added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 7, 2021

nvanbenschoten added the branch-release-21.2 label Sep 7, 2021

exalate-issue-sync bot added T-multi-region and removed T-multiregion T-kv KV Team branch-release-21.2 labels Sep 7, 2021

craig bot closed this as completed in d611828 Sep 8, 2021

jlinder added T-multiregion and removed T-multi-region labels Oct 14, 2021

exalate-issue-sync bot changed the title ~~kvserver: StoreRebalancer is not constraints-aware when comparing stores’ QPS~~ kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS #61883

kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS #61883

aayushshah15 commented Mar 11, 2021 •

edited by ajstorm

Loading

aayushshah15 commented Mar 12, 2021 •

edited

Loading

awoods187 commented Mar 12, 2021

awoods187 commented Mar 12, 2021

nvanbenschoten commented Mar 23, 2021

aayushshah15 commented Mar 23, 2021

blathers-crl bot commented Sep 7, 2021

blathers-crl bot commented Sep 7, 2021

kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS #61883

kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS #61883

Comments

aayushshah15 commented Mar 11, 2021 • edited by ajstorm Loading

aayushshah15 commented Mar 12, 2021 • edited Loading

awoods187 commented Mar 12, 2021

awoods187 commented Mar 12, 2021

nvanbenschoten commented Mar 23, 2021

aayushshah15 commented Mar 23, 2021

blathers-crl bot commented Sep 7, 2021

blathers-crl bot commented Sep 7, 2021

aayushshah15 commented Mar 11, 2021 •

edited by ajstorm

Loading

aayushshah15 commented Mar 12, 2021 •

edited

Loading