Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: StoreRebalancer is not constraints-aware when comparing stores’ QPS #61883

Closed
aayushshah15 opened this issue Mar 11, 2021 · 7 comments · Fixed by #65379
Closed
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-multiregion

Comments

@aayushshah15
Copy link
Contributor

aayushshah15 commented Mar 11, 2021

Load based range rebalancing in CRDB happens at the store level (StoreRebalancer, which rebalances based on QPS) as well as at the per-range level (replicateQueue, which rebalances based on range count). In broad strokes, at the per-range level, we do the following:

  1. For each of the current stores that have a replica for the range, we compute a list of “comparable” stores — these are stores that meet the replication constraints or are identical to the existing store in terms of their locality.
  2. We compute capacity statistics for each of these comparable subsets of stores.
  3. If the existing store is “overfull” relative to its subset of comparable stores, we try to transfer the existing replica away. Likewise, if any other store is “underfull” relative to the subset, we also try to relinquish our replica and move it to such an underfull store.

In contrast to this, in the StoreRebalancer, we simply fetch the list of all stores in the cluster and compute QPS-based statistics across this entire set.

To see the hazard with this, consider a scenario where we have 5 regions with 3 stores each, but all tables are constrained to, say, some single region A. This will very likely result in all the 3 stores in region A fielding higher QPS than the cluster average.

In such a scenario, the coarseness of StoreRebalancer essentially de-activates it because it computes the QPS average across the entire set of stores in the cluster. The StoreRebalancer will not try to achieve balance across the 3 stores that satisfy constraints, but rather across the entire cluster. It will try to rebalance a replica away from these 3 stores but fail because all the other, underfull, stores violate constraints.

This means that the hottest ranges in the system will not be actively rebalanced away based on QPS.
cc @nvanbenschoten

gz#7724

Epic CRDB-6437

@aayushshah15 aayushshah15 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Mar 11, 2021
@aayushshah15
Copy link
Contributor Author

aayushshah15 commented Mar 12, 2021

This came up in a customer call where they had heterogenous load profiles across the different regions in their cluster -- some regions experiencing a lot more traffic than others. Due to the hazard described above, they were seeing vastly different QPS on stores even within the same region.

I think we can fix this fairly non-invasively by filtering the list of stores the StoreRebalancer computes its QPS average off of.

@aayushshah15 aayushshah15 changed the title kvserver: StoreRebalancer is too coarse when comparing stores’ QPS kvserver: StoreRebalancer is not constraints-aware when comparing stores’ QPS Mar 12, 2021
@awoods187
Copy link
Contributor

@awoods187
Copy link
Contributor

@nvanbenschoten can PTAL when you return?

@nvanbenschoten
Copy link
Member

@aayushshah15 thanks for writing this up and discussing it today.

I think we can fix this fairly non-invasively by filtering the list of stores the StoreRebalancer computes its QPS average off of.

Where are you suggesting that we perform the filtering? On a per-range basis in chooseLeaseToTransfer and chooseReplicaToRebalance? It seems to me that we would need to invert some of the control flow in these functions if we start considering different sets of candidates for different ranges. For instance, would qpsMinThreshold and qpsMaxThreshold, which are both functions of the unfiltered storeList, need to be computed per range?

You mentioned that a form of per-range qps rebalancing was recently removed. Do you have a reference to that?

Also, is this issue saying the same thing as #31135?

@aayushshah15
Copy link
Contributor Author

Where are you suggesting that we perform the filtering?

What you said is pretty much what I had in mind. Do you see anything wrong with it that I may have missed?

You mentioned that a form of per-range qps rebalancing was recently removed

I must have misspoken, I meant "not very recently". It looks like 8e3dad8 removed it. We seem to have made a decision to always just rely on StoreRebalancer for QPS-based rebalancing.

Also, is this issue saying the same thing as #31135?

That issue seems to be talking about a slightly different problem. It's talking about cases where some overfull store has multiple super-hot ranges that the StoreRebalancer cannot move to any other store -- because they will cause every receiving store to also be overfull. Consider something like:

The values in the curly braces indicate ranges' QPS on each of the stores:

store 1: {100, 80, 20} 
store 2: {60}
store 3: {40} 
----

avg QPS = 100
overfull threshold = 105

The ideal placement of ranges here would be something like the following, but the StoreRebalancer will not try to do this.

store 1: {80, 20}
store 2: {60}
store 3: {100, 40}

@aayushshah15 aayushshah15 self-assigned this May 17, 2021
craig bot pushed a commit that referenced this issue May 18, 2021
65312: kvserver: disallow replica rebalancing to dead nodes r=nvanbenschoten,lunevalex a=aayushshah15

The previously existing logic to prevent replica rebalances to dead nodes was
totally broken, and it was never caught because the accompanying unit test was
also broken.

The previous logic had an early return clause during the computation of
diversity stats for _existing replicas_ rather than during the computation of
stats for _potential candidates_. Additionally, the scenario that was
previously created by this unit test was never going to trigger a replica
rebalance, and the unit test would never actually execute any of its checks in
the face of an empty result set.

This commit fixes the check to actually prevent rebalancing to dead nodes and
also re-structures the test.

Noticed while working on #61883.

/cc @cockroachdb/kv

Release note (bug fix): Replica rebalancing could previously sometimes
rebalance to stores on dead nodes. This bug is now fixed.


65377: sql: fix error message for out of range interval r=otan a=rafiss

fixes #62369

Release note (bug fix): CockroachDB now shows a correct error message if
it tries to parse an interval that is out of range.

Co-authored-by: Aayush Shah <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue May 25, 2021
…localities

This commit teaches the `StoreRebalancer` to make load-based rebalancing
decisions that are meaningful within the context of the replication constraints
placed on the ranges being relocated and the set of stores that can legally
receive replicas for such ranges.

Previously, the `StoreRebalancer` would compute the QPS underfull and overfull
thresholds based on the overall average QPS being served by all stores in the
cluster. Notably, this included stores that were in replication zones that
would not satisfy required constraints for the range being considered for
rebalancing. This meant that the store rebalancer would effectively never be
able to rebalance ranges within the stores inside heavily loaded replication
zones (since all the _valid_ stores would be above the overfull thresholds).

This patch is a move away from the bespoke relocation logic in the
`StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the
rebalancing logic used by the `replicateQueue` that already has the machinery
to compute load based signals for candidates _relative to other comparable
stores_. The main difference here is that the `StoreRebalancer` uses this
machinery to promote convergence of QPS across stores, whereas the
`replicateQueue` uses it to promote convergence of range counts. A series of
preceeding commits in this patchset generalize the existing replica rebalancing
logic, and this commit teaches the `StoreRebalancer` to use it.

This generalization also addresses another key limitation (see cockroachdb#62922) of the
`StoreRebalancer` regarding its inability to make partial improvements to a
range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_
off of overfull stores, it would give up and not even move the subset of
replicas it could. This is no longer the case.

Resolves cockroachdb#61883
Resolves cockroachdb#62992

/cc @cockroachdb/kv

Release note (performance improvement): QPS-based replica rebalancing is now
aware of different constraints placed on different replication zones. This
means that heterogeneously loaded replication zones (for instance, regions)
will achieve a more even distribution of QPS within the stores inside each
such zone.
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 31, 2021
…localities

This commit teaches the `StoreRebalancer` to make load-based rebalancing
decisions that are meaningful within the context of the replication constraints
placed on the ranges being relocated and the set of stores that can legally
receive replicas for such ranges.

Previously, the `StoreRebalancer` would compute the QPS underfull and overfull
thresholds based on the overall average QPS being served by all stores in the
cluster. Notably, this included stores that were in replication zones that
would not satisfy required constraints for the range being considered for
rebalancing. This meant that the store rebalancer would effectively never be
able to rebalance ranges within the stores inside heavily loaded replication
zones (since all the _valid_ stores would be above the overfull thresholds).

This patch is a move away from the bespoke relocation logic in the
`StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the
rebalancing logic used by the `replicateQueue` that already has the machinery
to compute load based signals for candidates _relative to other comparable
stores_. The main difference here is that the `StoreRebalancer` uses this
machinery to promote convergence of QPS across stores, whereas the
`replicateQueue` uses it to promote convergence of range counts. A series of
preceeding commits in this patchset generalize the existing replica rebalancing
logic, and this commit teaches the `StoreRebalancer` to use it.

This generalization also addresses another key limitation (see cockroachdb#62922) of the
`StoreRebalancer` regarding its inability to make partial improvements to a
range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_
off of overfull stores, it would give up and not even move the subset of
replicas it could. This is no longer the case.

Resolves cockroachdb#61883
Resolves cockroachdb#62992

/cc @cockroachdb/kv

Release note (performance improvement): QPS-based replica rebalancing is now
aware of different constraints placed on different replication zones. This
means that heterogeneously loaded replication zones (for instance, regions)
will achieve a more even distribution of QPS within the stores inside each
such zone.
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Sep 4, 2021
…localities

This commit teaches the `StoreRebalancer` to make load-based rebalancing
decisions that are meaningful within the context of the replication constraints
placed on the ranges being relocated and the set of stores that can legally
receive replicas for such ranges.

Previously, the `StoreRebalancer` would compute the QPS underfull and overfull
thresholds based on the overall average QPS being served by all stores in the
cluster. Notably, this included stores that were in replication zones that
would not satisfy required constraints for the range being considered for
rebalancing. This meant that the store rebalancer would effectively never be
able to rebalance ranges within the stores inside heavily loaded replication
zones (since all the _valid_ stores would be above the overfull thresholds).

This patch is a move away from the bespoke relocation logic in the
`StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the
rebalancing logic used by the `replicateQueue` that already has the machinery
to compute load based signals for candidates _relative to other comparable
stores_. The main difference here is that the `StoreRebalancer` uses this
machinery to promote convergence of QPS across stores, whereas the
`replicateQueue` uses it to promote convergence of range counts. A series of
preceeding commits in this patchset generalize the existing replica rebalancing
logic, and this commit teaches the `StoreRebalancer` to use it.

This generalization also addresses another key limitation (see cockroachdb#62922) of the
`StoreRebalancer` regarding its inability to make partial improvements to a
range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_
off of overfull stores, it would give up and not even move the subset of
replicas it could. This is no longer the case.

Resolves cockroachdb#61883
Resolves cockroachdb#62992

/cc @cockroachdb/kv

Release note (performance improvement): QPS-based replica rebalancing is now
aware of different constraints placed on different replication zones. This
means that heterogeneously loaded replication zones (for instance, regions)
will achieve a more even distribution of QPS within the stores inside each
such zone.
@nvanbenschoten nvanbenschoten added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 7, 2021
@blathers-crl
Copy link

blathers-crl bot commented Sep 7, 2021

Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@nvanbenschoten nvanbenschoten added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 7, 2021
@blathers-crl
Copy link

blathers-crl bot commented Sep 7, 2021

Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Sep 8, 2021
…localities

This commit teaches the `StoreRebalancer` to make load-based rebalancing
decisions that are meaningful within the context of the replication constraints
placed on the ranges being relocated and the set of stores that can legally
receive replicas for such ranges.

Previously, the `StoreRebalancer` would compute the QPS underfull and overfull
thresholds based on the overall average QPS being served by all stores in the
cluster. Notably, this included stores that were in replication zones that
would not satisfy required constraints for the range being considered for
rebalancing. This meant that the store rebalancer would effectively never be
able to rebalance ranges within the stores inside heavily loaded replication
zones (since all the _valid_ stores would be above the overfull thresholds).

This patch is a move away from the bespoke relocation logic in the
`StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the
rebalancing logic used by the `replicateQueue` that already has the machinery
to compute load based signals for candidates _relative to other comparable
stores_. The main difference here is that the `StoreRebalancer` uses this
machinery to promote convergence of QPS across stores, whereas the
`replicateQueue` uses it to promote convergence of range counts. A series of
preceeding commits in this patchset generalize the existing replica rebalancing
logic, and this commit teaches the `StoreRebalancer` to use it.

This generalization also addresses another key limitation (see cockroachdb#62922) of the
`StoreRebalancer` regarding its inability to make partial improvements to a
range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_
off of overfull stores, it would give up and not even move the subset of
replicas it could. This is no longer the case.

Resolves cockroachdb#61883
Resolves cockroachdb#62992

Release justification: Fixes high priority bug

Release note (performance improvement): QPS-based replica rebalancing is now
aware of different constraints placed on different replication zones. This
means that heterogeneously loaded replication zones (for instance, regions)
will achieve a more even distribution of QPS within the stores inside each
such zone.

/cc @cockroachdb/kv
craig bot pushed a commit that referenced this issue Sep 8, 2021
65379: kvserver: actuate load-based replica rebalancing under heterogeneous localities r=aayushshah15 a=aayushshah15

This commit teaches the `StoreRebalancer` to make load-based rebalancing
decisions that are meaningful within the context of the replication constraints
placed on the ranges being relocated and the set of stores that can legally
receive replicas for such ranges.

Previously, the `StoreRebalancer` would compute the QPS underfull and overfull
thresholds based on the overall average QPS being served by all stores in the
cluster. Notably, this included stores that were in replication zones that
would not satisfy required constraints for the range being considered for
rebalancing. This meant that the store rebalancer would effectively never be
able to rebalance ranges within the stores inside heavily loaded replication
zones (since all the _valid_ stores would be above the overfull thresholds).

This patch is a move away from the bespoke relocation logic in the
`StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the
rebalancing logic used by the `replicateQueue` that already has the machinery
to compute load based signals for candidates _relative to other comparable
stores_. The main difference here is that the `StoreRebalancer` uses this
machinery to promote convergence of QPS across stores, whereas the
`replicateQueue` uses it to promote convergence of range counts. A series of
preceeding commits in this patchset generalize the existing replica rebalancing
logic, and this commit teaches the `StoreRebalancer` to use it.

This generalization also addresses another key limitation (see #62992) of the
`StoreRebalancer` regarding its inability to make partial improvements to a
range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_
off of overfull stores, it would give up and not even move the subset of
replicas it could. This is no longer the case.

Resolves #61883
Resolves #62992
Resolves #31135

/cc @cockroachdb/kv

Release justification: fixes a set of major limitations behind numerous support escalations

Release note (performance improvement): QPS-based rebalancing is now
aware of different constraints placed on different replication zones. This
means that heterogeneously loaded replication zones (for instance, regions)
will achieve a more even distribution of QPS within the stores inside each
such zone.


Co-authored-by: Aayush Shah <[email protected]>
@craig craig bot closed this as completed in d611828 Sep 8, 2021
@exalate-issue-sync exalate-issue-sync bot changed the title kvserver: StoreRebalancer is not constraints-aware when comparing stores’ QPS kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-multiregion
Projects
None yet
4 participants