-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Fix handling of convergeScore in selection of best candidates #23036
Conversation
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. pkg/storage/allocator_scorer.go, line 312 at r1 (raw file):
Could/should we be using Also, I think the previous code was trying to handle the case where Comments from Reviewable |
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. pkg/storage/allocator_scorer.go, line 312 at r1 (raw file):
No. This function requires that its input is already sorted by The goal of this function is to pick out all the candidates that are the "best", where "best" includes some wiggle room to include not just the absolute single best store, but also some others that are similarly good but may have a few more replicas on them, because if every rebalance action deterministically chose the absolute single best store than cluster-wide rebalancing would get bottlenecked on that store (and would overwhelm it before snapshot rate limiting was added). Instead, we randomly choose from among the best in
It isn't possible for Comments from Reviewable |
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. pkg/storage/allocator_scorer.go, line 312 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Ok, I guess the inequality comparisons are confusing to read. Would it be equivalent to structure this loop body as:
This would make it clearer to me that we're trying to find the prefix that is equal on Comments from Reviewable |
Also add testing of rebalancing in clusters with differently sized localities as a follow-up to cockroachdb#22412. Release note (bug fix): Fix the occasional selection of sub-optimal rebalance targets.
50e4f74
to
06ef142
Compare
TFTR, I've also deflaked the 2 test cases that were flaky by slightly changing the test structure. Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion. pkg/storage/allocator_scorer.go, line 312 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Sure, done. Also cleaned up Comments from Reviewable |
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful. Comments from Reviewable |
Also add testing of rebalancing in clusters with differently sized
localities as a follow-up to #22412.
Release note (bug fix): Fix the occasional selection of sub-optimal
rebalance targets.
This should be cherry-picked to release-2.0 due to the bug I found in
(candidateList).best()
while adding the test. I'm pretty shocked that that bug has been there for so long without being hit in any test case or noticed due to how unusual it looks.The new test fails miserably on release-1.1 (see below), but reliably succeeds now due to #22412.
For fun, the output of running this against release-1.1: