-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cherrypick-1.1: storage: avoid replica thrashing when localities are different sizes #20934
cherrypick-1.1: storage: avoid replica thrashing when localities are different sizes #20934
Conversation
72e35a4
to
1cbb256
Compare
Review status: 0 of 6 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Updating a target store write stats immediately after rebalancing was recently addressed in cockroachdb#18425. With that change, if `updateLocalStoreAfterRebalance` is called before the `StorePool` had seen the `StoreDescriptor` in gossip, it will trigger a NPE. This change fixes this by making the update a no-op if the descriptor has yet to be seen in gossip.
… to improve balance
If the first target attempted was rejected due to the simulation claiming that it would be immediately removed, we would reuse the modified `rangeInfo.Desc.Replicas` that had the target added to it, messing with future iterations of the loop. Also, we weren't properly modifying the `candidates` slice, meaning that we could end up trying the same replica multiple times. Release note (bug fix): Improve data rebalancing to make thrashing back and forth between nodes much less likely.
Skipping the simulation when raftStatus.Progress is nil can make for undesirable thrashing of replicas, as seen when testing cockroachdb#20241. It's better to run the simulation without properly filtering replicas than to not run it at all. Release note: None
Fixes cockroachdb#20241 Release note (bug fix): avoid rebalance thrashing when localities have very different numbers of nodes
1cbb256
to
f01eb24
Compare
I've tested this with all the multi-node allocsim configs as well as some other arbitrary nodes-per-locality configurations I made up. It didn't always end up super balanced within the small localities (due to #20751), but did its job of maintaining a rough balance and avoided thrashing in all of them. I'll merge this later tonight unless someone speaks up before then. |
This is the minimal set of unmodified (other than to fix merge conflicts) commits needed to fix #20241 on the release-1.1 branch.
This is larger than our typical cherrypick, and while the code is pretty well tested, it hasn't been fully tested without a few other allocator commits that are on the master branch. If @bdarnell and others think it's worth cherrypicking something like this, I can do some additional manual testing of this branch with allocsim and/or spinning indigo back up.