-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: make load-based replica rebalancing decisions at the store level #28852
Conversation
f6900c0
to
0f13108
Compare
I take back what I said about this not working on tpc-c. As mentioned on slack, the problem was that running with Results of a one hour tpc-c 10k run on 30 nodes:
The pMax is not ideal. I'm not sure what's the cause of the outlier(s). But otherwise this is a good sign that more tests/polish are justified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but mostly just 👨🔬 🐶 on the
StoreRebalancer
changes. @BramGruneir do you mind taking a look at that?
Reviewed 6 of 6 files at r4, 1 of 1 files at r5, 1 of 1 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/allocator.go, line 383 at r4 (raw file):
existingReplicas: len(existing), aliveStores: aliveStoreCount, throttledStores: throttledStoreCount,
nit: always 0, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nvanbenschoten. Waiting on @BramGruneir isn't ideal since he's out this week, but I'll at least wait until I have metrics/debugging hooked up before merging, and will definitely wait for his review before cherrypicking.
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/storage/allocator.go, line 383 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: always 0, right?
True, although I'd prefer to leave as is rather than assume it's always 0 in case the condition above ever gets removed.
eacd889
to
c2a2041
Compare
Plans for improving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave this another full read through.
LGTM
Reviewed 3 of 3 files at r1, 9 of 9 files at r2, 3 of 3 files at r3, 1 of 6 files at r4, 13 of 13 files at r7.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
Follow-up to cockroachdb#28340, which did this for just leases. Fixes cockroachdb#17979 Release note (performance improvement): Range replicas will be automatically rebalanced throughout the cluster to even out the amount of QPS being handled by each node.
This leaves properly cleaning up the code for later, but ensures that the existing cluster setting will enable store-level rebalancing rather than the old experimental write/disk-based rebalancing. Release note: None
It's identical to the test for load-based lease rebalancing, just with more than 3 nodes such that replicas must be rebalanced in addition to leases in order for load to be properly spread across all nodes. Release note: None
This cleans up all the old code, settings, and tests without massively overhauling the structure of things. More could be done to simplify things, but this is the least intrusive set of changes that seem appropriate so late in the development cycle. Release note (backwards-incompatible change): The experimental, non-recommended stat-based rebalancing setting controlled by the kv.allocator.stat_based_rebalancing.enabled and kv.allocator.stat_rebalance_threshold cluster settings has been removed and replaced by a new, better supported approach to load-based rebalancing that can be controlled via the new kv.allocator.load_based_rebalancing cluster setting. By default, leases will be rebalanced within a cluster to achieve better QPS balance.
63ace74
to
2e5c856
Compare
Release note: None
d8666a4
to
5bbc29e
Compare
TFTR! bors r+ |
28852: storage: make load-based replica rebalancing decisions at the store level r=a-robinson a=a-robinson Built on top of #28340, which is where the first 3 commits are from. This is still somewhat incomplete, in that it's missing unit tests and I'm only just now running tpc-c 10k on it. Sending out now to start the discussion of whether to include it in 2.1, since we're obviously very late in the intended development cycle. Co-authored-by: Alex Robinson <[email protected]>
Build succeeded |
I missed this when reworking the settings in cockroachdb#28852. Fixes cockroachdb#29804 Fixes cockroachdb#29805 Release note: None
I missed this when reworking the settings in cockroachdb#28852. Fixes cockroachdb#29804 Fixes cockroachdb#29805 Release note: None
Built on top of #28340, which is where the first 3 commits are from. This is still somewhat incomplete, in that it's missing unit tests and I'm only just now running tpc-c 10k on it. Sending out now to start the discussion of whether to include it in 2.1, since we're obviously very late in the intended development cycle.