Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: simplify scatter implementation #17644

Merged
merged 1 commit into from
Aug 15, 2017

Conversation

petermattis
Copy link
Collaborator

Rather than using custom logic in Replica.adminScatter, call into
replicateQueue.processOneChange until no further action needs to be
taken. This fixes a major caveat with the prior implementation: replicas
were not removed synchronously leading to conditions where an arbitrary
number of replicas could be added to a range via scattering.

Fixes #17612

@petermattis petermattis requested a review from a team August 15, 2017 00:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator Author

All of the tests still pass, which makes me suspect that scatter isn't well tested.

@benesch I'm curious if you can run this on restore workloads to see if it has any negative impact. My suspicion is that it will be fine, but I'd prefer to be cautious at this time in the cycle.

@petermattis
Copy link
Collaborator Author

I should mention that I've been testing this on sky and it provides near ideal scattering in my testing. And the implementation seems easier to understand conceptually. In effect, it temporarily provides a dedicated goroutine to the replicate queue for the specified replica.

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017 via email

@a-robinson
Copy link
Contributor

LGTM. I like the simplicity of this much more than what was there before, but I'd wait on @benesch to test how it works for restore in practice (and perhaps double check what was causing problems in the original scatter implementation). This still suffers from the problems in #17341, but that's a feature more than a bug at this point in the release cycle.


Reviewed 2 of 2 files at r1.
Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

This might actually speed up RESTORE as now we'll only ever be restoring into 3 replicas while previously we could be restoring into 6. 🤞

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

Running restore tests now!

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

Well, this is odd:

screenshot 2017-08-15 11 02 17

@petermattis
Copy link
Collaborator Author

@benesch Can you try running again with set cluster setting kv.allocator.stat_based_rebalancing.enabled = false?

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

For posterity, the whole thing took 28m. It usually takes about 15m, I think. Here's the final state:

screenshot 2017-08-15 11 23 06

Rerunning without stats rebalancing now.

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

Bingo. Though now I don't know whether to be happy or worried.

screenshot 2017-08-15 11 54 55

If you don't mind, I want to run two controls before stamping this.

@petermattis
Copy link
Collaborator Author

Yep, take your time.

@a-robinson This is evidence towards #17645 or that stats-based rebalancing still needs tuning.

@bdarnell
Copy link
Contributor

:lgtm:


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/storage/replica_command.go, line 3995 at r1 (raw file):

	var allowLeaseTransfer bool
	for re := retry.StartWithCtx(ctx, retryOpts); re.Next(); {
		canTransferLease := func() bool { return false }

You can do canTransferLease := func() bool { return allowLeaseTransfer } outside the loop (but I guess that forces a heap allocation? If that's significant here add a comment so someone doesn't try to simplify it in the future).


Comments from Reviewable

Rather than using custom logic in Replica.adminScatter, call into
replicateQueue.processOneChange until no further action needs to be
taken. This fixes a major caveat with the prior implementation: replicas
were not removed synchronously leading to conditions where an arbitrary
number of replicas could be added to a range via scattering.

Fixes cockroachdb#17612
@petermattis
Copy link
Collaborator Author

Review status: 1 of 2 files reviewed at latest revision, 1 unresolved discussion.


pkg/storage/replica_command.go, line 3995 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

You can do canTransferLease := func() bool { return allowLeaseTransfer } outside the loop (but I guess that forces a heap allocation? If that's significant here add a comment so someone doesn't try to simplify it in the future).

Clever. I can't imagine the heap allocation is problematic here. Changed.


Comments from Reviewable

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

Ok, seems to match TPCH10 on a recent master (301e5fc) without stats-based rebalancing:

screenshot 2017-08-15 12 50 10

Things seem really broken with stats-based rebalancing:

screenshot 2017-08-15 13 20 53

In both cases the restore took about 22m. Turns out there's a default cluster setting limiting the RESTORE write rate to 30MB/s, which is why these aren't taking the 15m I was expecting.

So the new scatter seems as good as the old one. LGTM! 🎉

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

@petermattis, we should shrink the test you've been running on sky and turn it into an acceptance test for scatter.

@petermattis
Copy link
Collaborator Author

@benesch Yes, scatter needs better testing. The test would essentially be kv --concurrency 512 --splits 1000000 --max-ops 1. Question is how to determine whether scatter worked well. I suppose we could look at the number of replicas vs ranges and the balance across the cluster.

@petermattis petermattis merged commit 779d80e into cockroachdb:master Aug 15, 2017
@petermattis petermattis deleted the pmattis/scatter branch August 15, 2017 17:56
@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

Yeah, I think the criteria for success is actually straightforward to measure. I'll file an issue.

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

Never mind. I just saw #17660.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: 17 replicas on a range configured for 3
5 participants