Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Fix handling of convergeScore in selection of best candidates #23036

Merged
merged 1 commit into from
Feb 26, 2018

Conversation

a-robinson
Copy link
Contributor

Also add testing of rebalancing in clusters with differently sized
localities as a follow-up to #22412.

Release note (bug fix): Fix the occasional selection of sub-optimal
rebalance targets.


This should be cherry-picked to release-2.0 due to the bug I found in (candidateList).best() while adding the test. I'm pretty shocked that that bug has been there for so long without being hit in any test case or noticed due to how unusual it looks.

The new test fails miserably on release-1.1 (see below), but reliably succeeds now due to #22412.

For fun, the output of running this against release-1.1:

--- FAIL: TestAllocatorRebalanceDifferentLocalitySizes (0.01s)
	allocator_test.go:2685: 1: RebalanceTarget([(n2,s2):? (n3,s3):? (n5,s5):?]) expected s1; got <nil>:
	allocator_test.go:2685: 2: RebalanceTarget([(n1,s1):? (n4,s4):? (n5,s5):?]) expected s3; got <nil>:
	allocator_test.go:2685: 4: RebalanceTarget([(n1,s1):? (n5,s5):? (n6,s6):?]) expected s3; got <nil>:
	allocator_test.go:2685: 5: RebalanceTarget([(n2,s2):? (n5,s5):? (n6,s6):?]) expected s3; got <nil>:
	allocator_test.go:2685: 6: RebalanceTarget([(n3,s3):? (n5,s5):? (n6,s6):?]) expected s1; got <nil>:
	allocator_test.go:2685: 7: RebalanceTarget([(n4,s4):? (n5,s5):? (n6,s6):?]) expected s1; got <nil>:
	allocator_test.go:2752: 3: RebalanceTarget([(n1,s1):? (n3,s3):? (n6,s6):?]) expected s9; got store_id:5 attrs:<> node:<node_id:5 address:<network_field:"" address_field:"" > attrs:<> locality:<tiers:<key:"locale" value:"3" > > ServerVersion:<major_val:0 minor_val:0 patch:0 unstable:0 > > capacity:<capacity:100 available:90 used:0 logical_bytes:10 range_count:10 lease_count:0 writes_per_second:0 bytes_per_replica:<p10:0 p25:0 p50:0 p75:0 p90:0 > writes_per_replica:<p10:0 p25:0 p50:0 p75:0 p90:0 > > : {"Target":"s5, valid:true, constraint:1.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:10 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)","Existing":"[\ns6, valid:true, constraint:1.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:20, logicalBytes:20 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)\ns1, valid:true, constraint:1.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:50, logicalBytes:50 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)\ns3, valid:true, constraint:1.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:50, logicalBytes:50 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)]","RangeBytes":0,"RangeWritesPerSecond":0}
	allocator_test.go:2752: 8: RebalanceTarget([(n5,s5):? (n6,s6):? (n7,s7):?]) expected s9; got <nil>:
	allocator_test.go:2752: 13: RebalanceTarget([(n1,s1):? (n5,s5):? (n10,s10):?]) expected s0; got store_id:9 attrs:<> node:<node_id:9 address:<network_field:"" address_field:"" > attrs:<> locality:<tiers:<key:"locale" value:"4" > > ServerVersion:<major_val:0 minor_val:0 patch:0 unstable:0 > > capacity:<capacity:100 available:70 used:0 logical_bytes:30 range_count:30 lease_count:0 writes_per_second:0 bytes_per_replica:<p10:0 p25:0 p50:0 p75:0 p90:0 > writes_per_replica:<p10:0 p25:0 p50:0 p75:0 p90:0 > > : {"Target":"s9, valid:true, constraint:1.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:30, logicalBytes:30 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)","Existing":"[\ns5, valid:true, constraint:1.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:10 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)\ns10, valid:true, constraint:1.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:40, logicalBytes:40 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)\ns1, valid:true, constraint:1.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:50, logicalBytes:50 B, writesPerSecond:0.00, details:(diversity=1.00, preferred=0)]","RangeBytes":0,"RangeWritesPerSecond":0}

@a-robinson a-robinson requested review from BramGruneir, petermattis and a team February 24, 2018 04:18
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/storage/allocator_scorer.go, line 312 at r1 (raw file):

		if cl[i].necessary != cl[0].necessary ||
			cl[i].diversityScore < cl[0].diversityScore ||
			cl[i].convergesScore < cl[0].convergesScore {

Could/should we be using candidate.less() here?

Also, I think the previous code was trying to handle the case where cl[i].diversityScore > cl[0].diversityScore, but this code doesn't. Is that a problem?


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/storage/allocator_scorer.go, line 312 at r1 (raw file):

Could/should we be using candidate.less() here?

No. This function requires that its input is already sorted by candidate.less, so there isn't much interesting to be learned by using candidate.less(). We could call candidate.compare() with some threshold that its return value must be greater than, but it'd just be a magic number and would be less readable.

The goal of this function is to pick out all the candidates that are the "best", where "best" includes some wiggle room to include not just the absolute single best store, but also some others that are similarly good but may have a few more replicas on them, because if every rebalance action deterministically chose the absolute single best store than cluster-wide rebalancing would get bottlenecked on that store (and would overwhelm it before snapshot rate limiting was added). Instead, we randomly choose from among the best in candidateList.selectGood() to create a bit more of a spread of where snapshots are going.

Also, I think the previous code was trying to handle the case where cl[i].diversityScore > cl[0].diversityScore, but this code doesn't. Is that a problem?

It isn't possible for cl[i].diversityScore > cl[0].diversityScore unless cl[i].necessary == false and cl[0].necessary == true, in which case we would have already returned earlier due to the necessary condition. And before the necessary code was added, it would have been completely impossible for cl[i].diversityScore > cl[0].diversityScore due to the required sorted order of the input.


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm:


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/storage/allocator_scorer.go, line 312 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Could/should we be using candidate.less() here?

No. This function requires that its input is already sorted by candidate.less, so there isn't much interesting to be learned by using candidate.less(). We could call candidate.compare() with some threshold that its return value must be greater than, but it'd just be a magic number and would be less readable.

The goal of this function is to pick out all the candidates that are the "best", where "best" includes some wiggle room to include not just the absolute single best store, but also some others that are similarly good but may have a few more replicas on them, because if every rebalance action deterministically chose the absolute single best store than cluster-wide rebalancing would get bottlenecked on that store (and would overwhelm it before snapshot rate limiting was added). Instead, we randomly choose from among the best in candidateList.selectGood() to create a bit more of a spread of where snapshots are going.

Also, I think the previous code was trying to handle the case where cl[i].diversityScore > cl[0].diversityScore, but this code doesn't. Is that a problem?

It isn't possible for cl[i].diversityScore > cl[0].diversityScore unless cl[i].necessary == false and cl[0].necessary == true, in which case we would have already returned earlier due to the necessary condition. And before the necessary code was added, it would have been completely impossible for cl[i].diversityScore > cl[0].diversityScore due to the required sorted order of the input.

Ok, I guess the inequality comparisons are confusing to read. Would it be equivalent to structure this loop body as:

if cl[i].necessary == cl[0].necessary &&
  cl[i].diversityScore == cl[0].diversityScore &&
  cl[i].convergesScore == cl[0].diversityScore {
  continue
}
return cl[:i]

This would make it clearer to me that we're trying to find the prefix that is equal on necessary, diversityScore and convergesScore.


Comments from Reviewable

Also add testing of rebalancing in clusters with differently sized
localities as a follow-up to cockroachdb#22412.

Release note (bug fix): Fix the occasional selection of sub-optimal
rebalance targets.
@a-robinson
Copy link
Contributor Author

TFTR, I've also deflaked the 2 test cases that were flaky by slightly changing the test structure.


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion.


pkg/storage/allocator_scorer.go, line 312 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ok, I guess the inequality comparisons are confusing to read. Would it be equivalent to structure this loop body as:

if cl[i].necessary == cl[0].necessary &&
  cl[i].diversityScore == cl[0].diversityScore &&
  cl[i].convergesScore == cl[0].diversityScore {
  continue
}
return cl[:i]

This would make it clearer to me that we're trying to find the prefix that is equal on necessary, diversityScore and convergesScore.

Sure, done. Also cleaned up candidateList.worst() similarly.


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm:


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants