Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: server: fix flaky drain test under race #99703

Merged
merged 1 commit into from
Mar 31, 2023

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Mar 27, 2023

Backport 1/1 commits from #99543 on behalf of @AlexTalks.

/cc @cockroachdb/release


While previously TestDrain would issue a drain request twice, and expect that after the second drain request there would be no remaining leases, we have seen in some race builds that a lease extension can occur before that second drain, leaving one lease remaining after the second drain request. This can be seen in the following log example:

I230325 00:39:18.151604 14728 1@server/drain.go:145 ⋮ [T1,n1] 383  drain request received with doDrain = true, shutdown = false
...
I230325 00:39:18.155547 986 kv/kvserver/replica_proposal.go:272 ⋮ [T1,n1,s1,r51/1:‹/Table/5{0-1}›,raft] 385  new range lease repl=(n1,s1):1 seq=1 start=0,0 exp=1679704764.152223164,0 pro=1679704758.152223164,0 following repl=(n1,s1):1 seq=1 start=0,0 exp=1679704746.135729956,0 pro=1679704740.135729956,0
I230325 00:39:18.172450 14728 1@server/drain.go:399 ⋮ [T1,n1] 386  (DEBUG) initiating kvserver node drain
I230325 00:39:18.172613 14728 1@kv/kvserver/store.go:1559 ⋮ [T1,drain,n1,s1] 387  (DEBUG) store marked as draining
I230325 00:39:18.182123 14728 1@server/drain.go:293 ⋮ [T1,n1] 388  drain remaining: 1
I230325 00:39:18.182249 14728 1@server/drain.go:295 ⋮ [T1,n1] 389  drain details: range lease iterations: 1
I230325 00:39:18.182404 14728 1@server/drain.go:175 ⋮ [T1,n1] 390  drain request completed without server shutdown

This change modifies the test to repeatedly issue drain requests until there is no remaining work, allowing the drain to complete upon subsequent requests.

Fixes: #86974

Release note: None


Release justification: Fix and unskip flaky test.

@blathers-crl blathers-crl bot requested review from a team as code owners March 27, 2023 17:56
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-99543 branch from ff498ad to d71e8f0 Compare March 27, 2023 17:56
@blathers-crl blathers-crl bot requested a review from knz March 27, 2023 17:56
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-99543 branch from 64abfd0 to 585706d Compare March 27, 2023 17:56
@blathers-crl
Copy link
Author

blathers-crl bot commented Mar 27, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Mar 27, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@AlexTalks AlexTalks force-pushed the blathers/backport-release-23.1-99543 branch from 585706d to dd2184e Compare March 28, 2023 23:59
@AlexTalks AlexTalks force-pushed the blathers/backport-release-23.1-99543 branch from dd2184e to b09a5b2 Compare March 29, 2023 15:26
While previously `TestDrain` would issue a drain request twice, and
expect that after the second drain request there would be no remaining
leases, we have seen in some race builds that a lease extension can
occur before that second drain, leaving one lease remaining after the
second drain request. This can be seen in the following log example:
```
I230325 00:39:18.151604 14728 1@server/drain.go:145 ⋮ [T1,n1] 383  drain request received with doDrain = true, shutdown = false
...
I230325 00:39:18.155547 986 kv/kvserver/replica_proposal.go:272 ⋮ [T1,n1,s1,r51/1:‹/Table/5{0-1}›,raft] 385  new range lease repl=(n1,s1):1 seq=1 start=0,0 exp=1679704764.152223164,0 pro=1679704758.152223164,0 following repl=(n1,s1):1 seq=1 start=0,0 exp=1679704746.135729956,0 pro=1679704740.135729956,0
I230325 00:39:18.172450 14728 1@server/drain.go:399 ⋮ [T1,n1] 386  (DEBUG) initiating kvserver node drain
I230325 00:39:18.172613 14728 1@kv/kvserver/store.go:1559 ⋮ [T1,drain,n1,s1] 387  (DEBUG) store marked as draining
I230325 00:39:18.182123 14728 1@server/drain.go:293 ⋮ [T1,n1] 388  drain remaining: 1
I230325 00:39:18.182249 14728 1@server/drain.go:295 ⋮ [T1,n1] 389  drain details: range lease iterations: 1
I230325 00:39:18.182404 14728 1@server/drain.go:175 ⋮ [T1,n1] 390  drain request completed without server shutdown
```
This change modifies the test to repeatedly issue drain requests until
there is no remaining work, allowing the drain to complete upon
subsequent requests.

Fixes: #86974

Release note: None
@AlexTalks AlexTalks force-pushed the blathers/backport-release-23.1-99543 branch from b09a5b2 to 9d8ea26 Compare March 29, 2023 23:08
@rafiss
Copy link
Collaborator

rafiss commented Mar 31, 2023

I'm going to go ahead and merge this since it's a test-only change and fixes a flake.

@rafiss rafiss merged commit d6d7367 into release-23.1 Mar 31, 2023
@rafiss rafiss deleted the blathers/backport-release-23.1-99543 branch March 31, 2023 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants