Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: kv: deflake multi-node kvnemesis #63403

Merged

Conversation

nvanbenschoten
Copy link
Member

Backport 3/3 commits from #62580.

/cc @cockroachdb/release


Fixes #61322.
Prep to address #59062.

kv: don't consider Subsume error as assertion failures

These were added in b15e3dd, but that was incorrect. It is valid for a merge that races with another merge or split to hit these case, as the Subsume request is non-transactional.

Also, allow RHS range bounds do not match errors in kvnemesis. This is a valid error for the reasons mentioned above.

kv: mark error from Replica.GetSnapshot with errMarkSnapshotError

Without it, we see the following errors in kvnemesis:

Wraps: (2) error applying x.AdminChangeReplicas(ctx, /Table/50/"244956da", [{ADD_VOTER n2,s2}]) // [n1,s1,r36/1:/Table/{40-50/"ba86b…}]: failed to generate LEARNER_INITIAL snapshot: couldn't find range descriptor

kv: allow two more errors with TransferLeaseOperation

This deflakes TestKVNemesisMultiNode. The test had recently gotten flakier, which I bisected back to 3fe1992. It appears that the changes in that commit made it more likely to hit this replica not found in RangeDescriptor, which is returned after a TransferLease request has acquired latches, in addition to the existing unable to find store \d+ in range error, which is returned before a TransferLease request has acquired latches.

These were added in b15e3dd, but that was incorrect. It is valid for a
merge that races with another merge or split to hit these case, as the
Subsume request is non-transactional.

Also, allow `RHS range bounds do not match` errors in kvnemesis. This is
a valid error for the reasons mentioned above.
Without it, we see the following errors in kvnemesis:
```
Wraps: (2) error applying x.AdminChangeReplicas(ctx, /Table/50/"244956da", [{ADD_VOTER n2,s2}]) // [n1,s1,r36/1:/Table/{40-50/"ba86b…}]: failed to generate LEARNER_INITIAL snapshot: couldn't find range descriptor
```
Fixes cockroachdb#61322.

This deflakes TestKVNemesisMultiNode. The test had recently gotten more
flaky, which I bisected back to 3fe1992. It appears that the changes in
that commit made it more likely to hit this `replica not found in RangeDescriptor`,
which is returned after a TransferLease request has acquired latches, in
addition to the existing `unable to find store \d+ in range` error, which
is returned before a TransferLease request has acquired latches.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten merged commit b8cb256 into cockroachdb:release-21.1 Apr 9, 2021
@nvanbenschoten nvanbenschoten deleted the backport21.1-62580 branch April 19, 2021 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants