-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: test flakes with cannot remove learner while snapshot is in flight #98672
Comments
I'm looking through the code for this, and the method
There is a risk that if there was a concurrent descriptor change, we shouldn’t proceed with our removal of learners. There is a function That said, this code is a little bit of a mess, with many repeated calls to the same checks. An @kvoli we should probably do a walkthrough of this code at some point for 23.2 and figure out how to best clean this up. |
#99099 is now in the bors queue. I will leave this issue alone, but we could entertain rolling back some of the ad-hoc mitigations that have occurred so far before closing. |
Going to close this out, we've implemented the changes we wanted. |
There are multiple tests flaking due to this error:
The error is created when an admin command runs into learners, attempts to remove them and discovers there is a snapshot in flight to the learner.
This error commonly occurs when there are concurrent range modification commands being issued, such as by the replicate queue, store rebalancer and possibly the test itself.
This issue is to track these test flakes and determine a solution which will make tests more robust to this problem; either by updating the code itself or tests.
Flakes:
Jira issue: CRDB-25477
The text was updated successfully, but these errors were encountered: