-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: flake in TestMultiRangeScanReverseScanInconsistent #91856
Comments
91840: server: remove unused apiv2 server field r=knz a=dhartunian Epic: None Resolves: #91829 Release note: None 91857: skip flaky tests r=andrewbaptist a=knz Informs #91856 Informs #91858 Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
I can still hit this on master (
|
I'm going to need some help on this from the storage team. I added a ton of debugging statements through the stack to try and figure out why my writes are getting "lost" and created a more straightforward test to reproduce this https://github.com/andrewbaptist/cockroach/tree/23.03.15-pebble_invalid From what I can tell we are putting the key correctly and see this log:
However, it fails with
Note that the scan time is greater than the put time, however, it never even gets the item back from pebble. This feels like a pebble bug, but I haven't created a non-KV test to reproduce this. Fortunately, the test reproduces the issue quickly. I haven't bisected to figure out when this problem started. |
@andrewbaptist nice job getting to a minimal reproduction. It's interesting that we see this flake when issuing an |
I verified changing the know does fix the issue. I don't fully understand what that knob does or why we would need it in tests. This seems like a real linearizability violation from the perspective that a write is missed in a future scan (at a later time with a later timestamp). Maybe we can chat more on this tomorrow. Also, note that the Put is called before the later scan, and we are not ever going directly through the engine, these are all |
A read request at READ_UNCOMMITTED or INCONSISTENT levels are not guaranteed to return the latest data. This is especially true after aysnc raft where a put operation on the leaseholder returns after the data is appended to raft but not yet applied to the state machine. This change adds retries to the scan to ensure that the data does get there eventually and the scan stil returns the correct results. Epic: none Fixes: cockroachdb#91856
Noticed the storage tag—Is there something for storage to look into here, or do we know what's going on? |
No - sorry - I will remove it - I thought it was a storage bug first when I didn't realize the difference between raft append and log application. |
A read request at READ_UNCOMMITTED or INCONSISTENT levels are not guaranteed to return the latest data. This is especially true after aysnc raft where a put operation on the leaseholder returns after the data is appended to raft but not yet applied to the state machine. This change adds retries to the scan to ensure that the data does get there eventually and the scan stil returns the correct results. Epic: none Fixes: cockroachdb#91856
97845: sql: add RevertSpans and RevertSpansFanout r=msbutler a=stevendanna This PR adds RevertSpans and RevertSpansFanout. RevertSpans de-duplicates some code between import rollback and streaming cutover. Along the way, it fixes a small bug that existed in RevertTables: Only a single RevertRangeRequest is permitted in a batch since RevertRangeRequest has the isAlone flag set. As a result, a future caller of RevertTables would have encountered a fatal error from KV. RevertSpansFanout uses DistSQL's PartitionSpans to manually fan out multiple RevertRange requests. Since all users of revert range request currently set a limit on the number of keys touched, dist sender doesn't fanout such request. Epic: none Release note: None 98775: kvserver: add test retries for INCONSISTENT scan r=andrewbaptist a=andrewbaptist A read request at READ_UNCOMMITTED or INCONSISTENT levels are not guaranteed to return the latest data. This is especially true after aysnc raft where a put operation on the leaseholder returns after the data is appended to raft but not yet applied to the state machine. This change adds retries to the scan to ensure that the data does get there eventually and the scan stil returns the correct results. Epic: none Fixes: #91856 Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Andrew Baptist <[email protected]>
Also found here: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests_BazelUnitTests/7504394?buildTab=overview&showRootCauses=false&expandBuildProblemsSection=true&expandBuildTestsSection=true&expandBuildChangesSection=true&expandBuildDeploymentsSection=true#%2Ftmp
cc @nvanbenschoten for triage
Jira issue: CRDB-21459
The text was updated successfully, but these errors were encountered: