-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: don't crash on new seqno error from rocks ingest #36688
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in #36679. If that looks reasonable to you, I'd want to include that scenario in a comment here.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajkr and @petermattis)
@ajkr and @bdarnell How do you feel about this approach vs a sentinel file that indicates an external SST has already been ingested? The bulk IO folks are asking for someone on Core to take over this change as they want to focus on testing. This is high priority as we need to get some bandaid into the 19.1 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like this approach better than the sentinel file, since the sentinel file introduces additional cleanup concerns. The main thing that worried me about the error matching is that it might result in hiding true errors, but on further thought that's not really a problem because we fall back to copying, and if the file is really the problem we'll fail the second time and won't hide the error.
Just add a comment about the scenario in which this matters and the fact that it's safe to err on the side of swallowing the error because the second ingestion attempt will catch it if it's a real error.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajkr)
I would've leaned slightly towards sentinel file to avoid the string matching dependency between here and rocksdb. Given fixing the root cause appears to require major changes in RocksDB (evicting files deleted by compaction and tracking unique ID in manifest), I don't think it'll be fixed anytime soon. So this band-aid may last a while, and rocksdb can change their error messages as they wish. |
Good point. Let me be more specific then: I prefer the error matching fix for 19.1, with a more robust solution to come in 19.2. |
Or you could attempt the copy approach more broadly, like on any |
If rocks has already compacted our file away, the link count might not be >1 but it could still reject repeated ingestion. We can just fall back to the copy, and any real error will be surfaced when we try to ingest it. Release note: none.
I updated the comment and commit message -- ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood.
Reviewed 1 of 1 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained
LGTM |
bors r+ |
Build failed (retrying...) |
Build failed |
bors r+ |
36688: storage: don't crash on new seqno error from rocks ingest r=dt a=dt storage: don't crash on new seqno error from rocks ingest If rocks has already compacted our file away, the link count might not be >1 but it could still reject repeated ingestion. We can just fall back to the copy, and any real error will be surfaced when we try to ingest it. Release note: none. Co-authored-by: David Taylor <[email protected]>
Build succeeded |
storage: don't crash on new seqno error from rocks ingest
If rocks has already compacted our file away, the link count might not
be >1 but it could still reject repeated ingestion. We can just fall
back to the copy, and any real error will be surfaced when we try to
ingest it.
Release note: none.