-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: reduce SysBytes
MVCC stats race during merges
#99017
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This will be racy too, because we can still apply a lease request concurrently with the subsume request, which will affect the in-memory stats. Will think about something better. |
During a range merge, we subsume the RHS and ship its MVCC stats via the merge trigger to add them to the LHS stats. Since the RHS range ID-local keys aren't present in the merged range, the merge trigger computed these and subtracted them from the given stats. However, this could race with a lease request, which ignores latches and writes to the range ID-local keyspace, resulting in incorrect `SysBytes` MVCC stats. This patch instead computes the range ID-local MVCC stats during subsume and sends them via a new `RangeIDLocalMVCCStats` field. This still doesn't guarantee that they're consistent with the RHS's in-memory stats, since the latch-ingnoring lease request can update these independently of the subsume request's engine snapshot. However, it substantially reduces the likelihood of this race. While it would be possible to prevent this race entirely by introducing additional synchronization between lease requests and merge application, this would likely come with significant additional complexity, which doesn't seem worth it just to avoid `SysBytes` being a few bytes wrong. The main fallout is a log message when the consistency checker detects the stats mismatch, and potential test flake. This PR therefore settles for best-effort prevention. Epic: none Release note: None
5bdc1e1
to
2d855d3
Compare
SysBytes
MVCC stats race during mergesSysBytes
MVCC stats race during merges
Decided to live with the race for now, and settle for significantly reducing the odds of it happening. |
Did 50.000 runs for 12 hours overnight that only exercised splits and merges, with no failures. Previously, it would flake within 10-20 minutes. So this seems like a substantial improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I get that this reduces the likelihood of the race, but not sure why. Could you explain a bit why doing so in subsume is better than in merge trigger? Does this reduce the window of time within which the race can happen?
bors r+
Yes, exactly. During subsume evaluation, the race window is between when we acquire the engine read snapshot and when we fetch the in-memory MVCC stats. During the merge trigger, we additionally have to wait for the subsume command to be replicated to and applied on all replicas, and then for the final merge commit request to be evaluated. |
Build succeeded: |
should this be backported to 23.1? |
No, the risk/reward isn't justifiable. We did #99244 instead. |
During a range merge, we subsume the RHS and ship its MVCC stats via the merge trigger to add them to the LHS stats. Since the RHS range ID-local keys aren't present in the merged range, the merge trigger computed these and subtracted them from the given stats. However, this could race with a lease request, which ignores latches and writes to the range ID-local keyspace, resulting in incorrect
SysBytes
MVCC stats.This patch instead computes the range ID-local MVCC stats during subsume and sends them via a new
RangeIDLocalMVCCStats
field. This still doesn't guarantee that they're consistent with the RHS's in-memory stats, since the latch-ignoring lease request can update these independently of the subsume request's engine snapshot. However, it substantially reduces the likelihood of this race.While it would be possible to prevent this race entirely by introducing additional synchronization between lease requests and merge application, this would likely come with significant additional complexity, which doesn't seem worth it just to avoid
SysBytes
being a few bytes wrong. The main fallout is a log message when the consistency checker detects the stats mismatch, and potential test flake. This PR therefore settles for best-effort prevention.Resolves #93896.
Resolves #94876.
Resolves #99010.
Epic: none
Release note: None