Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: reduce SysBytes MVCC stats race during merges #99017

Merged
merged 1 commit into from
Mar 27, 2023

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Mar 20, 2023

During a range merge, we subsume the RHS and ship its MVCC stats via the merge trigger to add them to the LHS stats. Since the RHS range ID-local keys aren't present in the merged range, the merge trigger computed these and subtracted them from the given stats. However, this could race with a lease request, which ignores latches and writes to the range ID-local keyspace, resulting in incorrect SysBytes MVCC stats.

This patch instead computes the range ID-local MVCC stats during subsume and sends them via a new RangeIDLocalMVCCStats field. This still doesn't guarantee that they're consistent with the RHS's in-memory stats, since the latch-ignoring lease request can update these independently of the subsume request's engine snapshot. However, it substantially reduces the likelihood of this race.

While it would be possible to prevent this race entirely by introducing additional synchronization between lease requests and merge application, this would likely come with significant additional complexity, which doesn't seem worth it just to avoid SysBytes being a few bytes wrong. The main fallout is a log message when the consistency checker detects the stats mismatch, and potential test flake. This PR therefore settles for best-effort prevention.

Resolves #93896.
Resolves #94876.
Resolves #99010.

Epic: none
Release note: None

@erikgrinaker erikgrinaker requested a review from a team March 20, 2023 12:03
@erikgrinaker erikgrinaker self-assigned this Mar 20, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 20, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

This will be racy too, because we can still apply a lease request concurrently with the subsume request, which will affect the in-memory stats. Will think about something better.

During a range merge, we subsume the RHS and ship its MVCC stats via the
merge trigger to add them to the LHS stats. Since the RHS range
ID-local keys aren't present in the merged range, the merge trigger
computed these and subtracted them from the given stats. However, this
could race with a lease request, which ignores latches and writes to the
range ID-local keyspace, resulting in incorrect `SysBytes` MVCC stats.

This patch instead computes the range ID-local MVCC stats during subsume
and sends them via a new `RangeIDLocalMVCCStats` field. This still
doesn't guarantee that they're consistent with the RHS's in-memory
stats, since the latch-ingnoring lease request can update these
independently of the subsume request's engine snapshot. However, it
substantially reduces the likelihood of this race.

While it would be possible to prevent this race entirely by introducing
additional synchronization between lease requests and merge application,
this would likely come with significant additional complexity, which
doesn't seem worth it just to avoid `SysBytes` being a few bytes wrong.
The main fallout is a log message when the consistency checker detects
the stats mismatch, and potential test flake. This PR therefore settles
for best-effort prevention.

Epic: none
Release note: None
@erikgrinaker erikgrinaker changed the title kvserver: fix SysBytes MVCC stats race during merges kvserver: reduce SysBytes MVCC stats race during merges Mar 20, 2023
@erikgrinaker
Copy link
Contributor Author

Decided to live with the race for now, and settle for significantly reducing the odds of it happening.

@erikgrinaker erikgrinaker marked this pull request as ready for review March 20, 2023 20:18
@erikgrinaker erikgrinaker requested a review from a team as a code owner March 20, 2023 20:18
@erikgrinaker
Copy link
Contributor Author

Did 50.000 runs for 12 hours overnight that only exercised splits and merges, with no failures. Previously, it would flake within 10-20 minutes. So this seems like a substantial improvement.

@erikgrinaker erikgrinaker requested a review from pav-kv March 22, 2023 09:13
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I get that this reduces the likelihood of the race, but not sure why. Could you explain a bit why doing so in subsume is better than in merge trigger? Does this reduce the window of time within which the race can happen?

@erikgrinaker
Copy link
Contributor Author

bors r+

Could you explain a bit why doing so in subsume is better than in merge trigger? Does this reduce the window of time within which the race can happen?

Yes, exactly. During subsume evaluation, the race window is between when we acquire the engine read snapshot and when we fetch the in-memory MVCC stats. During the merge trigger, we additionally have to wait for the subsume command to be replicated to and applied on all replicas, and then for the final merge commit request to be evaluated.

@craig
Copy link
Contributor

craig bot commented Mar 27, 2023

Build succeeded:

@craig craig bot merged commit 3c3d2a5 into cockroachdb:master Mar 27, 2023
@knz
Copy link
Contributor

knz commented Apr 2, 2023

should this be backported to 23.1?

@erikgrinaker
Copy link
Contributor Author

No, the risk/reward isn't justifiable. We did #99244 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants