Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-2.1: storage: preserve consistency when applying widening preemptive snapshots #29694

Merged
merged 1 commit into from
Sep 7, 2018

Conversation

benesch
Copy link
Contributor

@benesch benesch commented Sep 7, 2018

Backport 1/1 commits from #29677.

/cc @cockroachdb/release


Merges can cause preemptive snapshots that widen existing replicas. For
example, consider the following sequence of events:

  1. A replica of range A is removed from store S, but is not garbage
    collected.
  2. Range A subsumes its right neighbor B.
  3. Range A is re-added to store S.

In step 3, S will receive a preemptive snapshot for A that requires
widening its existing replica, thanks to the intervening merge.

Problematically, the code to check whether this widening was possible,
in Store.canApplySnapshotLocked, was incorrectly mutating the range
descriptor in the snapshot header! Applying the snapshot would then fail
to clear all of the data from the old incarnation of the replica, since
the bounds on the range deletion tombstone were wrong. This often
resulted in replica inconsistency. Plus, the in-memory copy of the range
descriptor would be incorrect until the next descriptor update--though
this usually happened quickly, as the replica would apply the change
replicas command, which updates the descriptor, soon after applying the
preemptive snapshot.

To fix the problem, teach Store.canApplySnapshotLocked to make a copy of
the range descriptor before it mutates it.

To prevent regressions, add an assertion that a range's start key is
never changed to the descriptor update path. With this assertion in
place, but without the fix itself,
TestStoreRangeMergeReadoptedLHSFollower reliably fails.

Fixes #29252.

Release note: None

…hots

Merges can cause preemptive snapshots that widen existing replicas. For
example, consider the following sequence of events:

    1. A replica of range A is removed from store S, but is not garbage
       collected.
    2. Range A subsumes its right neighbor B.
    3. Range A is re-added to store S.

In step 3, S will receive a preemptive snapshot for A that requires
widening its existing replica, thanks to the intervening merge.

Problematically, the code to check whether this widening was possible,
in Store.canApplySnapshotLocked, was incorrectly mutating the range
descriptor in the snapshot header! Applying the snapshot would then fail
to clear all of the data from the old incarnation of the replica, since
the bounds on the range deletion tombstone were wrong. This often
resulted in replica inconsistency. Plus, the in-memory copy of the range
descriptor would be incorrect until the next descriptor update--though
this usually happened quickly, as the replica would apply the change
replicas command, which updates the descriptor, soon after applying the
preemptive snapshot.

To fix the problem, teach Store.canApplySnapshotLocked to make a copy of
the range descriptor before it mutates it.

To prevent regressions, add an assertion that a range's start key is
never changed to the descriptor update path. With this assertion in
place, but without the fix itself,
TestStoreRangeMergeReadoptedLHSFollower reliably fails.

Fixes cockroachdb#29252.

Release note: None
@benesch benesch requested review from nvanbenschoten and a team September 7, 2018 04:05
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member

tbg commented Sep 7, 2018

LGTM

@benesch benesch merged commit c4c8b5e into cockroachdb:release-2.1 Sep 7, 2018
@benesch benesch deleted the backport2.1-29677 branch September 16, 2018 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants