Skip to content

Commit c8afea2

Browse files
beneschandy-kimball
authored andcommitted
storage: preserve consistency when applying widening preemptive snapshots
Merges can cause preemptive snapshots that widen existing replicas. For example, consider the following sequence of events: 1. A replica of range A is removed from store S, but is not garbage collected. 2. Range A subsumes its right neighbor B. 3. Range A is re-added to store S. In step 3, S will receive a preemptive snapshot for A that requires widening its existing replica, thanks to the intervening merge. Problematically, the code to check whether this widening was possible, in Store.canApplySnapshotLocked, was incorrectly mutating the range descriptor in the snapshot header! Applying the snapshot would then fail to clear all of the data from the old incarnation of the replica, since the bounds on the range deletion tombstone were wrong. This often resulted in replica inconsistency. Plus, the in-memory copy of the range descriptor would be incorrect until the next descriptor update--though this usually happened quickly, as the replica would apply the change replicas command, which updates the descriptor, soon after applying the preemptive snapshot. To fix the problem, teach Store.canApplySnapshotLocked to make a copy of the range descriptor before it mutates it. To prevent regressions, add an assertion that a range's start key is never changed to the descriptor update path. With this assertion in place, but without the fix itself, TestStoreRangeMergeReadoptedLHSFollower reliably fails. Fixes cockroachdb#29252. Release note: None
1 parent 96718b6 commit c8afea2

File tree

3 files changed

+11
-3
lines changed

3 files changed

+11
-3
lines changed

pkg/storage/replica.go

+5
Original file line numberDiff line numberDiff line change
@@ -1717,6 +1717,11 @@ func (r *Replica) setDescWithoutProcessUpdate(desc *roachpb.RangeDescriptor) {
17171717
log.Fatalf(ctx, "cannot replace initialized descriptor with uninitialized one: %+v -> %+v",
17181718
r.mu.state.Desc, desc)
17191719
}
1720+
if r.mu.state.Desc != nil && !r.mu.state.Desc.StartKey.Equal(desc.StartKey) {
1721+
ctx := r.AnnotateCtx(context.TODO())
1722+
log.Fatalf(ctx, "attempted to change replica's start key from %s to %s",
1723+
r.mu.state.Desc.StartKey, desc.StartKey)
1724+
}
17201725

17211726
newMaxID := maxReplicaID(desc)
17221727
if newMaxID > r.mu.lastReplicaAdded {

pkg/storage/replica_raftstorage.go

+3
Original file line numberDiff line numberDiff line change
@@ -945,6 +945,9 @@ func (r *Replica) applySnapshot(
945945
// Update the range and store stats.
946946
r.store.metrics.subtractMVCCStats(*r.mu.state.Stats)
947947
r.store.metrics.addMVCCStats(*s.Stats)
948+
// TODO(benesch): the next line updates r.mu.state.Desc, but that's supposed
949+
// to be handled by the call to setDescWithoutProcessUpdate below. This is not
950+
// a correctness issue right now, but it's liable to become one.
948951
r.mu.state = s
949952
r.assertStateLocked(ctx, r.store.Engine())
950953
r.mu.Unlock()

pkg/storage/store_snapshot.go

+3-3
Original file line numberDiff line numberDiff line change
@@ -394,7 +394,7 @@ func (s *Store) canApplySnapshot(
394394
func (s *Store) canApplySnapshotLocked(
395395
ctx context.Context, snapHeader *SnapshotRequest_Header,
396396
) (*ReplicaPlaceholder, error) {
397-
desc := snapHeader.State.Desc
397+
desc := *snapHeader.State.Desc
398398
if v, ok := s.mu.replicas.Load(int64(desc.RangeID)); ok && (*Replica)(v).IsInitialized() {
399399
// We have an initialized replica. Preemptive snapshots can be applied with
400400
// no further checks if they do not widen the existing replica. Raft
@@ -420,7 +420,7 @@ func (s *Store) canApplySnapshotLocked(
420420

421421
// TODO(benesch): consider discovering and GC'ing *all* overlapping ranges,
422422
// not just the first one that getOverlappingKeyRangeLocked happens to return.
423-
if exRange := s.getOverlappingKeyRangeLocked(desc); exRange != nil {
423+
if exRange := s.getOverlappingKeyRangeLocked(&desc); exRange != nil {
424424
// We have a conflicting range, so we must block the snapshot.
425425
// When such a conflict exists, it will be resolved by one range
426426
// either being split or garbage collected.
@@ -454,7 +454,7 @@ func (s *Store) canApplySnapshotLocked(
454454
}
455455

456456
placeholder := &ReplicaPlaceholder{
457-
rangeDesc: *desc,
457+
rangeDesc: desc,
458458
}
459459
return placeholder, nil
460460
}

0 commit comments

Comments
 (0)