Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: log truncation constraints protecting snapshots subverted by lease transfer #81978

Open
nvanbenschoten opened this issue May 27, 2022 · 2 comments
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented May 27, 2022

In a customer issue, we saw a log truncation race with an INITIAL snapshot, causing the log to be truncated above the snapshot index and creating the need for a second (recovery) snapshot. This later combined with #81561 to introduce unavailability.

We believe this had to do with the log truncation constraints that protect snapshots being subverted by lease movement. Concretely, the sender of the INITIAL snapshot (the "old" leaseholder) was different from the differentiator of the log truncations (the "new" leaseholder). The snapshot protection is only local on the snapshot sender, so it was ineffective once the lease moved.

From @nvanbenschoten:

One theory is that this logic in the raftLogQueue is incorrect, and that it should be if progress.State != tracker.StateReplicate { so that snapshots to active followers also disable log truncation. Or maybe we don't want snapshots to hold up log truncation, but then is this kind of race expected? Concretely, is the following race expected:

  1. r1 begins adding r3 to a range by adding it as a LEARNER and sending it an INITIAL snapshot
  2. the snapshot is delayed (in this case, by 18 minutes)
  3. r1 transfers the lease and raft leadership to r2 sometime before the INITIAL snapshot completes
  4. r2's raftLogQueue detects r3 as a learner in StateSnapshot, without a snapshotLogTruncationConstraints
  5. r2 truncates above r3's INITIAL snapshot index
  6. r3's snapshot is useless, immediately needs another snapshot, hopefully we don't give it the lease ...

From @tbg:

The race is basically expected, since we protect snapshots using the in-memory map here, which only knows about snapshots from the local store:

// A map of raft log index of pending snapshots to deadlines.
// Used to prohibit raft log truncations that would leave a gap between
// the snapshot and the new first index. The map entry has a zero
// deadline while the snapshot is being sent and turns nonzero when the
// snapshot has completed, preventing truncation for a grace period
// (since there is a race between the snapshot completing and its being
// reflected in the raft status used to make truncation decisions).
//
// NB: If we kept only one value, we could end up in situations in which
// we're either giving some snapshots no grace period, or keep an
// already finished snapshot "pending" for extended periods of time
// (preventing log truncation).
snapshotLogTruncationConstraints map[uuid.UUID]snapTruncationInfo

But this seems unnecessarily complicated; with the change you're proposing we would robustly protect all snapshots simply by not truncating until the snapshots have been satisfied.

Jira issue: CRDB-16155

Epic CRDB-19227

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv Anything in KV that doesn't belong in a more specific category. O-postmortem Originated from a Postmortem action item. N-followup Needs followup. labels May 27, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label May 27, 2022
@nvanbenschoten
Copy link
Member Author

But this seems unnecessarily complicated; with the change you're proposing we would robustly protect all snapshots simply by not truncating until the snapshots have been satisfied.

I'm trying to understand why we didn't do this in the first place, as it seems like the path of least resistance. The current protection against log truncation invalidating a snapshot was introduced in b64bdc9. This was during the era of preemptive snapshots. These snapshots occurred before the target replica was added to the Raft group, so they would not be tracked in the Raft progress map. @tbg is this why we opted for something more general?

@tbg
Copy link
Member

tbg commented Jun 2, 2022

@tbg is this why we opted for something more general?

That's the only explanation I have as well.

@exalate-issue-sync exalate-issue-sync bot removed O-postmortem Originated from a Postmortem action item. N-followup Needs followup. labels Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants