kv: log truncation constraints protecting snapshots subverted by lease transfer #81978
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
In a customer issue, we saw a log truncation race with an INITIAL snapshot, causing the log to be truncated above the snapshot index and creating the need for a second (recovery) snapshot. This later combined with #81561 to introduce unavailability.
We believe this had to do with the log truncation constraints that protect snapshots being subverted by lease movement. Concretely, the sender of the INITIAL snapshot (the "old" leaseholder) was different from the differentiator of the log truncations (the "new" leaseholder). The snapshot protection is only local on the snapshot sender, so it was ineffective once the lease moved.
From @nvanbenschoten:
From @tbg:
Jira issue: CRDB-16155
Epic CRDB-19227
The text was updated successfully, but these errors were encountered: