-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: sync lease transfers and range merges with closed timestamp side-transport #61221
kv: sync lease transfers and range merges with closed timestamp side-transport #61221
Conversation
3fa7470
to
31f9993
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was made possible by #59086 and was suggested by @tbg during the code review.
Heh, I don't think I quite suggested this, but that said I like this change a lot.
Reviewed 6 of 6 files at r1, 6 of 6 files at r2, 14 of 14 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvserver/replica.go, line 1358 at r1 (raw file):
// // It is very important that this check occur after we have acquired latches // from the spanlatch manager. Only after we release these latches are we
is "release" right here? I'd think that once you release your latches again you might be racing.
pkg/kv/kvserver/replica_range_lease.go, line 824 at r3 (raw file):
// timestamps during evaluation and communicates this back up using the // LocalResult. r.mu.minLeaseProposedTS = now
❤️
pkg/kv/kvserver/batcheval/cmd_lease_transfer.go, line 95 at r3 (raw file):
// revocation takes place regardless of whether the corresponding Raft // proposal succeeds, fails, or is ambiguous - in which case there's no // guarantee that the transfer will not still apply.
Could you say a few words (or drop a reference to a comment) about what happens in this case in which this proposal will fail? We'll have relinquished the current lease but not managed to give the lease to someone else. What happens next, to recover from this situation?
pkg/kv/kvserver/batcheval/eval_context.go, line 110 at r2 (raw file):
// closed timestamp is already incorporated into the read summary, but some // callers also need is separated out. It is expected that a caller will // have performed some action (either calling ... or ...) to freeze further
... ...?
pkg/kv/kvserver/batcheval/eval_context.go, line 111 at r2 (raw file):
// callers also need is separated out. It is expected that a caller will // have performed some action (either calling ... or ...) to freeze further // progression of the closed timestamp before callign this method.
calling
…umed ranges In b192bba, we began allowing historical reads up to the freeze start time on subsumed ranges. This turned out not to be quite the right idea, because we can now ship the timestamp cache to the LHS and be more optimal about the resulting timestamp cache on the joint range. However, since we started allowing reads up to the freeze time, we were effectively closing this time for all future writes on the joint range, so we couldn't take advantage of the new ability to ship the timestamp cache around. But the change was very well intentioned and revealed that we should have no problem allowing reads below the closed timestamp on subsumed ranges. We will add support for this in the near future, likely once the new closed timestamp system is phased in.
These are doing roughly the same thing, so merge the methods.
Needed for cockroachdb#61137. This commit updates the manner through which lease transfers (through `LeaseTransferRequest`) and range merges (through `SubsumeRequest`) handle the "transfer of power" from their outgoing leaseholder to their incoming leaseholder. Specifically, it updates the handling of these requests in order to rationalize the interaction between their evaluation and the closing of timestamps through the closed timestamp side-transport. It is now clearer when and how these requests instruct the outgoing leaseholder to relinquish its ability to advance the closed timestamp and, as a result, now possible for the requests to query and operate on the maximum closed timestamp published by the outgoing leaseholder. For lease transfers, this commit begins by addressing an existing TODO to push the revocation of the outgoing lease out of `AdminTransferLease` and into the evaluation of `LeaseTransferRequest` through a new `RevokeLease` method on the `EvalContext`. Once a lease is revoked, the side-transport will no longer be able to advance the closed timestamp under it. This was made possible by cockroachdb#59086 and was suggested by @tbg during the code review. We generally like to keep replica state changes out of "admin" requests themselves, which are intended to coordinate changes through individual non-admin requests. Admin requests generally don't even need to evaluate on a leaseholder (though they try to), so having them perform any state changes is fragile. For range merges, this commit removes the `MaybeWatchForMerge` flag from the `LocalResult` returned by `SubsumeRequest` and replaces it with a `WatchForMerge` method on the `EvalContext`. This allows the `SubsumeRequest` to launch the range merge watcher goroutine during it evaluation, which the side-transport checks for to see whether a leaseholder can advance its closed timestamp. In doing so, the `SubsumeRequest` is able to pause closed timestamps when it wants and is also able to observe and return the maximum closed timestamp _after_ the closed timestamp has stopped advancing. This is a stark improvement over the approach with the original closed timestamp system, which required a herculean effort in cockroachdb#50265 to make correct. With these refactors complete, the closed timestamp side-transport should have a much easier and safer time checking whether a given leaseholder is able to advance its closed timestamp. Release justification: Necessary for the safety of new functionality.
31f9993
to
9c3ed73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/kv/kvserver/replica.go, line 1358 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
is "release" right here? I'd think that once you release your latches again you might be racing.
Yeah, I think it was meant to say "acquire". It was "exit the command queue" before eb38f20#diff-271aa5bd8272fa39bd683ecbf27f1906bafca61dc350ff229c6627fbde697fbaL2522, and I misinterpreted the translation to mean "release latches" instead of "exit the latch manager after finishing acquiring latches". Done.
pkg/kv/kvserver/batcheval/cmd_lease_transfer.go, line 95 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Could you say a few words (or drop a reference to a comment) about what happens in this case in which this proposal will fail? We'll have relinquished the current lease but not managed to give the lease to someone else. What happens next, to recover from this situation?
Done.
pkg/kv/kvserver/batcheval/eval_context.go, line 110 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
... ...?
That wasn't clear enough for you
pkg/kv/kvserver/batcheval/eval_context.go, line 111 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
calling
Done.
bors r+ |
Build succeeded: |
Needed for the safety of #61137.
This commit updates the manner through which lease transfers (through
LeaseTransferRequest
) and range merges (throughSubsumeRequest
) handle the "transfer of power" from their outgoing leaseholder to their incoming leaseholder. Specifically, it updates the handling of these requests in order to rationalize the interaction between their evaluation and the closing of timestamps through the closed timestamp side-transport. It is now clearer when and how these requests instruct the outgoing leaseholder to relinquish its ability to advance the closed timestamp and, as a result, now possible for the requests to query and operate on the maximum closed timestamp published by the outgoing leaseholder.For lease transfers, this commit begins by addressing an existing TODO to push the revocation of the outgoing lease out of
AdminTransferLease
and into the evaluation ofLeaseTransferRequest
through a newRevokeLease
method on theEvalContext
. Once a lease is revoked, the side-transport will no longer be able to advance the closed timestamp under it. This was made possible by #59086 and was suggested by @tbg during the code review. We generally like to keep replica state changes out of "admin" requests themselves, which are intended to coordinate changes through individual non-admin requests. Admin requests generally don't even need to evaluate on a leaseholder (though they try to), so having them perform any state changes is fragile.For range merges, this commit removes the
MaybeWatchForMerge
flag from theLocalResult
returned bySubsumeRequest
and replaces it with aWatchForMerge
method on theEvalContext
. This allows theSubsumeRequest
to launch the range merge watcher goroutine during it evaluation, which the side-transport checks for to see whether a leaseholder can advance its closed timestamp. In doing so, theSubsumeRequest
is able to pause closed timestamps when it wants and is also able to observe and return the maximum closed timestamp after the closed timestamp has stopped advancing. This is a stark improvement over the approach with the original closed timestamp system, which required a herculean effort in #50265 to make correct.With these refactors complete, the closed timestamp side-transport should have a much easier and safer time checking whether a given leaseholder is able to advance its closed timestamp.
Release justification: Necessary for the safety of new functionality.