Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: sync lease transfers and range merges with closed timestamp side-transport #61221

Merged

Conversation

nvanbenschoten
Copy link
Member

Needed for the safety of #61137.

This commit updates the manner through which lease transfers (through LeaseTransferRequest) and range merges (through SubsumeRequest) handle the "transfer of power" from their outgoing leaseholder to their incoming leaseholder. Specifically, it updates the handling of these requests in order to rationalize the interaction between their evaluation and the closing of timestamps through the closed timestamp side-transport. It is now clearer when and how these requests instruct the outgoing leaseholder to relinquish its ability to advance the closed timestamp and, as a result, now possible for the requests to query and operate on the maximum closed timestamp published by the outgoing leaseholder.

For lease transfers, this commit begins by addressing an existing TODO to push the revocation of the outgoing lease out of AdminTransferLease and into the evaluation of LeaseTransferRequest through a new RevokeLease method on the EvalContext. Once a lease is revoked, the side-transport will no longer be able to advance the closed timestamp under it. This was made possible by #59086 and was suggested by @tbg during the code review. We generally like to keep replica state changes out of "admin" requests themselves, which are intended to coordinate changes through individual non-admin requests. Admin requests generally don't even need to evaluate on a leaseholder (though they try to), so having them perform any state changes is fragile.

For range merges, this commit removes the MaybeWatchForMerge flag from the LocalResult returned by SubsumeRequest and replaces it with a WatchForMerge method on the EvalContext. This allows the SubsumeRequest to launch the range merge watcher goroutine during it evaluation, which the side-transport checks for to see whether a leaseholder can advance its closed timestamp. In doing so, the SubsumeRequest is able to pause closed timestamps when it wants and is also able to observe and return the maximum closed timestamp after the closed timestamp has stopped advancing. This is a stark improvement over the approach with the original closed timestamp system, which required a herculean effort in #50265 to make correct.

With these refactors complete, the closed timestamp side-transport should have a much easier and safer time checking whether a given leaseholder is able to advance its closed timestamp.

Release justification: Necessary for the safety of new functionality.

@nvanbenschoten nvanbenschoten requested review from andreimatei, tbg and a team February 27, 2021 04:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/freezeSubsume branch 2 times, most recently from 3fa7470 to 31f9993 Compare February 27, 2021 18:26
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was made possible by #59086 and was suggested by @tbg during the code review.

Heh, I don't think I quite suggested this, but that said I like this change a lot.

Reviewed 6 of 6 files at r1, 6 of 6 files at r2, 14 of 14 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/kv/kvserver/replica.go, line 1358 at r1 (raw file):

	//
	// It is very important that this check occur after we have acquired latches
	// from the spanlatch manager. Only after we release these latches are we

is "release" right here? I'd think that once you release your latches again you might be racing.


pkg/kv/kvserver/replica_range_lease.go, line 824 at r3 (raw file):

		// timestamps during evaluation and communicates this back up using the
		// LocalResult.
		r.mu.minLeaseProposedTS = now

❤️


pkg/kv/kvserver/batcheval/cmd_lease_transfer.go, line 95 at r3 (raw file):

	// revocation takes place regardless of whether the corresponding Raft
	// proposal succeeds, fails, or is ambiguous - in which case there's no
	// guarantee that the transfer will not still apply.

Could you say a few words (or drop a reference to a comment) about what happens in this case in which this proposal will fail? We'll have relinquished the current lease but not managed to give the lease to someone else. What happens next, to recover from this situation?


pkg/kv/kvserver/batcheval/eval_context.go, line 110 at r2 (raw file):

	// closed timestamp is already incorporated into the read summary, but some
	// callers also need is separated out. It is expected that a caller will
	// have performed some action (either calling ... or ...) to freeze further

... ...?


pkg/kv/kvserver/batcheval/eval_context.go, line 111 at r2 (raw file):

	// callers also need is separated out. It is expected that a caller will
	// have performed some action (either calling ... or ...) to freeze further
	// progression of the closed timestamp before callign this method.

calling

…umed ranges

In b192bba, we began allowing historical reads up to the freeze start time on
subsumed ranges. This turned out not to be quite the right idea, because we can
now ship the timestamp cache to the LHS and be more optimal about the resulting
timestamp cache on the joint range. However, since we started allowing reads up
to the freeze time, we were effectively closing this time for all future writes
on the joint range, so we couldn't take advantage of the new ability to ship the
timestamp cache around. But the change was very well intentioned and revealed
that we should have no problem allowing reads below the closed timestamp on
subsumed ranges. We will add support for this in the near future, likely once
the new closed timestamp system is phased in.
These are doing roughly the same thing, so merge the methods.
Needed for cockroachdb#61137.

This commit updates the manner through which lease transfers (through
`LeaseTransferRequest`) and range merges (through `SubsumeRequest`)
handle the "transfer of power" from their outgoing leaseholder to their
incoming leaseholder. Specifically, it updates the handling of these
requests in order to rationalize the interaction between their
evaluation and the closing of timestamps through the closed timestamp
side-transport. It is now clearer when and how these requests instruct
the outgoing leaseholder to relinquish its ability to advance the closed
timestamp and, as a result, now possible for the requests to query and
operate on the maximum closed timestamp published by the outgoing leaseholder.

For lease transfers, this commit begins by addressing an existing TODO
to push the revocation of the outgoing lease out of `AdminTransferLease`
and into the evaluation of `LeaseTransferRequest` through a new
`RevokeLease` method on the `EvalContext`. Once a lease is revoked, the
side-transport will no longer be able to advance the closed timestamp
under it. This was made possible by cockroachdb#59086 and was suggested by @tbg
during the code review. We generally like to keep replica state changes
out of "admin" requests themselves, which are intended to coordinate
changes through individual non-admin requests. Admin requests generally
don't even need to evaluate on a leaseholder (though they try to), so
having them perform any state changes is fragile.

For range merges, this commit removes the `MaybeWatchForMerge` flag from
the `LocalResult` returned by `SubsumeRequest` and replaces it with a
`WatchForMerge` method on the `EvalContext`. This allows the
`SubsumeRequest` to launch the range merge watcher goroutine during it
evaluation, which the side-transport checks for to see whether a
leaseholder can advance its closed timestamp. In doing so, the
`SubsumeRequest` is able to pause closed timestamps when it wants and is
also able to observe and return the maximum closed timestamp _after_ the
closed timestamp has stopped advancing. This is a stark improvement over
the approach with the original closed timestamp system, which required a
herculean effort in cockroachdb#50265 to make correct.

With these refactors complete, the closed timestamp side-transport
should have a much easier and safer time checking whether a given
leaseholder is able to advance its closed timestamp.

Release justification: Necessary for the safety of new functionality.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/freezeSubsume branch from 31f9993 to 9c3ed73 Compare March 1, 2021 16:43
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/kv/kvserver/replica.go, line 1358 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

is "release" right here? I'd think that once you release your latches again you might be racing.

Yeah, I think it was meant to say "acquire". It was "exit the command queue" before eb38f20#diff-271aa5bd8272fa39bd683ecbf27f1906bafca61dc350ff229c6627fbde697fbaL2522, and I misinterpreted the translation to mean "release latches" instead of "exit the latch manager after finishing acquiring latches". Done.


pkg/kv/kvserver/batcheval/cmd_lease_transfer.go, line 95 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Could you say a few words (or drop a reference to a comment) about what happens in this case in which this proposal will fail? We'll have relinquished the current lease but not managed to give the lease to someone else. What happens next, to recover from this situation?

Done.


pkg/kv/kvserver/batcheval/eval_context.go, line 110 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

... ...?

That wasn't clear enough for you :trollface:


pkg/kv/kvserver/batcheval/eval_context.go, line 111 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

calling

Done.

@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 1, 2021

Build succeeded:

@craig craig bot merged commit 3fe1992 into cockroachdb:master Mar 1, 2021
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/freezeSubsume branch March 1, 2021 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants