Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: permit merge transactions to refresh after subsumption #61324

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #59308.

This commit adds support for range merge transactions to refresh. This
is necessary to allow the merge transaction to have its write timestamp
be bumped and still commit without retrying. In such cases, the
transaction must refresh its reads, including its original read on the
RHS's local range descriptor. If we were to block this refresh on the
frozen RHS range, the merge would deadlock.

On the surface, it seems unsafe to permit Refresh requests on an already
subsumed RHS range, because the refresh's effect on the timestamp cache
will never make it to the LHS leaseholder. This risks the future joint
range serving a write that invalidates the Refresh. However, in this
specific situation, we can be sure that such a serializability violation
will not occur because the Range merge also writes to (deletes) this
key. This means that if the Range merge transaction commits, its intent
on the key will be resolved to the timestamp of the refresh and no
future write will ever be able to violate the refresh. Conversely, if the
Range merge transaction does not commit, then the merge will fail and
the update to the RHS's timestamp cache will not be lost (not that this
particularly matters in cases of aborted transactions).

The same line of reasoning as the one above has motivated us to explore
removing keys from a transaction's refresh spans when they are written
to by the transaction, as the intents written by the transaction act as
a form of pessimistic lock that obviate the need for the optimistic
refresh. Such an improvement would eliminate the need for this special
case, but until we generalize the mechanism to prune refresh spans based
on intent spans, we're forced to live with this.

See #59308 (comment)
for why the original fix, which attempted to manually refresh the range
merge transaction before it entered its critical phase, was not sufficient.

Release justification: needed for new functionality.

@nvanbenschoten nvanbenschoten requested review from ajwerner and a team March 2, 2021 07:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/kv/kvserver/client_merge_test.go, line 1102 at r1 (raw file):

// transaction's timestamp is bumped, it is able to refresh even after it has
// entered the critical phase of the merge and subsumed the RHS.
func TestStoreRangeMergeTxnRefresh(t *testing.T) {

don't you want to do a lease transfer in this test? Otherwise it works on master too, doesn't it?


pkg/kv/kvserver/replica.go, line 1419 at r1 (raw file):

	// entered its critical phase is a Refresh request, but only one issued by
	// the active range merge transaction itself, targetting the RHS's local
	// range descriptor. This is necessary to allow the merge transaction to

s/targetting/targeting lint says


pkg/kv/kvserver/replica_command.go, line 570 at r1 (raw file):

		// range merge transaction refreshing its reads while the RHS range is
		// subsumed, observe the commit timestamp to force a client-side retry.
		// TODO(nvanbenschoten): remove this in v21.2.

nit: no need for TODO, it's implied by the version check


pkg/kv/kvserver/replica_command.go, line 571 at r1 (raw file):

		// subsumed, observe the commit timestamp to force a client-side retry.
		// TODO(nvanbenschoten): remove this in v21.2.
		if !r.ClusterSettings().Version.IsActive(ctx, clusterversion.PriorReadSummaries) {

shouldn't we use a new cluster version? is this one related enough?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/mergeRefresh2 branch from 15438f0 to c561e1e Compare March 3, 2021 03:28
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @andreimatei)


pkg/kv/kvserver/client_merge_test.go, line 1102 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

don't you want to do a lease transfer in this test? Otherwise it works on master too, doesn't it?

I considered it, but now that we're letting the txn refresh whenever it wants, it's always hitting the post-subsumed state because it refreshes immediately before its EndTxn, so it doesn't really matter how the range enters its critical phase. So I don't feel a strong need to add the lease transfer case, since it wouldn't actually change anything from the perspective of the txn.

Also, kvnemesis will easily create that situation, so there is test coverage.


pkg/kv/kvserver/replica.go, line 1419 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/targetting/targeting lint says

Done.


pkg/kv/kvserver/replica_command.go, line 570 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: no need for TODO, it's implied by the version check

Done.


pkg/kv/kvserver/replica_command.go, line 571 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

shouldn't we use a new cluster version? is this one related enough?

Discussed offline - neither of us care much. This version was added last week, so it's new enough.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Fixes cockroachdb#59308.

This commit adds support for range merge transactions to refresh. This
is necessary to allow the merge transaction to have its write timestamp
be bumped and still commit without retrying. In such cases, the
transaction must refresh its reads, including its original read on the
RHS's local range descriptor. If we were to block this refresh on the
frozen RHS range, the merge would deadlock.

On the surface, it seems unsafe to permit Refresh requests on an already
subsumed RHS range, because the refresh's effect on the timestamp cache
will never make it to the LHS leaseholder. This risks the future joint
range serving a write that invalidates the Refresh. However, in this
specific situation, we can be sure that such a serializability violation
will not occur because the Range merge also writes to (deletes) this
key. This means that if the Range merge transaction commits, its intent
on the key will be resolved to the timestamp of the refresh and no
future write will ever be able to violate the refresh. Conversely, if the
Range merge transaction does not commit, then the merge will fail and
the update to the RHS's timestamp cache will not be lost (not that this
particularly matters in cases of aborted transactions).

The same line of reasoning as the one above has motivated us to explore
removing keys from a transaction's refresh spans when they are written
to by the transaction, as the intents written by the transaction act as
a form of pessimistic lock that obviate the need for the optimistic
refresh. Such an improvement would eliminate the need for this special
case, but until we generalize the mechanism to prune refresh spans based
on intent spans, we're forced to live with this.

See cockroachdb#59308 (comment)
for why the original fix, which attempted to manually refresh the range
merge transaction before it entered its critical phase, was not sufficient.

Release justification: needed for new functionality.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/mergeRefresh2 branch from c561e1e to df1c620 Compare March 3, 2021 04:13
@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 3, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 3, 2021

Build succeeded:

@craig craig bot merged commit 3e3521c into cockroachdb:master Mar 3, 2021
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/mergeRefresh2 branch March 3, 2021 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: allow range merge txn to be pushed, even after lease transfer
4 participants