-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: permit merge transactions to refresh after subsumption #61324
kv: permit merge transactions to refresh after subsumption #61324
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
pkg/kv/kvserver/client_merge_test.go, line 1102 at r1 (raw file):
// transaction's timestamp is bumped, it is able to refresh even after it has // entered the critical phase of the merge and subsumed the RHS. func TestStoreRangeMergeTxnRefresh(t *testing.T) {
don't you want to do a lease transfer in this test? Otherwise it works on master too, doesn't it?
pkg/kv/kvserver/replica.go, line 1419 at r1 (raw file):
// entered its critical phase is a Refresh request, but only one issued by // the active range merge transaction itself, targetting the RHS's local // range descriptor. This is necessary to allow the merge transaction to
s/targetting/targeting
lint says
pkg/kv/kvserver/replica_command.go, line 570 at r1 (raw file):
// range merge transaction refreshing its reads while the RHS range is // subsumed, observe the commit timestamp to force a client-side retry. // TODO(nvanbenschoten): remove this in v21.2.
nit: no need for TODO, it's implied by the version check
pkg/kv/kvserver/replica_command.go, line 571 at r1 (raw file):
// subsumed, observe the commit timestamp to force a client-side retry. // TODO(nvanbenschoten): remove this in v21.2. if !r.ClusterSettings().Version.IsActive(ctx, clusterversion.PriorReadSummaries) {
shouldn't we use a new cluster version? is this one related enough?
15438f0
to
c561e1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @andreimatei)
pkg/kv/kvserver/client_merge_test.go, line 1102 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
don't you want to do a lease transfer in this test? Otherwise it works on master too, doesn't it?
I considered it, but now that we're letting the txn refresh whenever it wants, it's always hitting the post-subsumed state because it refreshes immediately before its EndTxn, so it doesn't really matter how the range enters its critical phase. So I don't feel a strong need to add the lease transfer case, since it wouldn't actually change anything from the perspective of the txn.
Also, kvnemesis will easily create that situation, so there is test coverage.
pkg/kv/kvserver/replica.go, line 1419 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
s/targetting/targeting
lint says
Done.
pkg/kv/kvserver/replica_command.go, line 570 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: no need for TODO, it's implied by the version check
Done.
pkg/kv/kvserver/replica_command.go, line 571 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
shouldn't we use a new cluster version? is this one related enough?
Discussed offline - neither of us care much. This version was added last week, so it's new enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes cockroachdb#59308. This commit adds support for range merge transactions to refresh. This is necessary to allow the merge transaction to have its write timestamp be bumped and still commit without retrying. In such cases, the transaction must refresh its reads, including its original read on the RHS's local range descriptor. If we were to block this refresh on the frozen RHS range, the merge would deadlock. On the surface, it seems unsafe to permit Refresh requests on an already subsumed RHS range, because the refresh's effect on the timestamp cache will never make it to the LHS leaseholder. This risks the future joint range serving a write that invalidates the Refresh. However, in this specific situation, we can be sure that such a serializability violation will not occur because the Range merge also writes to (deletes) this key. This means that if the Range merge transaction commits, its intent on the key will be resolved to the timestamp of the refresh and no future write will ever be able to violate the refresh. Conversely, if the Range merge transaction does not commit, then the merge will fail and the update to the RHS's timestamp cache will not be lost (not that this particularly matters in cases of aborted transactions). The same line of reasoning as the one above has motivated us to explore removing keys from a transaction's refresh spans when they are written to by the transaction, as the intents written by the transaction act as a form of pessimistic lock that obviate the need for the optimistic refresh. Such an improvement would eliminate the need for this special case, but until we generalize the mechanism to prune refresh spans based on intent spans, we're forced to live with this. See cockroachdb#59308 (comment) for why the original fix, which attempted to manually refresh the range merge transaction before it entered its critical phase, was not sufficient. Release justification: needed for new functionality.
c561e1e
to
df1c620
Compare
bors r+ |
Build failed (retrying...): |
Build succeeded: |
Fixes #59308.
This commit adds support for range merge transactions to refresh. This
is necessary to allow the merge transaction to have its write timestamp
be bumped and still commit without retrying. In such cases, the
transaction must refresh its reads, including its original read on the
RHS's local range descriptor. If we were to block this refresh on the
frozen RHS range, the merge would deadlock.
On the surface, it seems unsafe to permit Refresh requests on an already
subsumed RHS range, because the refresh's effect on the timestamp cache
will never make it to the LHS leaseholder. This risks the future joint
range serving a write that invalidates the Refresh. However, in this
specific situation, we can be sure that such a serializability violation
will not occur because the Range merge also writes to (deletes) this
key. This means that if the Range merge transaction commits, its intent
on the key will be resolved to the timestamp of the refresh and no
future write will ever be able to violate the refresh. Conversely, if the
Range merge transaction does not commit, then the merge will fail and
the update to the RHS's timestamp cache will not be lost (not that this
particularly matters in cases of aborted transactions).
The same line of reasoning as the one above has motivated us to explore
removing keys from a transaction's refresh spans when they are written
to by the transaction, as the intents written by the transaction act as
a form of pessimistic lock that obviate the need for the optimistic
refresh. Such an improvement would eliminate the need for this special
case, but until we generalize the mechanism to prune refresh spans based
on intent spans, we're forced to live with this.
See #59308 (comment)
for why the original fix, which attempted to manually refresh the range
merge transaction before it entered its critical phase, was not sufficient.
Release justification: needed for new functionality.