-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serializable txn improvements #21140
Serializable txn improvements #21140
Conversation
[DNM] this is a prototype of improvements to serializable txns so that we can allow long-lived transactions and reduce restarts, even with the max safe timestamp changes proposed in #21056. |
I haven't looked at the code, but are you aware that SQL very rarely has pure write-only transactions? |
That’s interesting to know. The point of this isn’t improving write-only txns, though that seemed an easy addition. But your comment makes me realize there’s further work to do with conditional puts here. Need to treat those as reads if the conditional put fails. |
@petermattis actually I think this is correct for conditional puts as is, and won't count those as read spans. So the zero-read-span optimization ought to be able to work with "write-only" sql workloads where we're doing simple primary key constraint checking. Like big numbers of upserts. |
4ae6396
to
1ebd7d8
Compare
This is great; I wish we had thought to do this a long time ago. Not every request that updates the timestamp cache does it in the same way (code), but UpdateRead will update the tscache in the same way no matter what. We need to at least remember when a read span was a part of a DeleteRange request so that it can be updated properly. I'm not sure whether the ResumeSpan work in batch.go is enough for Scan and ReverseScan or if we need to look at MaxSpanRequestKeys too. Reviewed 4 of 29 files at r1, 12 of 12 files at r2. pkg/kv/txn_coord_sender.go, line 1106 at r2 (raw file):
It's probably worthwhile to track a running size in a field of txnMeta. pkg/kv/txn_coord_sender.go, line 1205 at r2 (raw file):
This should be a log.VEventf so it goes in the request trace. Level 4 also seems too high to me (i'd have used 2), but I don't have strong opinions about this. pkg/roachpb/api.go, line 896 at r2 (raw file):
ConditionalPut is the only method for which updatesTSCache and isTxnRead differ. Does it need to update the ts cache? I don't think so, by the same argument as for isTxnRead. In that case we don't need a new flag and can just reuse updatesTSCache. pkg/roachpb/api.go, line 996 at r2 (raw file):
If you comment out pkg/roachpb/api.proto, line 1115 at r2 (raw file):
s/transactin/transaction/ Document what this returns (just success or failure? what kind of error?) pkg/storage/batcheval/cmd_update_read.go, line 1 at r2 (raw file):
s/2014/2017/ pkg/storage/batcheval/cmd_update_read.go, line 81 at r2 (raw file):
If you get an intent older than OrigTimestamp, something weird has happened. The original read should not have succeeded. We may want to return an error in this case. Comments from Reviewable |
1ebd7d8
to
a4d46b5
Compare
I've changed Could you elaborate on why you don't believe the Review status: 4 of 36 files reviewed at latest revision, 7 unresolved discussions. pkg/kv/txn_coord_sender.go, line 1106 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yes good idea. I already had to make the same suggested change to the write intents because it was showing up in profiling building huge transactions. pkg/kv/txn_coord_sender.go, line 1205 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/roachpb/api.go, line 896 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I had this same thought and explored it. The problem is that these two ideas (needing to update the timestamp cache, vs. needing to rescan the span for newer updates) are different. Primarily, it occurs to me that In any case, because these really are subtly different concepts, I think it's wise to have a new bool. pkg/roachpb/api.go, line 996 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I hadn't even gotten through tests because I wanted to get this idea prototyped first to see if this direction seemed like an adequate solution to the various problems people have brought up with max safe timestamp, large transactions, and serializability. pkg/roachpb/api.proto, line 1115 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
It's just a generic error. Do you think we need to make a specific error for this for some reason? pkg/storage/batcheval/cmd_update_read.go, line 1 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/batcheval/cmd_update_read.go, line 81 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
I'm not sure about this, it just caught my attention that we're doing it differently in the two places, and I think they should be the same (i.e. change the existing code to use ResumeSpan). Reviewed 2 of 14 files at r3, 12 of 12 files at r5. pkg/roachpb/api.go, line 896 at r2 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
I see. Yes, operations that update the timestamp cache should do so even if they return an error (as long as that error is based on data read. ConditionFailedError may be the only one of these for now, but it's probably better to be conservative and assume that any error from a read command should still update the timestamp cache). Since it's impossible to continue a transaction after one of these errors, it doesn't really matter whether we add them to the UpdateRead list or not. However, I think it's more consistent to add them to the list (which will subsequently get ignored since we'll never get a retry error from a pushed timestamp) than to maintain that these are different concepts (or at least I'm not seeing the subtle difference that justifies a separate flag) pkg/roachpb/api.proto, line 1115 at r2 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Mainly for logging/monitoring purposes. A failure to update the read timestamps because there was a conflicting write is different from an UpdateRead that fails due to an RPC error. pkg/roachpb/batch.go, line 233 at r5 (raw file):
If it updates the write tscache, is "ReadSpanIterate" the appropriate name? Maybe TSCacheSpanIterate? pkg/roachpb/batch.go, line 247 at r5 (raw file):
We should avoid duplicating this knowledge here. Maybe a new api.go flag for updatesWriteTSCache? pkg/roachpb/batch_test.go, line 247 at r5 (raw file):
s/delRng/write/g. We may have more like DelRange in the future. pkg/storage/batcheval/cmd_update_read.go, line 20 at r5 (raw file):
Wrong log package. Comments from Reviewable |
See #21304 for fixes to the resume spans. Review status: 18 of 36 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. pkg/roachpb/api.go, line 896 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I think it would be very unconservative indeed to drastically change the behavior by setting the timestamp cache on all read errors... Better to just correct the I think we still need this flag. The difference is subtle, but it's important for our ability to avoid restarts on common insert / upsert workloads which are only doing uniqueness constraint checking on the primary key via conditional puts. If we eliminate this distinction and require pkg/roachpb/api.proto, line 1115 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I thought RPC errors ended up as pkg/roachpb/batch.go, line 233 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I renamed everything given the presence of read and write spans. The RPC is now pkg/roachpb/batch.go, line 247 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
See above. pkg/roachpb/batch_test.go, line 247 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/batcheval/cmd_update_read.go, line 20 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
526d941
to
d1b1d11
Compare
Reviewed 7 of 29 files at r1, 2 of 14 files at r6, 17 of 17 files at r9. pkg/kv/txn_coord_sender.go, line 71 at r9 (raw file):
pkg/roachpb/api.go, line 896 at r2 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Now that we set the timestamp cache on ConditionFailedError (#21297), is the distinction between pkg/roachpb/batch.go, line 233 at r5 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
While UpdateSpan (kind of) works for the new rpc method, I think it's confusing to have all these other variables with UpdateSpan in their names. It sounds like it's referring to spans that have been updated/written to, instead of spans whose status needs to be updated when our timestamp is pushed. pkg/storage/engine/mvcc.go, line 1912 at r9 (raw file):
Document the new return value. Comments from Reviewable |
d1b1d11
to
11e7909
Compare
11e7909
to
2bd2411
Compare
@bdarnell, this has undergone some big changes due to @tschottdorf's suggestion that we handle things other than just Review status: 0 of 42 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending. pkg/kv/txn_coord_sender.go, line 71 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I did some pretty big changes here. No longer called pkg/roachpb/api.go, line 896 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Now that we're handling pkg/roachpb/batch.go, line 233 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Lots of refactoring...let me know if this is still a problem. pkg/storage/engine/mvcc.go, line 1912 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
Reviewed 10 of 10 files at r13. pkg/kv/txn_coord_sender.go, line 584 at r11 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
MixedSuccessError sounds good to me. Comments from Reviewable |
76eb1fe
to
0cd1bd5
Compare
Had to add a new version as it turns out that if you send an old server a |
0cd1bd5
to
976dae7
Compare
Did some more thinking this morning about Also improved comments in |
Had to add a new version as it turns out that if you send an old server a
Request type it doesn't understand in a batch it panics. We should return
an error instead. I filed #22290
<#22290> to track that.
I thought I did that somewhere recently.
On Thu, Feb 1, 2018 at 7:08 AM Spencer Kimball ***@***.***> wrote:
Did some more thinking this morning about InitPut and realized we
shouldn't need to update the timestamp cache or track refresh spans for it.
Updated the comments and added unittests.
Also improved comments in mvccResolveIntent.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21140 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE135HiRpXtgrAAIRhhRJyWZruENMRoZks5tQakmgaJpZM4RP4bK>
.
--
…-- Tobias
|
976dae7
to
240b329
Compare
@tschottdorf thanks for pointing that out. I closed my issue as a duplicate. |
The concept is straightforward: store spans encountered during the course of executing a transaction, up to a maximum limit of bytes. Upon encountering a transaction retry error with `reason=RETRY_SERIALIZABLE`, attempt to "refresh" all of the spans in order to avoid a full txn restart. For the case of write-only SERIALIZABLE transactions, we're now able to set the original timestamp equal to the commit timestamp before committing, entirely circumventing the possibility of a serializable restart. Otherwise, if all spans can be updated, the original timestamp is moved to the commit timestamp (without a transaction restart), and the `EndTransaction` batch is re-submitted. `Refresh(Range)` is a new KV RPC which re-reads a span, updating the timestamp cache and checking whether any more recent writes occurred to the span. It uses a time-bounded iterator for efficiency. Any writes or intents which have appeared between the transaction's original timestamp and its commit timestamp result in an error. Note that `Refresh(Range)` has a bool indicating whether it should update the read or write timestamp cache. Only `DeleteRange` creates update spans which need to update the write timestamp cache. This change should allow long-lived SERIALIZABLE transactions in the same style as SNAPSHOT, and should decrease the occurence of transaction retry errors which currently must be handled by client code. See cockroachdb#21078 Release note (sql): significantly reduce the likelihood of serializable restarts seen by clients due to concurrent workloads.
240b329
to
f7f6472
Compare
Reviewed 8 of 8 files at r14. Comments from Reviewable |
... if it's still possible for the SQL connExecutor to retry them (i.e. we haven't sent results to the client for the respective txn yet). This patch re-introduces code we used to have, but was deleted in cockroachdb#21140. The motivation for deleting the code was that retriable errors were supposed to be rarer because of the "read refreshing" mechanism, but that mechanism doesn't apply to transactions that "observe their commit timestamp". A txn can observe its timestamp by calling cluster_logical_timestamp(), or in other ways. In particular, schema change txns always observe their ts. Fixes cockroachdb#22933 Release note: Retriable errors on schema change operations are less likely to be returned to clients; more operations are retried internally.
... if it's still possible for the SQL connExecutor to retry them (i.e. we haven't sent results to the client for the respective txn yet). This patch re-introduces code we used to have, but was deleted in cockroachdb#21140. The motivation for deleting the code was that retriable errors were supposed to be rarer because of the "read refreshing" mechanism, but that mechanism doesn't apply to transactions that "observe their commit timestamp". A txn can observe its timestamp by calling cluster_logical_timestamp(), or in other ways. In particular, schema change txns always observe their ts. Fixes cockroachdb#22933 Release note: Retriable errors on schema change operations are less likely to be returned to clients; more operations are retried internally.
... if it's still possible for the SQL connExecutor to retry them (i.e. we haven't sent results to the client for the respective txn yet). This patch re-introduces code we used to have, but was deleted in cockroachdb#21140. The motivation for deleting the code was that retriable errors were supposed to be rarer because of the "read refreshing" mechanism, but that mechanism doesn't apply to transactions that "observe their commit timestamp". A txn can observe its timestamp by calling cluster_logical_timestamp(), or in other ways. In particular, schema change txns always observe their ts. Fixes cockroachdb#22933 Release note: Retriable errors on schema change operations are less likely to be returned to clients; more operations are retried internally.
Proposals that return errors but still want to go through Raft to lay down intents were refactored away in cockroachdb#21140. This allows us to simplify the error handling in `evaluateProposalInner`. Release note: None
There used to be a discrepancy between deadline exceeded errors detected on the server side and those detected on the client side for elided EndTransaction requests. No longer since cockroachdb#21140. Release note: None
28117: kv: get rid of test FUD r=andreimatei a=andreimatei Some TCS tests were employing a teardownHeartbeat() method. That's not needed - all the test indirectly stop a stopper that also tears down dangling heartbeat loops. Release note: None 28124: client: delete stale comment r=andreimatei a=andreimatei There used to be a discrepancy between deadline exceeded errors detected on the server side and those detected on the client side for elided EndTransaction requests. No longer since #21140. Release note: None Co-authored-by: Andrei Matei <[email protected]>
Proposals that return errors but still want to go through Raft to lay down intents were refactored away in cockroachdb#21140. This allows us to simplify the error handling in `evaluateProposalInner`. Release note: None
The concept is straightforward: store read spans encountered during
the course of executing a transaction, up to a maximum limit of bytes.
Upon encountering a transaction retry error with
reason=RETRY_SERIALIZABLE
,attempt to "update" all of the read spans in order to avoid a full txn
restart. For the case of write-only SERIALIZABLE transactions, we're now
able to set the original timestamp equal to the commit timestamp before
committing, entirely circumventing the possibility of a serializable
restart. Otherwise, if all read spans can be updated, the original timestamp
is moved to the commit timestamp (without a transaction restart), and the
EndTransaction
batch is re-submitted.UpdateRead
is a new KV RPC which re-reads a span, updating the timestampcache and checking whether any more recent writes occurred to the span.
It uses a time-bounded iterator for efficiency. Any writes or intents
which have appeared between the transaction's original timestamp and its
commit timestamp result in an error.
This change should allow long-lived SERIALIZABLE transactions in the same
style as SNAPSHOT, and should decrease the occurence of transaction retry
errors which currently must be handled by client code.
See #21078