-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: automatically retry the first batch after a BEGIN #16719
sql: automatically retry the first batch after a BEGIN #16719
Conversation
Looks good! Also check the CLI shell and make sure that the FirstBatch status is reported as OPEN in the prompt. Reviewed 11 of 11 files at r1. pkg/sql/session.go, line 744 at r1 (raw file):
Extend your comment here to explain that this is actually not as trivial as implementing the conditional, because we first need to research+determine whether the mere ACK of an operation (and/or e.g. the timing thereof) can cause clients to use different code paths in practice. Comments from Reviewable |
LGTM Reviewed 11 of 11 files at r1. pkg/internal/client/txn.go, line 768 at r1 (raw file):
Spurious change to this file. pkg/sql/executor.go, line 2042 at r1 (raw file):
Comments from Reviewable |
If you go down the road of allowing SET, I think that any SET statement is fair game. |
Ok what about merging this now, with a TODO to suggest perhaps the various forms of SET are also fair game. (Actually not only SET: PREPARE, DISCARD, RESET, SHOW (not TRACE), EXPLAIN are probably all fine.) |
5090a5f
to
9043105
Compare
added SET and a TODO for all the others. Review status: 6 of 10 files reviewed at latest revision, 3 unresolved discussions. pkg/sql/executor.go, line 2042 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
added SET pkg/sql/session.go, line 744 at r1 (raw file): Previously, knz (kena) wrote…
Mmmm I don't know... I wouldn't add that. For one, I doubt that we can actually do such research (or at least I don't know how). pkg/internal/client/txn.go, line 768 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
ebe1edd
to
bf84edf
Compare
I've removed I've added a separate commit adding I've added a first commit that makes the txn state an atomic. This is required because the transition from FirstBatch->Open can happen concurrently with the execution of statements in the Review status: 0 of 10 files reviewed at latest revision, 3 unresolved discussions. Comments from Reviewable |
bf84edf
to
62ade32
Compare
I've made some changes to the 2nd commit to support transitioning from RestartWait -> FirstBatch. The intention of the code was already to do this transition, but it wasn't working because, after RestartWait -> FirstBatch, we'd immediately transition to Open. This was because Review status: 0 of 10 files reviewed at latest revision, 3 unresolved discussions. Comments from Reviewable |
62ade32
to
5dd7451
Compare
Review status: 0 of 10 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful. pkg/sql/executor.go, line 2019 at r4 (raw file):
Let's leave this out until we're able to test it. pkg/sql/logictest/testdata/logic_test/txn, line 590 at r4 (raw file):
We need to test that after the retry the transaction has the priority that we set at the start of the transaction, and the retry process didn't forget about the first batch of the transaction. pkg/sql/logictest/testdata/logic_test/txn, line 633 at r4 (raw file):
What exactly gets retried here? Just this batch of SELECTs, or the entire list of statements since BEGIN? We need to retry the INSERT RETURNING NOTHING too, and the test needs to verify this (commit the transaction and make sure that the change was applied). Comments from Reviewable |
Ok for the first two commits but please remove the last commit before this can merge. Reviewed 11 of 11 files at r2, 10 of 10 files at r3. Comments from Reviewable |
Also please rebase so I can have a last look with the rebased changes merged in. Review status: 8 of 10 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. Comments from Reviewable |
5dd7451
to
8c72366
Compare
Rebased. @bdarnell you were right that the last commit - about the Review status: 0 of 10 files reviewed at latest revision, 5 unresolved discussions. pkg/sql/executor.go, line 2019 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/sql/logictest/testdata/logic_test/txn, line 590 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
added test that the priority and isolation level stay the same. pkg/sql/logictest/testdata/logic_test/txn, line 633 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Ack. Comments from Reviewable |
Reviewed 2 of 10 files at r5, 10 of 10 files at r6. Comments from Reviewable |
This is in anticipation of the next commit in which we add a state transition that can happen concurrently with statement execution in the parallizeQueue (which execution accesses the txn state). That state transition is inconsequential for statement execution, so this atomic field serves only to appease the race detector.
Reviewed 10 of 10 files at r6. Comments from Reviewable |
Before this patch, in case of retryable errors, the server (i.e. the Executor) would automatically retry a batch of statements if the batch was the prefix of a transaction (or if the batch contained the whole txn). For example, if the following SELECT would get an error, it'd be retried if all of the following statements arrived to the server in one batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying these prefixes, but not otherwise, was that, in the case of a prefix batch, we know that the client had no conditional logic based on reads performed in the current txn. This patch extends this reasoning to statements executed in the first batch arriving after the batch with the BEGIN if the BEGIN had been trailing a previous batch (more realistically, if the BEGIN is sent alone as a batch). As a further optimization, the SAVEPOINT statement doesn't change the retryable character of the next range). So, if you do something like (different lines are different batches): BEGIN SELECT foo; or BEGIN; SAVEPOINT cockroach_restart; SELECT foo or BEGIN SAVEPOINT cockroach_restart [...;] SELECT FOO the SELECTs will be retried automatically. Besides being generally a good idea to hide retryable errors more, this change was motivated by ORMs getting retryable errors from a BEGIN; CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate batch). This ORM code is not under our control and we can't teach it about user-directed retries. This is implemented by creating a new txnState.State - FirstBatch. Auto-retry is enabled for batches executed in this state. Fixes cockroachdb#16450 Fixes cockroachdb#16200 See also forum discussion about it: https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
8c72366
to
3fc8e0b
Compare
Before this patch, in case of retryable errors, the server (i.e. the
Executor) would automatically retry a batch of statements if the batch
was the prefix of a transaction (or if the batch contained the whole
txn). For example, if the following SELECT would get an error, it'd be
retried if all of the following statements arrived to the server in one
batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying
these prefixes, but not otherwise, was that, in the case of a prefix
batch, we know that the client had no conditional logic based on reads
performed in the current txn.
This patch extends this reasoning to statements executed in the first
batch arriving after the batch with the BEGIN if the BEGIN had been
trailing a previous batch (more realistically, if the BEGIN is sent
alone as a batch). As a further optimization, the SAVEPOINT statement
doesn't change the retryable character of the next range). So, if you do
something like (different lines are different batches):
BEGIN
SELECT foo;
or
BEGIN; SAVEPOINT cockroach_restart;
SELECT foo
or
BEGIN
SAVEPOINT cockroach_restart
[...;] SELECT FOO
the SELECTs will be retried automatically.
Besides being generally a good idea to hide retryable errors more, this
change was motivated by ORMs getting retryable errors from a BEGIN;
CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate
batch). This ORM code is not under our control and we can't teach it
about user-directed retries.
This is implemented by creating a new txnState.State - FirstBatch.
Auto-retry is enabled for batches executed in this state.
Fixes #16450
Fixes #16200
See also forum discussion about it:
https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
cc @tristan-ohlson