Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: CREATE TABLE returns retryable error; should retry internally #16450

Closed
bdarnell opened this issue Jun 11, 2017 · 16 comments · Fixed by #16719
Closed

sql: CREATE TABLE returns retryable error; should retry internally #16450

bdarnell opened this issue Jun 11, 2017 · 16 comments · Fixed by #16719
Assignees

Comments

@bdarnell
Copy link
Contributor

When a statement is run outside of a transaction (i.e. in an implicit auto-commit transaction), the server is supposed to be responsible for handling any retryable errors (since it has the entire transaction available for the retry). This doesn't appear to be working for CREATE TABLE statements, as seen in #15733. I can't be sure that this is limited to CREATE TABLE. That's the only place where I've been seeing it, but in these tests errors in other statements aren't logged as visibly.

org.postgresql.util.PSQLException: ERROR: restart transaction: HandledRetryableTxnError: TransactionRetryError: retry txn "sql txn" id=f9637e48 key=/Table/0/0 rw=true pri=0.01474274 iso=SERIALIZABLE stat=PENDING epo=0 ts=1494037334.449734196,1 orig=1494037332.150969729,0 max=1494037332.151492593,0 wto=false rop=false seq=6

@andreimatei is it expected that a "handled" retryable error would make it out to the client like this?

@jordanlewis
Copy link
Member

I believe this is also the cause of the flakiness in the examples orms tests in #16200 and others.

@andreimatei
Copy link
Contributor

If the statement is an implicit transaction, then indeed I don't think this error should escape to the client. Let me see if I can repro with the jepsen scripts. Or do you think it's easy to repro with one of the other tests?

@andreimatei
Copy link
Contributor

I'm able to repro about half the time with the Jepsen sets test, but not very reliably. Figuring out how to have the test run an instrumented binary and looking.

@bdarnell
Copy link
Contributor Author

If you push a branch to the main cockroachdb/cockroach repo and then trigger the jepsen tests on that branch with teamcity, it will run with a build from that branch (it'll take two hours because it runs all the tests, but a full run is almost certain to hit this error a couple of times)

@andreimatei
Copy link
Contributor

andreimatei commented Jun 20, 2017

I've made the sqlalchemy test work under stress, and I'm able to reproduce that failure. In the case of that test, the create table statement is done in a client transaction, so at a first approximation, it's not illegal for the retryable error to make it to the client. @bdarnell, do you know if that transaction is avoidable with sqlalchemy?

The error is due to the fact that the create table transaction has been pushed, likely by the split of an early range that takes up much of the initial key space ({System/tse-Max}). Perhaps the split is caused by the recent creation of a previous table.
I don't know if this push is surprising or not. We don't generally push transactions any more, but perhaps a split is still expected to do some pushes? I'll continue investigating exactly what the txn conflict is.

@bdarnell
Copy link
Contributor Author

do you know if that transaction is avoidable with sqlalchemy?

It seems like it should be, but I can't find where that transaction is coming from.

If it's in a (multi-step) client transaction, then it's legitimate to let this error escape. That's really unfortunate, though, since it means injecting retry loops into a lot more places for things like this (unless we can disable the transaction completely).

In the clojure example, I believe we were seeing this happen without a client-side transaction, although I didn't verify this with wireshark.

Was the transaction actually pushed or did it run into the timestamp cache? The former would be a surprise to me, but the latter seems more likely.

@andreimatei
Copy link
Contributor

Was the transaction actually pushed or did it run into the timestamp cache? The former would be a surprise to me, but the latter seems more likely.

Well I can tell you that someone did send a PushTxn request at some point, but it could be unrelated to our txn. And I can also tell you that the "retry reason" of the TxnRetryError is RETRY_SERIALIZABLE. And that it's generated on COMMIT; if the txn would have run into the timestamp cache, would that have result in an error immediately, or at commit time?

@bdarnell
Copy link
Contributor Author

Yes, the timestamp cache would have resulted in an error immediately. So something's sending a PUSH_TIMESTAMP PushTxn, which means a read encountering a WriteIntentError.

How exactly did you run this under stress? Adding --vmodule=replica_command=1 will show a little more detail about pushes. We're not supposed to push any more unless there's a deadlock. Could a split deadlock with a table creation? I hope not, but I'd have to trace through both transactions to be sure.

@andreimatei
Copy link
Contributor

As we discussed, the timestamp cache will not, in fact, give you an error immediately; the error will be deferred to the commit. And it appears that it is indeed the timestamp cache that sometimes bumps the timestamp of the CREATE TABLE txn.

I'm able to reproduce this easily in a test doing concurrent table creations in explicit transactions. I have not been able to reproduce in the same test by doing non-transactional table creations - so my hope is still that the Jepsen test, and everybody else experiencing this error, is using explicit transactions.
In any case, for now investigating these explicit transactions.

I'm able to produce two types of retryable errors:

  1. When creating tables with different names, the increment on the shared counter use to get new descriptor ids can fail with a WriteTooOldError (that's swallowed and resurfaced at commit time).
  2. When creating tables with the same name, the timestamp cache can detect a conflict on the entry mapping the table name to a descriptor id.

The previous speculations about a split causing a create to fail have not been seen yet. Although they probably can happen too.

For 1), I think we should do that counter increment non-transactionally. And if we burn some ids that will not actually be used for a descriptor, so be it.
For 2), and in general for various other possible retryable errors that a CREATE TABLE might encounter, Ben had an interesting suggestion - we could automatically retry the first statement sent after a BEGIN, even if it is not sent in the same batch of statements with the BEGIN (currently we only retry automatically the batch with the BEGIN in it). The reasoning here is that the client logic that issues the first query is not conditional on anything read in the transaction.

I'll work on these things.

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Jun 22, 2017 via email

@jordanlewis
Copy link
Member

I can confirm that the SQLAlchemy ORM test does in fact generate all of its CREATE TABLE DDLs within a transaction. Here's the log with --vmodule=executor=2:

I170621 23:47:07.645977 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] BEGIN TRANSACTION
I170621 23:47:07.646002 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] BEGIN done
I170621 23:47:07.646122 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.646155 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.646173 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.647142 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.647167 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.647482 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.647507 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.647518 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.648226 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.648249 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.648459 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.648486 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.648497 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.649187 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.649210 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.649454 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.649484 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.649504 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.650257 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.650280 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.651091 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest:
CREATE TABLE customers (
	id INTEGER DEFAULT unique_rowid() NOT NULL,
	name VARCHAR,
	PRIMARY KEY (id)
)

I170621 23:47:07.651197 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: CREATE TABLE customers (id INTEGER NOT NULL DEFAULT unique_rowid(), "name" VARCHAR, PRIMARY KEY (id))
I170621 23:47:07.651221 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] CREATE TABLE customers (id INTEGER NOT NULL DEFAULT unique_rowid(), "name" VARCHAR, PRIMARY KEY (id))
I170621 23:47:07.651517 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.createTableNode
I170621 23:47:07.654768 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] CREATE TABLE done
I170621 23:47:07.654988 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: COMMIT
I170621 23:47:07.655023 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: COMMIT TRANSACTION
I170621 23:47:07.655035 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] COMMIT TRANSACTION
I170621 23:47:07.655889 444 sql/event_log.go:101  [client=[::1]:52128,user=root,n1] Event: "create_table", target: 51, info: {TableName:customers Statement:CREATE TABLE customers (id INTEGER NOT NULL DEFAULT unique_rowid(), "name" VARCHAR, PRIMARY KEY (id)) User:root}```

@bdarnell
Copy link
Contributor Author

For 1), I think we should do that counter increment non-transactionally. And if we burn some ids that will not actually be used for a descriptor, so be it.

SGTM

2 When creating tables with the same name, the timestamp cache can detect a conflict on the entry mapping the table name to a descriptor id.

In this case, the table creation is going to fail no matter what, right? It's just a question of which error is returned? Although I guess it could be a problem for CREATE IF NOT EXISTS.

Also, I took a closer look at what clojure is doing and it does indeed wrap everything in a transaction by default. There doesn't appear to be any concurrency, but the split race could probably explain the retry errors.

@andreimatei
Copy link
Contributor

In this case, the table creation is going to fail no matter what, right? It's just a question of which error is returned? Although I guess it could be a problem for CREATE IF NOT EXISTS.

Right.

Also, I took a closer look at what clojure is doing and it does indeed wrap everything in a transaction by default. There doesn't appear to be any concurrency, but the split race could probably explain the retry errors.

By Clojure you mean the Jepsen tests, right?

andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 23, 2017
Before this patch, ids for new table of database descriptors were
created in the same transaction as the SQL statement performing the
creation. This meant that if they encountered a retryable error, the
statement failed. This is a problem for CREATEs performed by an ORM,
which like to perform such statements in explicit transactions (so we
can't automatically retry the statement).
This patch makes the id creation non-transactional, and retryable on
errors.

Fixes cockroachdb#13180.
Touches cockroachdb#16450
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 26, 2017
Before this patch, in case of retryable errors, the server (i.e. the
Executor) would automatically retry a batch of statements if the batch
was the prefix of a transaction (or if the batch contained the whole
txn). For example, if the following SELECT would get an error, it'd be
retried if all of the following statements arrived to the server in one
batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying
these prefixes, but not otherwise, was that, in the case of a prefix
batch, we know that the client had no conditional logic based on reads
performed in the current txn.

This patch extends this reasoning to statements executed in the first
batch arriving after the batch with the BEGIN if the BEGIN had been
trailing a previous batch (more realistically, if the BEGIN is sent
alone as a batch). As a further optimization, the SAVEPOINT statement
doesn't change the retryable character of the next range). So, if you do
something like (different lines are different batches):
BEGIN
SELECT foo;

or

BEGIN; SAVEPOINT cockroach_restart;
SELECT foo

or

BEGIN
SAVEPOINT cockroach_restart
[...;] SELECT FOO

the SELECTs will be retried automatically.

Besides being generally a good idea to hide retryable errors more, this
change was motivated by ORMs getting retryable errors from a BEGIN;
CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate
batch). This ORM code is not under our control and we can't teach it
about user-directed retries.

This is implemented by creating a new txnState.State - FirstBatch.
Auto-retry is enabled for batches executed in this state.

Fixes cockroachdb#16450
Fixes cockroachdb#16200
See also forum discussion about it:
https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 27, 2017
Before this patch, ids for new table of database descriptors were
created in the same transaction as the SQL statement performing the
creation. This meant that if they encountered a retryable error, the
statement failed. This is a problem for CREATEs performed by an ORM,
which like to perform such statements in explicit transactions (so we
can't automatically retry the statement).
This patch makes the id creation non-transactional, and retryable on
errors.

Fixes cockroachdb#13180.
Touches cockroachdb#16450
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 29, 2017
Before this patch, ids for new table of database descriptors were
created in the same transaction as the SQL statement performing the
creation. This meant that if they encountered a retryable error, the
statement failed. This is a problem for CREATEs performed by an ORM,
which like to perform such statements in explicit transactions (so we
can't automatically retry the statement).
This patch makes the id creation non-transactional, and retryable on
errors.

Fixes cockroachdb#13180.
Touches cockroachdb#16450
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 3, 2017
Before this patch, in case of retryable errors, the server (i.e. the
Executor) would automatically retry a batch of statements if the batch
was the prefix of a transaction (or if the batch contained the whole
txn). For example, if the following SELECT would get an error, it'd be
retried if all of the following statements arrived to the server in one
batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying
these prefixes, but not otherwise, was that, in the case of a prefix
batch, we know that the client had no conditional logic based on reads
performed in the current txn.

This patch extends this reasoning to statements executed in the first
batch arriving after the batch with the BEGIN if the BEGIN had been
trailing a previous batch (more realistically, if the BEGIN is sent
alone as a batch). As a further optimization, the SAVEPOINT statement
doesn't change the retryable character of the next range). So, if you do
something like (different lines are different batches):
BEGIN
SELECT foo;

or

BEGIN; SAVEPOINT cockroach_restart;
SELECT foo

or

BEGIN
SAVEPOINT cockroach_restart
[...;] SELECT FOO

the SELECTs will be retried automatically.

Besides being generally a good idea to hide retryable errors more, this
change was motivated by ORMs getting retryable errors from a BEGIN;
CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate
batch). This ORM code is not under our control and we can't teach it
about user-directed retries.

This is implemented by creating a new txnState.State - FirstBatch.
Auto-retry is enabled for batches executed in this state.

Fixes cockroachdb#16450
Fixes cockroachdb#16200
See also forum discussion about it:
https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 5, 2017
Before this patch, in case of retryable errors, the server (i.e. the
Executor) would automatically retry a batch of statements if the batch
was the prefix of a transaction (or if the batch contained the whole
txn). For example, if the following SELECT would get an error, it'd be
retried if all of the following statements arrived to the server in one
batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying
these prefixes, but not otherwise, was that, in the case of a prefix
batch, we know that the client had no conditional logic based on reads
performed in the current txn.

This patch extends this reasoning to statements executed in the first
batch arriving after the batch with the BEGIN if the BEGIN had been
trailing a previous batch (more realistically, if the BEGIN is sent
alone as a batch). As a further optimization, the SAVEPOINT statement
doesn't change the retryable character of the next range). So, if you do
something like (different lines are different batches):
BEGIN
SELECT foo;

or

BEGIN; SAVEPOINT cockroach_restart;
SELECT foo

or

BEGIN
SAVEPOINT cockroach_restart
[...;] SELECT FOO

the SELECTs will be retried automatically.

Besides being generally a good idea to hide retryable errors more, this
change was motivated by ORMs getting retryable errors from a BEGIN;
CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate
batch). This ORM code is not under our control and we can't teach it
about user-directed retries.

This is implemented by creating a new txnState.State - FirstBatch.
Auto-retry is enabled for batches executed in this state.

Fixes cockroachdb#16450
Fixes cockroachdb#16200
See also forum discussion about it:
https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 5, 2017
Before this patch, in case of retryable errors, the server (i.e. the
Executor) would automatically retry a batch of statements if the batch
was the prefix of a transaction (or if the batch contained the whole
txn). For example, if the following SELECT would get an error, it'd be
retried if all of the following statements arrived to the server in one
batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying
these prefixes, but not otherwise, was that, in the case of a prefix
batch, we know that the client had no conditional logic based on reads
performed in the current txn.

This patch extends this reasoning to statements executed in the first
batch arriving after the batch with the BEGIN if the BEGIN had been
trailing a previous batch (more realistically, if the BEGIN is sent
alone as a batch). As a further optimization, the SAVEPOINT statement
doesn't change the retryable character of the next range). So, if you do
something like (different lines are different batches):
BEGIN
SELECT foo;

or

BEGIN; SAVEPOINT cockroach_restart;
SELECT foo

or

BEGIN
SAVEPOINT cockroach_restart
[...;] SELECT FOO

the SELECTs will be retried automatically.

Besides being generally a good idea to hide retryable errors more, this
change was motivated by ORMs getting retryable errors from a BEGIN;
CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate
batch). This ORM code is not under our control and we can't teach it
about user-directed retries.

This is implemented by creating a new txnState.State - FirstBatch.
Auto-retry is enabled for batches executed in this state.

Fixes cockroachdb#16450
Fixes cockroachdb#16200
See also forum discussion about it:
https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
@jordanlewis
Copy link
Member

Does "first batch" in the closing PR correspond to "first SQL statement" only? If so, neither this nor #16200 should have been closed.

The offending CREATE TABLE statement actually occurs after several SHOW TABLES statements in the transaction generated by SQLAlchemy.

@andreimatei
Copy link
Contributor

The "first batch" refers to the first batch of statements after a trailing BEGIN.
I missed all the SHOW statements done by sqlalchemy. It's not doing them in the CREATE TABLE batch, so indeed this PR won't help. And I guess neither can anything else on the crdb side; if the client is doing reads, it needs to drive the retries. Do you think we can customize it to do so?

Anywho, let's hope that the other improvements made recently about making CREATE TABLE less likely to get a retryable error will make our tests not flake any more. If they do, we can reopen #16200.
This current issue I think can stay closed; it outlived its usefulness.

@jordanlewis
Copy link
Member

Sounds good!

andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 6, 2017
…LE txns

Before this patch, a write-read conflict (a txn attempting to write
"under" a previous read) would be handled by pushing the commit
timestamp of the writer, but otherwise letting it continue. Even though
it will not be allowed to commit, the writer is allowed to continue
laying down intents in the hope that they'll keep other conflicting txns
away.
Since this push is only detected at COMMIT time, that's too late to do
automatic retries. Therefore, the desire to let the txn go forward and
lay down intents is at odds with the desire to sometimes retry
automatically.
This commit puts a finger on this tradeof scale and makes it so that we
detect pushes after statements executed in the FirstBatch state (so,
while we can still retry automatically). When a push is detected, the
txn is (auto-)retried. Note that the write will still have laid down at
least one intent on the key that caused the push.

This change hopes to make it possible for some transactions to
all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE;
COMMIT).

Touches cockroachdb#16450
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 11, 2017
…LE txns

Before this patch, a write-read conflict (a txn attempting to write
"under" a previous read) would be handled by pushing the commit
timestamp of the writer, but otherwise letting it continue. Even though
it will not be allowed to commit, the writer is allowed to continue
laying down intents in the hope that they'll keep other conflicting txns
away.
Since this push is only detected at COMMIT time, that's too late to do
automatic retries. Therefore, the desire to let the txn go forward and
lay down intents is at odds with the desire to sometimes retry
automatically.
This commit puts a finger on this tradeof scale and makes it so that we
detect pushes after statements executed in the FirstBatch state (so,
while we can still retry automatically). When a push is detected, the
txn is (auto-)retried. Note that the write will still have laid down at
least one intent on the key that caused the push.

This change hopes to make it possible for some transactions to
all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE;
COMMIT).

Touches cockroachdb#16450
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 24, 2017
…LE txns

Before this patch, a write-read conflict (a txn attempting to write
"under" a previous read) would be handled by pushing the commit
timestamp of the writer, but otherwise letting it continue. Even though
it will not be allowed to commit, the writer is allowed to continue
laying down intents in the hope that they'll keep other conflicting txns
away.
Since this push is only detected at COMMIT time, that's too late to do
automatic retries. Therefore, the desire to let the txn go forward and
lay down intents is at odds with the desire to sometimes retry
automatically.
This commit puts a finger on this tradeof scale and makes it so that we
detect pushes after statements executed in the FirstBatch state (so,
while we can still retry automatically). When a push is detected, the
txn is (auto-)retried. Note that the write will still have laid down at
least one intent on the key that caused the push.

This change hopes to make it possible for some transactions to
all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE;
COMMIT).

Touches cockroachdb#16450
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 24, 2017
…LE txns

Before this patch, a write-read conflict (a txn attempting to write
"under" a previous read) would be handled by pushing the commit
timestamp of the writer, but otherwise letting it continue. Even though
it will not be allowed to commit, the writer is allowed to continue
laying down intents in the hope that they'll keep other conflicting txns
away.
Since this push is only detected at COMMIT time, that's too late to do
automatic retries. Therefore, the desire to let the txn go forward and
lay down intents is at odds with the desire to sometimes retry
automatically.
This commit puts a finger on this tradeof scale and makes it so that we
detect pushes after statements executed in the FirstBatch state (so,
while we can still retry automatically). When a push is detected, the
txn is (auto-)retried. Note that the write will still have laid down at
least one intent on the key that caused the push.

This change hopes to make it possible for some transactions to
all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE;
COMMIT).

Touches cockroachdb#16450
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants