sql: `CREATE TABLE` returns retryable error; should retry internally #16450

bdarnell · 2017-06-11T17:47:28Z

When a statement is run outside of a transaction (i.e. in an implicit auto-commit transaction), the server is supposed to be responsible for handling any retryable errors (since it has the entire transaction available for the retry). This doesn't appear to be working for CREATE TABLE statements, as seen in #15733. I can't be sure that this is limited to CREATE TABLE. That's the only place where I've been seeing it, but in these tests errors in other statements aren't logged as visibly.

org.postgresql.util.PSQLException: ERROR: restart transaction: HandledRetryableTxnError: TransactionRetryError: retry txn "sql txn" id=f9637e48 key=/Table/0/0 rw=true pri=0.01474274 iso=SERIALIZABLE stat=PENDING epo=0 ts=1494037334.449734196,1 orig=1494037332.150969729,0 max=1494037332.151492593,0 wto=false rop=false seq=6

@andreimatei is it expected that a "handled" retryable error would make it out to the client like this?

The text was updated successfully, but these errors were encountered:

jordanlewis · 2017-06-12T00:03:46Z

I believe this is also the cause of the flakiness in the examples orms tests in #16200 and others.

andreimatei · 2017-06-12T17:48:33Z

If the statement is an implicit transaction, then indeed I don't think this error should escape to the client. Let me see if I can repro with the jepsen scripts. Or do you think it's easy to repro with one of the other tests?

andreimatei · 2017-06-12T20:09:57Z

I'm able to repro about half the time with the Jepsen sets test, but not very reliably. Figuring out how to have the test run an instrumented binary and looking.

bdarnell · 2017-06-12T20:12:20Z

If you push a branch to the main cockroachdb/cockroach repo and then trigger the jepsen tests on that branch with teamcity, it will run with a build from that branch (it'll take two hours because it runs all the tests, but a full run is almost certain to hit this error a couple of times)

andreimatei · 2017-06-20T01:53:21Z

I've made the sqlalchemy test work under stress, and I'm able to reproduce that failure. In the case of that test, the create table statement is done in a client transaction, so at a first approximation, it's not illegal for the retryable error to make it to the client. @bdarnell, do you know if that transaction is avoidable with sqlalchemy?

The error is due to the fact that the create table transaction has been pushed, likely by the split of an early range that takes up much of the initial key space ({System/tse-Max}). Perhaps the split is caused by the recent creation of a previous table.
I don't know if this push is surprising or not. We don't generally push transactions any more, but perhaps a split is still expected to do some pushes? I'll continue investigating exactly what the txn conflict is.

bdarnell · 2017-06-20T03:06:42Z

do you know if that transaction is avoidable with sqlalchemy?

It seems like it should be, but I can't find where that transaction is coming from.

If it's in a (multi-step) client transaction, then it's legitimate to let this error escape. That's really unfortunate, though, since it means injecting retry loops into a lot more places for things like this (unless we can disable the transaction completely).

In the clojure example, I believe we were seeing this happen without a client-side transaction, although I didn't verify this with wireshark.

Was the transaction actually pushed or did it run into the timestamp cache? The former would be a surprise to me, but the latter seems more likely.

andreimatei · 2017-06-20T03:56:14Z

Was the transaction actually pushed or did it run into the timestamp cache? The former would be a surprise to me, but the latter seems more likely.

Well I can tell you that someone did send a PushTxn request at some point, but it could be unrelated to our txn. And I can also tell you that the "retry reason" of the TxnRetryError is RETRY_SERIALIZABLE. And that it's generated on COMMIT; if the txn would have run into the timestamp cache, would that have result in an error immediately, or at commit time?

bdarnell · 2017-06-21T19:32:58Z

Yes, the timestamp cache would have resulted in an error immediately. So something's sending a PUSH_TIMESTAMP PushTxn, which means a read encountering a WriteIntentError.

How exactly did you run this under stress? Adding --vmodule=replica_command=1 will show a little more detail about pushes. We're not supposed to push any more unless there's a deadlock. Could a split deadlock with a table creation? I hope not, but I'd have to trace through both transactions to be sure.

andreimatei · 2017-06-21T22:57:00Z

As we discussed, the timestamp cache will not, in fact, give you an error immediately; the error will be deferred to the commit. And it appears that it is indeed the timestamp cache that sometimes bumps the timestamp of the CREATE TABLE txn.

I'm able to reproduce this easily in a test doing concurrent table creations in explicit transactions. I have not been able to reproduce in the same test by doing non-transactional table creations - so my hope is still that the Jepsen test, and everybody else experiencing this error, is using explicit transactions.
In any case, for now investigating these explicit transactions.

I'm able to produce two types of retryable errors:

When creating tables with different names, the increment on the shared counter use to get new descriptor ids can fail with a WriteTooOldError (that's swallowed and resurfaced at commit time).
When creating tables with the same name, the timestamp cache can detect a conflict on the entry mapping the table name to a descriptor id.

The previous speculations about a split causing a create to fail have not been seen yet. Although they probably can happen too.

For 1), I think we should do that counter increment non-transactionally. And if we burn some ids that will not actually be used for a descriptor, so be it.
For 2), and in general for various other possible retryable errors that a CREATE TABLE might encounter, Ben had an interesting suggestion - we could automatically retry the first statement sent after a BEGIN, even if it is not sent in the same batch of statements with the BEGIN (currently we only retry automatically the batch with the BEGIN in it). The reasoning here is that the client logic that issues the first query is not conditional on anything read in the transaction.

I'll work on these things.

vivekmenezes · 2017-06-22T00:32:03Z

1. is related to #13180

…

On Wed, Jun 21, 2017 at 6:57 PM Andrei Matei ***@***.***> wrote: As we discussed, the timestamp cache will not, in fact, give you an error immediately; the error will be deferred to the commit. And it appears that it is indeed the timestamp cache that sometimes bumps the timestamp of the CREATE TABLE txn. I'm able to reproduce this easily in a test doing concurrent table creations in explicit transactions. I have not been able to reproduce in the same test by doing non-transactional table creations - so my hope is still that the Jepsen test, and everybody else experiencing this error, is using explicit transactions. In any case, for now investigating these explicit transactions. I'm able to produce two types of retryable errors: 1. When creating tables with different names, the increment on the shared counter use to get new descriptor ids can fail with a WriteTooOldError (that's swallowed and resurfaced at commit time). 2. When creating tables with the same name, the timestamp cache can detect a conflict on the entry mapping the table name to a descriptor id. The previous speculations about a split causing a create to fail have not been seen yet. Although they probably can happen too. For 1), I think we should do that counter increment non-transactionally. And if we burn some ids that will not actually be used for a descriptor, so be it. For 2), and in general for various other possible retryable errors that a CREATE TABLE might encounter, Ben had an interesting suggestion - we could automatically retry the first statement sent after a BEGIN, even if it is not sent in the same batch of statements with the BEGIN (currently we only retry automatically the batch with the BEGIN in it). The reasoning here is that the client logic that issues the first query is not conditional on anything read in the transaction. I'll work on these things. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#16450 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALOpBH-H_WbZur1FyUaEyEJ7ss8dJTNiks5sGZ_SgaJpZM4N2dea> .

jordanlewis · 2017-06-22T03:49:51Z

I can confirm that the SQLAlchemy ORM test does in fact generate all of its CREATE TABLE DDLs within a transaction. Here's the log with --vmodule=executor=2:

I170621 23:47:07.645977 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] BEGIN TRANSACTION
I170621 23:47:07.646002 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] BEGIN done
I170621 23:47:07.646122 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.646155 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.646173 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.647142 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.647167 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.647482 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.647507 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.647518 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.648226 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.648249 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.648459 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.648486 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.648497 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.649187 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.649210 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.649454 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: SHOW TABLES
I170621 23:47:07.649484 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: SHOW TABLES
I170621 23:47:07.649504 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES
I170621 23:47:07.650257 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.delayedNode
I170621 23:47:07.650280 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] SHOW TABLES (0 results) done
I170621 23:47:07.651091 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest:
CREATE TABLE customers (
	id INTEGER DEFAULT unique_rowid() NOT NULL,
	name VARCHAR,
	PRIMARY KEY (id)
)

I170621 23:47:07.651197 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: CREATE TABLE customers (id INTEGER NOT NULL DEFAULT unique_rowid(), "name" VARCHAR, PRIMARY KEY (id))
I170621 23:47:07.651221 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] CREATE TABLE customers (id INTEGER NOT NULL DEFAULT unique_rowid(), "name" VARCHAR, PRIMARY KEY (id))
I170621 23:47:07.651517 444 sql/executor.go:1643  [client=[::1]:52128,user=root,n1] query not supported for distSQL: unsupported node *sql.createTableNode
I170621 23:47:07.654768 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] CREATE TABLE done
I170621 23:47:07.654988 444 sql/executor.go:634  [client=[::1]:52128,user=root,n1] execRequest: COMMIT
I170621 23:47:07.655023 444 sql/executor.go:1021  [client=[::1]:52128,user=root,n1] executing 1/1: COMMIT TRANSACTION
I170621 23:47:07.655035 444 sql/executor.go:1204  [client=[::1]:52128,user=root,n1] COMMIT TRANSACTION
I170621 23:47:07.655889 444 sql/event_log.go:101  [client=[::1]:52128,user=root,n1] Event: "create_table", target: 51, info: {TableName:customers Statement:CREATE TABLE customers (id INTEGER NOT NULL DEFAULT unique_rowid(), "name" VARCHAR, PRIMARY KEY (id)) User:root}```

bdarnell · 2017-06-23T01:58:25Z

For 1), I think we should do that counter increment non-transactionally. And if we burn some ids that will not actually be used for a descriptor, so be it.

SGTM

2 When creating tables with the same name, the timestamp cache can detect a conflict on the entry mapping the table name to a descriptor id.

In this case, the table creation is going to fail no matter what, right? It's just a question of which error is returned? Although I guess it could be a problem for CREATE IF NOT EXISTS.

Also, I took a closer look at what clojure is doing and it does indeed wrap everything in a transaction by default. There doesn't appear to be any concurrency, but the split race could probably explain the retry errors.

andreimatei · 2017-06-23T15:23:31Z

In this case, the table creation is going to fail no matter what, right? It's just a question of which error is returned? Although I guess it could be a problem for CREATE IF NOT EXISTS.

Right.

Also, I took a closer look at what clojure is doing and it does indeed wrap everything in a transaction by default. There doesn't appear to be any concurrency, but the split race could probably explain the retry errors.

By Clojure you mean the Jepsen tests, right?

Before this patch, ids for new table of database descriptors were created in the same transaction as the SQL statement performing the creation. This meant that if they encountered a retryable error, the statement failed. This is a problem for CREATEs performed by an ORM, which like to perform such statements in explicit transactions (so we can't automatically retry the statement). This patch makes the id creation non-transactional, and retryable on errors. Fixes cockroachdb#13180. Touches cockroachdb#16450

Before this patch, in case of retryable errors, the server (i.e. the Executor) would automatically retry a batch of statements if the batch was the prefix of a transaction (or if the batch contained the whole txn). For example, if the following SELECT would get an error, it'd be retried if all of the following statements arrived to the server in one batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying these prefixes, but not otherwise, was that, in the case of a prefix batch, we know that the client had no conditional logic based on reads performed in the current txn. This patch extends this reasoning to statements executed in the first batch arriving after the batch with the BEGIN if the BEGIN had been trailing a previous batch (more realistically, if the BEGIN is sent alone as a batch). As a further optimization, the SAVEPOINT statement doesn't change the retryable character of the next range). So, if you do something like (different lines are different batches): BEGIN SELECT foo; or BEGIN; SAVEPOINT cockroach_restart; SELECT foo or BEGIN SAVEPOINT cockroach_restart [...;] SELECT FOO the SELECTs will be retried automatically. Besides being generally a good idea to hide retryable errors more, this change was motivated by ORMs getting retryable errors from a BEGIN; CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate batch). This ORM code is not under our control and we can't teach it about user-directed retries. This is implemented by creating a new txnState.State - FirstBatch. Auto-retry is enabled for batches executed in this state. Fixes cockroachdb#16450 Fixes cockroachdb#16200 See also forum discussion about it: https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759

Before this patch, ids for new table of database descriptors were created in the same transaction as the SQL statement performing the creation. This meant that if they encountered a retryable error, the statement failed. This is a problem for CREATEs performed by an ORM, which like to perform such statements in explicit transactions (so we can't automatically retry the statement). This patch makes the id creation non-transactional, and retryable on errors. Fixes cockroachdb#13180. Touches cockroachdb#16450

Before this patch, in case of retryable errors, the server (i.e. the Executor) would automatically retry a batch of statements if the batch was the prefix of a transaction (or if the batch contained the whole txn). For example, if the following SELECT would get an error, it'd be retried if all of the following statements arrived to the server in one batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying these prefixes, but not otherwise, was that, in the case of a prefix batch, we know that the client had no conditional logic based on reads performed in the current txn. This patch extends this reasoning to statements executed in the first batch arriving after the batch with the BEGIN if the BEGIN had been trailing a previous batch (more realistically, if the BEGIN is sent alone as a batch). As a further optimization, the SAVEPOINT statement doesn't change the retryable character of the next range). So, if you do something like (different lines are different batches): BEGIN SELECT foo; or BEGIN; SAVEPOINT cockroach_restart; SELECT foo or BEGIN SAVEPOINT cockroach_restart [...;] SELECT FOO the SELECTs will be retried automatically. Besides being generally a good idea to hide retryable errors more, this change was motivated by ORMs getting retryable errors from a BEGIN; CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate batch). This ORM code is not under our control and we can't teach it about user-directed retries. This is implemented by creating a new txnState.State - FirstBatch. Auto-retry is enabled for batches executed in this state. Fixes cockroachdb#16450 Fixes cockroachdb#16200 See also forum discussion about it: https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759

jordanlewis · 2017-07-05T17:15:59Z

Does "first batch" in the closing PR correspond to "first SQL statement" only? If so, neither this nor #16200 should have been closed.

The offending CREATE TABLE statement actually occurs after several SHOW TABLES statements in the transaction generated by SQLAlchemy.

andreimatei · 2017-07-05T17:30:47Z

The "first batch" refers to the first batch of statements after a trailing BEGIN.
I missed all the SHOW statements done by sqlalchemy. It's not doing them in the CREATE TABLE batch, so indeed this PR won't help. And I guess neither can anything else on the crdb side; if the client is doing reads, it needs to drive the retries. Do you think we can customize it to do so?

Anywho, let's hope that the other improvements made recently about making CREATE TABLE less likely to get a retryable error will make our tests not flake any more. If they do, we can reopen #16200.
This current issue I think can stay closed; it outlived its usefulness.

jordanlewis · 2017-07-05T17:31:50Z

Sounds good!

…LE txns Before this patch, a write-read conflict (a txn attempting to write "under" a previous read) would be handled by pushing the commit timestamp of the writer, but otherwise letting it continue. Even though it will not be allowed to commit, the writer is allowed to continue laying down intents in the hope that they'll keep other conflicting txns away. Since this push is only detected at COMMIT time, that's too late to do automatic retries. Therefore, the desire to let the txn go forward and lay down intents is at odds with the desire to sometimes retry automatically. This commit puts a finger on this tradeof scale and makes it so that we detect pushes after statements executed in the FirstBatch state (so, while we can still retry automatically). When a push is detected, the txn is (auto-)retried. Note that the write will still have laid down at least one intent on the key that caused the push. This change hopes to make it possible for some transactions to all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE; COMMIT). Touches cockroachdb#16450

bdarnell assigned andreimatei Jun 11, 2017

bdarnell mentioned this issue Jun 12, 2017

jepsen: make the Jepsen tests first-class citizens #15738

Closed

17 tasks

andreimatei mentioned this issue Jun 23, 2017

sql: generate descriptor IDs non-transactionally #16704

Merged

andreimatei mentioned this issue Jun 26, 2017

sql: automatically retry the first batch after a BEGIN #16719

Merged

bdarnell mentioned this issue Jun 29, 2017

build: Set --concurrency in jepsen tests #16802

Merged

andreimatei closed this as completed in #16719 Jul 5, 2017

andreimatei mentioned this issue Jul 6, 2017

sql: automatically retry the first batches of uncommitable SERIALIZABLE txns #16874

Merged

jordanlewis mentioned this issue Jul 6, 2017

sql: sqlalchemy examples-orm test skipped since it fails due to txn restarts #16715

Closed

vivekmenezes mentioned this issue Feb 27, 2018

jepsen: table creation fails with transaction retry error #22933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: `CREATE TABLE` returns retryable error; should retry internally #16450

sql: `CREATE TABLE` returns retryable error; should retry internally #16450

bdarnell commented Jun 11, 2017

jordanlewis commented Jun 12, 2017

andreimatei commented Jun 12, 2017

andreimatei commented Jun 12, 2017

bdarnell commented Jun 12, 2017

andreimatei commented Jun 20, 2017 •

edited

Loading

bdarnell commented Jun 20, 2017

andreimatei commented Jun 20, 2017

bdarnell commented Jun 21, 2017

andreimatei commented Jun 21, 2017

vivekmenezes commented Jun 22, 2017 via email

jordanlewis commented Jun 22, 2017

bdarnell commented Jun 23, 2017

andreimatei commented Jun 23, 2017

jordanlewis commented Jul 5, 2017

andreimatei commented Jul 5, 2017

jordanlewis commented Jul 5, 2017

sql: CREATE TABLE returns retryable error; should retry internally #16450

sql: CREATE TABLE returns retryable error; should retry internally #16450

Comments

bdarnell commented Jun 11, 2017

jordanlewis commented Jun 12, 2017

andreimatei commented Jun 12, 2017

andreimatei commented Jun 12, 2017

bdarnell commented Jun 12, 2017

andreimatei commented Jun 20, 2017 • edited Loading

bdarnell commented Jun 20, 2017

andreimatei commented Jun 20, 2017

bdarnell commented Jun 21, 2017

andreimatei commented Jun 21, 2017

vivekmenezes commented Jun 22, 2017 via email

jordanlewis commented Jun 22, 2017

bdarnell commented Jun 23, 2017

andreimatei commented Jun 23, 2017

jordanlewis commented Jul 5, 2017

andreimatei commented Jul 5, 2017

jordanlewis commented Jul 5, 2017

sql: `CREATE TABLE` returns retryable error; should retry internally #16450

sql: `CREATE TABLE` returns retryable error; should retry internally #16450

andreimatei commented Jun 20, 2017 •

edited

Loading