-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: CREATE TABLE
returns retryable error; should retry internally
#16450
Comments
I believe this is also the cause of the flakiness in the examples orms tests in #16200 and others. |
If the statement is an implicit transaction, then indeed I don't think this error should escape to the client. Let me see if I can repro with the jepsen scripts. Or do you think it's easy to repro with one of the other tests? |
I'm able to repro about half the time with the Jepsen |
If you push a branch to the main cockroachdb/cockroach repo and then trigger the jepsen tests on that branch with teamcity, it will run with a build from that branch (it'll take two hours because it runs all the tests, but a full run is almost certain to hit this error a couple of times) |
I've made the sqlalchemy test work under stress, and I'm able to reproduce that failure. In the case of that test, the The error is due to the fact that the |
It seems like it should be, but I can't find where that transaction is coming from. If it's in a (multi-step) client transaction, then it's legitimate to let this error escape. That's really unfortunate, though, since it means injecting retry loops into a lot more places for things like this (unless we can disable the transaction completely). In the clojure example, I believe we were seeing this happen without a client-side transaction, although I didn't verify this with wireshark. Was the transaction actually pushed or did it run into the timestamp cache? The former would be a surprise to me, but the latter seems more likely. |
Well I can tell you that someone did send a |
Yes, the timestamp cache would have resulted in an error immediately. So something's sending a PUSH_TIMESTAMP PushTxn, which means a read encountering a WriteIntentError. How exactly did you run this under stress? Adding |
As we discussed, the timestamp cache will not, in fact, give you an error immediately; the error will be deferred to the commit. And it appears that it is indeed the timestamp cache that sometimes bumps the timestamp of the I'm able to reproduce this easily in a test doing concurrent table creations in explicit transactions. I have not been able to reproduce in the same test by doing non-transactional table creations - so my hope is still that the Jepsen test, and everybody else experiencing this error, is using explicit transactions. I'm able to produce two types of retryable errors:
The previous speculations about a split causing a create to fail have not been seen yet. Although they probably can happen too. For 1), I think we should do that counter increment non-transactionally. And if we burn some ids that will not actually be used for a descriptor, so be it. I'll work on these things. |
1. is related to #13180
…On Wed, Jun 21, 2017 at 6:57 PM Andrei Matei ***@***.***> wrote:
As we discussed, the timestamp cache will not, in fact, give you an error
immediately; the error will be deferred to the commit. And it appears that
it is indeed the timestamp cache that sometimes bumps the timestamp of the CREATE
TABLE txn.
I'm able to reproduce this easily in a test doing concurrent table
creations in explicit transactions. I have not been able to reproduce in
the same test by doing non-transactional table creations - so my hope is
still that the Jepsen test, and everybody else experiencing this error, is
using explicit transactions.
In any case, for now investigating these explicit transactions.
I'm able to produce two types of retryable errors:
1. When creating tables with different names, the increment on the
shared counter use to get new descriptor ids can fail with a
WriteTooOldError (that's swallowed and resurfaced at commit time).
2. When creating tables with the same name, the timestamp cache can
detect a conflict on the entry mapping the table name to a descriptor id.
The previous speculations about a split causing a create to fail have not
been seen yet. Although they probably can happen too.
For 1), I think we should do that counter increment non-transactionally.
And if we burn some ids that will not actually be used for a descriptor, so
be it.
For 2), and in general for various other possible retryable errors that a
CREATE TABLE might encounter, Ben had an interesting suggestion - we could
automatically retry the first statement sent after a BEGIN, even if it is
not sent in the same batch of statements with the BEGIN (currently we
only retry automatically the batch with the BEGIN in it). The reasoning
here is that the client logic that issues the first query is not
conditional on anything read in the transaction.
I'll work on these things.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#16450 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALOpBH-H_WbZur1FyUaEyEJ7ss8dJTNiks5sGZ_SgaJpZM4N2dea>
.
|
I can confirm that the SQLAlchemy ORM test does in fact generate all of its
|
SGTM
In this case, the table creation is going to fail no matter what, right? It's just a question of which error is returned? Although I guess it could be a problem for CREATE IF NOT EXISTS. Also, I took a closer look at what clojure is doing and it does indeed wrap everything in a transaction by default. There doesn't appear to be any concurrency, but the split race could probably explain the retry errors. |
Right.
By Clojure you mean the Jepsen tests, right? |
Before this patch, ids for new table of database descriptors were created in the same transaction as the SQL statement performing the creation. This meant that if they encountered a retryable error, the statement failed. This is a problem for CREATEs performed by an ORM, which like to perform such statements in explicit transactions (so we can't automatically retry the statement). This patch makes the id creation non-transactional, and retryable on errors. Fixes cockroachdb#13180. Touches cockroachdb#16450
Before this patch, in case of retryable errors, the server (i.e. the Executor) would automatically retry a batch of statements if the batch was the prefix of a transaction (or if the batch contained the whole txn). For example, if the following SELECT would get an error, it'd be retried if all of the following statements arrived to the server in one batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying these prefixes, but not otherwise, was that, in the case of a prefix batch, we know that the client had no conditional logic based on reads performed in the current txn. This patch extends this reasoning to statements executed in the first batch arriving after the batch with the BEGIN if the BEGIN had been trailing a previous batch (more realistically, if the BEGIN is sent alone as a batch). As a further optimization, the SAVEPOINT statement doesn't change the retryable character of the next range). So, if you do something like (different lines are different batches): BEGIN SELECT foo; or BEGIN; SAVEPOINT cockroach_restart; SELECT foo or BEGIN SAVEPOINT cockroach_restart [...;] SELECT FOO the SELECTs will be retried automatically. Besides being generally a good idea to hide retryable errors more, this change was motivated by ORMs getting retryable errors from a BEGIN; CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate batch). This ORM code is not under our control and we can't teach it about user-directed retries. This is implemented by creating a new txnState.State - FirstBatch. Auto-retry is enabled for batches executed in this state. Fixes cockroachdb#16450 Fixes cockroachdb#16200 See also forum discussion about it: https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
Before this patch, ids for new table of database descriptors were created in the same transaction as the SQL statement performing the creation. This meant that if they encountered a retryable error, the statement failed. This is a problem for CREATEs performed by an ORM, which like to perform such statements in explicit transactions (so we can't automatically retry the statement). This patch makes the id creation non-transactional, and retryable on errors. Fixes cockroachdb#13180. Touches cockroachdb#16450
Before this patch, ids for new table of database descriptors were created in the same transaction as the SQL statement performing the creation. This meant that if they encountered a retryable error, the statement failed. This is a problem for CREATEs performed by an ORM, which like to perform such statements in explicit transactions (so we can't automatically retry the statement). This patch makes the id creation non-transactional, and retryable on errors. Fixes cockroachdb#13180. Touches cockroachdb#16450
Before this patch, in case of retryable errors, the server (i.e. the Executor) would automatically retry a batch of statements if the batch was the prefix of a transaction (or if the batch contained the whole txn). For example, if the following SELECT would get an error, it'd be retried if all of the following statements arrived to the server in one batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying these prefixes, but not otherwise, was that, in the case of a prefix batch, we know that the client had no conditional logic based on reads performed in the current txn. This patch extends this reasoning to statements executed in the first batch arriving after the batch with the BEGIN if the BEGIN had been trailing a previous batch (more realistically, if the BEGIN is sent alone as a batch). As a further optimization, the SAVEPOINT statement doesn't change the retryable character of the next range). So, if you do something like (different lines are different batches): BEGIN SELECT foo; or BEGIN; SAVEPOINT cockroach_restart; SELECT foo or BEGIN SAVEPOINT cockroach_restart [...;] SELECT FOO the SELECTs will be retried automatically. Besides being generally a good idea to hide retryable errors more, this change was motivated by ORMs getting retryable errors from a BEGIN; CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate batch). This ORM code is not under our control and we can't teach it about user-directed retries. This is implemented by creating a new txnState.State - FirstBatch. Auto-retry is enabled for batches executed in this state. Fixes cockroachdb#16450 Fixes cockroachdb#16200 See also forum discussion about it: https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
Before this patch, in case of retryable errors, the server (i.e. the Executor) would automatically retry a batch of statements if the batch was the prefix of a transaction (or if the batch contained the whole txn). For example, if the following SELECT would get an error, it'd be retried if all of the following statements arrived to the server in one batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying these prefixes, but not otherwise, was that, in the case of a prefix batch, we know that the client had no conditional logic based on reads performed in the current txn. This patch extends this reasoning to statements executed in the first batch arriving after the batch with the BEGIN if the BEGIN had been trailing a previous batch (more realistically, if the BEGIN is sent alone as a batch). As a further optimization, the SAVEPOINT statement doesn't change the retryable character of the next range). So, if you do something like (different lines are different batches): BEGIN SELECT foo; or BEGIN; SAVEPOINT cockroach_restart; SELECT foo or BEGIN SAVEPOINT cockroach_restart [...;] SELECT FOO the SELECTs will be retried automatically. Besides being generally a good idea to hide retryable errors more, this change was motivated by ORMs getting retryable errors from a BEGIN; CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate batch). This ORM code is not under our control and we can't teach it about user-directed retries. This is implemented by creating a new txnState.State - FirstBatch. Auto-retry is enabled for batches executed in this state. Fixes cockroachdb#16450 Fixes cockroachdb#16200 See also forum discussion about it: https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
Before this patch, in case of retryable errors, the server (i.e. the Executor) would automatically retry a batch of statements if the batch was the prefix of a transaction (or if the batch contained the whole txn). For example, if the following SELECT would get an error, it'd be retried if all of the following statements arrived to the server in one batch: BEGIN; ...; SELECT foo; [... COMMIT]. The rationale in retrying these prefixes, but not otherwise, was that, in the case of a prefix batch, we know that the client had no conditional logic based on reads performed in the current txn. This patch extends this reasoning to statements executed in the first batch arriving after the batch with the BEGIN if the BEGIN had been trailing a previous batch (more realistically, if the BEGIN is sent alone as a batch). As a further optimization, the SAVEPOINT statement doesn't change the retryable character of the next range). So, if you do something like (different lines are different batches): BEGIN SELECT foo; or BEGIN; SAVEPOINT cockroach_restart; SELECT foo or BEGIN SAVEPOINT cockroach_restart [...;] SELECT FOO the SELECTs will be retried automatically. Besides being generally a good idea to hide retryable errors more, this change was motivated by ORMs getting retryable errors from a BEGIN; CREATE TABLE ...; COMMIT; sequence (with the BEGIN being a separate batch). This ORM code is not under our control and we can't teach it about user-directed retries. This is implemented by creating a new txnState.State - FirstBatch. Auto-retry is enabled for batches executed in this state. Fixes cockroachdb#16450 Fixes cockroachdb#16200 See also forum discussion about it: https://forum.cockroachlabs.com/t/automatically-retrying-the-first-batch-of-statements-after-a-begin/759
Does "first batch" in the closing PR correspond to "first SQL statement" only? If so, neither this nor #16200 should have been closed. The offending |
The "first batch" refers to the first batch of statements after a trailing Anywho, let's hope that the other improvements made recently about making |
Sounds good! |
…LE txns Before this patch, a write-read conflict (a txn attempting to write "under" a previous read) would be handled by pushing the commit timestamp of the writer, but otherwise letting it continue. Even though it will not be allowed to commit, the writer is allowed to continue laying down intents in the hope that they'll keep other conflicting txns away. Since this push is only detected at COMMIT time, that's too late to do automatic retries. Therefore, the desire to let the txn go forward and lay down intents is at odds with the desire to sometimes retry automatically. This commit puts a finger on this tradeof scale and makes it so that we detect pushes after statements executed in the FirstBatch state (so, while we can still retry automatically). When a push is detected, the txn is (auto-)retried. Note that the write will still have laid down at least one intent on the key that caused the push. This change hopes to make it possible for some transactions to all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE; COMMIT). Touches cockroachdb#16450
…LE txns Before this patch, a write-read conflict (a txn attempting to write "under" a previous read) would be handled by pushing the commit timestamp of the writer, but otherwise letting it continue. Even though it will not be allowed to commit, the writer is allowed to continue laying down intents in the hope that they'll keep other conflicting txns away. Since this push is only detected at COMMIT time, that's too late to do automatic retries. Therefore, the desire to let the txn go forward and lay down intents is at odds with the desire to sometimes retry automatically. This commit puts a finger on this tradeof scale and makes it so that we detect pushes after statements executed in the FirstBatch state (so, while we can still retry automatically). When a push is detected, the txn is (auto-)retried. Note that the write will still have laid down at least one intent on the key that caused the push. This change hopes to make it possible for some transactions to all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE; COMMIT). Touches cockroachdb#16450
…LE txns Before this patch, a write-read conflict (a txn attempting to write "under" a previous read) would be handled by pushing the commit timestamp of the writer, but otherwise letting it continue. Even though it will not be allowed to commit, the writer is allowed to continue laying down intents in the hope that they'll keep other conflicting txns away. Since this push is only detected at COMMIT time, that's too late to do automatic retries. Therefore, the desire to let the txn go forward and lay down intents is at odds with the desire to sometimes retry automatically. This commit puts a finger on this tradeof scale and makes it so that we detect pushes after statements executed in the FirstBatch state (so, while we can still retry automatically). When a push is detected, the txn is (auto-)retried. Note that the write will still have laid down at least one intent on the key that caused the push. This change hopes to make it possible for some transactions to all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE; COMMIT). Touches cockroachdb#16450
…LE txns Before this patch, a write-read conflict (a txn attempting to write "under" a previous read) would be handled by pushing the commit timestamp of the writer, but otherwise letting it continue. Even though it will not be allowed to commit, the writer is allowed to continue laying down intents in the hope that they'll keep other conflicting txns away. Since this push is only detected at COMMIT time, that's too late to do automatic retries. Therefore, the desire to let the txn go forward and lay down intents is at odds with the desire to sometimes retry automatically. This commit puts a finger on this tradeof scale and makes it so that we detect pushes after statements executed in the FirstBatch state (so, while we can still retry automatically). When a push is detected, the txn is (auto-)retried. Note that the write will still have laid down at least one intent on the key that caused the push. This change hopes to make it possible for some transactions to all-but-never see retryable errors (e.g. Jepsen's BEGIN; CREATE TABLE; COMMIT). Touches cockroachdb#16450
When a statement is run outside of a transaction (i.e. in an implicit auto-commit transaction), the server is supposed to be responsible for handling any retryable errors (since it has the entire transaction available for the retry). This doesn't appear to be working for
CREATE TABLE
statements, as seen in #15733. I can't be sure that this is limited toCREATE TABLE
. That's the only place where I've been seeing it, but in these tests errors in other statements aren't logged as visibly.org.postgresql.util.PSQLException: ERROR: restart transaction: HandledRetryableTxnError: TransactionRetryError: retry txn "sql txn" id=f9637e48 key=/Table/0/0 rw=true pri=0.01474274 iso=SERIALIZABLE stat=PENDING epo=0 ts=1494037334.449734196,1 orig=1494037332.150969729,0 max=1494037332.151492593,0 wto=false rop=false seq=6
@andreimatei is it expected that a "handled" retryable error would make it out to the client like this?
The text was updated successfully, but these errors were encountered: