-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: bulk UPDATE statement in implicit txn will retry indefinitely due to txn commit deadline errors #69089
Comments
If Shaun watches the intent count climb to 10m, then writing 10m intents sequentially isn't terribly slow. The transaction should be able to refresh, and commit after a refresh, and the refresh shouldn't take longer than it took 10m intents to be written in the first place. Intent resolution only happens after commit, and the commit does not wait for intent resolution. Since the intent count isn't going down, I don't think the slow intent resolution path is to blame. But probably the slow refresh path (RefreshRange instead of point refreshs) but again, shouldn't refreshing ~everything be faster than writing ~everything in the first place? Besides, I think Shaun mentioned that the restart count stays at one. So we're not fast-failing the refresh somewhere and retrying the txn. Or maybe we are. |
Made a typo in my original comment. I'm updating 100 million, not 10. |
Hi @lunevalex, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I was able to reproduce this and get an understanding of what is going wrong. It turns out that the issue is not related to ranged intent resolution or txn refreshing. We see the intent count climb for 10m until it hits the 100M intents, then we see it quickly refresh. After the successful refresh, we see the transaction hit a transaction retry error and restart at a higher epoch. This repeats indefinitely. So why the txn retry after a successful refresh? Because the transaction is hitting its commit deadline:
This wasn't particularly easy to figure out because the observability was lacking here - see #69671. Regardless, we can explain the rest of the behavior here now that we know that. The There's plenty of history here. #18684 -> #36871 -> #51042, which eventually led to #63725. My (possibly misguided) understanding of that patch is that it removed a prior limitation that txns would run into these kinds of issues after about 5 minutes. However, we are still vulnerable to individual statements that run for more than 5 minutes, especially (only?) if they are run in implicit transactions. If there's more than a 5 minute gap between the last time that I'm going to remove the |
Oof. When we have an implicit transaction that runs for this long, is there a way to disable the commit in batch behavior and have it bubble back up to the connExecutor and go through the usual commit phase? When I first wrote #51042 I was thinking that the sql layer would inject a callback into the txn which the kv layer could use to set its deadline. We avoided doing that, if I remember correctly, because of questions of how often such a callback should be called. Maybe there were other reasons. I think a callback API for the deadline on the Txn could work but it'd be an invasive change at this point. Are there other execution points where it would make sense to add calls to update the deadline? |
Let's contemplate this change: diff --git a/pkg/kv/txn.go b/pkg/kv/txn.go
index 7060d6c74d..a682b22929 100644
--- a/pkg/kv/txn.go
+++ b/pkg/kv/txn.go
@@ -750,6 +750,18 @@ func (txn *Txn) UpdateDeadline(ctx context.Context, deadline hlc.Timestamp) erro
return nil
}
+// DeadlineMightBeExpired returns true if there currently is a deadline and
+// that deadline is earlier than either the ProvisionalCommitTimestamp or
+// the current timestamp. This can be used as a hint that we do not want to
+// auto-commit the transaction in a batch with writes.
+func (txn *Txn) DeadlineMightBeExpired() bool {
+ txn.mu.Lock()
+ defer txn.mu.Unlock()
+ return !txn.mu.deadline.IsEmpty() &&
+ (txn.mu.deadline.Less(txn.ProvisionalCommitTimestamp()) ||
+ txn.mu.deadline.Less(txn.DB().Clock().Now()))
+}
+
// resetDeadlineLocked resets the deadline.
func (txn *Txn) resetDeadlineLocked() {
txn.mu.deadline = nil
diff --git a/pkg/sql/tablewriter.go b/pkg/sql/tablewriter.go
index 1afdb39c7c..3c9ad32599 100644
--- a/pkg/sql/tablewriter.go
+++ b/pkg/sql/tablewriter.go
@@ -210,11 +210,18 @@ func (tb *tableWriterBase) finalize(ctx context.Context) (err error) {
// NB: unlike flushAndStartNewBatch, we don't bother with admission control
// for response processing when finalizing.
tb.rowsWritten += int64(tb.currentBatchSize)
- if tb.autoCommit == autoCommitEnabled && (tb.rowsWrittenLimit == 0 || tb.rowsWritten < tb.rowsWrittenLimit) {
+
+ if tb.autoCommit == autoCommitEnabled &&
// We can only auto commit if the rows written guardrail is disabled or
// we haven't reached the specified limit (the optimizer is responsible
// for making sure that there is exactly one mutation before enabling
// the auto commit).
+ (tb.rowsWrittenLimit == 0 || tb.rowsWritten < tb.rowsWrittenLimit) &&
+ // Also, we don't want to try to commit here if the deadline is expired.
+ // If we bubble back up to SQL then maybe we can get a fresh deadline
+ // before committing.
+ !tb.txn.DeadlineMightBeExpired() {
+
log.Event(ctx, "autocommit enabled")
// An auto-txn can commit the transaction with the batch. This is an
// optimization to avoid an extra round-trip to the transaction |
Do you have a reference to the discussion of "how often such a callback should be called"? If we replaced the static The change would be invasive at first, but it would lead to the kv client controlling the late-bound behavior, as opposed to the caller. That seems like a nice property.
This change also seems fine to me, though it is more limited in scope. |
I'm inclined to do this more limited thing right now. Given the client has control on auto-commit and it's auto-commit that is hurting us here, I think I'm pleased with this sort of thing. Honestly, this sort of change feels backportable to 21.1. What do you think? |
This sounds good to me. And I heard that @fqazi was going to confirm that this resolves the original issue here. Thanks! |
I don't. What came to mind was that the invariant checking got weird. It's nice to know when you set the deadline that you aren't setting it to something absurd. |
This change is a special case of what we fixed in #63725, which is new in 21.2. In 21.1 and prior, we would bind a transaction to a deadline as soon as it started and that deadline never moved further into the future. That meant that any transaction which was pushed more than 4-5 minutes would be doomed to fail. In practice, that meant that any writing transaction which was pushed more than that much was doomed to fail. In #63725 we changed the deadline behavior such that we refreshed the deadline on each statement (including commit), so that explicit transaction which lasted arbitrarily long would be fine. The issue here is that implicit, single-statement transactions would not have an opportunity to get a new deadline before committing. So, a single-statement UPDATE or DELETE which lasted too long would fail. Those are particularly painful for users because single-statements ops like that are things we can (and do) retry transparently. This commit fixes that by noticing that we're probably doomed if we were to try to auto-commit in the 1PC optimization path ( |
Describe the problem
Attempting to run an UPDATE which touches all 100 million rows of a table continuously fails ('retries' number on the DBConsole slowly increases). There are no other queries running on the cluster.
To Reproduce
New cluster with the following SQL used to populate 100,000,000 rows:
Once I give the current intents from the INSERTS some time to clear and settle down, I then run this:
I then watch the intents climb with this query:
The intents climb to 100,000,000 as expected, then I would expect the write to 'complete' and the intents to start dropping. However, what happens instead is it reaches 100,000,000 then hangs. In the DBConsole, I see the query and the 'retries' value goes to 1. There are no other queries running (unless there are internal ones?) so I wouldn't expect this to fail. Of course, I realise this is bad practise for bulk-insertions, and I would expect it to take a long time, but I would also expect it to complete.
Environment:
The text was updated successfully, but these errors were encountered: