Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: bulk UPDATE statement in implicit txn will retry indefinitely due to txn commit deadline errors #69089

Closed
smcvey opened this issue Aug 18, 2021 · 14 comments · Fixed by #69936
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@smcvey
Copy link
Contributor

smcvey commented Aug 18, 2021

Describe the problem

Attempting to run an UPDATE which touches all 100 million rows of a table continuously fails ('retries' number on the DBConsole slowly increases). There are no other queries running on the cluster.

To Reproduce

New cluster with the following SQL used to populate 100,000,000 rows:

create database shaun;
use shaun;

create table a (
        a int primary key default unique_rowid(),
        b varchar,
        c uuid default gen_random_uuid()
);

insert into a select a from generate_series(1, 1000000) g(a);
insert into a select a from generate_series(1000001, 2000000) g(a);
insert into a select a from generate_series(2000001, 3000000) g(a);
insert into a select a from generate_series(3000001, 4000000) g(a);
insert into a select a from generate_series(4000001, 5000000) g(a);
insert into a select a from generate_series(5000001, 6000000) g(a);
insert into a select a from generate_series(6000001, 7000000) g(a);
insert into a select a from generate_series(7000001, 8000000) g(a);
insert into a select a from generate_series(8000001, 9000000) g(a);
insert into a select a from generate_series(9000001, 10000000) g(a);

insert into a select a from generate_series(10000001, 11000000) g(a);
insert into a select a from generate_series(11000001, 12000000) g(a);
insert into a select a from generate_series(12000001, 13000000) g(a);
insert into a select a from generate_series(13000001, 14000000) g(a);
insert into a select a from generate_series(14000001, 15000000) g(a);
insert into a select a from generate_series(15000001, 16000000) g(a);
insert into a select a from generate_series(16000001, 17000000) g(a);
insert into a select a from generate_series(17000001, 18000000) g(a);
insert into a select a from generate_series(18000001, 19000000) g(a);
insert into a select a from generate_series(19000001, 20000000) g(a);

insert into a select a from generate_series(20000001, 21000000) g(a);
insert into a select a from generate_series(21000001, 22000000) g(a);
insert into a select a from generate_series(22000001, 23000000) g(a);
insert into a select a from generate_series(23000001, 24000000) g(a);
insert into a select a from generate_series(24000001, 25000000) g(a);
insert into a select a from generate_series(25000001, 26000000) g(a);
insert into a select a from generate_series(26000001, 27000000) g(a);
insert into a select a from generate_series(27000001, 28000000) g(a);
insert into a select a from generate_series(28000001, 29000000) g(a);
insert into a select a from generate_series(29000001, 30000000) g(a);

insert into a select a from generate_series(30000001, 31000000) g(a);
insert into a select a from generate_series(31000001, 32000000) g(a);
insert into a select a from generate_series(32000001, 33000000) g(a);
insert into a select a from generate_series(33000001, 34000000) g(a);
insert into a select a from generate_series(34000001, 35000000) g(a);
insert into a select a from generate_series(35000001, 36000000) g(a);
insert into a select a from generate_series(36000001, 37000000) g(a);
insert into a select a from generate_series(37000001, 38000000) g(a);
insert into a select a from generate_series(38000001, 39000000) g(a);
insert into a select a from generate_series(39000001, 40000000) g(a);

insert into a select a from generate_series(40000001, 41000000) g(a);
insert into a select a from generate_series(41000001, 42000000) g(a);
insert into a select a from generate_series(42000001, 43000000) g(a);
insert into a select a from generate_series(43000001, 44000000) g(a);
insert into a select a from generate_series(44000001, 45000000) g(a);
insert into a select a from generate_series(45000001, 46000000) g(a);
insert into a select a from generate_series(46000001, 47000000) g(a);
insert into a select a from generate_series(47000001, 48000000) g(a);
insert into a select a from generate_series(48000001, 49000000) g(a);
insert into a select a from generate_series(49000001, 50000000) g(a);

insert into a select a from generate_series(50000001, 51000000) g(a);
insert into a select a from generate_series(51000001, 52000000) g(a);
insert into a select a from generate_series(52000001, 53000000) g(a);
insert into a select a from generate_series(53000001, 54000000) g(a);
insert into a select a from generate_series(54000001, 55000000) g(a);
insert into a select a from generate_series(55000001, 56000000) g(a);
insert into a select a from generate_series(56000001, 57000000) g(a);
insert into a select a from generate_series(57000001, 58000000) g(a);
insert into a select a from generate_series(58000001, 59000000) g(a);
insert into a select a from generate_series(59000001, 60000000) g(a);

insert into a select a from generate_series(60000001, 61000000) g(a);
insert into a select a from generate_series(61000001, 62000000) g(a);
insert into a select a from generate_series(62000001, 63000000) g(a);
insert into a select a from generate_series(63000001, 64000000) g(a);
insert into a select a from generate_series(64000001, 65000000) g(a);
insert into a select a from generate_series(65000001, 66000000) g(a);
insert into a select a from generate_series(66000001, 67000000) g(a);
insert into a select a from generate_series(67000001, 68000000) g(a);
insert into a select a from generate_series(68000001, 69000000) g(a);
insert into a select a from generate_series(69000001, 70000000) g(a);

insert into a select a from generate_series(70000001, 71000000) g(a);
insert into a select a from generate_series(71000001, 72000000) g(a);
insert into a select a from generate_series(72000001, 73000000) g(a);
insert into a select a from generate_series(73000001, 74000000) g(a);
insert into a select a from generate_series(74000001, 75000000) g(a);
insert into a select a from generate_series(75000001, 76000000) g(a);
insert into a select a from generate_series(76000001, 77000000) g(a);
insert into a select a from generate_series(77000001, 78000000) g(a);
insert into a select a from generate_series(78000001, 79000000) g(a);
insert into a select a from generate_series(79000001, 80000000) g(a);

insert into a select a from generate_series(80000001, 81000000) g(a);
insert into a select a from generate_series(81000001, 82000000) g(a);
insert into a select a from generate_series(82000001, 83000000) g(a);
insert into a select a from generate_series(83000001, 84000000) g(a);
insert into a select a from generate_series(84000001, 85000000) g(a);
insert into a select a from generate_series(85000001, 86000000) g(a);
insert into a select a from generate_series(86000001, 87000000) g(a);
insert into a select a from generate_series(87000001, 88000000) g(a);
insert into a select a from generate_series(88000001, 89000000) g(a);
insert into a select a from generate_series(89000001, 90000000) g(a);

insert into a select a from generate_series(90000001, 91000000) g(a);
insert into a select a from generate_series(91000001, 92000000) g(a);
insert into a select a from generate_series(92000001, 93000000) g(a);
insert into a select a from generate_series(93000001, 94000000) g(a);
insert into a select a from generate_series(94000001, 95000000) g(a);
insert into a select a from generate_series(95000001, 96000000) g(a);
insert into a select a from generate_series(96000001, 97000000) g(a);
insert into a select a from generate_series(97000001, 98000000) g(a);
insert into a select a from generate_series(98000001, 99000000) g(a);
insert into a select a from generate_series(99000001, 100000000) g(a);

Once I give the current intents from the INSERTS some time to clear and settle down, I then run this:

UPDATE a SET b = 'Hello world' WHERE b IS NULL;

I then watch the intents climb with this query:

select sum((crdb_internal.range_stats(start_key)->'intent_count')::int) as intent_count
from crdb_internal.ranges_no_leases as r
join crdb_internal.tables as t on r.table_id = t.table_id;

The intents climb to 100,000,000 as expected, then I would expect the write to 'complete' and the intents to start dropping. However, what happens instead is it reaches 100,000,000 then hangs. In the DBConsole, I see the query and the 'retries' value goes to 1. There are no other queries running (unless there are internal ones?) so I wouldn't expect this to fail. Of course, I realise this is bad practise for bulk-insertions, and I would expect it to take a long time, but I would also expect it to complete.

Environment:

  • CockroachDB version 21.1.7
  • Server OS: Linux
  • Client app: cockroach sql
@smcvey smcvey added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 18, 2021
@jordanlewis
Copy link
Member

I think this is very slow because of the slow intent resolution path that gets triggered when a write exceeds the threshold kv.transaction.max_intent_bytes limit. #54029

I don't think this is a bug per se, more of a known limitation or design constraint.

@tbg what would you say?

@blathers-crl blathers-crl bot added the T-kv KV Team label Aug 18, 2021
@tbg
Copy link
Member

tbg commented Aug 19, 2021

If Shaun watches the intent count climb to 10m, then writing 10m intents sequentially isn't terribly slow. The transaction should be able to refresh, and commit after a refresh, and the refresh shouldn't take longer than it took 10m intents to be written in the first place. Intent resolution only happens after commit, and the commit does not wait for intent resolution. Since the intent count isn't going down, I don't think the slow intent resolution path is to blame. But probably the slow refresh path (RefreshRange instead of point refreshs) but again, shouldn't refreshing ~everything be faster than writing ~everything in the first place? Besides, I think Shaun mentioned that the restart count stays at one. So we're not fast-failing the refresh somewhere and retrying the txn. Or maybe we are.

@smcvey
Copy link
Contributor Author

smcvey commented Aug 19, 2021

Made a typo in my original comment. I'm updating 100 million, not 10.

@blathers-crl
Copy link

blathers-crl bot commented Aug 25, 2021

Hi @lunevalex, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@nvanbenschoten
Copy link
Member

I was able to reproduce this and get an understanding of what is going wrong. It turns out that the issue is not related to ranged intent resolution or txn refreshing. We see the intent count climb for 10m until it hits the 100M intents, then we see it quickly refresh. After the successful refresh, we see the transaction hit a transaction retry error and restart at a higher epoch. This repeats indefinitely.

So why the txn retry after a successful refresh? Because the transaction is hitting its commit deadline:

TransactionRetryError: retry txn (RETRY_COMMIT_DEADLINE_EXCEEDED - txn timestamp pushed too much;
deadline exceeded by 16m55.677514067s (1630435901.626618029,2 > 1630434885.949103962,0)):
"sql txn" meta={id=5862a7e4 key=/Table/53/1/1/0 pri=0.00210661 epo=0 ts=1630435901.626618029,2
min=1630434588.799261732,0 seq=100000001} lock=true stat=PENDING rts=1630435901.626618029,2
wto=false gul=1630434589.299261732,0

This wasn't particularly easy to figure out because the observability was lacking here - see #69671.

Regardless, we can explain the rest of the behavior here now that we know that. The UPDATE a SET b = 'Hello world' WHERE b IS NULL transaction is run in a single statement that updates all 100M rows. This single statement takes minutes to run (at least 17, if the message is to be believed) and then attempts to commit. Sometime during this 17+ minute writing phase, its timestamp gets bumped by the closed timestamp. But this is fine, as it refreshes without issue. However, on commit, it notices that the deadline it was given before starting has been exceeded.

There's plenty of history here. #18684 -> #36871 -> #51042, which eventually led to #63725. My (possibly misguided) understanding of that patch is that it removed a prior limitation that txns would run into these kinds of issues after about 5 minutes. However, we are still vulnerable to individual statements that run for more than 5 minutes, especially (only?) if they are run in implicit transactions. If there's more than a 5 minute gap between the last time that descCollection.MaybeUpdateDeadline is called and the time that the txn attempts to commit, it will hit this.

I'm going to remove the GA-blocker label from this issue because it is not new in v21.2. I'm also going to re-assign to SQL Schema, as the solution to this will be in that area.

@nvanbenschoten nvanbenschoten removed their assignment Aug 31, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Aug 31, 2021
@nvanbenschoten nvanbenschoten added T-sql-schema-deprecated Use T-sql-foundations instead and removed GA-blocker T-kv KV Team T-sql-queries SQL Queries Team labels Aug 31, 2021
@nvanbenschoten nvanbenschoten changed the title Lone UPDATE query on table requires retries sql: bulk UPDATE statement in implicit txn will retry indefinitely due to txn commit deadline errors Aug 31, 2021
@nvanbenschoten
Copy link
Member

cc. @fqazi @ajwerner

@fqazi fqazi self-assigned this Aug 31, 2021
@ajwerner
Copy link
Contributor

ajwerner commented Sep 1, 2021

Oof. When we have an implicit transaction that runs for this long, is there a way to disable the commit in batch behavior and have it bubble back up to the connExecutor and go through the usual commit phase?

When I first wrote #51042 I was thinking that the sql layer would inject a callback into the txn which the kv layer could use to set its deadline. We avoided doing that, if I remember correctly, because of questions of how often such a callback should be called. Maybe there were other reasons. I think a callback API for the deadline on the Txn could work but it'd be an invasive change at this point.

Are there other execution points where it would make sense to add calls to update the deadline?

@ajwerner
Copy link
Contributor

ajwerner commented Sep 1, 2021

Let's contemplate this change:

diff --git a/pkg/kv/txn.go b/pkg/kv/txn.go
index 7060d6c74d..a682b22929 100644
--- a/pkg/kv/txn.go
+++ b/pkg/kv/txn.go
@@ -750,6 +750,18 @@ func (txn *Txn) UpdateDeadline(ctx context.Context, deadline hlc.Timestamp) erro
 	return nil
 }
 
+// DeadlineMightBeExpired returns true if there currently is a deadline and
+// that deadline is earlier than either the ProvisionalCommitTimestamp or
+// the current timestamp. This can be used as a hint that we do not want to
+// auto-commit the transaction in a batch with writes.
+func (txn *Txn) DeadlineMightBeExpired() bool {
+	txn.mu.Lock()
+	defer txn.mu.Unlock()
+	return !txn.mu.deadline.IsEmpty() &&
+		(txn.mu.deadline.Less(txn.ProvisionalCommitTimestamp()) ||
+			txn.mu.deadline.Less(txn.DB().Clock().Now()))
+}
+
 // resetDeadlineLocked resets the deadline.
 func (txn *Txn) resetDeadlineLocked() {
 	txn.mu.deadline = nil
diff --git a/pkg/sql/tablewriter.go b/pkg/sql/tablewriter.go
index 1afdb39c7c..3c9ad32599 100644
--- a/pkg/sql/tablewriter.go
+++ b/pkg/sql/tablewriter.go
@@ -210,11 +210,18 @@ func (tb *tableWriterBase) finalize(ctx context.Context) (err error) {
 	// NB: unlike flushAndStartNewBatch, we don't bother with admission control
 	// for response processing when finalizing.
 	tb.rowsWritten += int64(tb.currentBatchSize)
-	if tb.autoCommit == autoCommitEnabled && (tb.rowsWrittenLimit == 0 || tb.rowsWritten < tb.rowsWrittenLimit) {
+
+	if tb.autoCommit == autoCommitEnabled &&
 		// We can only auto commit if the rows written guardrail is disabled or
 		// we haven't reached the specified limit (the optimizer is responsible
 		// for making sure that there is exactly one mutation before enabling
 		// the auto commit).
+		(tb.rowsWrittenLimit == 0 || tb.rowsWritten < tb.rowsWrittenLimit) &&
+		// Also, we don't want to try to commit here if the deadline is expired.
+		// If we bubble back up to SQL then maybe we can get a fresh deadline
+		// before committing.
+		!tb.txn.DeadlineMightBeExpired() {
+
 		log.Event(ctx, "autocommit enabled")
 		// An auto-txn can commit the transaction with the batch. This is an
 		// optimization to avoid an extra round-trip to the transaction

@nvanbenschoten
Copy link
Member

When I first wrote #51042 I was thinking that the sql layer would inject a callback into the txn which the kv layer could use to set its deadline. We avoided doing that, if I remember correctly, because of questions of how often such a callback should be called. Maybe there were other reasons. I think a callback API for the deadline on the Txn could work but it'd be an invasive change at this point.

Do you have a reference to the discussion of "how often such a callback should be called"? If we replaced the static txn.mu.deadline with a callback, there only appear to be a limited number of places where the callback would be called. So this seems like a reasonably clean solution to me.

The change would be invasive at first, but it would lead to the kv client controlling the late-bound behavior, as opposed to the caller. That seems like a nice property.

Let's contemplate this change:

This change also seems fine to me, though it is more limited in scope.

@ajwerner
Copy link
Contributor

ajwerner commented Sep 2, 2021

This change also seems fine to me, though it is more limited in scope.

I'm inclined to do this more limited thing right now. Given the client has control on auto-commit and it's auto-commit that is hurting us here, I think I'm pleased with this sort of thing. Honestly, this sort of change feels backportable to 21.1. What do you think?

@nvanbenschoten
Copy link
Member

Honestly, this sort of change feels backportable to 21.1. What do you think?

This sounds good to me.

And I heard that @fqazi was going to confirm that this resolves the original issue here. Thanks!

@ajwerner
Copy link
Contributor

Do you have a reference to the discussion of "how often such a callback should be called"? If we replaced the static txn.mu.deadline with a callback, there only appear to be a limited number of places where the callback would be called. So this seems like a reasonably clean solution to me.

I don't. What came to mind was that the invariant checking got weird. It's nice to know when you set the deadline that you aren't setting it to something absurd.

@vy-ton
Copy link
Contributor

vy-ton commented Sep 14, 2021

And I heard that @fqazi was going to confirm that this resolves the original issue here.

@fqazi @ajwerner Have we confirmed this? Do we know if this will be backported to 21.1?

@ajwerner
Copy link
Contributor

And I heard that @fqazi was going to confirm that this resolves the original issue here.

@fqazi @ajwerner Have we confirmed this? Do we know if this will be backported to 21.1?

This change is a special case of what we fixed in #63725, which is new in 21.2. In 21.1 and prior, we would bind a transaction to a deadline as soon as it started and that deadline never moved further into the future. That meant that any transaction which was pushed more than 4-5 minutes would be doomed to fail. In practice, that meant that any writing transaction which was pushed more than that much was doomed to fail. In #63725 we changed the deadline behavior such that we refreshed the deadline on each statement (including commit), so that explicit transaction which lasted arbitrarily long would be fine. The issue here is that implicit, single-statement transactions would not have an opportunity to get a new deadline before committing. So, a single-statement UPDATE or DELETE which lasted too long would fail. Those are particularly painful for users because single-statements ops like that are things we can (and do) retry transparently. This commit fixes that by noticing that we're probably doomed if we were to try to auto-commit in the 1PC optimization path (CommitInBatch as it is known in the code) and bubbling up a layer back to SQL. Then, it also makes sure that when we do bubble up and hit the auto-commit path, that we refresh the deadline.

@craig craig bot closed this as completed in 879ef81 Sep 28, 2021
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants