sql: properly synchronize the internal executor iterator #62581

yuzefovich · 2021-03-25T04:08:45Z

sql: fix an edge case in the internal executor

SetColumns contract allows for the argument to be nil, yet the
iterator of the streaming internal executor expects that column schema,
if set, is non-nil. I don't think this could happen in practice, but
theoretically previously we could encounter an assertion error due to
none of the four fields in ieIteratorResult object being non-nil, and
now this is fixed.

Release note: None

sql: add context to IncrementRowsAffected interface

Release note: None

sql: properly synchronize the internal executor iterator

We recently merged an update to the internal executor to make it
streaming. Currently it is implemented by having two goroutines (the
iterator, "the consumer"; and the connExecutor, "the producer"). The
communication between them is done on the buffered channel. As a result,
both goroutines can run concurrently.

However, a crucial point was overlooked - our kv.Txns cannot be used
concurrently. Imagine a plan that reads from two tables each of which is
populated via the internal executor: if we read from the first, and then
from the second concurrently, we will have the concurrent usage of the
txn for that plan.

This commit the problem by carving out a new abstraction to optionally
synchronize the execution of the internal executor with its corresponding
iterator. The abstraction comes in sync and async flavors. In the sync
form, the ieResultChannel ensures that the reader and writer do not
concurrently operate, and, additionally provides a mechanism whereby the
reader may ensure that the writer observes an error when its attempts to
publish the previous row returns. This last point is critical to ensure
that transactions are not erroneously used after it has become unsafe.

The async flavor is still used by the internal executor when it doesn't
return an iterator directly and executes the query to completion itself.

Fixes: #62415.

Release note: None (no stable release with this bug).

cockroach-teamcity · 2021-03-25T04:08:53Z

This change is

yuzefovich · 2021-03-25T04:18:13Z

Andrew, this solution almost works, but it doesn't work well with the vectorized engine where we use the context cancellation to shut everything down. For one of the queries mentioned in the issue we have the following plan:

[email protected]:26257/defaultdb> explain SHOW JOBS WHEN COMPLETE SELECT job_id FROM [SHOW JOBS] WHERE job_type = 'SCHEMA CHANGE GC' AND  description like '%idx1%' ;
                                                                              info
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • render
  │
  └── • hash join (semi)
      │ equality: (job_id) = (job_id)
      │
      ├── • filter
      │   │ filter: CASE finished IS NULL WHEN true THEN CASE pg_sleep(1.0) WHEN true THEN crdb_internal.force_retry('24:00:00') ELSE 0 END ELSE 0 END = 0
      │   │
      │   └── • virtual table
      │         table: jobs@primary
      │
      └── • render
          │
          └── • filter
              │ filter: (((finished IS NULL) OR (finished > '2021-03-24 16:17:27.28017')) AND (job_type = 'SCHEMA CHANGE GC')) AND (description LIKE '%idx1%')
              │
              └── • virtual table
                    table: jobs@primary

When performing hash join on the empty cluster, the right side will be empty, and as a result, we will short-circuit the execution of the whole query without reading anything from the left side of the join. The whole query then shuts down which cancels the context. However, the left side has a connExecutor goroutine running the virtual table, and it will still proceed to fully execute the read, but that read will encounter a context cancellation error at the distSender level. Which makes the whole txn to be in a non-retriable state, and the whole query errors out.

I'm not quite sure what to do here.

removing the optimization in the hash joiner might not be sufficient (possibly we might have some other edge cases elsewhere)
refactoring the vectorized engine to use some other method of shutting everything down will be too invasive at this point
ideally, we wouldn't have to fully drain the data channel in rowsIterator.Close as well as we had a way to signal the connExecutor goroutine to stop producing more rows. I actually have a TODO for this and was looking into addressing it a few weeks ago, but I got distracted. I guess I gotta get back to it.

I'm curious whether you have some insights here.

ajwerner · 2021-03-25T04:28:15Z

Huh, if the right side is empty than the query should just return no rows, right? I don't think I've fully digested what you're saying. When does the context plumbed into the virtual table get canceled?

yuzefovich · 2021-03-25T04:33:42Z

Yes, the query will return no rows, but the way the cleanup of the virtual table on the left currently works is that the left scan will still proceed until the very end because we don't have a way to stop the connExecutor from executing the query to completion. I originally removed some parts of the plan which probably made it confusing, so I put them back in. The table is not empty itself, everything is filtered out.

I'm currently trying to plumb the shutdown of query execution of the left side when the iterator for it is closed. We'll see whether the hack at least works.

yuzefovich · 2021-03-25T04:36:20Z

Sorry, didn't answer your question - the context is canceled right before returning from the main query. The main query is ready to return no rows, but during the shutdown of the vectorized infrastructure, we cancel the context of the vectorized flow; then, after the vectorized infra is cleaned up, we call cleanup on virtualTableNode, there we will call rowsIterator.Close which fully drains the data channel and the corresponding connExecutor goroutine executes the left side till completion. This is where the txn gets into the non-retriable error state.

yuzefovich · 2021-03-25T04:47:32Z

Just added another commit as a prototype, but it didn't work :/ Possibly I screwed something up though.

yuzefovich · 2021-03-25T04:52:24Z

Hm, possibly my understanding of the situation is wrong too because the error (aborted during DistSender.Send: context canceled) occurs when Starting the table reader (presumably from the left side). I would expect it to happen before the corresponding rowsIterator is returned.

Oh, while typing it out I realized that I might see the problem - SetColumns is called before the query execution starts (I think).

yuzefovich · 2021-03-25T04:59:12Z

Yeah, that was it, apologies for panicking too early. At least the couple of queries in the issue now work. I'll polish up the PR tomorrow for review.

However, I'm still concerned about the scenario I typed up above about the way the vectorized flows are shutdown. Possibly we might never actually run into such case, but I don't know all the details of how context cancellation might affect things. One example that comes to my mind is with the changefeeds (some details here).

ajwerner

I'm not concerned about changefeeds and the internal executor. We could add some defensive checks to not execute sinkless changefeeds on an internal executor but I doubt it will happen.

This wasn't too bad

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/conn_executor_internal_test.go, line 305 at r2 (raw file):

		},
		dataCh: dataCh,
		waitCh: waitCh,

just pass a closed waitCh?

pkg/sql/internal.go, line 294 at r2 (raw file):

	}()

	// TODO(yuzefovich): de-duplicate this with the code below.

I believe in your ability to make a closure

pkg/sql/internal.go, line 811 at r2 (raw file):

	// the first object is sent on the data channel.
	first, ok := <-dataCh
	if ok && first.cols != nil {

can you explain this part to me?

yuzefovich

I mentioned changefeeds as an example of when the way of shutting down the vectorized flows (by context cancellation) interferes with some other features. (I didn't mean that the work we're doing here will affect the changefeeds.) In general I'm still concerned about the correctness.

Also, the CI seems to be unhappy, and in particular TestShowJobWhenComplete/show_jobs currently fails easily (often, but not every time). I wonder whether you have some ideas of what we're missing because I don't see what we're doing wrong :/

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/sql/conn_executor_internal_test.go, line 305 at r2 (raw file):

Previously, ajwerner wrote…

just pass a closed waitCh?

Good point, done.

pkg/sql/internal.go, line 294 at r2 (raw file):

Previously, ajwerner wrote…

I believe in your ability to make a closure

Thanks! I wanted to see whether CI was happy :) it wasn't :(

pkg/sql/internal.go, line 811 at r2 (raw file):

Previously, ajwerner wrote…

can you explain this part to me?

This was the last piece that I added to make a couple of queries from the issue happy.

I just added a more explicit comment about what's going on here. Let me know if more info is worth it.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/sql/internal.go, line 733 at r4 (raw file):

				// skipped.
				dataCh <- ieIteratorResult{err: res.Err()}
				<-waitCh

Note that some of these waits are not necessary, but I think it might be a good idea to follow the same pattern (of "send on dataCh, block on waitCh") everywhere.

ajwerner

Also, the CI seems to be unhappy, and in particular TestShowJobWhenComplete/show_jobs

This seems better now, no? Is this RFAL?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

yuzefovich

I think the failures are non-deterministic, but last time I looked TestShowJobWhenComplete/show_jobs (among other things) was failing pretty reliably. I don't see where the problem might be, so any suggestions are welcome. I'll look into debugging this more later today.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

yuzefovich · 2021-03-29T22:46:32Z

Hm, the CI I kicked off this morning succeeded, and now I can no longer repro manually TestShowJobWhenComplete/show_jobs failures I saw last week. I don't believe I changed anything in the code, maybe I was just using the wrong branch or something 🤷 . I'll kick off several runs of CI to make sure that failures disappeared for good.

Anyway, I think it is RFAL then.

ajwerner

Generally I'm concerned about the need for selects and detecting context cancellation on the two channels. Can you convince me it's not a problem?

Reviewed 1 of 4 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @yuzefovich)

pkg/sql/internal.go, line 811 at r4 (raw file):

		// the next piece from the data channel.
		//
		// Note that only only statements of ROWS type should send the cols, but

only only

pkg/sql/internal.go, line 816 at r4 (raw file):

			resultColumns = first.cols
		}
		waitCh <- struct{}{}

does this need to be done in a select? If the context underneath is canceled, then it seems possible that nobody will receive this.

yuzefovich

I believe your concern is totally on point, and I addressed it. I'm feeling optimistic that now we have the correct fix in place.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/sql/internal.go, line 816 at r4 (raw file):

Previously, ajwerner wrote…

does this need to be done in a select? If the context underneath is canceled, then it seems possible that nobody will receive this.

That's a very good point. I think you are totally right, and this is the thing that I missed.

I updated the code so that the reader always listens for context cancellation when operating with the channels. We don't need to do that on the writer (connExecutor) side because the reader will properly drain/close the channels in rowsIterator.Close (Close must be called whenever an iterator is returned by the internal executor).

yuzefovich · 2021-03-30T00:06:29Z

Ok, fixed some comments and a minor issue. PTAL.

ajwerner

I added two more commits which I speculate fix the problems. See what you think.

For what it's worth, I'm not super pleased with them. I'm thinking we should refactor some of this synchronization into an abstraction. Also the context management around here is not particularly principled.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @pbardea)

ajwerner

I cleaned it up. I'm happier with it now.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @pbardea)

yuzefovich · 2021-03-30T19:36:23Z

I was missing the crucial observation you added to the commit message. I think now the requirement of listening on the ctx cancellation on the writer side makes sense.

I squashed the two larger commits into one with some cosmetic modifications. The notable changes are moving the logic of draining dataCh from rowsIterator.Close into asyncIEResultChannel.close (to match the comment on ieResultReader.close) and closing waitCh in syncIEResultChannel.finish.

Thanks a lot for baby-sitting this PR. PTAL.

ajwerner

Reviewed 5 of 9 files at r5, 6 of 6 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

ajwerner

Actually, I feel like we can make this a tad better.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/internal_result_channel.go, line 39 at r6 (raw file):

	// is non-nil. Err will be non-nil if either close has been called or
	// the passed context is finished.
	nextResult(ctx context.Context) (_ ieIteratorResult, done bool, err error)

There's something that still smells a little bit about this interface. The semantics of done and err are a bit off. The semantics of close feel off too. I;m alarmed by the way that close on the async channel consumes everything and propagates errors from the sender while close on the sync channel just effectively no-ops. It's probably fine but I'm having a hard time reasoning about what we want here. Are we trying to wait for the writer or preempt it? It feels like the later as on the happy path we should always see done from nextResult. I'm inclined to make the close behavior much more uniform between the two impls here.

`SetColumns` contract allows for the argument to be nil, yet the iterator of the streaming internal executor expects that column schema, if set, is non-nil. I don't think this could happen in practice, but theoretically previously we could encounter an assertion error due to none of the four fields in `ieIteratorResult` object being non-nil, and now this is fixed. Release note: None

Release note: None

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)

pkg/sql/internal_result_channel.go, line 39 at r6 (raw file):

Previously, ajwerner wrote…

There's something that still smells a little bit about this interface. The semantics of done and err are a bit off. The semantics of close feel off too. I;m alarmed by the way that close on the async channel consumes everything and propagates errors from the sender while close on the sync channel just effectively no-ops. It's probably fine but I'm having a hard time reasoning about what we want here. Are we trying to wait for the writer or preempt it? It feels like the later as on the happy path we should always see done from nextResult. I'm inclined to make the close behavior much more uniform between the two impls here.

Updated the usage of context.Canceled error per our discussion.

ajwerner

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

We recently merged an update to the internal executor to make it streaming. Currently it is implemented by having two goroutines (the iterator, "the consumer"; and the connExecutor, "the producer"). The communication between them is done on the buffered channel. As a result, both goroutines can run concurrently. However, a crucial point was overlooked - our `kv.Txn`s cannot be used concurrently. Imagine a plan that reads from two tables each of which is populated via the internal executor: if we read from the first, and then from the second concurrently, we will have the concurrent usage of the txn for that plan. This commit the problem by carving out a new abstraction to optionally synchronize the execution of the internal executor with its corresponding iterator. The abstraction comes in sync and async flavors. In the sync form, the ieResultChannel ensures that the reader and writer do not concurrently operate, and, additionally provides a mechanism whereby the reader may ensure that the writer observes an error when its attempts to publish the previous row returns. This last point is critical to ensure that transactions are not erroneously used after it has become unsafe. The async flavor is still used by the internal executor when it doesn't return an iterator directly and executes the query to completion itself. Release note: None (no stable release with this bug).

yuzefovich · 2021-03-31T23:35:51Z

Thanks for the help with this PR!

bors r+

craig · 2021-04-01T01:08:33Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Mar 25, 2021

yuzefovich requested a review from ajwerner March 25, 2021 04:08

yuzefovich force-pushed the ie branch from dc336ab to 9d77c08 Compare March 25, 2021 04:55

ajwerner reviewed Mar 25, 2021

View reviewed changes

yuzefovich force-pushed the ie branch from 9d77c08 to a210de6 Compare March 26, 2021 01:02

yuzefovich commented Mar 26, 2021

View reviewed changes

yuzefovich force-pushed the ie branch 2 times, most recently from 8128ee2 to 66d9f8d Compare March 26, 2021 01:16

yuzefovich commented Mar 26, 2021

View reviewed changes

ajwerner reviewed Mar 29, 2021

View reviewed changes

yuzefovich commented Mar 29, 2021

View reviewed changes

yuzefovich force-pushed the ie branch from 66d9f8d to 14b8794 Compare March 29, 2021 22:41

yuzefovich added backport-21.1.x and removed do-not-merge bors won't merge a PR with this label. labels Mar 29, 2021

ajwerner reviewed Mar 29, 2021

View reviewed changes

yuzefovich force-pushed the ie branch 2 times, most recently from 84f388f to f69b85b Compare March 29, 2021 23:54

yuzefovich commented Mar 29, 2021

View reviewed changes

yuzefovich force-pushed the ie branch 2 times, most recently from c497c1a to 60ef3ee Compare March 30, 2021 00:05

yuzefovich requested review from pbardea and removed request for a team March 30, 2021 04:11

ajwerner reviewed Mar 30, 2021

View reviewed changes

ajwerner force-pushed the ie branch from 8e145b1 to 3b4bd90 Compare March 30, 2021 06:38

ajwerner reviewed Mar 30, 2021

View reviewed changes

ajwerner force-pushed the ie branch from 3b4bd90 to 744acc2 Compare March 30, 2021 06:47

yuzefovich removed the request for review from pbardea March 30, 2021 16:56

ajwerner force-pushed the ie branch from 744acc2 to f337175 Compare March 30, 2021 16:58

yuzefovich force-pushed the ie branch from f337175 to e6c0460 Compare March 30, 2021 19:16

yuzefovich changed the title ~~sql: remove concurrency from the internal executor~~ sql: properly synchronize the internal executor iterator Mar 30, 2021

yuzefovich force-pushed the ie branch from e6c0460 to 27201c8 Compare March 30, 2021 19:34

yuzefovich force-pushed the ie branch from 27201c8 to 655ce63 Compare March 30, 2021 19:39

ajwerner approved these changes Mar 31, 2021

View reviewed changes

ajwerner reviewed Mar 31, 2021

View reviewed changes

yuzefovich and others added 2 commits March 31, 2021 14:18

sql: add context to IncrementRowsAffected interface

fb4ce00

Release note: None

yuzefovich force-pushed the ie branch from 655ce63 to da50037 Compare March 31, 2021 21:28

yuzefovich commented Mar 31, 2021

View reviewed changes

ajwerner reviewed Mar 31, 2021

View reviewed changes

yuzefovich force-pushed the ie branch from da50037 to ea2deb7 Compare March 31, 2021 22:08

craig bot merged commit 565765f into cockroachdb:master Apr 1, 2021

yuzefovich deleted the ie branch April 1, 2021 01:11

yuzefovich mentioned this pull request Apr 1, 2021

release-21.1: sql: properly synchronize the internal executor iterator #62923

Merged

rafiss mentioned this pull request Apr 1, 2021

sql: TestGetUserHashedPasswordTimeout failed #62948

Closed

rafiss added this to the 21.1 milestone Apr 22, 2021

yuzefovich mentioned this pull request Apr 28, 2021

sql: improve the performance of the internal executor iterator API to short-circuit execution on Close #60306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: properly synchronize the internal executor iterator #62581

sql: properly synchronize the internal executor iterator #62581

yuzefovich commented Mar 25, 2021 •

edited

Loading

cockroach-teamcity commented Mar 25, 2021

yuzefovich commented Mar 25, 2021 •

edited

Loading

ajwerner commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

ajwerner left a comment

yuzefovich left a comment

yuzefovich left a comment

ajwerner left a comment

yuzefovich left a comment

yuzefovich commented Mar 29, 2021 •

edited

Loading

ajwerner left a comment

yuzefovich left a comment

yuzefovich commented Mar 30, 2021

ajwerner left a comment

ajwerner left a comment

yuzefovich commented Mar 30, 2021

ajwerner left a comment

ajwerner left a comment

yuzefovich left a comment

ajwerner left a comment

yuzefovich commented Mar 31, 2021

craig bot commented Apr 1, 2021

sql: properly synchronize the internal executor iterator #62581

sql: properly synchronize the internal executor iterator #62581

Conversation

yuzefovich commented Mar 25, 2021 • edited Loading

cockroach-teamcity commented Mar 25, 2021

yuzefovich commented Mar 25, 2021 • edited Loading

ajwerner commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

yuzefovich commented Mar 25, 2021

ajwerner left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Mar 29, 2021 • edited Loading

ajwerner left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Mar 30, 2021

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

yuzefovich commented Mar 30, 2021

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

yuzefovich commented Mar 31, 2021

craig bot commented Apr 1, 2021

yuzefovich commented Mar 25, 2021 •

edited

Loading

yuzefovich commented Mar 25, 2021 •

edited

Loading

yuzefovich commented Mar 29, 2021 •

edited

Loading