Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v20.2 vectorized updates #8009

Merged
merged 1 commit into from
Aug 31, 2020
Merged

v20.2 vectorized updates #8009

merged 1 commit into from
Aug 31, 2020

Conversation

ericharmeling
Copy link
Contributor

@ericharmeling ericharmeling commented Aug 17, 2020

Fixes #7845.
Fixes #7843.
Fixes #7844.
Fixes #7629.
Fixes #7618.
Fixes #7635.
Fixes #7842.
Fixes #7743.
Fixes #7628.
Fixes #8125.
Fixes #8141.
Fixes #8165.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)


v20.2/vectorized-execution.md, line 19 at r1 (raw file):

Option    | Description
----------|------------
`on`   | Turns on vectorized execution for all queries on rows under the [`vectorize_row_count_threshold`](#setting-the-row-threshold-for-vectorized-execution) (1000 rows, by default).<br><br>**Default:** `vectorize=on`

s/rows under/rows over/.


v20.2/vectorized-execution.md, line 62 at r1 (raw file):

- Global [sorts](query-order.html)
- [Window functions](window-functions.html)

I'd keep "window functions" here because they do require memory buffering (although we implement only several window functions in the vectorized engine, for most cases we will use old row-by-row processor that is embedded into the vectorized plan, but it'll still require buffering in memory). Also, I'd probably make the mention of "window functions" as the very last (I think it has lower priority than aggregations and joins).


v20.2/vectorized-execution.md, line 68 at r1 (raw file):

The vectorized engine does not support queries containing:

- [Window functions](window-functions.html). See [tracking issue](https://github.com/cockroachdb/cockroach/issues/37040).

I'd call the support of window functions "limited" or "unoptimized" because if a query has a window function that we don't support but also other operations that we do support in the vectorized engine, we will still plan a vectorized flow with the caveat that the window function will be handled in the unoptimized row-by-row fashion, but other operations will be handled efficiently.

@ericharmeling ericharmeling changed the title [WIP] v20.2 vectorized updates v20.2 vectorized updates Aug 25, 2020
Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR @yuzefovich !

Adding @asubiotto for another pass.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


v20.2/vectorized-execution.md, line 19 at r1 (raw file):

Previously, yuzefovich wrote…

s/rows under/rows over/.

Done.


v20.2/vectorized-execution.md, line 62 at r1 (raw file):

Previously, yuzefovich wrote…

I'd keep "window functions" here because they do require memory buffering (although we implement only several window functions in the vectorized engine, for most cases we will use old row-by-row processor that is embedded into the vectorized plan, but it'll still require buffering in memory). Also, I'd probably make the mention of "window functions" as the very last (I think it has lower priority than aggregations and joins).

Done.


v20.2/vectorized-execution.md, line 68 at r1 (raw file):

Previously, yuzefovich wrote…

I'd call the support of window functions "limited" or "unoptimized" because if a query has a window function that we don't support but also other operations that we do support in the vectorized engine, we will still plan a vectorized flow with the caveat that the window function will be handled in the unoptimized row-by-row fashion, but other operations will be handled efficiently.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but I agree, let's get Alfonso's input as well.

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ericharmeling)


v20.2/vectorized-execution.md, line 20 at r2 (raw file):

----------|------------
`on`   | Turns on vectorized execution for all queries on rows over the [`vectorize_row_count_threshold`](#setting-the-row-threshold-for-vectorized-execution) (1000 rows, by default).<br><br>**Default:** `vectorize=on`
`201auto` | Follows the [vectorized execution behavior of CockroachDB v20.1](../v20.1/vectorized-execution.html), instructing CockroachDB to use the vectorized execution engine on most queries that execute in memory, on [data types supported by the vectorized engine in CockroachDB v20.1](../v20.1/data-types.html), without the need to [spill intermediate results to disk](../v20.1/vectorized-execution.html#disk-spilling-operations).

This description is a bit unclear to me. I would change it to say something along the lines of "queries that use a constant amount of memory" and maybe remove the disk spilling mention


v20.2/vectorized-execution.md, line 70 at r2 (raw file):

- A join filtered with an [`ON` expression](joins.html#supported-join-conditions). See [tracking issue](https://github.com/cockroachdb/cockroach/issues/38018).
- Any query containing constant `NULL` arguments, with the exception of the `IS` projection operators `IS NULL` and `IS NOT NULL`. For example, `SELECT x IS NOT NULL FROM t` is supported, but `SELECT x + NULL FROM t` returns an `unable to vectorize execution plan` error. See [tracking issue](https://github.com/cockroachdb/cockroach/issues/41001).

I think this NULL business is not true anymore because we now wrap the post-processing stages (also tried it out to confirm). cc @yuzefovich

Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR @asubiotto.

I'm going to pass off to writer review now and then merge. We can always update docs ahead of/after the release if anything needs changing.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


v20.2/vectorized-execution.md, line 20 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

This description is a bit unclear to me. I would change it to say something along the lines of "queries that use a constant amount of memory" and maybe remove the disk spilling mention

Done.


v20.2/vectorized-execution.md, line 70 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think this NULL business is not true anymore because we now wrap the post-processing stages (also tried it out to confirm). cc @yuzefovich

Removed this bullet.

@ericharmeling ericharmeling requested a review from lnhsingh August 31, 2020 13:52
Copy link
Contributor

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @lnhsingh, and @yuzefovich)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @yuzefovich)


v20.2/vectorized-execution.md, line 70 at r2 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Removed this bullet.

The boundary what is "unsupported" is blurred because of wrapping - yes, queries with constant NULL arguments will have a wrapped row-by-row processor into the vectorized flow, so I agree with Alfonso, it's a good idea to remove this bullet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment