Skip to content

Commit

Permalink
v20.2 vectorized updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Eric Harmeling committed Aug 19, 2020
1 parent 5fbd0d8 commit aa42a11
Show file tree
Hide file tree
Showing 11 changed files with 37 additions and 80 deletions.
3 changes: 0 additions & 3 deletions _includes/v20.2/sql/vectorized-support.md

This file was deleted.

2 changes: 0 additions & 2 deletions v20.2/array.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ The `ARRAY` data type is useful for ensuring compatibility with ORMs and other t
CockroachDB does not support nested arrays.
{{site.data.alerts.end}}

{% include {{page.version.version}}/sql/vectorized-support.md %}

## Syntax

A value of data type `ARRAY` can be expressed in the following ways:
Expand Down
2 changes: 0 additions & 2 deletions v20.2/bit.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ toc: true
The `BIT` and `VARBIT` [data types](data-types.html) stores bit arrays.
With `BIT`, the length is fixed; with `VARBIT`, the length can be variable.

{% include {{page.version.version}}/sql/vectorized-support.md %}

## Aliases

The name `BIT VARYING` is an alias for `VARBIT`.
Expand Down
2 changes: 0 additions & 2 deletions v20.2/collate.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ The `COLLATE` feature lets you sort [`STRING`](string.html) values according to

Collated strings are important because different languages have [different rules for alphabetic order](https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions), especially with respect to accented letters. For example, in German accented letters are sorted with their unaccented counterparts, while in Swedish they are placed at the end of the alphabet. A collation is a set of rules used for ordering and usually corresponds to a language, though some languages have multiple collations with different rules for sorting; for example Portuguese has separate collations for Brazilian and European dialects (`pt-BR` and `pt-PT` respectively).

{% include {{page.version.version}}/sql/vectorized-support.md %}

## Details

- Operations on collated strings cannot involve strings with a different collation or strings with no collation. However, it is possible to <a href="#ad-hoc-collation-casting">add or overwrite a collation on the fly</a>.
Expand Down
38 changes: 19 additions & 19 deletions v20.2/data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,25 @@ toc: true

CockroachDB supports the following data types. Click a type for more details.

Type | Description | Example | [Vectorized Execution](vectorized-execution.html)
-----|-------------|---------|----------
[`ARRAY`](array.html) | A 1-dimensional, 1-indexed, homogeneous array of any non-array data type. | `{"sky","road","car"}` | Not supported
[`BIT`](bit.html) | A string of binary digits (bits). | `B'10010101'` | Not supported
[`BOOL`](bool.html) | A Boolean value. | `true` | Supported
[`BYTES`](bytes.html) | A string of binary characters. | `b'\141\061\142\062\143\063'` | Supported
[`COLLATE`](collate.html) | The `COLLATE` feature lets you sort [`STRING`](string.html) values according to language- and country-specific rules, known as collations. | `'a1b2c3' COLLATE en` | Not supported
[`DATE`](date.html) | A date. | `DATE '2016-01-25'` | Supported
[`DECIMAL`](decimal.html) | An exact, fixed-point number. | `1.2345` | Supported
[`FLOAT`](float.html) | A 64-bit, inexact, floating-point number. | `1.2345` | Supported
[`INET`](inet.html) | An IPv4 or IPv6 address. | `192.168.0.1` | Not supported
[`INT`](int.html) | A signed integer, up to 64 bits. | `12345` | Supported
[`INTERVAL`](interval.html) | A span of time. | `INTERVAL '2h30m30s'` | Supported
[`JSONB`](jsonb.html) | JSON (JavaScript Object Notation) data. | `'{"first_name": "Lola", "last_name": "Dog", "location": "NYC", "online" : true, "friends" : 547}'` | Not supported
[`SERIAL`](serial.html) | A pseudo-type that combines an [integer type](int.html) with a [`DEFAULT` expression](default-value.html). | `148591304110702593` | Not supported
[`STRING`](string.html) | A string of Unicode characters. | `'a1b2c3'` | Supported
[`TIME`<br>`TIMETZ`](time.html) | `TIME` stores a time of day in UTC.<br> `TIMETZ` converts `TIME` values with a specified time zone offset from UTC. | `TIME '01:23:45.123456'`<br> `TIMETZ '01:23:45.123456-5:00'` | Not supported
[`TIMESTAMP`<br>`TIMESTAMPTZ`](timestamp.html) | `TIMESTAMP` stores a date and time pairing in UTC.<br>`TIMESTAMPTZ` converts `TIMESTAMP` values with a specified time zone offset from UTC. | `TIMESTAMP '2016-01-25 10:10:10'`<br>`TIMESTAMPTZ '2016-01-25 10:10:10-05:00'` | Supported
[`UUID`](uuid.html) | A 128-bit hexadecimal value. | `7f9c24e8-3b12-4fef-91e0-56a2d5a246ec` | Supported
Type | Description | Example
-----|-------------|---------
[`ARRAY`](array.html) | A 1-dimensional, 1-indexed, homogeneous array of any non-array data type. | `{"sky","road","car"}`
[`BIT`](bit.html) | A string of binary digits (bits). | `B'10010101'`
[`BOOL`](bool.html) | A Boolean value. | `true`
[`BYTES`](bytes.html) | A string of binary characters. | `b'\141\061\142\062\143\063'`
[`COLLATE`](collate.html) | The `COLLATE` feature lets you sort [`STRING`](string.html) values according to language- and country-specific rules, known as collations. | `'a1b2c3' COLLATE en`
[`DATE`](date.html) | A date. | `DATE '2016-01-25'`
[`DECIMAL`](decimal.html) | An exact, fixed-point number. | `1.2345`
[`FLOAT`](float.html) | A 64-bit, inexact, floating-point number. | `1.2345`
[`INET`](inet.html) | An IPv4 or IPv6 address. | `192.168.0.1`
[`INT`](int.html) | A signed integer, up to 64 bits. | `12345`
[`INTERVAL`](interval.html) | A span of time. | `INTERVAL '2h30m30s'`
[`JSONB`](jsonb.html) | JSON (JavaScript Object Notation) data. | `'{"first_name": "Lola", "last_name": "Dog", "location": "NYC", "online" : true, "friends" : 547}'`
[`SERIAL`](serial.html) | A pseudo-type that combines an [integer type](int.html) with a [`DEFAULT` expression](default-value.html). | `148591304110702593`
[`STRING`](string.html) | A string of Unicode characters. | `'a1b2c3'`
[`TIME`<br>`TIMETZ`](time.html) | `TIME` stores a time of day in UTC.<br> `TIMETZ` converts `TIME` values with a specified time zone offset from UTC. | `TIME '01:23:45.123456'`<br> `TIMETZ '01:23:45.123456-5:00'`
[`TIMESTAMP`<br>`TIMESTAMPTZ`](timestamp.html) | `TIMESTAMP` stores a date and time pairing in UTC.<br>`TIMESTAMPTZ` converts `TIMESTAMP` values with a specified time zone offset from UTC. | `TIMESTAMP '2016-01-25 10:10:10'`<br>`TIMESTAMPTZ '2016-01-25 10:10:10-05:00'`
[`UUID`](uuid.html) | A 128-bit hexadecimal value. | `7f9c24e8-3b12-4fef-91e0-56a2d5a246ec`

## Data type conversions and casts

Expand Down
4 changes: 2 additions & 2 deletions v20.2/explain.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Using `EXPLAIN`'s output, you can optimize your queries by taking the following

- Avoid scanning an entire table, which is the slowest way to access data. You can avoid this by [creating indexes](indexes.html) that contain at least one of the columns that the query is filtering in its `WHERE` clause.

- By default, the [vectorized execution](vectorized-execution.html) engine is enabled for all [supported operations](vectorized-execution.html#disk-spilling-operations) and [data types](vectorized-execution.html#supported-data-types). If you are querying a table with a small number of rows, it might be more efficient to use row-oriented execution. The `vectorize_row_count_threshold` [cluster setting](cluster-settings.html) specifies the minimum number of rows required to use the vectorized engine to execute a query plan.
- By default, the [vectorized execution](vectorized-execution.html) engine is enabled for all [supported operations](vectorized-execution.html#disk-spilling-operations). If you are querying a table with a small number of rows, it might be more efficient to use row-oriented execution. The `vectorize_row_count_threshold` [cluster setting](cluster-settings.html) specifies the minimum number of rows required to use the vectorized engine to execute a query plan.

You can find out if your queries are performing entire table scans by using `EXPLAIN` to see which:

Expand All @@ -43,7 +43,7 @@ The user requires the appropriate [privileges](authorization.html#assign-privile
`VERBOSE` | Show as much information as possible about the query plan.
`TYPES` | Include the intermediate [data types](data-types.html) CockroachDB chooses to evaluate intermediate SQL expressions.
`OPT` | Display the query plan tree generated by the [cost-based optimizer](cost-based-optimizer.html).<br/><br/>To include cost details used by the optimizer in planning the query, use `OPT, VERBOSE`. To include cost and type details, use `OPT, TYPES`. To include all details used by the optimizer, including statistics, use `OPT, ENV`.
`VEC` | Show detailed information about the [vectorized execution](vectorized-execution.html) plan for a query. If the table queried includes [unsupported data types](vectorized-execution.html#supported-data-types), an unhandled data type error is returned.
`VEC` | Show detailed information about the [vectorized execution](vectorized-execution.html) plan for a query.
`preparable_stmt` | The [statement](sql-grammar.html#preparable_stmt) you want details about. All preparable statements are explainable.
`DISTSQL` | Generate a URL to a [distributed SQL physical query plan tree](explain-analyze.html#distsql-plan-viewer).<br><br>{% include {{ page.version.version }}/sql/physical-plan-url.md %}

Expand Down
2 changes: 0 additions & 2 deletions v20.2/inet.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ toc: true
---
The `INET` [data type](data-types.html) stores an IPv4 or IPv6 address.

{% include {{page.version.version}}/sql/vectorized-support.md %}

## Syntax

A constant value of type `INET` can be expressed using an
Expand Down
2 changes: 0 additions & 2 deletions v20.2/jsonb.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ The `JSONB` [data type](data-types.html) stores JSON (JavaScript Object Notation

{{site.data.alerts.callout_success}}For a hands-on demonstration of storing and querying JSON data from a third-party API, see the <a href="demo-json-support.html">JSON tutorial</a>.{{site.data.alerts.end}}

{% include {{page.version.version}}/sql/vectorized-support.md %}

## Alias

In CockroachDB, `JSON` is an alias for `JSONB`.
Expand Down
2 changes: 0 additions & 2 deletions v20.2/serial.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ In most cases, we recommend using the [`UUID`](uuid.html) data type with the `ge
See [this FAQ entry](sql-faqs.html#how-do-i-auto-generate-unique-row-ids-in-cockroachdb) for more details.
{{site.data.alerts.end}}

{% include {{page.version.version}}/sql/vectorized-support.md %}

## Modes of operation

The keyword `SERIAL` is recognized in `CREATE TABLE` and is
Expand Down
2 changes: 0 additions & 2 deletions v20.2/time.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ The `TIME` [data type](data-types.html) stores the time of day in UTC.

The `TIMETZ` data type stores a time of day with a time zone offset from UTC.

{% include {{page.version.version}}/sql/vectorized-support.md %}

## Variants

`TIME` has two variants:
Expand Down
58 changes: 16 additions & 42 deletions v20.2/vectorized-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,24 @@ summary: The CockroachDB vectorized SQL query execution engine processes query p
toc: true
---

CockroachDB supports [column-oriented](https://en.wikipedia.org/wiki/Column-oriented_DBMS#Column-oriented_systems) ("vectorized") query execution.
CockroachDB supports [column-oriented](https://en.wikipedia.org/wiki/Column-oriented_DBMS#Column-oriented_systems) ("vectorized") query execution on all [CockroachDB data types](data-types.html).

Many SQL databases execute [query plans](https://en.wikipedia.org/wiki/Query_plan) one row of table data at a time. Row-oriented execution models can offer good performance for [online transaction processing (OLTP)](https://en.wikipedia.org/wiki/Online_transaction_processing) queries, but suboptimal performance for [online analytical processing (OLAP)](https://en.wikipedia.org/wiki/Online_analytical_processing) queries. The CockroachDB vectorized execution engine dramatically improves performance over [row-oriented execution](https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems) by processing each component of a query plan on type-specific batches of column data.

{{site.data.alerts.callout_info}}
CockroachDB does not support vectorized execution for all data types. For details, see [supported data types](#supported-data-types).
{{site.data.alerts.end}}

## Configuring vectorized execution

By default, vectorized execution is enabled in CockroachDB for [all queries that are guaranteed to execute in memory](#disk-spilling-operations), on tables with [supported data types](#supported-data-types).
<span class="version-tag">New in v20.2:</span> By default, vectorized execution is enabled in CockroachDB.

You can turn vectorized execution on or off for all queries in the current session with the `vectorize` [session variable](set-vars.html). The following options are supported:
You can configure vectorized execution with the `vectorize` [session variable](set-vars.html). The following options are supported:

Option | Description
----------|------------
`auto` | Instructs CockroachDB to use the vectorized execution engine on most queries that execute in memory, without the need to [spill intermediate results to disk](#disk-spilling-operations).<br><br>**Default:** `vectorize=auto`
`on` | Turns on vectorized execution for all supported queries.
`on` | Turns on vectorized execution for all queries on rows under the [`vectorize_row_count_threshold`](#setting-the-row-threshold-for-vectorized-execution) (1000 rows, by default).<br><br>**Default:** `vectorize=on`
`201auto` | Follows the [vectorized execution behavior of CockroachDB v20.1](../v20.1/vectorized-execution.html), instructing CockroachDB to use the vectorized execution engine on most queries that execute in memory, on [data types supported by the vectorized engine in CockroachDB v20.1](../v20.1/data-types.html), without the need to [spill intermediate results to disk](../v20.1/vectorized-execution.html#disk-spilling-operations).
`off` | Turns off vectorized execution for all queries.

For information about setting session variables, see [`SET` &lt;session variable&gt;](set-vars.html).

{{site.data.alerts.callout_success}}
CockroachDB supports vectorized execution on columns with [supported data types](#supported-data-types) only. Setting the `vectorize` session variable to `on` does not turn vectorized execution on for queries on columns with unsupported data types.
{{site.data.alerts.end}}

{{site.data.alerts.callout_success}}
To see if CockroachDB will use the vectorized execution engine for a query, run a simple [`EXPLAIN`](explain.html) statement on the query. If `vectorize` is `true`, the query will be executed with the vectorized engine. If it is `false`, the row-oriented execution engine is used instead.
{{site.data.alerts.end}}
Expand All @@ -40,7 +32,7 @@ The efficiency of vectorized execution increases with the number of rows process

By default, vectorized execution is enabled for queries on tables of 1000 rows or more. If the number of rows in a table falls below 1000, CockroachDB uses the row-oriented execution engine instead.

For performance tuning, you can change the minimum number of rows required to use the vectorized engine to execute a query plan in the current session with the `vectorize_row_count_threshold` [session variable](set-vars.html). This variable is ignored if `vectorize=on`.
For performance tuning, you can change the minimum number of rows required to use the vectorized engine to execute a query plan in the current session with the `vectorize_row_count_threshold` [session variable](set-vars.html).

## How vectorized execution works

Expand All @@ -56,48 +48,30 @@ For detailed examples of vectorized query execution for hash and merge joins, se

## Disk-spilling operations

By default, vectorized execution is disabled for the following memory-intensive operations:
The following operations require [memory buffering](https://en.wikipedia.org/wiki/Data_buffer) during execution:

- Global [sorts](query-order.html)
- [Window functions](window-functions.html)
- [Unordered aggregations](query-order.html#processing-order-during-aggregations)
- [Hash joins](joins.html#hash-joins)
- [Merge joins](joins.html#merge-joins) on non-unique columns. Merge joins on columns that are guaranteed to have one row per value, also known as "key columns", can execute entirely in-memory.

To turn vectorized execution on for these operations, set the `vectorize` [session variable](set-vars.html) to `on`.

These operations require [memory buffering](https://en.wikipedia.org/wiki/Data_buffer) during execution. If there is not enough memory allocated for an operation, CockroachDB will spill the intermediate execution results to disk. By default, the memory limit allocated per operator is 64MiB. You can change this limit with the `sql.distsql.temp_storage.workmem` [cluster setting](cluster-settings.html).
If there is not enough memory allocated for an operation, CockroachDB will spill the intermediate execution results to disk. By default, the memory limit allocated per operator is 64MiB. You can change this limit with the `sql.distsql.temp_storage.workmem` [cluster setting](cluster-settings.html).

You can also configure a node's total budget for in-memory query processing at node startup with the [`--max-sql-memory` flag](cockroach-start.html#general). If the queries running on the node exceed the memory budget, the node spills intermediate execution results to disk. The [`--max-disk-temp-storage` flag](cockroach-start.html#general) sets the maximum on-disk storage capacity. If the maximum on-disk storage capacity is reached, the query will return an error during execution.

## Supported data types

Vectorized execution is supported for the following [data types](data-types.html) and their aliases:

- [`BOOL`](bool.html)
- [`BYTES`](bytes.html)
- [`DATE`](date.html)
- [`DECIMAL`](decimal.html)
- [`FLOAT`](float.html)
- [`INT`](int.html)
- [`INTERVAL`](interval.html)
- [`STRING`](string.html)
- [`TIMESTAMP`/`TIMESTAMPTZ`](timestamp.html)
- [`UUID`](uuid.html)

{{site.data.alerts.callout_info}}
CockroachDB uses the vectorized engine to execute queries on columns with supported data types, even if a column's parent table includes unused columns with unsupported data types.
{{site.data.alerts.end}}

## Known limitations

### Queries with constant `NULL` arguments
### Unsupported queries

The vectorized engine does not support queries containing:

The vectorized execution engine does not support queries that contain constant `NULL` arguments, with the exception of the `IS` projection operators `IS NULL` and `IS NOT NULL`.
- [Window functions](window-functions.html). See [tracking issue](https://github.com/cockroachdb/cockroach/issues/37040).
- A join filtered with an [`ON` expression](joins.html#supported-join-conditions). See [tracking issue](https://github.com/cockroachdb/cockroach/issues/38018).
- Any query containing constant `NULL` arguments, with the exception of the `IS` projection operators `IS NULL` and `IS NOT NULL`. For example, `SELECT x IS NOT NULL FROM t` is supported, but `SELECT x + NULL FROM t` returns an `unable to vectorize execution plan` error. See [tracking issue](https://github.com/cockroachdb/cockroach/issues/41001).

For example, `SELECT x IS NOT NULL FROM t` is supported, but `SELECT x + NULL FROM t` returns an `unable to vectorize execution plan` error.
### Spatial features

For more information, see the [tracking issue](https://github.com/cockroachdb/cockroach/issues/41001).
The vectorized engine does not support [working with spatial data](spatial-data.html). Queries with [geospatial functions](functions-and-operators.html#geospatial-functions) or [spatial data](spatial-data.html) will revert to the row-oriented engine.

## See also

Expand Down

0 comments on commit aa42a11

Please sign in to comment.