colexec: optimize all types #42043

yuzefovich · 2019-10-30T20:09:12Z

glennfawcett · 2019-11-20T01:12:23Z

I have a customer that has a table with ARRAY and JSONB as columns. These columns are NOT used as part of the aggregate query or retrieved, but regardless this query doesn't benefit from vectorized execution. It would be nice to allow vectorization for this case.

awoods187 · 2019-11-20T01:28:15Z

@glennfawcett could you provide an example query? i wonder if we decode the query into an unsupported type. This seems similar to what @jseldess was reporting. You can also try casting the type to a supported type.

bladefist · 2019-12-23T22:50:56Z

We're unable to use vectorization across the board due to missing timestampz. Can this be prioritized in anyway? It should unlock a lot of performance gains for us. I don't think we're missing any of the others.

43514: colexec: support TIMESTAMPTZ type r=yuzefovich a=yuzefovich **colexec: support TIMESTAMPTZ type** This commit adds the support for TimestampTZ data type which is represented in the same way as Timestamp (as 'time.Time'). We already had everything in place, so only the type-conversion was needed. Addresses: #42043. Release note (sql change): vectorized engine now supports TIMESTAMPTZ data type. **sqlsmith: add several types to vecSeedTable** This commit adds previously supported INT2 and INT4 types to vecSeedTable as well as newly supported TIMESTAMPTZ. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

awoods187 · 2020-01-02T20:44:17Z

We have merged timestamptz into 20.1 but it is a bit risky to backport it to 19.2. Does everyone one of your tables use it? For all queries?

bladefist · 2020-01-02T20:54:00Z

@awoods187 Yes, essentially our primary table for a user profile uses it so all queries join to that. We can wait a little bit if 20.1 is coming soon? thank you!

43517: colexec, coldata: add support for INTERVAL type r=yuzefovich a=yuzefovich **pgerror: clean up build deps** The pgerror (and pgcode) packages are (perhaps inadvisably) used in low-level utility packages. They had some pretty heavyweight build deps, but this wasn't fundamentally necessary. Clean it up a bit and make these packages more lightweight. Release note: None **colexec, coldata: add support for INTERVAL type** This commit adds the support for INTERVAL type that is represented by duration.Duration. Only comparison projections are currently supported. The serialization is also missing. Addresses: #42043. Release note: None Co-authored-by: Daniel Harrison <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

63770: colexec: add builtin json datatype r=jordanlewis a=jordanlewis This commit adds a builtin json datatype to the colexec package. It's implemented using the Bytes data structure, and lazily deserializes JSON objects for processing. There's an inefficiency here, which is that forming a JSON object costs an allocation. A future commit can make a cheaper "lazy JSON" object that doesn't cache or require up-front allocations. Addresses: #42043. Fixes: #49470. Fixes: #49472. Release note (performance improvement): improve the speed of JSON in the vectorized execution engine Co-authored-by: Jordan Lewis <[email protected]>

93400: coldata: add native support of enums r=yuzefovich a=yuzefovich This commit adds the native support of enum types to the vectorized engine. We store them via their physical representation, so we can easily reuse `Bytes` vector for almost all operations, and, thus, we just mark the enum family as having the bytes family as its canonical representation. There are only a handful of places where we need to go from the physical representation to either the logical one or to the `DEnum`: - when constructing the pgwire message to the client (in both text and binary format the logical representation is used) - when converting from columnar to row-by-row format (fully-fledged `DEnum` is constructed) - casts. In all of these places we already have access to the precise typing information (similar to what we have for UUIDs which are supported via the bytes canonical type family already). I can really see only one downside to such implementation - in some places the resolution based on the canonical (rather than actual) type family might be too coarse. For example, we have `<bytes> || <bytes>` binary operator (`concat`). As it currently stands the execution will proceed to perform the concatenation between two UUIDs or between a BYTES value and a UUID, and now we'll be adding enums into the mix. However, the type checking is performed earlier on the query execution path, so I think it is acceptable since the execution should never reach such a setup. An additional benefit of this work is that we'll be able to support the KV projection pushdown in presence of enums - on the KV server side we'll just operate with the physical representations and won't need to have access to the hydrated type whereas on the client side we'll have the hydrated type, so we'll be able to do all operations. Addresses: #42043. Informs: #92954. Epic: CRDB-14837 Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

yuzefovich added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 30, 2019

asubiotto added A-sql-vec SQL vectorized engine meta-issue Contains a list of several other issues. labels Oct 30, 2019

This was referenced Dec 24, 2019

colexec: support TIMESTAMPTZ type #43514

Merged

colexec, coldata: add support for INTERVAL type #43517

Merged

colexec: support all aggregate functions #43561

Open

yuzefovich changed the title ~~colexec: add unsupported types~~ colexec: optimize all types Jun 15, 2020

yuzefovich mentioned this issue Apr 20, 2021

colexec: add builtin json datatype #63770

Merged

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

yuzefovich mentioned this issue Dec 11, 2022

coldata: add native support of enums #93400

Merged

yuzefovich mentioned this issue May 6, 2023

colexec: add native INet support #102841

Draft

yuzefovich added this to SQL Queries May 2, 2024

yuzefovich moved this to Backlog in SQL Queries May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: optimize all types #42043

colexec: optimize all types #42043

yuzefovich commented Oct 30, 2019 •

edited

Loading

glennfawcett commented Nov 20, 2019

awoods187 commented Nov 20, 2019

bladefist commented Dec 23, 2019

awoods187 commented Jan 2, 2020

bladefist commented Jan 2, 2020

colexec: optimize all types #42043

colexec: optimize all types #42043

Comments

yuzefovich commented Oct 30, 2019 • edited Loading

glennfawcett commented Nov 20, 2019

awoods187 commented Nov 20, 2019

bladefist commented Dec 23, 2019

awoods187 commented Jan 2, 2020

bladefist commented Jan 2, 2020

yuzefovich commented Oct 30, 2019 •

edited

Loading