-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: optimize all types #42043
Comments
I have a customer that has a table with ARRAY and JSONB as columns. These columns are NOT used as part of the aggregate query or retrieved, but regardless this query doesn't benefit from vectorized execution. It would be nice to allow vectorization for this case. |
@glennfawcett could you provide an example query? i wonder if we decode the query into an unsupported type. This seems similar to what @jseldess was reporting. You can also try casting the type to a supported type. |
We're unable to use vectorization across the board due to missing timestampz. Can this be prioritized in anyway? It should unlock a lot of performance gains for us. I don't think we're missing any of the others. |
43514: colexec: support TIMESTAMPTZ type r=yuzefovich a=yuzefovich **colexec: support TIMESTAMPTZ type** This commit adds the support for TimestampTZ data type which is represented in the same way as Timestamp (as 'time.Time'). We already had everything in place, so only the type-conversion was needed. Addresses: #42043. Release note (sql change): vectorized engine now supports TIMESTAMPTZ data type. **sqlsmith: add several types to vecSeedTable** This commit adds previously supported INT2 and INT4 types to vecSeedTable as well as newly supported TIMESTAMPTZ. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
We have merged timestamptz into 20.1 but it is a bit risky to backport it to 19.2. Does everyone one of your tables use it? For all queries? |
@awoods187 Yes, essentially our primary table for a user profile uses it so all queries join to that. We can wait a little bit if 20.1 is coming soon? thank you! |
43517: colexec, coldata: add support for INTERVAL type r=yuzefovich a=yuzefovich **pgerror: clean up build deps** The pgerror (and pgcode) packages are (perhaps inadvisably) used in low-level utility packages. They had some pretty heavyweight build deps, but this wasn't fundamentally necessary. Clean it up a bit and make these packages more lightweight. Release note: None **colexec, coldata: add support for INTERVAL type** This commit adds the support for INTERVAL type that is represented by duration.Duration. Only comparison projections are currently supported. The serialization is also missing. Addresses: #42043. Release note: None Co-authored-by: Daniel Harrison <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
63770: colexec: add builtin json datatype r=jordanlewis a=jordanlewis This commit adds a builtin json datatype to the colexec package. It's implemented using the Bytes data structure, and lazily deserializes JSON objects for processing. There's an inefficiency here, which is that forming a JSON object costs an allocation. A future commit can make a cheaper "lazy JSON" object that doesn't cache or require up-front allocations. Addresses: #42043. Fixes: #49470. Fixes: #49472. Release note (performance improvement): improve the speed of JSON in the vectorized execution engine Co-authored-by: Jordan Lewis <[email protected]>
93400: coldata: add native support of enums r=yuzefovich a=yuzefovich This commit adds the native support of enum types to the vectorized engine. We store them via their physical representation, so we can easily reuse `Bytes` vector for almost all operations, and, thus, we just mark the enum family as having the bytes family as its canonical representation. There are only a handful of places where we need to go from the physical representation to either the logical one or to the `DEnum`: - when constructing the pgwire message to the client (in both text and binary format the logical representation is used) - when converting from columnar to row-by-row format (fully-fledged `DEnum` is constructed) - casts. In all of these places we already have access to the precise typing information (similar to what we have for UUIDs which are supported via the bytes canonical type family already). I can really see only one downside to such implementation - in some places the resolution based on the canonical (rather than actual) type family might be too coarse. For example, we have `<bytes> || <bytes>` binary operator (`concat`). As it currently stands the execution will proceed to perform the concatenation between two UUIDs or between a BYTES value and a UUID, and now we'll be adding enums into the mix. However, the type checking is performed earlier on the query execution path, so I think it is acceptable since the execution should never reach such a setup. An additional benefit of this work is that we'll be able to support the KV projection pushdown in presence of enums - on the KV server side we'll just operate with the physical representations and won't need to have access to the hydrated type whereas on the client side we'll have the hydrated type, so we'll be able to do all operations. Addresses: #42043. Informs: #92954. Epic: CRDB-14837 Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
With the addition of
datumVec
we now support all types with either atree.Datum
-backed representation or optimized "native" representation. The latter is more performant, and this issue tracks the addition of native representation to the remaining types.INTERVAL
TIMESTAMPTZ
JSONB
Enum
ARRAY
TIME
INET
COLLATEDSTRING
TIMETZ
Geometry
Geography
Tuple
Bit
ARRAY
seems to be the most frequently used currently unimplemented type. ThenINET
andTIME
.Jira issue: CRDB-5397
The text was updated successfully, but these errors were encountered: