-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexecbase: extend support of casts #67733
Conversation
b3a28d7
to
ec33dc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 94 at r1 (raw file):
return true } if fromType.Family() == types.UuidFamily && toType.Family() == types.BytesFamily {
nit/question: Why is this not the case for UUID
-> 'String? Can
String` have a width?
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 351 at r1 (raw file):
c.da.AllocSize = n } if cap(c.scratch) < maxIdx {
I think c.scratch
needs to be resliced when the capacity is sufficient, right?
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 400 at r1 (raw file):
} converted := scratch[idx] if hasNulls {
[nit] could fold these conditions
pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):
setValues := makeFunctionRegex("_CAST_TUPLES", 2) // We have to be a bit tricky with this replacement because we want to
[nit] We might be able to avoid this sort of thing by using execgen. Though, feel free to disregard this if you don't think it's worth it; the code that generates the code isn't too important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 94 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
nit/question: Why is this not the case for
UUID
-> 'String? Can
String` have a width?
types.String
implicitly also represents weird string-like special types (like char
and "char"
, etc) which do require some truncation. At the moment our "resolution" is based on type families (and widths for IntFamily), so all special string-like types are handled by the same operator. In the future we might lift up the resolution to be based on Oids (which will speed up the casts), but it doesn't seem worth it at the moment.
Also, even if there weren't such complications, UUID
s are represented differently when they have been cast to strings. UUIDs are essentially [16]byte
, but when we cast them to string, we get xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
which is longer and also encoded. Thus, we do need to first convert [16]byte
to a UUID
and then stringify it.
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 351 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I think
c.scratch
needs to be resliced when the capacity is sufficient, right?
Good catch, fixed.
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 400 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] could fold these conditions
Done.
pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] We might be able to avoid this sort of thing by using execgen. Though, feel free to disregard this if you don't think it's worth it; the code that generates the code isn't too important.
Hm, I don't see how the execgen would help us here - _CAST_TUPLES
does rely heavily on the text/template stuff. In particular, $fromInfo
is a template variable that we want to propagate into the template invocation. It also propagates the overload via .Global
. I think that the execgen isn't currently capable of these things (at least without an involved refactor).
7a6b9cb
to
c8aa0ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 94 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
types.String
implicitly also represents weird string-like special types (likechar
and"char"
, etc) which do require some truncation. At the moment our "resolution" is based on type families (and widths for IntFamily), so all special string-like types are handled by the same operator. In the future we might lift up the resolution to be based on Oids (which will speed up the casts), but it doesn't seem worth it at the moment.Also, even if there weren't such complications,
UUID
s are represented differently when they have been cast to strings. UUIDs are essentially[16]byte
, but when we cast them to string, we getxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
which is longer and also encoded. Thus, we do need to first convert[16]byte
to aUUID
and then stringify it.
Got it, thanks.
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 325 at r2 (raw file):
scratch []tree.Datum da rowenc.DatumAlloc
Should we be accounting for these somewhere? (maybe we are, and I'm just not seeing it)
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 358 at r2 (raw file):
scratch := c.scratch colconv.ColVecToDatum(scratch, inputVec, n, sel, &c.da) if sel != nil {
Shouldn't we always ignore the selection?
pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Hm, I don't see how the execgen would help us here -
_CAST_TUPLES
does rely heavily on the text/template stuff. In particular,$fromInfo
is a template variable that we want to propagate into the template invocation. It also propagates the overload via.Global
. I think that the execgen isn't currently capable of these things (at least without an involved refactor).
I thin you should be able to leave the $fromInfo
templating and whatever else - execgen inlining should preserve those comments. But, again, it doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 325 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should we be accounting for these somewhere? (maybe we are, and I'm just not seeing it)
No, we typically don't account for objects within DatumAlloc
s. We could (and probably should) be accounting for the temporary converted datums stored in c.scratch
, but so far we haven't done so. I filed #67889 to not forget about this shortcoming, but I don't think it's too important to address it in this PR.
pkg/sql/colexec/colexecbase/cast_tmpl.go, line 358 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Shouldn't we always ignore the selection?
No, we're consciously using sel
here. colconv.ColVecToDatum
performs the conversion without deselection (we have another version for that), all slots in scratch
that are not in sel
are left unchanged, and the same goes for all slots in outputCol
. I left it this way because all other cast operators work in this manner (I don't remember but I might have run into test issues before my PTO if the deselection is performed).
Hm, now that I've typed it out I think we can perform the conversion with deselection but then put the datums from the scratch into the corresponding slots, without performing deselection on the output vector (because that would require copying over all other vectors too).
pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I thin you should be able to leave the
$fromInfo
templating and whatever else - execgen inlining should preserve those comments. But, again, it doesn't really matter.
Indeed, good point, added another commit to refactor the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
This is accomplished by falling back to the conversion between datums (i.e. we first cast to a datum and then convert it to the physical representation). This might seem useless, but it allows us to remove the pair of columnarizer/materializer if we had to fallback to wrapping a row-execution processor, so I think this approach is worth it. Over time we will add more inlined casts between native types and extend the support of native types too. Release note: None
This is done by introducing a single operator which first converts the whole input vector to a datum representation (stored in a scratch space) and then performs a cast one datum at a time to the desired type. Release note: None
Release note: None
This commit introduces several casts between natively supported types, mostly bytes-like. This commit also cleans up things around BCEs because bytes-like vectors don't support slicing. Additionally, this commit refactors the cast template in order to use the execgen rather than the text/template stuff as well as makes the error for unsupported casts more descriptive. Release note: None
I rebased on top of master and squashed the last three commits into one. TFTR! bors r+ |
67733: colexecbase: extend support of casts r=yuzefovich a=yuzefovich Addresses: #48135 See individual commits for details. After this PR we only need to add more casts between natively supported types. Co-authored-by: Yahor Yuzefovich <[email protected]>
Build failed: |
bors r+ |
Build failed (retrying...): |
This PR was included in a batch that was canceled, it will be automatically retried |
Build failed (retrying...): |
Build succeeded: |
Addresses: #48135
See individual commits for details. After this PR we only need to add
more casts between natively supported types.