Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexecbase: extend support of casts #67733

Merged
merged 4 commits into from
Jul 22, 2021
Merged

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jul 17, 2021

Addresses: #48135

See individual commits for details. After this PR we only need to add
more casts between natively supported types.

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jul 17, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the vec-casts branch 10 times, most recently from b3a28d7 to ec33dc1 Compare July 21, 2021 17:26
@yuzefovich yuzefovich marked this pull request as ready for review July 21, 2021 17:27
@yuzefovich yuzefovich requested a review from a team as a code owner July 21, 2021 17:27
@yuzefovich yuzefovich requested a review from DrewKimball July 21, 2021 17:27
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 94 at r1 (raw file):

		return true
	}
	if fromType.Family() == types.UuidFamily && toType.Family() == types.BytesFamily {

nit/question: Why is this not the case for UUID -> 'String? Can String` have a width?


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 351 at r1 (raw file):

			c.da.AllocSize = n
		}
		if cap(c.scratch) < maxIdx {

I think c.scratch needs to be resliced when the capacity is sufficient, right?


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 400 at r1 (raw file):

	}
	converted := scratch[idx]
	if hasNulls {

[nit] could fold these conditions


pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):

	setValues := makeFunctionRegex("_CAST_TUPLES", 2)
	// We have to be a bit tricky with this replacement because we want to

[nit] We might be able to avoid this sort of thing by using execgen. Though, feel free to disregard this if you don't think it's worth it; the code that generates the code isn't too important.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 94 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

nit/question: Why is this not the case for UUID -> 'String? Can String` have a width?

types.String implicitly also represents weird string-like special types (like char and "char", etc) which do require some truncation. At the moment our "resolution" is based on type families (and widths for IntFamily), so all special string-like types are handled by the same operator. In the future we might lift up the resolution to be based on Oids (which will speed up the casts), but it doesn't seem worth it at the moment.

Also, even if there weren't such complications, UUIDs are represented differently when they have been cast to strings. UUIDs are essentially [16]byte, but when we cast them to string, we get xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx which is longer and also encoded. Thus, we do need to first convert [16]byte to a UUID and then stringify it.


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 351 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think c.scratch needs to be resliced when the capacity is sufficient, right?

Good catch, fixed.


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 400 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] could fold these conditions

Done.


pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] We might be able to avoid this sort of thing by using execgen. Though, feel free to disregard this if you don't think it's worth it; the code that generates the code isn't too important.

Hm, I don't see how the execgen would help us here - _CAST_TUPLES does rely heavily on the text/template stuff. In particular, $fromInfo is a template variable that we want to propagate into the template invocation. It also propagates the overload via .Global. I think that the execgen isn't currently capable of these things (at least without an involved refactor).

@yuzefovich yuzefovich force-pushed the vec-casts branch 2 times, most recently from 7a6b9cb to c8aa0ab Compare July 21, 2021 23:37
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 94 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

types.String implicitly also represents weird string-like special types (like char and "char", etc) which do require some truncation. At the moment our "resolution" is based on type families (and widths for IntFamily), so all special string-like types are handled by the same operator. In the future we might lift up the resolution to be based on Oids (which will speed up the casts), but it doesn't seem worth it at the moment.

Also, even if there weren't such complications, UUIDs are represented differently when they have been cast to strings. UUIDs are essentially [16]byte, but when we cast them to string, we get xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx which is longer and also encoded. Thus, we do need to first convert [16]byte to a UUID and then stringify it.

Got it, thanks.


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 325 at r2 (raw file):

	scratch []tree.Datum
	da      rowenc.DatumAlloc

Should we be accounting for these somewhere? (maybe we are, and I'm just not seeing it)


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 358 at r2 (raw file):

		scratch := c.scratch
		colconv.ColVecToDatum(scratch, inputVec, n, sel, &c.da)
		if sel != nil {

Shouldn't we always ignore the selection?


pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, I don't see how the execgen would help us here - _CAST_TUPLES does rely heavily on the text/template stuff. In particular, $fromInfo is a template variable that we want to propagate into the template invocation. It also propagates the overload via .Global. I think that the execgen isn't currently capable of these things (at least without an involved refactor).

I thin you should be able to leave the $fromInfo templating and whatever else - execgen inlining should preserve those comments. But, again, it doesn't really matter.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 325 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Should we be accounting for these somewhere? (maybe we are, and I'm just not seeing it)

No, we typically don't account for objects within DatumAllocs. We could (and probably should) be accounting for the temporary converted datums stored in c.scratch, but so far we haven't done so. I filed #67889 to not forget about this shortcoming, but I don't think it's too important to address it in this PR.


pkg/sql/colexec/colexecbase/cast_tmpl.go, line 358 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Shouldn't we always ignore the selection?

No, we're consciously using sel here. colconv.ColVecToDatum performs the conversion without deselection (we have another version for that), all slots in scratch that are not in sel are left unchanged, and the same goes for all slots in outputCol. I left it this way because all other cast operators work in this manner (I don't remember but I might have run into test issues before my PTO if the deselection is performed).

Hm, now that I've typed it out I think we can perform the conversion with deselection but then put the datums from the scratch into the corresponding slots, without performing deselection on the output vector (because that would require copying over all other vectors too).


pkg/sql/colexec/execgen/cmd/execgen/cast_gen.go, line 36 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I thin you should be able to leave the $fromInfo templating and whatever else - execgen inlining should preserve those comments. But, again, it doesn't really matter.

Indeed, good point, added another commit to refactor the template.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)

This is accomplished by falling back to the conversion between datums
(i.e. we first cast to a datum and then convert it to the physical
representation). This might seem useless, but it allows us to remove the
pair of columnarizer/materializer if we had to fallback to wrapping
a row-execution processor, so I think this approach is worth it. Over
time we will add more inlined casts between native types and extend the
support of native types too.

Release note: None
This is done by introducing a single operator which first converts the
whole input vector to a datum representation (stored in a scratch space)
and then performs a cast one datum at a time to the desired type.

Release note: None
This commit introduces several casts between natively supported types,
mostly bytes-like.

This commit also cleans up things around BCEs because bytes-like vectors
don't support slicing.

Additionally, this commit refactors the cast template in order to use
the execgen rather than the text/template stuff as well as makes the
error for unsupported casts more descriptive.

Release note: None
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jul 22, 2021
@yuzefovich
Copy link
Member Author

I rebased on top of master and squashed the last three commits into one.

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Jul 22, 2021
67733: colexecbase: extend support of casts r=yuzefovich a=yuzefovich

Addresses: #48135

See individual commits for details. After this PR we only need to add
more casts between natively supported types.

Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 22, 2021

Build failed:

@yuzefovich
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 22, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 22, 2021

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Jul 22, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 22, 2021

Build succeeded:

@craig craig bot merged commit 0119b2b into cockroachdb:master Jul 22, 2021
@yuzefovich yuzefovich deleted the vec-casts branch July 22, 2021 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants