Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: add support for ILIKE and NOT ILIKE #85695

Merged
merged 2 commits into from
Aug 9, 2022

Conversation

yuzefovich
Copy link
Member

colexec: clean up NOT LIKE operator generation

This commit cleans up the way we generate operators for NOT LIKE.
Previously, they would get their own copy which was exactly the same as
for LIKE with a difference in a single line, and now the same underlying
operator will handle both LIKE and NOT LIKE - the result of comparison
just needs to be negated. The performance hit of this extra boolean
comparison is negligible yet we can remove some of the duplicated
generated code.

name                                     old time/op    new time/op    delta
LikeOps/selPrefixBytesBytesConstOp-24      17.8µs ± 1%    16.9µs ± 0%   -4.93%  (p=0.000 n=10+10)
LikeOps/selSuffixBytesBytesConstOp-24      18.5µs ± 0%    18.7µs ± 0%   +1.37%  (p=0.000 n=10+10)
LikeOps/selContainsBytesBytesConstOp-24    27.8µs ± 0%    28.0µs ± 0%   +1.02%  (p=0.000 n=9+10)
LikeOps/selRegexpBytesBytesConstOp-24       479µs ± 1%     484µs ± 0%   +1.10%  (p=0.000 n=10+10)
LikeOps/selSkeletonBytesBytesConstOp-24    39.9µs ± 0%    40.3µs ± 0%   +0.85%  (p=0.000 n=10+10)
LikeOps/selRegexpSkeleton-24                871µs ± 2%     871µs ± 0%     ~     (p=1.000 n=10+10)

Release note: None

colexec: add support for ILIKE and NOT ILIKE

This commit adds the native vectorized support for ILIKE and NOT ILIKE
comparisons. The idea is simple - convert both the argument and the
pattern to capital letters. This required minor changes to the templates
to add a "prelude" step of that conversion as well as conversion of the
pattern to the upper case during planning.

Initially, I generated separate operators for case-insensitive cases,
but the benchmarks shown that the performance impact of a single
conditional inside of the for loop is barely noticeable given that the
branch prediction will always be right, so I refactored the existing
operators to support case insensitivity.

name                                     old time/op    new time/op    delta
LikeOps/selPrefixBytesBytesConstOp-24      16.8µs ± 0%    17.7µs ± 0%  +5.30%  (p=0.000 n=10+10)
LikeOps/selSuffixBytesBytesConstOp-24      18.7µs ± 0%    19.2µs ± 0%  +2.99%  (p=0.000 n=10+10)
LikeOps/selContainsBytesBytesConstOp-24    28.0µs ± 0%    27.8µs ± 0%  -0.73%  (p=0.000 n=10+10)
LikeOps/selRegexpBytesBytesConstOp-24       479µs ± 0%     480µs ± 0%  +0.33%  (p=0.008 n=9+10)
LikeOps/selSkeletonBytesBytesConstOp-24    40.2µs ± 0%    41.4µs ± 0%  +3.20%  (p=0.000 n=9+10)
LikeOps/selRegexpSkeleton-24                860µs ± 0%     857µs ± 0%  -0.36%  (p=0.023 n=10+10)

Addresses: #49781.

Release note (performance improvement): ILIKE and NOT ILIKE filters can
now be evaluated more efficiently in some cases.

@yuzefovich yuzefovich requested review from mgartner, cucaroach and a team August 6, 2022 00:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool looks good to me! The need to toupper surprised me a bit, is there no optimized case insensitive comparison we can use and avoid the extra allocation?

Reviewed 1 of 10 files at r1, 11 of 11 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)


pkg/sql/colexec/execgen/cmd/execgen/overloads_base.go line 316 at r2 (raw file):

	Right           *lastArgWidthOverload
	Negatable       bool
	CaseInsensitive bool

At this point, should there be a separate struct for the LIKE operators with twoArgsResolvedOverload embedded?

This commit cleans up the way we generate operators for NOT LIKE.
Previously, they would get their own copy which was exactly the same as
for LIKE with a difference in a single line, and now the same underlying
operator will handle both LIKE and NOT LIKE - the result of comparison
just needs to be negated. The performance hit of this extra boolean
comparison is negligible yet we can remove some of the duplicated
generated code.

```
name                                     old time/op    new time/op    delta
LikeOps/selPrefixBytesBytesConstOp-24      17.8µs ± 1%    16.9µs ± 0%   -4.93%  (p=0.000 n=10+10)
LikeOps/selSuffixBytesBytesConstOp-24      18.5µs ± 0%    18.7µs ± 0%   +1.37%  (p=0.000 n=10+10)
LikeOps/selContainsBytesBytesConstOp-24    27.8µs ± 0%    28.0µs ± 0%   +1.02%  (p=0.000 n=9+10)
LikeOps/selRegexpBytesBytesConstOp-24       479µs ± 1%     484µs ± 0%   +1.10%  (p=0.000 n=10+10)
LikeOps/selSkeletonBytesBytesConstOp-24    39.9µs ± 0%    40.3µs ± 0%   +0.85%  (p=0.000 n=10+10)
LikeOps/selRegexpSkeleton-24                871µs ± 2%     871µs ± 0%     ~     (p=1.000 n=10+10)
```

Release note: None
This commit adds the native vectorized support for ILIKE and NOT ILIKE
comparisons. The idea is simple - convert both the argument and the
pattern to capital letters. This required minor changes to the templates
to add a "prelude" step of that conversion as well as conversion of the
pattern to the upper case during planning.

Initially, I generated separate operators for case-insensitive cases,
but the benchmarks shown that the performance impact of a single
conditional inside of the `for` loop is barely noticeable given that the
branch prediction will always be right, so I refactored the existing
operators to support case insensitivity.

```
name                                     old time/op    new time/op    delta
LikeOps/selPrefixBytesBytesConstOp-24      16.8µs ± 0%    17.7µs ± 0%  +5.30%  (p=0.000 n=10+10)
LikeOps/selSuffixBytesBytesConstOp-24      18.7µs ± 0%    19.2µs ± 0%  +2.99%  (p=0.000 n=10+10)
LikeOps/selContainsBytesBytesConstOp-24    28.0µs ± 0%    27.8µs ± 0%  -0.73%  (p=0.000 n=10+10)
LikeOps/selRegexpBytesBytesConstOp-24       479µs ± 0%     480µs ± 0%  +0.33%  (p=0.008 n=9+10)
LikeOps/selSkeletonBytesBytesConstOp-24    40.2µs ± 0%    41.4µs ± 0%  +3.20%  (p=0.000 n=9+10)
LikeOps/selRegexpSkeleton-24                860µs ± 0%     857µs ± 0%  -0.36%  (p=0.023 n=10+10)
```

Release note (performance improvement): ILIKE and NOT ILIKE filters can
now be evaluated more efficiently in some cases.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no optimized case insensitive comparison we can use and avoid the extra allocation

Looks like there is one - bytes.EqualFold, but it's only for equality comparison whereas we want "prefix", "suffix", and "contains" matching behavior. (Also, the current approach was copy-pasted from eval.optimizedLikeFunc (the row-by-row engine), so for consistency it'd better to keep them in sync.)

TFTRs!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @DrewKimball, and @mgartner)


pkg/sql/colexec/execgen/cmd/execgen/overloads_base.go line 316 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

At this point, should there be a separate struct for the LIKE operators with twoArgsResolvedOverload embedded?

Unfortunately, it's not that simple - we are using the same templates and template generators as the "main" selection / projection operators. We either need to define separate templates or we need to put these variables (or define methods with the corresponding names) into twoArgsResolvedOverload so that templates for non-like operators compile. The latter seems nicer to me.

@craig
Copy link
Contributor

craig bot commented Aug 9, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 9, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 9, 2022

Build failed:

@yuzefovich
Copy link
Member Author

Unrelated flakes.

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 9, 2022

Build succeeded:

@craig craig bot merged commit 7c18668 into cockroachdb:master Aug 9, 2022
@yuzefovich yuzefovich deleted the ilike branch August 9, 2022 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants