-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: add support for ILIKE and NOT ILIKE #85695
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool looks good to me! The need to toupper surprised me a bit, is there no optimized case insensitive comparison we can use and avoid the extra allocation?
Reviewed 1 of 10 files at r1, 11 of 11 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/sql/colexec/execgen/cmd/execgen/overloads_base.go
line 316 at r2 (raw file):
Right *lastArgWidthOverload Negatable bool CaseInsensitive bool
At this point, should there be a separate struct for the LIKE
operators with twoArgsResolvedOverload
embedded?
This commit cleans up the way we generate operators for NOT LIKE. Previously, they would get their own copy which was exactly the same as for LIKE with a difference in a single line, and now the same underlying operator will handle both LIKE and NOT LIKE - the result of comparison just needs to be negated. The performance hit of this extra boolean comparison is negligible yet we can remove some of the duplicated generated code. ``` name old time/op new time/op delta LikeOps/selPrefixBytesBytesConstOp-24 17.8µs ± 1% 16.9µs ± 0% -4.93% (p=0.000 n=10+10) LikeOps/selSuffixBytesBytesConstOp-24 18.5µs ± 0% 18.7µs ± 0% +1.37% (p=0.000 n=10+10) LikeOps/selContainsBytesBytesConstOp-24 27.8µs ± 0% 28.0µs ± 0% +1.02% (p=0.000 n=9+10) LikeOps/selRegexpBytesBytesConstOp-24 479µs ± 1% 484µs ± 0% +1.10% (p=0.000 n=10+10) LikeOps/selSkeletonBytesBytesConstOp-24 39.9µs ± 0% 40.3µs ± 0% +0.85% (p=0.000 n=10+10) LikeOps/selRegexpSkeleton-24 871µs ± 2% 871µs ± 0% ~ (p=1.000 n=10+10) ``` Release note: None
This commit adds the native vectorized support for ILIKE and NOT ILIKE comparisons. The idea is simple - convert both the argument and the pattern to capital letters. This required minor changes to the templates to add a "prelude" step of that conversion as well as conversion of the pattern to the upper case during planning. Initially, I generated separate operators for case-insensitive cases, but the benchmarks shown that the performance impact of a single conditional inside of the `for` loop is barely noticeable given that the branch prediction will always be right, so I refactored the existing operators to support case insensitivity. ``` name old time/op new time/op delta LikeOps/selPrefixBytesBytesConstOp-24 16.8µs ± 0% 17.7µs ± 0% +5.30% (p=0.000 n=10+10) LikeOps/selSuffixBytesBytesConstOp-24 18.7µs ± 0% 19.2µs ± 0% +2.99% (p=0.000 n=10+10) LikeOps/selContainsBytesBytesConstOp-24 28.0µs ± 0% 27.8µs ± 0% -0.73% (p=0.000 n=10+10) LikeOps/selRegexpBytesBytesConstOp-24 479µs ± 0% 480µs ± 0% +0.33% (p=0.008 n=9+10) LikeOps/selSkeletonBytesBytesConstOp-24 40.2µs ± 0% 41.4µs ± 0% +3.20% (p=0.000 n=9+10) LikeOps/selRegexpSkeleton-24 860µs ± 0% 857µs ± 0% -0.36% (p=0.023 n=10+10) ``` Release note (performance improvement): ILIKE and NOT ILIKE filters can now be evaluated more efficiently in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there no optimized case insensitive comparison we can use and avoid the extra allocation
Looks like there is one - bytes.EqualFold
, but it's only for equality comparison whereas we want "prefix", "suffix", and "contains" matching behavior. (Also, the current approach was copy-pasted from eval.optimizedLikeFunc
(the row-by-row engine), so for consistency it'd better to keep them in sync.)
TFTRs!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @DrewKimball, and @mgartner)
pkg/sql/colexec/execgen/cmd/execgen/overloads_base.go
line 316 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
At this point, should there be a separate struct for the
LIKE
operators withtwoArgsResolvedOverload
embedded?
Unfortunately, it's not that simple - we are using the same templates and template generators as the "main" selection / projection operators. We either need to define separate templates or we need to put these variables (or define methods with the corresponding names) into twoArgsResolvedOverload
so that templates for non-like operators compile. The latter seems nicer to me.
Build failed (retrying...): |
Build failed (retrying...): |
Build failed: |
Unrelated flakes. bors r+ |
Build succeeded: |
colexec: clean up NOT LIKE operator generation
This commit cleans up the way we generate operators for NOT LIKE.
Previously, they would get their own copy which was exactly the same as
for LIKE with a difference in a single line, and now the same underlying
operator will handle both LIKE and NOT LIKE - the result of comparison
just needs to be negated. The performance hit of this extra boolean
comparison is negligible yet we can remove some of the duplicated
generated code.
Release note: None
colexec: add support for ILIKE and NOT ILIKE
This commit adds the native vectorized support for ILIKE and NOT ILIKE
comparisons. The idea is simple - convert both the argument and the
pattern to capital letters. This required minor changes to the templates
to add a "prelude" step of that conversion as well as conversion of the
pattern to the upper case during planning.
Initially, I generated separate operators for case-insensitive cases,
but the benchmarks shown that the performance impact of a single
conditional inside of the
for
loop is barely noticeable given that thebranch prediction will always be right, so I refactored the existing
operators to support case insensitivity.
Addresses: #49781.
Release note (performance improvement): ILIKE and NOT ILIKE filters can
now be evaluated more efficiently in some cases.