Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: add default comparison operator #52313

Merged
merged 2 commits into from
Aug 13, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Aug 4, 2020

Depends on #52315.

colexec: add default comparison operators

This commit introduces default comparison projection and selection
operators that handle all tree.ComparisonExprs for which we don't
have optimized implementations. The main operators are very similar
to defaultBuiltinFuncOperator, but we also introduce optimized
adapter implementations from tree.ComparisonExpr to a vectorized
friendly model. Quick benchmarks show about 3.5x improvement in
speed of IS DISTINCT FROM projection operator against the wrapped
post-processing spec.

Addresses: #49781.

Release note (sql change): Vectorized execution engine now fully
supports comparison operators (things like ILIKE, IS NOT DISTINCT FROM,
SIMILAR TO, and several others).

colexec: add support for Tuple expressions and clean up tests

This commit adds support for tree.Tuple expressions which are
supported either by pre-evaluation during planning (and handling the
result as other constant values) if the tuple is constant or by planning
projection operators for each expression in the tuple and then using
newly-introduced tupleProjOp if the tuple expression is not constant.

The support required refactoring of IsNull* operators to template out special
"is tuple null" variants (because tuple have very peculiar null-handling
semantics). It also required that we fallback to the default comparison
operators on things that we have optimized support for (for example,
EQ), also because of the null-handling semantics.

It also improves the tests to handle datum-backed types better. Note
that a single unit test in aggregators_test.go has been switched to use
types.TimeTZ instead of types.Jsonb because of the differences in
order of strings and json.JSON.String() (the former is the
"expected" value whereas the latter is the actual; it was simpler to
switch the test to use a different datum-backed type).

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the default-proj-op branch 5 times, most recently from 72c4cab to f3f2363 Compare August 5, 2020 02:39
@yuzefovich
Copy link
Member Author

yuzefovich commented Aug 5, 2020

This shows 3.5x improvement over the wrapped rowexec noop processor.

  • before
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=false/hasNulls=false-16         	    2587	    449706 ns/op	  36.43 MB/s
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=false/hasNulls=true-16          	    2665	    431585 ns/op	  37.96 MB/s
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=true/hasNulls=false-16          	    2799	    435340 ns/op	  37.63 MB/s
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=true/hasNulls=true-16           	    2900	    429555 ns/op	  38.14 MB/s
  • after
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=false/hasNulls=false-16         	    8319	    136810 ns/op	 119.76 MB/s
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=false/hasNulls=true-16          	    8220	    139109 ns/op	 117.78 MB/s
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=true/hasNulls=false-16          	    8780	    141413 ns/op	 115.86 MB/s
BenchmarkDefaultComparisonOp/IS_DISTINCT_FROM/useSel=true/hasNulls=true-16           	    8679	    139692 ns/op	 117.29 MB/s

@yuzefovich yuzefovich changed the title colexec: add default projection operator colexec: add default comparison operator Aug 5, 2020
@yuzefovich yuzefovich requested review from asubiotto and a team August 5, 2020 03:45
@yuzefovich yuzefovich force-pushed the default-proj-op branch 5 times, most recently from fbb515a to ee5dcbc Compare August 6, 2020 22:06
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 24 of 24 files at r3, 7 of 7 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/colexec/default_cmp_expr_tmpl.go, line 80 at r3 (raw file):

}

// {{range .}}

Would this be nicer to template with #50154?


pkg/sql/colexec/mergejoiner_test.go, line 1827 at r4 (raw file):

	}
	ctx := context.Background()
	evalCtx := tree.NewTestingEvalContext(cluster.MakeTestingClusterSettings())

defer evalCtx.Stop


pkg/sql/colexec/proj_const_ops_tmpl.go, line 240 at r3 (raw file):

			// Note that we performed a "dense" conversion, so there is no need
			// to check whether sel is non-nil.
			// {{if _IS_CONST_LEFT}}

I wonder whether it makes sense for this to be a template nonConstColumn and constArg are both the same type, so couldn't you just have two fields left and right that are set at instantiation? Also doesn't the comparison adapter already support flipped args?


pkg/sql/colexec/proj_non_const_ops_tmpl.go, line 248 at r3 (raw file):

			// Note that we performed a "dense" conversion, so there is no need
			// to check whether sel is non-nil.
			res, err := d.adapter.eval(leftColumn[i], rightColumn[i])

Can this code be extracted and shared between both default operator implementations?


pkg/sql/colexec/utils_test.go, line 265 at r4 (raw file):

	{
		ctx := context.Background()
		evalCtx := tree.NewTestingEvalContext(cluster.MakeTestingClusterSettings())

ditto

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colexec/default_cmp_expr_tmpl.go, line 80 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Would this be nicer to template with #50154?

Probably. TBH I'm very comfortable with the current template engine, so I probably can't see the difficulties it creates for others.


pkg/sql/colexec/mergejoiner_test.go, line 1827 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

defer evalCtx.Stop

Done.


pkg/sql/colexec/proj_const_ops_tmpl.go, line 240 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I wonder whether it makes sense for this to be a template nonConstColumn and constArg are both the same type, so couldn't you just have two fields left and right that are set at instantiation? Also doesn't the comparison adapter already support flipped args?

Note that nonConstColumn is a vector, so the corresponding argument is updated on every loop iteration meaning we still need to have the templated if.

Flipped arguments are a different concept - for example, we "normalize" arg1 >= arg2 into arg2 <= arg1 and remember that we have flipped arguments. That "flippage" is handled in eval method, but we need to pass in non-flipped arguments into eval.


pkg/sql/colexec/proj_non_const_ops_tmpl.go, line 248 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Can this code be extracted and shared between both default operator implementations?

I think it's not worth the effort at this point. We already have a precedent for non-default projection operators that have separate code for non-const-args and one-const-arg cases.


pkg/sql/colexec/utils_test.go, line 265 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

ditto

Done.

However, we now instantiate an evalCtx in different constructor methods in utils_test.go, and those won't get stopped, and I'm not sure of how to fix that without actually plumbing an eval context into runTests test harness. I think it should be acceptable to leave it as is and address later.

@dpulls
Copy link

dpulls bot commented Aug 11, 2020

🎉 All dependencies have been resolved !

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Aug 12, 2020
@yuzefovich
Copy link
Member Author

I'm polishing some things up, and I'll ping once it is ready for another look.

@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Aug 12, 2020
@yuzefovich
Copy link
Member Author

RFAL.

This commit introduces default comparison projection and selection
operators that handle all `tree.ComparisonExpr`s for which we don't
have optimized implementations. The main operators are very similar
to `defaultBuiltinFuncOperator`, but we also introduce optimized
adapter implementations from `tree.ComparisonExpr` to a vectorized
friendly model. Quick benchmarks show about 3.5x improvement in
speed of `IS DISTINCT FROM` projection operator against the wrapped
post-processing spec.

Release note (sql change): Vectorized execution engine now fully
supports comparison operators (things like `ILIKE`, `IS NOT DISTINCT
FROM`, `SIMILAR TO`, and several others).
This commit adds support for `tree.Tuple` expressions which are
supported either by pre-evaluation during planning (and handling the
result as other constant values) if the tuple is constant or by planning
projection operators for each expression in the tuple and then using
newly-introduced `tupleProjOp` if the tuple expression is not constant.

The support required refactoring of IsNull* operators to template out special
"is tuple null" variants (because tuple have very peculiar null-handling
semantics). It also required that we fallback to the default comparison
operators on things that we have optimized support for (for example,
EQ), also because of the null-handling semantics.

It also improves the tests to handle datum-backed types better. Note
that a single unit test in `aggregators_test.go` has been switched to use
`types.TimeTZ` instead of `types.Jsonb` because of the differences in
order of `string`s and `json.JSON.String()` (the former is the
"expected" value whereas the latter is the actual; it was simpler to
switch the test to use a different datum-backed type).

Release note: None
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 41 of 41 files at r5, 20 of 20 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/proj_non_const_ops_tmpl.go, line 248 at r3 (raw file):

Previously, yuzefovich wrote…

I think it's not worth the effort at this point. We already have a precedent for non-default projection operators that have separate code for non-const-args and one-const-arg cases.

😢 this makes me very sad. Just because there's precedent doesn't mean that it's good or that we should follow it. If we did, there'd be no progress. Up to you.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colexec/proj_non_const_ops_tmpl.go, line 248 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

😢 this makes me very sad. Just because there's precedent doesn't mean that it's good or that we should follow it. If we did, there'd be no progress. Up to you.

Actually, I have refactored this code and it is now templated.

I was thinking that it is easier to do such cleanups all at once, when the correct logic is already in place (i.e. "feature" is complete), so it should be ok to leave such technical debt if there is a precedent. If there isn't, then we should strive for not introducing the tech debt in the first place (or at least fix it very soon, in follow-up commits). In this particular case, I went back to introduce default selection comparison operator, so it was a good opportunity to clean up the code I was about to merge in this PR.

@craig
Copy link
Contributor

craig bot commented Aug 13, 2020

Build succeeded:

@craig craig bot merged commit b1359ad into cockroachdb:master Aug 13, 2020
@yuzefovich yuzefovich deleted the default-proj-op branch August 13, 2020 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants