-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tree: fix vectorized COALESCE and NULLIF type checking with VOID #83868
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @otan)
pkg/sql/logictest/testdata/logic_test/void
line 76 at r1 (raw file):
statement ok SET vectorize=off
nit: you should do something like this:
SET vectorize=on
<query>
SET vectorize=off
< query>
RESET vectorize
when you want to control the vectorize mode precisely, compare the results for two engines, and so that the setup is left unchanged in the end.
As it is written currently, the first execution can happen with any vectorize
mode (depending on the config), the second one is guaranteed to be vectorize=off
, and vectorize=off
remains for all configs for the rest of the file (should new queries are added to the file).
pkg/sql/sem/tree/eval.go
Outdated
@@ -1651,6 +1651,8 @@ var CmpOps = cmpOpFixups(map[treecmp.ComparisonOperatorSymbol]cmpOpOverload{ | |||
makeIsFn(types.TimestampTZ, types.Timestamp, volatility.Stable), | |||
makeIsFn(types.Time, types.TimeTZ, volatility.Stable), | |||
makeIsFn(types.TimeTZ, types.Time, volatility.Stable), | |||
makeIsFn(types.Unknown, types.Void, volatility.LeakProof), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix type resolution to instead evaluate NULL as a VOID? i wonder if this also gets fixed by adding void here (it might not)
cockroach/pkg/sql/sem/tree/constant.go
Line 480 in c48196c
StrValAvailAllParsable = []*types.T{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @otan)
pkg/sql/sem/tree/eval.go
line 1654 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
can we fix type resolution to instead evaluate NULL as a VOID? i wonder if this also gets fixed by adding void here (it might not)
cockroach/pkg/sql/sem/tree/constant.go
Line 480 in c48196c
StrValAvailAllParsable = []*types.T{
If I understand correctly, you're suggesting adding types.Void,
at the end of this slice. This does not fix the error, and the code in eval.go
is still required.
And, adding this would mean that type annotation on any string, such as 'foo':::VOID
would now succeed. Is this what we want?
Support for type annotation of the empty string to VOID is already added through #82965.
Previously, msirek (Mark Sirek) wrote…
i guess why do we special case Unknown for void but not the others? Ideally this comparison is fixed during type checking |
Previously, otan (Oliver Tan) wrote…
Sorry by "comparison is fixed" i mean "comparison is made and evaluated to be a void comparison" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, and @otan)
pkg/sql/sem/tree/eval.go
line 1654 at r1 (raw file):
i guess why do we special case Unknown for void but not the others?
Because all other types are equivalent to each other, causing this line to be true
for each of the inputs:
cockroach/pkg/sql/sem/tree/overload.go
Line 298 in 94fe056
return i < len(a) && (typ.Family() == types.UnknownFamily || a[i].Typ.Equivalent(typ)) |
a[i].Typ.Equivalent(typ)
is false
when a[i].Typ
is void and type
is void. But this is by design:
select ''::VOID = ''::VOID;
ERROR: unsupported comparison operator: <void> = <void>
SQLSTATE: 22023
So, VOID is just a special type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @otan, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/void
line 76 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: you should do something like this:
SET vectorize=on <query> SET vectorize=off < query> RESET vectorize
when you want to control the vectorize mode precisely, compare the results for two engines, and so that the setup is left unchanged in the end.
As it is written currently, the first execution can happen with any
vectorize
mode (depending on the config), the second one is guaranteed to bevectorize=off
, andvectorize=off
remains for all configs for the rest of the file (should new queries are added to the file).
Fixed it, thanks!
Previously, msirek (Mark Sirek) wrote…
interesting... |
Previously, otan (Oliver Tan) wrote…
i think this may stem from the fact that In
which is different to
so maybe adding this It may be worth adding a comment about why this is special cased. Sorry for making a seemingly big deal about two lines of code, I really just wanted to dig into the special case! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @otan, and @yuzefovich)
pkg/sql/sem/tree/eval.go
line 1654 at r1 (raw file):
select ''::VOID = ''::VOID; is equating the values of the types though, not the types itself.
Yes, I know, but in the process of testing if the =
operator can be applied we check if the types are equivalent. This applies to any of the operators, such as IS DISTINCT FROM, which is the operator we care about here.
which is different to Is:
otan=# select ''::void IS null;
This example is comparing VoidType with UnknownType, not VoidType with VoidType.
i'd argue types.VoidFamily.Equivalent(types.VoidFamily) should be true...
If that were the case, then we'd have different behavior from postgres, which treats them not as equivalent:
postgres=# select ''::void is distinct from ''::void;
ERROR: operator does not exist: void = void
LINE 1: select ''::void is distinct from ''::void;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
postgres=#
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment as per comment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added:
// Void is unique in that it is not equivalent with itself, so implicit
// equivalence with Unknown in function ArgTypes.MatchAt due to the check
// `(typ.Family() == types.UnknownFamily || a[i].Typ.Equivalent(typ))` does
// not occur. Therefore, to allow the comparison
// `''::VOID IS DISTINCT FROM NULL`, an explicit equivalence with Unknown is
// added:
makeIsFn(types.Unknown, types.Void, volatility.LeakProof),
makeIsFn(types.Void, types.Unknown, volatility.LeakProof),
Do you have an opinion on whether we should support type annotation to VOID on any string to match CAST?
Currently only '':::VOID
is legal, but since 'foo'::VOID
is legal, should 'foo':::VOID
also be legal? In #82965 we decided to only allow '':::VOID
since that's what SQLSmith needed, and '' is a valid string representation of a VOID literal, but perhaps type annotation should have the same behavior as CAST.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @otan, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @otan, and @yuzefovich)
Build failed: |
bors r+ |
Build failed (retrying...): |
bors r- |
Canceled. |
Previously, msirek (Mark Sirek) wrote…
I'm not convinced that PG supports the
When |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgartner @yuzefovich Please take a look at the new fix.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @otan, and @yuzefovich)
pkg/sql/sem/tree/eval.go
line 1654 at r1 (raw file):
I'm not convinced that PG supports the [VOID] IS DISTINCT FROM [UNKNOWN] operator:
You're right! I found they're normalizing this to an IS NOT NULL operator when one side is NULL. The code only looks for constant NULL, not a type-cast, so is using IS DISTINCT FROM when the cast is present.
The root problem for this case is COALESCE uses IS DISTINCT FROM when vectorized, but in the row engine evaluates each expression in the COALESCE directly into a Datum
, then compares that Datum
directly in Go code with DNull
:
cockroach/pkg/sql/sem/eval/expr.go
Lines 187 to 194 in 5a9ec7c
func (e *evaluator) EvalCoalesceExpr(expr *tree.CoalesceExpr) (tree.Datum, error) { | |
for _, ex := range expr.Exprs { | |
d, err := ex.(tree.TypedExpr).Eval(e) | |
if err != nil { | |
return nil, err | |
} | |
if d != tree.DNull { | |
return d, nil |
It's now fixed by using IS NOT NULL in place of IS DISTINCT FROM NULL.
PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it's worth to get another approval too.
Reviewed 6 of 6 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @otan)
1054063
to
4fb8726
Compare
I hate to throw a wrench in this, but
So we can't make this transformation in all cases. A good test case to add would be the example in PG below. Notice how
An alternative would be to use Or, since we know that any value of type Maybe there is a better idea if we think those alternatives are too hacky. |
Previously, msirek (Mark Sirek) wrote…
Thanks for investigating deeper! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @otan, and @yuzefovich)
pkg/sql/colexec/colbuilder/execplan.go
line 2128 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I hate to throw a wrench in this, but
x IS NOT NULL
andx IS DISTINCT FROM NULL
are not equivalent whenx
is a tuple (learned this lesson in #48299). In Postgres:marcus=# SELECT (NULL, NULL) IS NOT NULL; ?column? ---------- f (1 row) marcus=# SELECT (NULL, NULL) IS DISTINCT FROM NULL; ?column? ---------- t (1 row)
So we can't make this transformation in all cases. A good test case to add would be the example in PG below. Notice how
(NULL, NULL)
is returned.marcus=# SELECT COALESCE((NULL, NULL), (1, 2)); coalesce ---------- (,) (1 row)
An alternative would be to use
IS NOT NULL
only ift.Exprs[i]
is typeVOID
, and useIS DISTINCT FROM
otherwise.Or, since we know that any value of type
VOID
(or the singular value? or the empty value?) is always distinct fromNULL
, can we simply fold the coalesce intot.Exprs[i]
when it is of type void? The optimizer could also make this transformation in a normalization rule.Maybe there is a better idea if we think those alternatives are too hacky.
Thanks for catching this! I'm not sure what the 100% best option is, but I've updated the fix to use IS DISTINCT FROM
if it can (by checking if the type is compatible with comparison to NULL), and if not, IS NOT NULL
. VOID
should be the only type this applies to, but if another type is added which is similar to VOID
, this should automatically handle it too.
Can we get this merged? @mgartner do you still need to sign off? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek, @otan, and @yuzefovich)
-- commits
line 27 at r6:
nit: mention that IS NOT NULL
is only used when IS DISTINCT FROM NULL
is not a valid comparison.
Fixes cockroachdb#83754 Previously, the COALESCE and NULLIF operators could error out in vectorized execution when comparing a VOID datum with NULL. This occurred due to the unique property of a VOID, that it can't be compared with another VOID using any of the normal operators such as `=`, `<`, `>`..., or even IS [NOT] [DISTINCT FROM], for example: ``` SELECT ''::VOID IS DISTINCT FROM ''::VOID; ERROR: unsupported comparison operator: <void> IS DISTINCT FROM <void> SQLSTATE: 22023 ``` Processing of COALESCE in the columnar execution engine for an expression like `COALESCE(void_col, NULL)` builds an IS DISTINCT FROM operation between the VOID column and NULL here: https://github.com/cockroachdb/cockroach/blob/ea559dfe0ba57259ca71d3c8ca1de6388954ea73/pkg/sql/colexec/colbuilder/execplan.go#L2122-L2129 This comparison is assumed to be OK, but fails with an internal error because comparison with `UnknownType` is only implicitly supported if the type can be compared with itself. To address this, this patch modifies vectorized COALESCE to use the IS NOT NULL operator internally instead of IS DISTINCT FROM whenever the latter is not a valid comparison. A similar problem with NULLIF, which internally uses `=`, is fixed. Type checking of NULLIF is enhanced in the parser so incompatible comparisons can be caught prior to execution time. Release note (bug fix): This patch fixes vectorized evaluation of COALESCE involving expressions of type VOID and enhances type checking of NULLIF expressions with VOID, so incompatible comparisons can be caught during query compilation instead of during query execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @otan and @yuzefovich)
Previously, mgartner (Marcus Gartner) wrote…
nit: mention that
IS NOT NULL
is only used whenIS DISTINCT FROM NULL
is not a valid comparison.
Done
Build failed (retrying...): |
Build succeeded: |
Fixes #83754
Previously, the COALESCE and NULLIF operators could error out in
vectorized execution when comparing a VOID datum with NULL.
This occurred due to the unique property of a VOID, that it
can't be compared with another VOID using any of the normal operators
such as
=
,<
,>
..., or even IS [NOT] [DISTINCT FROM], for example:Processing of COALESCE in the columnar execution engine for an
expression like
COALESCE(void_col, NULL)
builds an IS DISTINCT FROMoperation between the VOID column and NULL here:
cockroach/pkg/sql/colexec/colbuilder/execplan.go
Lines 2122 to 2129 in ea559df
This comparison is assumed to be OK, but fails with an internal error
because comparison with
UnknownType
is only implicitly supported ifthe type can be compared with itself.
To address this, this patch modifies vectorized COALESCE to use the
IS NOT NULL operator internally instead of IS DISTINCT FROM whenever
the latter is not a valid comparison.
A similar problem with NULLIF, which internally uses
=
, is fixed.Type checking of NULLIF is enhanced in the parser so incompatible
comparisons can be caught prior to execution time.
Release note (bug fix): This patch fixes vectorized evaluation of
COALESCE involving expressions of type VOID and enhances type checking
of NULLIF expressions with VOID, so incompatible comparisons can
be caught during query compilation instead of during query execution.