-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infer right side nullability for LEFT join #5748
Conversation
@comphead I am not sure about this change; it doesn't seem sound. The analyzer is assigning incorrect nullability flags to the right side of the join. I wonder if we shouldn't fix this instead. |
I think its not that easy for analyzer to identify correct nullable flags without scanning entire data. for LEFT join it assumes right side cannot be matched and sets nullable and this sounds correct for me. From other side we also have similar weaken check in optimizer https://github.com/apache/arrow-datafusion/blob/26b8377b0690916deacf401097d688699026b8fb/datafusion/optimizer/src/optimizer.rs#L423 |
Consider the above query. After
The two Does it make sense? |
yes, good catch, col4 has to have a true nullability. That might happen because optimizer sees literals and assumes they cannot be nulls, which will be incorrect in case of left join for sure. Checking this |
This is before optimizer I believe. It's just the initial stage of building the logical plan from the query. |
This is great @comphead! Check out the comments above. |
.cloned() | ||
.collect() | ||
} | ||
JoinType::Left => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix other types of joins on separate PR
cool, @milevin @alamb @jackwener please review whenever you have time |
I'll approve, but can you address the question about |
JoinType::Inner | JoinType::Left | JoinType::Full | JoinType::Right => { | ||
let right_fields = right.fields().iter(); | ||
let left_fields = left.fields().iter(); | ||
JoinType::Inner | JoinType::Full | JoinType::Right => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right should be treated symmetrically opposite of Left.
Full should be nullified on both sides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -530,6 +530,13 @@ FROM t1 | |||
---- | |||
11 a 55 | |||
|
|||
# test create table from query with LEFT join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add tests like this for Right, Full, and all the other flavors of joins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion/common/src/utils.rs
Outdated
@@ -234,6 +235,18 @@ pub(crate) fn parse_identifiers_normalized(s: &str) -> Vec<String> { | |||
.collect::<Vec<_>>() | |||
} | |||
|
|||
/// Check two schemas for being equal for field names/types | |||
pub fn equivalent_names_and_types(schema: &SchemaRef, other: SchemaRef) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this change for equivalent_names_and_types? I don't think it's used in your solution to the problem; right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comphead is this change relevant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done, Ive got a better idea how to refactor this method.
@alamb @jackwener please check this PR so I can follow up later on other join types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me -- thank you @comphead
It is somewhat amazing that his wasn't caught by existing tests.
@alamb This comment of mine was on an original version of this PR which tried to address the issue by weakening the compatibility check; @comphead followed the suggestion in the comment to rewrite the fix by changing the analyzer instead -- so all is good! Note: for the case of LEFT join, the right side must be nullified (that's what the comment talks about). A follow on PR will handle RIGHT and FULL OUTER joins where the left side will have to be adjusted as well. -- Michael |
Right, for RIGHT/FULL OUTER joins I will create a follow up PR. This one is too overloaded :) |
Thanks all ! |
Which issue does this PR close?
Closes partially #5747 (LEFT Join Mismatch between schema and batches on a CREATE TABLE with a windowing query #5695 (comment)).
Rationale for this change
Weakens the schema comparison between planner schema and batches leaving only name and datatype to avoid false positives on checking not null and nullable columns
What changes are included in this PR?
Are these changes tested?
yes
Are there any user-facing changes?