-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: incorrect nullability of InList
expr
#6799
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nicely written and tested @jonahgao 👏 Thank you very much
Improve code readability Co-authored-by: Andrew Lamb <[email protected]>
|
||
Expr::InList(InList { expr, list, .. }) => { | ||
// Avoid inspecting too many expressions. | ||
const MAX_INSPECT_LIMIT: usize = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to know why use this number? Is it the practice from other systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark handle it simplified.
override def nullable: Boolean = children. Exists(_.nullable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to know why use this number? Is it the practice from other systems?
@jackwener No. The nullable
function may be called multiple times during the optimization phase,
So I think adding a limitation would be preferable in order to prevent it from being excessively slow.
But I'm not quite sure what would be an appropriate number.
spark handle it simplified.
override def nullable: Boolean = children. Exists(_.nullable)
This seems to be a cache style.
We can implement this by precomputing nullable in the InList::new()
function.
But the disadvantages are:
- We need a new field for the
InList
struct - The precomputed nullable may not be used.
- Calculating nullable requires
input_schema
@jackwener Which solution do you prefer?
Update: It seems challenging to accomplish. I need to take a closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @jonahgao hints at , trying to add some sort of cache / memoization is going to be challenging given Rust and how DataFusion is structured
I think the current PR solution is good:
- It is well tested
- It is conservative (it might say an InList is nullable that isn't, but I think that will not generate wrong results, only potentially less optimal plans)
Thus I think we should merge this PR as is and then as a follow on PR we can remove the limit or increase it, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* fix: incorrect nullability of InList expr * Update datafusion/expr/src/expr_schema.rs Improve code readability Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
* fix: incorrect nullability of InList expr * Update datafusion/expr/src/expr_schema.rs Improve code readability Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
None
Rationale for this change
Similar to #6786 .
We can safely assume that the
InList
expr is non-nullalbe only if all of its subexpressions are non-nullable.An example of a nullable expression is:
What changes are included in this PR?
InList
exprAre these changes tested?
Yes
Are there any user-facing changes?
No