-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix subquery where exists distinct #3732
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @b41sh -- I can't argue with the results of this PR and the coverage, but I do wonder if there is a more fundamental problem we are missing. 🤔
@@ -137,8 +137,14 @@ fn optimize_exists( | |||
let subqry_inputs = query_info.query.subquery.inputs(); | |||
let subqry_input = only_or_err(subqry_inputs.as_slice()) | |||
.map_err(|e| context!("single expression projection required", e))?; | |||
let subqry_filter = Filter::try_from_plan(subqry_input) | |||
.map_err(|e| context!("cannot optimize non-correlated subquery", e))?; | |||
let subqry_filter = match subqry_input { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this fixes the error -- what was in the projection in the subquery that caused the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been looking at this as well. The existing code works for a very simple projection but does not work if the projection is wrapped in any other operator, such as Distinct, Filter, Limit, Sort, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although in this case it is now looking for a projection wrapping a filter and isn't looking for distinct so I am also confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what is happening now and have some suggestions for improving this rule.
This line of code looks at inputs of the subquery and does not care what type of operator the subquery is. Previously this was assumed to be a Projection
but now it could be a Projection
or a Distinct
, or something else ... I think we should add some pattern matching here.
let subqry_inputs = query_info.query.subquery.inputs();
We are then matching on this input and previously expected a Filter
buit now could be a Projection
containing a Filter
because everything is shifted down by one because of the root Distinct
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see something with explicit pattern matching to make sure we are only supporting intended cases. Here is my attempt:
fn optimize_exists(
query_info: &SubqueryInfo,
outer_input: &LogicalPlan,
outer_other_exprs: &[Expr],
) -> datafusion_common::Result<LogicalPlan> {
let subqry_filter = match query_info.query.subquery.as_ref() {
LogicalPlan::Distinct(subqry_distinct) => match subqry_distinct.input.as_ref() {
LogicalPlan::Projection(subqry_proj) => Filter::try_from_plan(&*subqry_proj.input),
_ => Err(DataFusionError::NotImplemented("todo: error message".to_string()))
}
LogicalPlan::Projection(subqry_proj) => Filter::try_from_plan(&*subqry_proj.input),
_ => Err(DataFusionError::NotImplemented("todo: error message".to_string()))
}.map_err(|e| context!("cannot optimize non-correlated subquery", e))?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks for your advice. @andygrove
labeler CI failure is unrelated #3743 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for making those changes @b41sh
Benchmark runs are scheduled for baseline = de9c7c5 and contender = 1e1de82. 1e1de82 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3724
Rationale for this change
What changes are included in this PR?
If the
plan
isDistinct
, get theFilter
fromProjection
Are there any user-facing changes?