-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add one more projection to recover output schema #4733
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn redundant_project_fields() { | ||
let table_scan = test_table_scan().unwrap(); | ||
let affected_id: BTreeSet<Identifier> = | ||
["c+a".to_string(), "d+a".to_string()].into_iter().collect(); | ||
let expr_set = [ | ||
["c+a".to_string(), "b+a".to_string()].into_iter().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes for this and the next test case's changes:
The mock table scan doesn't have a column named "d
", so I changed it to "b
". And "a", "b" are ambiguous when join, thus I add qualifiers to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @waynexia
cc @liukun4515 and @jackwener
@@ -143,10 +143,15 @@ impl OptimizerRule for CommonSubexprEliminate { | |||
)?; | |||
|
|||
if let Some(predicate) = pop_expr(&mut new_expr)?.pop() { | |||
Ok(Some(LogicalPlan::Filter(Filter::try_new( | |||
let filter = LogicalPlan::Filter(Filter::try_new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for using try_new
if sort.schema() == &input_schema { | ||
Ok(Some(sort)) | ||
} else { | ||
Ok(Some(build_recover_project_plan(&input_schema, sort))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be possible to reduce the replication as well as be slightly more "defensive" coding style if instead of build_recover_project_plan
on each path, it could be called once to ensure the schema was always the same
Something like
impl OptimizerRule for CommonSubexprEliminate {
fn try_optimize(
&self,
plan: &LogicalPlan,
config: &dyn OptimizerConfig,
) -> Result<Option<LogicalPlan>> {
let mut expr_set = ExprSet::new();
let input_schema = Arc::clone(&plan.schema());
let new_plan = match plan {
...
}?;
if new_plan.schema() == &input_schema {
Ok(Some(new_plan))
} else {
Ok(Some(build_recover_project_plan(&input_schema, new_plan)))
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! It's really a good practice 👍 I change this in 162838c
Signed-off-by: Ruihang Xia <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
}; | ||
|
||
// add an additional projection if the output schema changed. | ||
if optimized_plan.schema() != &original_schema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Benchmark runs are scheduled for baseline = cd4fd80 and contender = 326117c. 326117c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4575 .
Rationale for this change
common_subexpression_eliminate
optimizer doesn't consider the situation that some plans (likeSort
orFilter
) rely on the input's schema. When introducing an intermediate projection schema that evaluates the common expressions, it also changes the "input schema" for those optimized plans. This patch will check if the output schema changed, and add another projection to recover the output schema if so.What changes are included in this PR?
Except the extra projection described above, this patch also fixes the problem that the optimizer used to build
Projection
plan manually, rather than throughProjection::try_new()
constructor. This will bypass the schema check process and may produce a wrong projection plan, whose schema doesn't match its exprs.Are these changes tested?
Yes, add a new case
filter_schema_changed
to coverAre there any user-facing changes?
no