Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: avoid every rule must recursive children in optimizer #3972

Closed
wants to merge 1 commit into from

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Oct 26, 2022

Which issue does this PR close?

This PR relate #2620.

Currently, we must do optimize recursively in each rules, which is troubled.

It make it's a little hard for putting some rule into optimizer.

If use new optimizer, we just need to consider the subtree which we need to consider.

However, it's really challenging is how to deal with the rules that currently exist

Rationale for this change

What changes are included in this PR?

This PR is draft, which present how to use optimizer instead of rule to traverse plantree.

And PR include an example rule, which present the new rule.

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Oct 26, 2022
@mingmwang
Copy link
Contributor

I also implemented similar things to PhysicalExprs. I think we can have a unified Trait.

https://github.com/apache/arrow-datafusion/blob/b705574cbb334fd07aab355437a56aa118e578f9/datafusion/physical-expr/src/physical_expr.rs

@mingmwang
Copy link
Contributor

@alamb

@jackwener
Copy link
Member Author

jackwener commented Oct 26, 2022

I also implemented similar things to PhysicalExprs. I think we can have a unified Trait.

https://github.com/apache/arrow-datafusion/blob/b705574cbb334fd07aab355437a56aa118e578f9/datafusion/physical-expr/src/physical_expr.rs

Yes, Essentially it's the same thing.

The main difference is that one is a plan tree and the other is an expression tree

@mingmwang
Copy link
Contributor

I also implemented similar things to PhysicalExprs. I think we can have a unified Trait.
https://github.com/apache/arrow-datafusion/blob/b705574cbb334fd07aab355437a56aa118e578f9/datafusion/physical-expr/src/physical_expr.rs

Yes, Essentially it's the same thing.

The main difference is that one is a plan tree and the other is an expression book tree

The Trait TreeNodeRewritable is quite general, It does not have any assumption on the Tree Node type.
Implementations just need to implement one method map_children(). And It also support rewriting using Visitors which can get ride of some clone() and can do more things in one tree walk(For example collect necessary context information and do rewriting, it's hybird of PostOrder and PreOder Tree traversal)

@jackwener
Copy link
Member Author

The Trait TreeNodeRewritable is quite general, It does not have any assumption on the Tree Node type. Implementations just need to implement one method map_children(). And It also support rewriting using Visitors which can get ride of some clone() and can do more things in one tree walk(For example collect necessary context information and do rewriting, it's hybird of PostOrder and PreOder Tree traversal)

This job is great. It will improve optimizer well and is overall compatible.

The trouble may lie in how to deal with the rules that currently exist. And I think we can create a new optimizer struct and gradually migrate the rules.

I find these code don't exist in master, Looking forward to your PR, @mingmwang thanks!

@jackwener
Copy link
Member Author

@alamb @andygrove @Dandandan how do you think?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this is headed @jackwener 👍

@@ -342,6 +342,53 @@ impl LogicalPlan {
self.accept(&mut visitor)?;
Ok(visitor.using_columns)
}

pub fn clone_with_inputs(&self, inputs: Vec<LogicalPlan>) -> Result<LogicalPlan, DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a non trivial amount in common with from_plan:
https://github.com/apache/arrow-datafusion/blob/4cb8ac094ee88dc0023b4cde1b39840a0a362ee0/datafusion/expr/src/utils.rs#L363

I wonder if we could make from_plan easier to find / use / more general 🤔

Comment on lines +277 to +286
fn optimize_inputs(&self, rule: &Arc<dyn OptimizerRule + Send + Sync>, plan: &LogicalPlan, optimizer_config: &mut OptimizerConfig) -> Result<LogicalPlan> {
let result: Result<Vec<LogicalPlan>> = plan
.inputs()
.into_iter()
.map(|sub_plan| self.optimize_recursively(rule, sub_plan, optimizer_config))
.collect();
let inputs = result?;
plan.clone_with_inputs(inputs)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is headed @jackwener 👍

@jackwener jackwener changed the title draft: refactor optimizer to avoid every rule must recursive children. draft: avoid every rule must recursive children in optimizer and add bottom-up optimization Nov 12, 2022
@jackwener jackwener changed the title draft: avoid every rule must recursive children in optimizer and add bottom-up optimization draft: avoid every rule must recursive children in optimizer Nov 15, 2022
@jackwener jackwener closed this Dec 14, 2022
@jackwener
Copy link
Member Author

in new PR #4618

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants