Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer: avoid every rule must recursive children in optimizer #4618

Merged
merged 2 commits into from
Dec 16, 2022

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Dec 14, 2022

Which issue does this PR close?

Closes #4613 .

Rationale for this change

Currently, we must do optimize recursively in each rules, which is troubled.

It make it's a little hard for putting some rule into optimizer.

What changes are included in this PR?

avoid every rule must recursive children in optimizer

And I modify the eliminate_limit as an example.

When apply_order is None, it's completely compatible with original behavior.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Dec 14, 2022
Comment on lines +73 to +68
/// If a rule use default None, its should traverse recursively plan inside itself
fn apply_order(&self) -> Option<ApplyOrder> {
None
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It' compatible with origin

@jackwener
Copy link
Member Author

PTAL @alamb @Dandandan @liukun4515

@jackwener
Copy link
Member Author

In the future, I also want to utilize the trait of TreeNodeRewritable to unify to code.

@alamb
Copy link
Contributor

alamb commented Dec 14, 2022

I plan to look at this PR carefully tomorrow

@tustvold
Copy link
Contributor

We definitely shouldn't hold this up on a theoretical future change, but just FYI - #4627

@github-actions github-actions bot added the core Core DataFusion crate label Dec 14, 2022
@jackwener
Copy link
Member Author

has resolved conflict, now there are clear

@github-actions github-actions bot removed the core Core DataFusion crate label Dec 14, 2022
@mingmwang
Copy link
Contributor

@jackwener I think you can leverage the TreeNodeRewritable trait to unify the code.
And there are two downsides of the approach:

  1. There will be additional clone() call to the plan structs.
  2. When implement rules, if we want to implement the rule as a top down process, need to be very careful with transform_down() closures, it might cause stack over flow(because of no duplication invocation check).

@mingmwang
Copy link
Contributor

BTW, some complex rules might be a mixed up of bottom-up and top-down processes.

@jackwener
Copy link
Member Author

Good advice for me, thanks @mingmwang

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really great to me @jackwener -- thank you!

I have some suggestions related to adding some more docstrings but I think this PR is ready to go.

Note it will likely conflict with #4645

@@ -59,6 +59,11 @@ pub trait OptimizerRule {

/// A human readable name for this optimizer rule
fn name(&self) -> &str;

/// If a rule use default None, its should traverse recursively plan inside itself
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If a rule use default None, its should traverse recursively plan inside itself
/// How should the rule be applied by the optimizer? See comments on [`ApplyOrder`] for details.
///
/// If a rule returns `None`, the default, its should traverse the plan recursively inside itself

@@ -274,6 +284,77 @@ impl Optimizer {
debug!("Optimizer took {} ms", start_time.elapsed().as_millis());
Ok(new_plan)
}

fn optimize_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value does this function add? In other words, why not call rule.try_optimize directly at the callsite 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to add future TODO:
we can do batch optimize

for rule in rules:
    rule.try_optimize(rule)

Ok(Some(plan.with_new_inputs(new_inputs.as_slice())?))
}

pub fn optimize_recursively(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add docstrings about what this API does?

Also, I wonder if it needs to be pub or if it could be private 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometime we need use it in UT😂

Some(order) => match order {
ApplyOrder::TopDown => {
let optimize_self_opt =
self.optimize_node(rule, plan, optimizer_config)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a shot at rewriting this stuff to use Result::and_then, etc and I think how you have written it here is clearer 👍

@jackwener jackwener force-pushed the optimizer branch 3 times, most recently from 628cb5f to e0101ce Compare December 16, 2022 04:08
@jackwener
Copy link
Member Author

Thanks @alamb review!❤️ Advices is used for me.

@jackwener
Copy link
Member Author

CI fail due to clippy, fix in #4652

@jackwener
Copy link
Member Author

jackwener commented Dec 16, 2022

CI fail due to clippy, fix in #4652

has fixed it, and rebased.

///
/// Notice: **sometime** result after optimize still can be optimized, we need apply again.
///
/// Usage Example: Merge Limit (subtree pattern is: Limit-Limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@alamb
Copy link
Contributor

alamb commented Dec 16, 2022

This PR had conflicts, so I took the liberty of resolving them and pushing a fix as @jackwener had already done so 👍

Once CI passes I plan to merge this in again. Thanks again @jackwener

@alamb alamb merged commit 414487c into apache:master Dec 16, 2022
@ursabot
Copy link

ursabot commented Dec 16, 2022

Benchmark runs are scheduled for baseline = 920f11a and contender = 414487c. 414487c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rules don't need to recursion inside themselves
5 participants