Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor extract_join_keys and move the ExtractEquijoinPredicate rule #4760

Merged
merged 9 commits into from
Jan 1, 2023

Conversation

ygf11
Copy link
Contributor

@ygf11 ygf11 commented Dec 28, 2022

Which issue does this PR close?

Closes #4759.

Rationale for this change

The other rule may depend on ExtractEquijoinPredicate.

What changes are included in this PR?

  1. Refactor extract_join_keys with split_conjunction.
  2. Move the ExtractEquijoinPredicate rule behind SubqueryFilterToJoin.

Are these changes tested?

Yes.

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Dec 28, 2022
@github-actions github-actions bot added the core Core DataFusion crate label Dec 28, 2022
@ygf11 ygf11 marked this pull request as ready for review December 29, 2022 10:23
col("t2.a") + lit(2i32).cast_to(&DataType::UInt32, &t2_schema)?,
)
.alias("t1.a + 1 = t2.a + 2");
let plan = LogicalPlanBuilder::from(t1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split_conjunction will unalias the expr.

" CoalesceBatchesExec: target_batch_size=4096",
" RepartitionExec: partitioning=Hash([Column { name: \"t1.t1_id + Int64(12)\", index: 2 }], 2)",
" ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t1_id@0 + CAST(12 AS UInt32) as t1.t1_id + Int64(12)]",
" RepartitionExec: partitioning=Hash([Column { name: \"t1.t1_id + UInt32(12)\", index: 2 }], 2)",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we move the ExtractEquijoinPredicate rule, the optimizer will

  1. Simplify the expression.
  2. Extract join keys from join filter.

Since the ExtractEquijoinPredicate can unalias the filter, the result is the same as #4755 now.

@github-actions github-actions bot removed the core Core DataFusion crate label Dec 30, 2022
Arc::new(SimplifyExpressions::new()),
Arc::new(UnwrapCastInComparison::new()),
Arc::new(DecorrelateWhereExists::new()),
Arc::new(DecorrelateWhereIn::new()),
Arc::new(ScalarSubqueryToJoin::new()),
Arc::new(ExtractEquijoinPredicate::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 related comments from #4711 (comment)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thank you @ygf11

use crate::{OptimizerConfig, OptimizerRule};
use datafusion_common::DFSchema;
use datafusion_common::Result;
use datafusion_expr::utils::{can_hash, find_valid_equijoin_key_pair};
use datafusion_expr::{BinaryExpr, Expr, ExprSchemable, Join, LogicalPlan, Operator};
use std::sync::Arc;

// equijoin predicate
type EquijoinPredicate = (Expr, Expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

match &expr {
Expr::BinaryExpr(BinaryExpr { left, op, right }) => match op {
Operator::Eq => {
) -> Result<(Vec<EquijoinPredicate>, Option<Expr>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a much nicer interface 👍 for being self documenting

@liukun4515
Copy link
Contributor

I will take a look this pr carefully tomorrow.

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice job to me.

Comment on lines +361 to +362
#[test]
fn join_with_alias_filter() -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I recommend add a integration-test to show the plan after all rule optimize it.

Copy link
Contributor Author

@ygf11 ygf11 Jan 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can't create a join whose condition is an alias a sql, I add a integration-test with dataframe api.

@github-actions github-actions bot added the core Core DataFusion crate label Jan 1, 2023
@ygf11 ygf11 changed the title Refactor extract_ join_keys and move the ExtractEquijoinPredicate rule Refactor extract_join_keys and move the ExtractEquijoinPredicate rule Jan 1, 2023
@alamb
Copy link
Contributor

alamb commented Jan 1, 2023

Thanks @ygf11 @jackwener and @liukun4515

@alamb alamb merged commit 93052cd into apache:master Jan 1, 2023
@ursabot
Copy link

ursabot commented Jan 1, 2023

Benchmark runs are scheduled for baseline = 0d6d371 and contender = 93052cd. 93052cd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ygf11 ygf11 deleted the refactor-extract-join-keys branch January 2, 2023 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move the ExtractEquijoinPredicate behind the SubqueryFilterToJoin
5 participants