-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor extract_join_keys and move the ExtractEquijoinPredicate rule #4760
Conversation
col("t2.a") + lit(2i32).cast_to(&DataType::UInt32, &t2_schema)?, | ||
) | ||
.alias("t1.a + 1 = t2.a + 2"); | ||
let plan = LogicalPlanBuilder::from(t1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split_conjunction
will unalias the expr.
datafusion/core/tests/sql/joins.rs
Outdated
" CoalesceBatchesExec: target_batch_size=4096", | ||
" RepartitionExec: partitioning=Hash([Column { name: \"t1.t1_id + Int64(12)\", index: 2 }], 2)", | ||
" ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t1_id@0 + CAST(12 AS UInt32) as t1.t1_id + Int64(12)]", | ||
" RepartitionExec: partitioning=Hash([Column { name: \"t1.t1_id + UInt32(12)\", index: 2 }], 2)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we move the ExtractEquijoinPredicate
rule, the optimizer will
Simplify
the expression.- Extract join keys from join filter.
Since the ExtractEquijoinPredicate
can unalias the filter, the result is the same as #4755 now.
Arc::new(SimplifyExpressions::new()), | ||
Arc::new(UnwrapCastInComparison::new()), | ||
Arc::new(DecorrelateWhereExists::new()), | ||
Arc::new(DecorrelateWhereIn::new()), | ||
Arc::new(ScalarSubqueryToJoin::new()), | ||
Arc::new(ExtractEquijoinPredicate::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 related comments from #4711 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thank you @ygf11
use crate::{OptimizerConfig, OptimizerRule}; | ||
use datafusion_common::DFSchema; | ||
use datafusion_common::Result; | ||
use datafusion_expr::utils::{can_hash, find_valid_equijoin_key_pair}; | ||
use datafusion_expr::{BinaryExpr, Expr, ExprSchemable, Join, LogicalPlan, Operator}; | ||
use std::sync::Arc; | ||
|
||
// equijoin predicate | ||
type EquijoinPredicate = (Expr, Expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
match &expr { | ||
Expr::BinaryExpr(BinaryExpr { left, op, right }) => match op { | ||
Operator::Eq => { | ||
) -> Result<(Vec<EquijoinPredicate>, Option<Expr>)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a much nicer interface 👍 for being self documenting
I will take a look this pr carefully tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice job to me.
#[test] | ||
fn join_with_alias_filter() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I recommend add a integration-test to show the plan after all rule optimize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can't create a join whose condition is an alias a sql, I add a integration-test with dataframe api.
Thanks @ygf11 @jackwener and @liukun4515 |
Benchmark runs are scheduled for baseline = 0d6d371 and contender = 93052cd. 93052cd is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4759.
Rationale for this change
The other rule may depend on
ExtractEquijoinPredicate
.What changes are included in this PR?
split_conjunction
.ExtractEquijoinPredicate
rule behindSubqueryFilterToJoin
.Are these changes tested?
Yes.
Are there any user-facing changes?