-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Change equijoin keys from column to expression in logical join #4602
Conversation
let expected = if repartition_joins { | ||
vec![ | ||
"ProjectionExec: expr=[t1_id@0 as t1_id, t2_id@2 as t2_id, t1_name@1 as t1_name]", | ||
" ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t2_id@3 as t2_id]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems merging projection does not consider the ordering also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think that could be improved as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will work on it.
" CoalesceBatchesExec: target_batch_size=4096", | ||
" RepartitionExec: partitioning=Hash([Column { name: \"t1.t1_id + Int64(11)\", index: 3 }], 2)", | ||
" ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t1_int@2 as t1_int, CAST(t1_id@0 AS Int64) + 11 as t1.t1_id + Int64(11)]", | ||
" RepartitionExec: partitioning=RoundRobinBatch(2)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name in join and projection matches now, both are t1.t1_id + Int64(11)
.
cc @liukun4515
" TableScan: t2 projection=[t2_id, t2_name, t2_int] [t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]", | ||
" Inner Join: CAST(t1.t1_id AS Int64) + Int64(12) = CAST(t2.t2_id AS Int64) + Int64(1) [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]", | ||
" TableScan: t1 projection=[t1_id, t1_name, t1_int] [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N]", | ||
" TableScan: t2 projection=[t2_id, t2_name, t2_int] [t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical plan does not need any projections now.
@alamb @mingmwang @jackwener @liukun4515, please take a look, thanks. |
I plan to review this carefully tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really cool PR. Thank you @ygf11
I went through the code and all the test changes carefully and I think this PR is 👨🍳 👌 very nice
I think it is ready to go, though plan to leave it open for another day or two in case @jackwener or anyone else would like a chance to review it.
&self, | ||
right: &LogicalPlan, | ||
join_type: JoinType, | ||
join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe calling this parameter equi_exprs
would better reflect what it is (exprs, not column keys) 🤔
@@ -1646,8 +1650,8 @@ pub struct Join { | |||
pub left: Arc<LogicalPlan>, | |||
/// Right input | |||
pub right: Arc<LogicalPlan>, | |||
/// Equijoin clause expressed as pairs of (left, right) join columns | |||
pub on: Vec<(Column, Column)>, | |||
/// Equijoin clause expressed as pairs of (left, right) join expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
" Projection: t2.a, t2.b, t2.c, t2.a * UInt32(2) [a:UInt32, b:UInt32, c:UInt32, t2.a * UInt32(2):UInt32]", | ||
" TableScan: t2 [a:UInt32, b:UInt32, c:UInt32]", | ||
]; | ||
"Filter: t2.c < UInt32(20) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is quite cool to see the expressions directly in the Join without needing a projection to compute them
left_keys, | ||
)?, | ||
JoinConstraint::Using => { | ||
// The equijoin keys in using-join must be column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
\n TableScan: person\ | ||
\n Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\ | ||
\n TableScan: orders"; | ||
\n Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this plan is certainly much nicer -- I don't understand where all the other columns used to come from but this is 👍
let expected = if repartition_joins { | ||
vec![ | ||
"ProjectionExec: expr=[t1_id@0 as t1_id, t2_id@2 as t2_id, t1_name@1 as t1_name]", | ||
" ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t2_id@3 as t2_id]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think that could be improved as a follow on PR
Because this PR has been outstanding for some time, and it had a conflict I took the liberty to merge from master and resolve the conflicts I am about out of time today but I plan to merge this PR first thing tomorrow. Thanks again @ygf11 -- really great stuff. |
Co-authored-by: Andrew Lamb <[email protected]>
Thanks @alamb. I resolve the remaining conflict, and ci success now. |
Thanks again @ygf11 -- this is great work |
Benchmark runs are scheduled for baseline = 42b3a6c and contender = 8d36529. 8d36529 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4389.
Rationale for this change
It can make the display representation of logical join more clean.
What changes are included in this PR?
parse_join
.EliminateCrossJoin
,FilterNullJoinKeys
,PushDownFilter
,SubqueryFilterToJoin
optimization rules.JoinNode
inproto
module.Are these changes tested?
Yes
Are there any user-facing changes?
Yes, the
JoinNode
inproto
module is also changed.