-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linearize binary expressions to reduce proto tree complexity #4115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks @isidentical. I'd like to see us implement this in the logical plan as well, eventually, as mentioned in #1434.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @isidentical -- this looks like a great step forward!
a1d6b02
to
4213d8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great -- thank you @isidentical
let or_chain = (0..n) | ||
.fold(basic_expr.clone(), |expr, _| expr.or(basic_expr.clone())); | ||
// (a < 5) OR (a < 5) AND (a < 5) OR (a < 5) AND (a < 5) AND (a < 5) OR ... | ||
let expr = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
let expr_ordered = col("A").and(col("B")).and(col("C")).and(col("D")); | ||
assert_eq!(expr_ordered, roundtrip_expr(&expr_ordered)); | ||
|
||
// Ensure that no other variation becomes equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Benchmark runs are scheduled for baseline = 3892a1f and contender = 6b71294. 6b71294 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Seems it brings a regression issue which causes stack overflow issue for "sum case when" |
Thanks @yahoNanJing for the report! Would you mind sharing the logs (or linking them) (I don't see a new issue relevant to this and the CI on main is passing so not sure when it fails 🤔) |
Hi @isidentical, the issue may not relate to this PR. I made a mistake in our testing environment. Really sorry for bringing the confusing info. |
Ah, no problem at all. Let me know if it resurfaces @yahoNanJing! |
Which issue does this PR close?
Closes #4066.
Rationale for this change
This PR tries to represent chained binary expressions (like
a AND b AND C
ora + b + c
) in a linearized manner (so instead of((a, AND, b), AND, c)
, they are represented as([a, b, c], AND)
) which reduces the complexity of protobuf trees and help serialize some of the complex expressions that weren't possible to serialize before.What changes are included in this PR?
New representation of the binary expressions in serialized logical plans.
Are there any user-facing changes?
This PR changes the structure in the logical plan, so not sure if this qualifies as an API change. If it might be better to actually do it without removing the existing fields from the protobuf declaration of
BinaryExpr
, we can also add a new field to the current form and represent all the extra operands there (but I think this one is much more straightforward).