-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orthogonalize distribution and sort enforcement rules into EnforceDistribution
and EnforceSorting
#4839
Conversation
…-ai/arrow-datafusion into feature/unify_sort_rules # Conflicts: # datafusion/core/src/execution/context.rs
cc @mingmwang |
Thank you @mustafasrepo -- I plan to review this PR tomorrow |
EnforceDistribution
and EnforceSorting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mustafasrepo -- this is a very nice change 👌
I agree with @jackwener it would be good to get @mingmwang 's opinion on this change if they have time. However, given all the existing tests pass I view this as a (nice) refactoring exercise that is likely to be uncontroversial
cc @liukun4515
let optimized = optimizer.optimize($PLAN, &config)?; | ||
let optimizer = EnforceSorting {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is slightly confusing that the tests in EnforceDistribution
also rely on EnforceSorting
though I understand the reason for this given they both started in the same pass
Maybe we can add a comment like:
// These tests also ensure `EnforceSorting` because they were written prior to the
// separation of `EnforceSorting` and `EnfoceDistribution`
Or something like that to give future readers a clue about the rationale for this seemingly strange choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- just sent a commit with a NOTE
and a TODO
.
@@ -287,6 +287,10 @@ impl ExecutionPlan for LocalLimitExec { | |||
self.input.output_ordering() | |||
} | |||
|
|||
fn maintains_input_order(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@mustafasrepo @alamb For the original |
@@ -832,7 +828,7 @@ fn new_join_conditions( | |||
/// Within this function, it checks whether we need to add additional plan operators | |||
/// of data exchanging and data ordering to satisfy the required distribution and ordering. | |||
/// And we should avoid to manually add plan operators of data exchanging and data ordering in other places |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the comment here and remove the "data ordering".
The change LGTM except for a small comment change. |
Thanks for the review @mingmwang, comment is updated. |
Thanks everyone |
Benchmark runs are scheduled for baseline = c5e2594 and contender = ceff6cb. ceff6cb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
N/A
Rationale for this change
During the review of #4691, one of the key findings was that there was a small separation of concern issue: Both
BasicEnforcement
andOptimizeSorts
were dealing with local sorting and there was some overlap in functionality.This PR pays this technical debt: Enforcers of distribution and sorting requirements will henceforth be two completely orthogonal rules (
EnforceDistribution
andEnforceSorting
). Note that one can get the same result with the oldBasicEnforcement
rule by applying these two rules in succession.The new
EnforceSorting
doesn't just enforce sorts by naively addingSortExec
s, it will smartly add OR remove them as it enforces the ordering requirements. This will hopefully help with rule reuse and ease reasoning (we will not lose optimality by liberally usingEnforceSorting
).What changes are included in this PR?
Local sort enforcement AND optimization is handled with a single rule. Current rule can be called multiple times without a downside in terms of the final physical plan.
Are these changes tested?
Existing tests check for plan correctness.
Are there any user-facing changes?
No.
Future Work
Some of the
EnforceDistribution
tests actually test the fullEnforceDistribution
+EnforceSorting
cascade. It would be a good idea to orthogonalize those tests too.