Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting` #4839

mustafasrepo · 2023-01-07T06:17:01Z

Which issue does this PR close?

N/A

Rationale for this change

During the review of #4691, one of the key findings was that there was a small separation of concern issue: Both BasicEnforcement and OptimizeSorts were dealing with local sorting and there was some overlap in functionality.

This PR pays this technical debt: Enforcers of distribution and sorting requirements will henceforth be two completely orthogonal rules (EnforceDistribution and EnforceSorting). Note that one can get the same result with the old BasicEnforcement rule by applying these two rules in succession.

The new EnforceSorting doesn't just enforce sorts by naively adding SortExecs, it will smartly add OR remove them as it enforces the ordering requirements. This will hopefully help with rule reuse and ease reasoning (we will not lose optimality by liberally using EnforceSorting).

What changes are included in this PR?

Local sort enforcement AND optimization is handled with a single rule. Current rule can be called multiple times without a downside in terms of the final physical plan.

Are these changes tested?

Existing tests check for plan correctness.

Are there any user-facing changes?

No.

Future Work

Some of the EnforceDistribution tests actually test the full EnforceDistribution + EnforceSorting cascade. It would be a good idea to orthogonalize those tests too.

…-ai/arrow-datafusion into feature/unify_sort_rules # Conflicts: # datafusion/core/src/execution/context.rs

jackwener · 2023-01-07T08:02:35Z

cc @mingmwang

alamb · 2023-01-07T14:59:43Z

Thank you @mustafasrepo -- I plan to review this PR tomorrow

alamb

Thank you @mustafasrepo -- this is a very nice change 👌

I agree with @jackwener it would be good to get @mingmwang 's opinion on this change if they have time. However, given all the existing tests pass I view this as a (nice) refactoring exercise that is likely to be uncontroversial

cc @liukun4515

alamb · 2023-01-08T11:11:44Z

datafusion/core/src/physical_optimizer/dist_enforcement.rs

            let optimized = optimizer.optimize($PLAN, &config)?;
+            let optimizer = EnforceSorting {};


it is slightly confusing that the tests in EnforceDistribution also rely on EnforceSorting though I understand the reason for this given they both started in the same pass

Maybe we can add a comment like:

// These tests also ensure `EnforceSorting` because they were written prior to the // separation of `EnforceSorting` and `EnfoceDistribution`

Or something like that to give future readers a clue about the rationale for this seemingly strange choice

I agree -- just sent a commit with a NOTE and a TODO.

alamb · 2023-01-08T11:12:38Z

datafusion/core/src/physical_plan/limit.rs

@@ -287,6 +287,10 @@ impl ExecutionPlan for LocalLimitExec {
        self.input.output_ordering()
    }

+    fn maintains_input_order(&self) -> bool {


…ointly

mingmwang · 2023-01-09T02:50:43Z

@mustafasrepo @alamb
Generally, I am OK with this change. I will take a closer look at the PR today.

For the original OptimizeSorts rule, I just have a feeling that the implementation was too complex and I think maybe there are ways to simply it. And I also have a feeling that the rule should be implemented in a top-down approach. I will try to do some test also.
Adding SortExec and RepartitionExec is applied in a bottom-up approach(this can also be implemented in a top-down way, it is much more complex but more powerful, I plan to implement this in future); Removing SortExec can be applied in a top-down approach because we want to remove the unnecessary Sorts which are usually in the high level of the plan tree.

mingmwang · 2023-01-09T03:13:23Z

datafusion/core/src/physical_optimizer/dist_enforcement.rs

@@ -832,7 +828,7 @@ fn new_join_conditions(
 /// Within this function, it checks whether we need to add additional plan operators
 /// of data exchanging and data ordering to satisfy the required distribution and ordering.
 /// And we should avoid to manually add plan operators of data exchanging and data ordering in other places


Please modify the comment here and remove the "data ordering".

mingmwang · 2023-01-09T03:19:19Z

The change LGTM except for a small comment change.

ozankabak · 2023-01-09T03:21:00Z

Thanks for the review @mingmwang, comment is updated.

alamb · 2023-01-09T22:10:49Z

Thanks everyone

ursabot · 2023-01-09T22:14:18Z

Benchmark runs are scheduled for baseline = c5e2594 and contender = ceff6cb. ceff6cb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

mustafasrepo and others added 3 commits January 3, 2023 10:01

Separate sort rule

c134f53

Merge branch 'feature/unify_sort_rules' of https://github.com/synnada…

fd21a47

…-ai/arrow-datafusion into feature/unify_sort_rules # Conflicts: # datafusion/core/src/execution/context.rs

Migrate to clearer file name, tidy up comments

2260133

github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions labels Jan 7, 2023

alamb changed the title ~~Orthogonalize distribution and sort enforcement rules~~ Orthogonalize distribution and sort enforcement rules into EnforceDistribution and EnforceSorting Jan 8, 2023

alamb approved these changes Jan 8, 2023

View reviewed changes

Add a note about tests verifying EnforceDistribution/EnforceSorting j…

4259e54

…ointly

mingmwang reviewed Jan 9, 2023

View reviewed changes

Address review, fix the stale comment

849f7c6

alamb merged commit ceff6cb into apache:master Jan 9, 2023

mustafasrepo deleted the feature/unify_sort_rules branch January 10, 2023 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting` #4839

Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting` #4839

mustafasrepo commented Jan 7, 2023

jackwener commented Jan 7, 2023

alamb commented Jan 7, 2023

alamb left a comment

alamb Jan 8, 2023

ozankabak Jan 8, 2023

alamb Jan 8, 2023

mingmwang commented Jan 9, 2023 •

edited

Loading

mingmwang Jan 9, 2023

mingmwang commented Jan 9, 2023

ozankabak commented Jan 9, 2023

alamb commented Jan 9, 2023

ursabot commented Jan 9, 2023

		let optimized = optimizer.optimize($PLAN, &config)?;
		let optimizer = EnforceSorting {};

Orthogonalize distribution and sort enforcement rules into EnforceDistribution and EnforceSorting #4839

Orthogonalize distribution and sort enforcement rules into EnforceDistribution and EnforceSorting #4839

Conversation

mustafasrepo commented Jan 7, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Future Work

jackwener commented Jan 7, 2023

alamb commented Jan 7, 2023

alamb left a comment

Choose a reason for hiding this comment

alamb Jan 8, 2023

Choose a reason for hiding this comment

ozankabak Jan 8, 2023

Choose a reason for hiding this comment

alamb Jan 8, 2023

Choose a reason for hiding this comment

mingmwang commented Jan 9, 2023 • edited Loading

mingmwang Jan 9, 2023

Choose a reason for hiding this comment

mingmwang commented Jan 9, 2023

ozankabak commented Jan 9, 2023

alamb commented Jan 9, 2023

ursabot commented Jan 9, 2023

Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting` #4839

Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting` #4839

mingmwang commented Jan 9, 2023 •

edited

Loading