Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Ordering Equivalence Support #6956

Merged
merged 26 commits into from
Jul 17, 2023
Merged

Extend Ordering Equivalence Support #6956

merged 26 commits into from
Jul 17, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Currently ordering equivalence properties cannot be propagated through PhysicalPlan. However, for executors that maintain its input order, we should be able to propagate ordering equivalence properties. This PR implements ordering equivalence properties for executors that maintains ordering. For most of the executors, this is trivial (such as FilterExec).
However, for SortMergeJoinExec and HashJoinExec updating ordering equivalence involves non-trivial implementation. Reviewers can focus on these executors.

Also during tests for propagating ordering equivalence properties through executors. I found a bug in how Projection handles ordering equivalence and equivalence. This PR fixes it also.

What changes are included in this PR?

Are these changes tested?

Yes new tests are added to window.slt and joins.slt file

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 13, 2023
mustafasrepo and others added 4 commits July 13, 2023 19:03
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this carefully and it looks good to me.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really neat PR -- thank you @mustafasrepo and @ozankabak

I had one comment about a test that I think is worth considering otherwise I think it looks great to me

@@ -107,6 +107,10 @@ impl ExecutionPlan for CoalescePartitionsExec {
self.input.equivalence_properties()
}

fn ordering_equivalence_properties(&self) -> OrderingEquivalenceProperties {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though CoalescePartitionsExec doesn't preserve the input sort order I agree it preserves the input equivalences 👍

Copy link
Contributor Author

@mustafasrepo mustafasrepo Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, when existing ordering is invalidated (maintains_input_order is false). OrderingEquivalenceProperties shouldn't be propagated (unlike Equivalence). The reason is that OrderingEquivalence contains different alternative versions to describe ordering of the table. Once ordering is lost, all of the alternatives descriptions are invalid also. Hence this implementation should be removed. I removed this implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: There could be two meanings we can assign to ordering equivalences:

  1. Entries in the ordering equivalence set describe different, alternative descriptions of the current table ordering. This is @mustafasrepo's interpretation.
  2. Entries in the ordering equivalence set describe different, alternative sorts one can apply to this table and end up with the same ordering. This is @alamb's interpretation.

(2) is more general than (1), and it opens up possibilities for more optimizations. However, it is also more difficult to implement in all cases. One can construct examples where the invariant described by (2) is lost when one has order-destroying operators interspersed in between order-imposing operators in deep-ish plans.

We just had a nice discussion with @mustafasrepo on this, and we added generalizing from (1) to (2) to our roadmap of future improvements. Let's get (1) done in a robust way first, and in the future we will revisit this and figure out how to get to (2), and apply even more optimizations!

@@ -381,6 +376,34 @@ impl ExecutionPlan for HashJoinExec {
)
}

fn ordering_equivalence_properties(&self) -> OrderingEquivalenceProperties {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this calculation should be the same for other ExecutionPlans like CrossJoin / NestedLoopsJoin as well as the ordering equivalence is a function of the join predicates and type (INNER/OUTER, etc), not the algorithms?

I wonder if we should move this to a common location rather than just for HashJoinExec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above reasoning, only HashJoinExec maintains_input_order under certain conditions. Hence, the only join that needs ordering_equivalence implementation is HashJoinExec. However, other joins such as CrossJoin, NestedLoopsJoin may maintain ordering under certain conditions if so ordering equivalence can be implemented for them also. I will analyze their implementation, if there are case that ordering can be maintained, I will implement their maintains_input_order and ordering_equivalence_properties methods in follow-up PRs. During that PR we can move, this implementation to a common location if there is a pattern between ordering_equivalence_properties of different joins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- thank you

@mustafasrepo mustafasrepo merged commit b0d2491 into apache:main Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants