-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Ordering Equivalence Support #6956
Extend Ordering Equivalence Support #6956
Conversation
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
…da-ai/arrow-datafusion into feature/oeq_extend_support # Conflicts: # datafusion/core/src/physical_plan/joins/sort_merge_join.rs # datafusion/core/src/physical_plan/projection.rs
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this carefully and it looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really neat PR -- thank you @mustafasrepo and @ozankabak
I had one comment about a test that I think is worth considering otherwise I think it looks great to me
@@ -107,6 +107,10 @@ impl ExecutionPlan for CoalescePartitionsExec { | |||
self.input.equivalence_properties() | |||
} | |||
|
|||
fn ordering_equivalence_properties(&self) -> OrderingEquivalenceProperties { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though CoalescePartitionsExec
doesn't preserve the input sort order I agree it preserves the input equivalences 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when existing ordering is invalidated (maintains_input_order
is false). OrderingEquivalenceProperties
shouldn't be propagated (unlike Equivalence
). The reason is that OrderingEquivalence
contains different alternative versions to describe ordering of the table. Once ordering is lost, all of the alternatives descriptions are invalid also. Hence this implementation should be removed. I removed this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity: There could be two meanings we can assign to ordering equivalences:
- Entries in the ordering equivalence set describe different, alternative descriptions of the current table ordering. This is @mustafasrepo's interpretation.
- Entries in the ordering equivalence set describe different, alternative sorts one can apply to this table and end up with the same ordering. This is @alamb's interpretation.
(2) is more general than (1), and it opens up possibilities for more optimizations. However, it is also more difficult to implement in all cases. One can construct examples where the invariant described by (2) is lost when one has order-destroying operators interspersed in between order-imposing operators in deep-ish plans.
We just had a nice discussion with @mustafasrepo on this, and we added generalizing from (1) to (2) to our roadmap of future improvements. Let's get (1) done in a robust way first, and in the future we will revisit this and figure out how to get to (2), and apply even more optimizations!
@@ -381,6 +376,34 @@ impl ExecutionPlan for HashJoinExec { | |||
) | |||
} | |||
|
|||
fn ordering_equivalence_properties(&self) -> OrderingEquivalenceProperties { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this calculation should be the same for other ExecutionPlans
like CrossJoin / NestedLoopsJoin as well as the ordering equivalence is a function of the join predicates and type (INNER/OUTER, etc), not the algorithms?
I wonder if we should move this to a common location rather than just for HashJoinExec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above reasoning, only HashJoinExec
maintains_input_order under certain conditions. Hence, the only join that needs ordering_equivalence implementation is HashJoinExec
. However, other joins such as CrossJoin
, NestedLoopsJoin
may maintain ordering under certain conditions if so ordering equivalence can be implemented for them also. I will analyze their implementation, if there are case that ordering can be maintained, I will implement their maintains_input_order
and ordering_equivalence_properties
methods in follow-up PRs. During that PR we can move, this implementation to a common location if there is a pattern between ordering_equivalence_properties
of different joins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense -- thank you
Which issue does this PR close?
Closes #.
Rationale for this change
Currently ordering equivalence properties cannot be propagated through PhysicalPlan. However, for executors that maintain its input order, we should be able to propagate ordering equivalence properties. This PR implements ordering equivalence properties for executors that maintains ordering. For most of the executors, this is trivial (such as
FilterExec
).However, for
SortMergeJoinExec
andHashJoinExec
updating ordering equivalence involves non-trivial implementation. Reviewers can focus on these executors.Also during tests for propagating ordering equivalence properties through executors. I found a bug in how Projection handles ordering equivalence and equivalence. This PR fixes it also.
What changes are included in this PR?
Are these changes tested?
Yes new tests are added to
window.slt
andjoins.slt
fileAre there any user-facing changes?