Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window Functions Order Conservation -- Follow-up On Set Monotonicity #14813

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

berkaysynnada
Copy link
Contributor

@berkaysynnada berkaysynnada commented Feb 21, 2025

Which issue does this PR close?

Rationale for this change

#14271 had introduced set-monotonicity term for window functions. After that PR we have some false negatives for order conservation, and we don't have a fully complete test coverage for window functions in terms of set-monotonicity, partitioning type, frame type, and order of the function inputs.

What changes are included in this PR?

This PR enriches the ordering properties of WindowAggExec and BoundedWindowAggExec, and I've written a test function which covers all cases for a window function, both positive and negative cases.

Are these changes tested?

Yes

Are there any user-facing changes?

Order conservation in more cases

@berkaysynnada berkaysynnada marked this pull request as draft February 21, 2025 14:24
@github-actions github-actions bot added physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate logical-expr Logical plan and expressions labels Feb 21, 2025
@@ -222,208 +227,6 @@ async fn test_remove_unnecessary_sort5() -> Result<()> {
Ok(())
}

#[tokio::test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage of these tests are preserved in test_window_partial_constant_and_set_monotonicity()

@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Feb 21, 2025
" BoundedWindowAggExec: wdw=[count: Ok(Field { name: \"count\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(NULL), end_bound: CurrentRow, is_causal: false }], mode=[Sorted]",
" SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC], preserve_partitioning=[false]",
" DataSourceExec: partitions=1, partition_sizes=[0]"];
" SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC], preserve_partitioning=[false]",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the reviewers, please double check here. While 2nd BoundedWindowAggExec requires [nullable_col, non_nullable_col] ordering, it is not guaranteed in the previous version. If I'm not missing something, now it is, with a newly added partial sort.

@@ -2280,3 +2086,1265 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Result<()>
assert_optimized!(expected_input, expected_no_change, physical_plan, true);
Ok(())
}

#[tokio::test]
async fn test_window_partial_constant_and_set_monotonicity() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing this with rs-test seems hard to debug, and the relation between the arguments and initial-final plans is not very clear. So I prefer this

@@ -498,6 +697,15 @@ pub fn get_window_mode(
None
}

fn all_possible_sort_options(expr: Arc<dyn PhysicalExpr>) -> Vec<PhysicalSortExpr> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be init once maybe

@berkaysynnada berkaysynnada marked this pull request as ready for review February 23, 2025 18:11
@berkaysynnada
Copy link
Contributor Author

PTAL @ozankabak, @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant