Reduce redundancy in sort_enforcement tests #4928

alamb · 2023-01-17T01:03:51Z

Which issue does this PR close?

Rationale for this change

Rationale: I am working on a bug related to sort enforcement and wanted to add a new test and the current structure was very hard to do without a lot of copy / paste.

What changes are included in this PR?

refactor repetition in sort_enforcement.rs tests into a macro

Are these changes tested?

Only tests

Are there any user-facing changes?

No -- there is no intended change in behavior

alamb · 2023-01-17T01:05:00Z

datafusion/core/src/physical_optimizer/sort_enforcement.rs

-            expected, actual_trim_last,
-            "\n\nexpected:\n\n{expected:#?}\nactual:\n\n{actual:#?}\n\n"
-        );
+        let source = memory_exec(&schema);


I think the new structure is much clearer about what the tests are doing -- if you look at the whitespace blind diff https://github.com/apache/arrow-datafusion/pull/4928/files?w=1 you can see that all the plans (original and optimized) are the same

alamb · 2023-01-17T01:05:42Z

cc @mustafasrepo and @mingmwang who I think have worked on this code

mustafasrepo · 2023-01-17T06:56:05Z

This is much cleaner and more readable. Thanks @alamb.

mingmwang · 2023-01-17T07:13:49Z

Nice !!

mustafasrepo · 2023-01-17T07:14:55Z

datafusion/core/src/physical_optimizer/sort_enforcement.rs

+        let sort_exprs = sort_exprs.into_iter().collect();
+        Arc::new(SortPreservingMergeExec::new(sort_exprs, input))
+    }
+


Maybe we can add here one more function to encapsulate window exec creation

fn window_exec( input: Arc<dyn ExecutionPlan>, schema: SchemaRef, sort_exprs: &[PhysicalSortExpr], arg_column_name: &str, ) -> Result<Arc<dyn ExecutionPlan>> { Ok(Arc::new(WindowAggExec::try_new( vec![create_window_expr( &WindowFunction::AggregateFunction(AggregateFunction::Count), "count".to_owned(), &[col(arg_column_name, &schema)?], &[], sort_exprs, Arc::new(WindowFrame::new(true)), schema.as_ref(), )?], input, schema, vec![], Some(sort_exprs.to_vec()), )?) as Arc<dyn ExecutionPlan>) }

By the way this is just a suggestion. I think, with or without this change PR is ready to merge

mustafasrepo · 2023-01-17T07:18:27Z

datafusion/core/src/physical_optimizer/sort_enforcement.rs

        // let filter_exec = sort_exec;
-        let window_agg_exec = Arc::new(WindowAggExec::try_new(
+        let physical_plan = Arc::new(WindowAggExec::try_new(


given util function window_exec is available we can construct physical_plan with

let physical_plan = window_exec( filter.clone(), filter.schema(), &sort_exprs, "non_nullable_col", )?;

mustafasrepo · 2023-01-17T07:20:22Z

datafusion/core/src/physical_optimizer/sort_enforcement.rs

-            as Arc<dyn ExecutionPlan>;
+        )];
+        let sort = sort_exec(sort_exprs.clone(), source);
+
        let window_agg_exec = Arc::new(WindowAggExec::try_new(


given util function window_exec is available, we can use below snippet to create window_agg_exec

let window_agg_exec = window_exec(sort.clone(), sort.schema(), &sort_exprs, "non_nullable_col")?;

Thank you -- I was being lazy -- I will do so

alamb · 2023-01-17T12:03:44Z

Update #4943 is the bug I am working on with sort enforcement

alamb · 2023-01-17T12:13:53Z

Will implement suggestions in a follow on PR. Thank you for the review @mustafasrepo

ursabot · 2023-01-17T12:22:49Z

Benchmark runs are scheduled for baseline = 4e08117 and contender = e7c2ef0. e7c2ef0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Reduce redundancy in sort_enforcement tests

56d0a3d

github-actions bot added the core Core DataFusion crate label Jan 17, 2023

alamb commented Jan 17, 2023

View reviewed changes

mustafasrepo reviewed Jan 17, 2023

View reviewed changes

alamb merged commit e7c2ef0 into apache:master Jan 17, 2023

alamb deleted the alamb/refactor_enforce_sort_tests branch January 17, 2023 12:14

alamb mentioned this pull request Jan 17, 2023

Minor: Reduce even more redundancy creating window_agg in sort_enforcement tests #4945

Merged

alamb mentioned this pull request Jan 17, 2023

WIP (test integration branch for IOx Upgrade) #4952

Closed

3 tasks

alamb added a commit to alamb/datafusion that referenced this pull request Jan 17, 2023

Reduce redundancy in sort_enforcement tests (apache#4928)

91ac6b2

alamb added a commit to alamb/datafusion that referenced this pull request Jan 17, 2023

Reduce redundancy in sort_enforcement tests (apache#4928)

906ac1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce redundancy in sort_enforcement tests #4928

Reduce redundancy in sort_enforcement tests #4928

alamb commented Jan 17, 2023 •

edited

Loading

alamb Jan 17, 2023

alamb commented Jan 17, 2023

mustafasrepo commented Jan 17, 2023

mingmwang commented Jan 17, 2023

mustafasrepo Jan 17, 2023 •

edited

Loading

mustafasrepo Jan 17, 2023

mustafasrepo Jan 17, 2023 •

edited

Loading

mustafasrepo Jan 17, 2023

alamb Jan 17, 2023

alamb Jan 17, 2023

alamb commented Jan 17, 2023

alamb commented Jan 17, 2023 •

edited

Loading

ursabot commented Jan 17, 2023

Reduce redundancy in sort_enforcement tests #4928

Reduce redundancy in sort_enforcement tests #4928

Conversation

alamb commented Jan 17, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb Jan 17, 2023

Choose a reason for hiding this comment

alamb commented Jan 17, 2023

mustafasrepo commented Jan 17, 2023

mingmwang commented Jan 17, 2023

mustafasrepo Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

mustafasrepo Jan 17, 2023

Choose a reason for hiding this comment

mustafasrepo Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

mustafasrepo Jan 17, 2023

Choose a reason for hiding this comment

alamb Jan 17, 2023

Choose a reason for hiding this comment

alamb Jan 17, 2023

Choose a reason for hiding this comment

alamb commented Jan 17, 2023

alamb commented Jan 17, 2023 • edited Loading

ursabot commented Jan 17, 2023

alamb commented Jan 17, 2023 •

edited

Loading

mustafasrepo Jan 17, 2023 •

edited

Loading

mustafasrepo Jan 17, 2023 •

edited

Loading

alamb commented Jan 17, 2023 •

edited

Loading