Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Option from window frame #4516

Merged
merged 8 commits into from
Dec 5, 2022
Merged

Remove Option from window frame #4516

merged 8 commits into from
Dec 5, 2022

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Dec 5, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

After parsing the window expression. If window frame is None, we can construct its equivalent (in terms of behavior).
For instance when someone enters the query OVER(ORDER BY a). It is equivalent to OVER(ORDER BY a) RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Similarly, for the query OVER(PARTITION BY a) its equivalent is OVER(PARTITION BY a ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).

After parsing the necessary information, we can construct the corresponding window frame. Hence in the downstream pipeline we don't need to use Option for window frame.

What changes are included in this PR?

This PR constructs an equivalent window frame when window frame is None after sql parsing. And at the rest of the pipeline uses equivalent window frame.
Also after equivalent window frame construction default name of the window result column changes. Hence this PR includes changes in the column names of the expected results (I have added aliases to make tests future-proof also). Large bulk of the changes comes from these changes.

Are these changes tested?

Existing tests should work

Are there any user-facing changes?

api change

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner labels Dec 5, 2022
@ozankabak
Copy link
Contributor

This is mostly a refactor PR that simplifies downstream window-processing code by handling empty (or default) window frames only once at the beginning.

I already reviewed it and it LGTM. Let us know if there are any concerns; we will be happy to address them and/or improve the code further.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice change -- thank you @mustafasrepo and @metesynnada

I found https://github.com/apache/arrow-datafusion/pull/4516/files?w=1 easier to review

cc @Ted-Jiang as I believe you are working on window function related things in Ballista

@@ -53,25 +53,25 @@ async fn csv_query_window_with_partition_by() -> Result<()> {
register_aggregate_csv(&ctx).await?;
let sql = "select \
c9, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these sure look nicer -- BTW if you are looking for easier to maintain tests, you may want to check out the new sqllogictests runner we are working on

#4505

start_bound: WindowFrameBound::Preceding(ScalarValue::Utf8(None)),
end_bound: WindowFrameBound::CurrentRow,
impl WindowFrame {
/// Creates a new, default window frame (with the meaning of default depending on whether the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clear comments 👍

@alamb alamb merged commit b6146b9 into apache:master Dec 5, 2022
@ursabot
Copy link

ursabot commented Dec 5, 2022

Benchmark runs are scheduled for baseline = 5bb9a18 and contender = b6146b9. b6146b9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

impl WindowFrame {
/// Creates a new, default window frame (with the meaning of default depending on whether the
/// frame contains an `ORDER BY` clause.
pub fn new(has_order_by: bool) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, i also want to change this default behavior.

oneof window_frame {
WindowFrame frame = 8;
}
WindowFrame window_frame = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants