Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support window functions with empty OVER clause #298

Closed
Dandandan opened this issue May 9, 2021 · 7 comments · Fixed by #403
Closed

Support window functions with empty OVER clause #298

Dandandan opened this issue May 9, 2021 · 7 comments · Fixed by #403
Labels
enhancement New feature or request

Comments

@Dandandan
Copy link
Contributor

Dandandan commented May 9, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Window functions are a very valuable feature to have for DataFusion, as it allows to do analytical queries and things like deduplication to happen.

Describe the solution you'd like
Initial support for window functions with OVER () clause. This allows us to gradually add more features, like support for PARTITION BY and ORDER BY.

Describe alternatives you've considered
A more complete implementation in one go. Window functions however are a big feature and we probably need quite some iterations to get it in shape.

Additional context
Some material:
http://www.vldb.org/pvldb/vol8/p1058-leis.pdf&ved=2ahUKEwj80-3OjrfwAhUJPOwKHfdRAssQFjAMegQIEhAC&usg=AOvVaw2KKUPeYhyc-pEFTmlqyboj

@Dandandan Dandandan added the enhancement New feature or request label May 9, 2021
@jimexist
Copy link
Member

@Dandandan i wonder how window function shall fit into the logical planner? it shall be folded into aggregate functions? but the actual output schema might be different.

@Dandandan
Copy link
Contributor Author

@jimexist I think a new operator might be the best to do here. They have very different semantics and needs.

Between the operators there probably is still a lot of opportunity for reuse and in the planner we can use different operators (repartition, merge, sort, etc.)

What do you think?

@jorgecarleitao @alamb

@alamb
Copy link
Contributor

alamb commented May 13, 2021

I think a new operator might be the best to do here. They have very different semantics and needs.

I agree that a new operator (along with a new kind of aggregate) is probably best here -- the way window aggregate functions are applied is different than "normal" aggregates -- among other things for a window function there is typically one row of output for each row of input whereas normal aggregates produce one row of output for each distinct value of grouping keys.

@jimexist
Copy link
Member

See #334 for a general idea of how much code change is needed if we'd add window function side by side with aggregates

@jimexist
Copy link
Member

jimexist commented May 19, 2021

@jimexist
Copy link
Member

jimexist commented May 26, 2021

@jimexist
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants