-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User defined window functions #5781
Comments
FYI @mustafasrepo and @ozankabak I think @stuartcarnie is contemplating what implementing user defined window functions might entail. I am not sure if you have any thoughts on this matter you would like to share |
Yes, we will be using UDWF too. Is there a design doc we can read and comment on? |
How we distinguish UDWF from UDAF since window function already supports AggregateUDF, I mean UDWF is a super set of udaf in window function. |
There is no document that I know of -- it is probably time to start one |
@doki23 I am not sure The current code has this: I never really understood why DataFusion makes a distinction between built in aggregate functions and user defined aggregate functions (it would be really nice if all aggregate functions had the same interface) |
Unless someone beats me to it I plan to start working on a proposed design for this feature in the next few days |
Here is a proposed design. LogicalPlan / ExprThe current ( #[derive(Clone, PartialEq, Eq, Hash)]
pub enum Expr {
...
/// Represents the call of a window function with arguments.
WindowFunction(WindowFunction),
} Which is defined like: /// Window function
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct WindowFunction {
/// Name of the function
pub fun: window_function::WindowFunction,
/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
/// List of partition by expressions
pub partition_by: Vec<Expr>,
/// List of order by expressions
pub order_by: Vec<Expr>,
/// Window frame
pub window_frame: window_frame::WindowFrame,
} Note that /// WindowFunction (in `window_function`):
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum WindowFunction {
/// A built in aggregate function that leverages an aggregate function
AggregateFunction(AggregateFunction),
/// A a built-in window function
BuiltInWindowFunction(BuiltInWindowFunction),
/// A user defined aggregate function
AggregateUDF(Arc<AggregateUDF>),
/// A user defined aggregate function <---- This is NEW
WindowUDF(Arc<WindowUDF>),
} WindowUDF
And similarly to the way that an pub trait PartitionEvaluator: Debug + Send { I am a little unclear on certain parts of the Looking at how this code is used in the Here are the next steps I plan: So steps:
|
Great stuff, @alamb! I did have similar ideas in my own test branch, but I wasn't sure how to replicate |
Looks reasonable to me, thanks @alamb |
While working on #6592 I think I can now articulate a key design question about Use existing API / TraitsWhat this would mean:This would mean exposing (at least) This would mean making those traits Pros
Cons
Also,
Make new APIs and traitsWhat this would mean:In this case, we would make new traits that are subsets of what is in Pros
ConsA new API would mean introducing another api layer that needs to be tested and kept up to date. The new API also might not expose all the functionality of the built in window functions if such functionality was added at a later date DiscussionI am going to try and bash out a technical proof of concept of the first approach (exposing the existing APIs) and we can see how it would look. On the balance that is my preferred approach due to the power of the API and the similarities with |
Is your feature request related to a problem or challenge?
When implementing our InfuxQL frontend for DataFusion, we found certain functions we would like to express as window functions (like a specialized interpolation function for example)
We can write user defined aggregates:
But those do not produce the same number of rows that go in (they reduce cardinality)
We would like to use our own window functions like
Describe the solution you'd like
I would like the ability to define, and register user defined window functions , like we have for user defined aggregate functions:
https://github.com/apache/arrow-datafusion/blob/8139ed40d06d77498217438ff52fe73a4ea16f61/datafusion-examples/examples/simple_udaf.rs#L18-L19
This would likely involve
WindowUDF
(likeAggregateUDF
)WindowExprFunctionImplementation
like https://github.com/apache/arrow-datafusion/blob/8139ed40d06d77498217438ff52fe73a4ea16f61/datafusion/expr/src/function.rs#L48WindowExpr
trait: https://github.com/apache/arrow-datafusion/blob/8139ed40d06d77498217438ff52fe73a4ea16f61/datafusion/physical-expr/src/window/window_expr.rs#L38-L143It isn't clear to me if we would want to expose
AggregateWindowExpr
, for exampleDescribe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: