Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: try and clarify what PartitionEvaluator functions are called #6869

Merged
merged 1 commit into from
Jul 7, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 6, 2023

Which issue does this PR close?

N/A

Rationale for this change

@mhilton noticed that the docs were backwards when he started looking at the uses_window_frame docs

I spent some time researching / verifying what actually happens, and it turns out that evaluate_all will be used when the function doesn’t use the window frame and I wanted to encode this learning into documentation

Here is the actual code that dispatches to the different evaluation functions
https://github.com/apache/arrow-datafusion/blob/a1281aa79ea104efdf3d07dd1948f2ae0b374634/datafusion/physical-expr/src/window/built_in.rs#L96-L128

What changes are included in this PR?

Update the documentation for PartitionEvaluator to make what is called when (hopefully) clearer.

Screenshot 2023-07-06 at 4 31 35 PM

I still don't understand enough of what makes a PartitionEvaluator "stateful" to document what functions need to be implemented under what circumstances. Maybe @mustafasrepo could help clarify this

Are these changes tested?

N/A

Are there any user-facing changes?

only doc changes

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 6, 2023
@alamb alamb marked this pull request as ready for review July 6, 2023 20:31
Comment on lines +94 to +99
/// |[`uses_window_frame`]|[`supports_bounded_execution`]|[`include_rank`]|function_to_implement|
/// |---|---|----|----|
/// |false (default) |false (default) |false (default) | [`evaluate_all`] |
/// |false |true |false | [`evaluate`] |
/// |false |true/false |true | [`evaluate_all_with_rank`] |
/// |true |true/false |true/false | [`evaluate`] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb for this PR.

@alamb alamb merged commit 7a41907 into apache:main Jul 7, 2023
@mustafasrepo
Copy link
Contributor

mustafasrepo commented Jul 7, 2023

Regarding

I still don't understand enough of what makes a PartitionEvaluator "stateful" to document what functions need to be implemented under what circumstances. Maybe @mustafasrepo could help clarify this

If we ignore uses_window_frame flag. Implementation table is as follows

supports_bounded_execution include_rank function_to_implement
false false evaluate_all
false true evaluate_all_with_rank
true false evaluate
true true evaluate

In this table, if supports_bounded_execution flag is true, evaluate method should be implemented. For some of the
window functions such as ROW_NUMBER, evaluate can be implemented trivially(It can keep track of internal count, each time it is call it can increment counter for each row.). However, if we were to implement evaluate method for RANK function. Information in the evaluate argument (values received and ) is not enough to calculate result. We need to know separation boundaries of order by columns to be able to produce correct results. For this reason, we have update_state that enables us to encode useful information to the state. By implementing update_state of the RANK evaluator, we can store necessary information that enables us to produce correct result during evaluate method call.

However, it is a bit confusing.I will try to simplify it.

@alamb
Copy link
Contributor Author

alamb commented Jul 14, 2023

For anyone following along, the simplifying PR is 👨‍🍳 👌 #6966

@alamb alamb deleted the alamb/fix_partition_docs branch July 14, 2023 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants