-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Median aggregation using DataFrame panics: "AggregateState is not a scalar aggregate" #3105
Labels
bug
Something isn't working
Comments
Thanks @jonmmease I will take a look |
I can also reproduce this with ❯ create table cpu (host string, usage float) as select * from (values ('host0', 90.1), ('host1', 90.2));
0 rows in set. Query took 0.011 seconds.
❯ select host, median(usage) from cpu group by host;
thread 'tokio-runtime-worker' panicked at 'unexpected accumulator state in hash aggregate: Internal("AggregateState is not a scalar aggregate")', /Users/cwolff/workspace/github.com/apache/arrow-datafusion/datafusion/core/src/physical_plan/aggregates/hash.rs:517:34
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
ArrowError(ExternalError(Execution("Join Error: task 17 panicked"))) |
We are hitting this with IOx so I plan to fix it |
Strangely, if I run the function directly on
|
I think this code will basically happen any time median is used in a query that has more than one partition |
Proposed fix in #4488 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
I am trying to use the new exact median aggregation function introduced by @andygrove in #3009, but when I try it using the DataFrame API the operation panics. Apologies that I didn't get around to testing this while the PR was open!
To Reproduce
Here is a test case that can be run inside the
src/core/tests/dataframe.rs
file:Expected behavior
I expect the test above to pass
The text was updated successfully, but these errors were encountered: