Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Reductions #53261

Open
jbrockmendel opened this issue May 16, 2023 · 1 comment
Open

REF: Reductions #53261

jbrockmendel opened this issue May 16, 2023 · 1 comment
Labels
Reduction Operations sum, mean, min, max, etc. Refactor Internal refactoring of code

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented May 16, 2023

We have reductions implemented in nanops, _libs.groupby, and _libs.window.aggregations. We should refactor these with the following goals in mind:

  1. Have one/fewer distinct implementations
  2. Avoid copies, particularly in the nanops versions where we do something like values[notna(values)]
  3. Chunked-friendliness, so that we can re-write ArrowExtensionArray._groupby_op to operate chunk-by-chunk, avoiding a copy in multi-chunk cases. (This could also be useful for hypothetical distributed EAs)
  4. Avoid casting/inference in nanops
  5. update Do axis=1 reductions without transposing/copying, inspired by PERF: axis=1 reductions with EA dtypes #54341

The implementation of group_skew is derived from https://www.johndcook.com/blog/skewness_kurtosis/ which includes a method for "adding" multiple RunningStats instances. Something like that could be adapted for 3).

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 16, 2023
@mroeschke
Copy link
Member

Just noting that _libs.window.aggregations should ideally keep its implementation since the sliding window aggregation is performant sensitive. I think the other reductions could be implemented in terms of the sliding windowing aggregation i.e. they would be non-overlapping windows

@mroeschke mroeschke added Refactor Internal refactoring of code Reduction Operations sum, mean, min, max, etc. and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reduction Operations sum, mean, min, max, etc. Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

2 participants