-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: make Series.agg aggregate when possible #53325
Conversation
I stumbled upon this in #53208. I think changing this behavior is good but should be deprecated IMO to ensure |
I've made a new version with deprecation. |
In the new deprecation version I BTW found out that while |
* REF/CLN: func in core.apply * Remove type-hint
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# TODO: the result below is wrong, should be fixed (GH53325) | ||
with tm.assert_produces_warning(FutureWarning, match=msg): | ||
result = df.agg({"x": foo1}, 0, 3, c=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually gave a wrong result in main and in v1.5, which was quite surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@mroeschke - good here? |
# operation is actually defined on the Series, e.g. str | ||
# GH53325: The setup below is just to keep current behavior while emitting a | ||
# deprecation message. In the future this will all be replaced with a simple | ||
# `result = f(self.obj, *self.args, **self.kwargs)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question about enforcing this deprecation; will we also check that result
is a scalar-like? In theory f
could non-reducing lambda *args, **kwargs: obj
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't. Rather, we treat whatever the UDF returns as if it were a scalar. An example where this can be useful:
df = DataFrame({"a": [1, 1, 2, 2, 2], "b": range(5)})
gb = df.groupby("a")
result = gb.agg(list)
print(result)
# b
# a
# 1 [0, 1]
# 2 [2, 3, 4]
While I think this is only particularly useful in groupby, for consistency I think we should apply this approach to all agg
methods. It has the added benefit of not having to infer what is a scalar and what is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, This just deprecates a path that will never aggregate, but there are still other paths that won't aggregate, like you mention. I agree with @rhshadrach that a sensible path may be to just be very permissive, but OTOH if we want to restrict what we accept for the results fro aggregations, that can always be done later.
Thanks @topper-123 |
* BUG: make Series.agg aggregate when possible * fix doc build * deprecate instead of treating as a bug * CLN: Apply.agg_list_like * some cleanups * REF/CLN: func in core.apply (pandas-dev#53437) * REF/CLN: func in core.apply * Remove type-hint * REF: Decouple Series.apply from Series.agg (pandas-dev#53400) * update test * fix issues * fix issues * fix issues --------- Co-authored-by: Richard Shadrach <[email protected]>
The doc string for
Series.agg
says:"A passed user-defined-function will be passed a Series for evaluation."
which isn't true currently. In addition,Series.agg
currently will not always aggregate, when given a aggregation function. This PR fixes those issues.It can be noted that both the behavior and that doc string are correct for
DataFrame.agg
, so this PR aligns the behavior ofSeries.agg
andDataFrame.agg
.I'm not sure if this is too drastic as a bug fix, or we want it as part of a deprecation process, but just putting this up there, will see where it goes.
EDIT: In the new version the old behavior is deprecated rather than just removed.