-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: groupby(...).agg should only accept reducers #35725
Comments
I would be in favor of this - there's a lack of clarity in the API around when agg and apply should or shouldn't be used. |
yes absolutely agg should error out if the returned shape is incorrect - iow it should be strict (likely can only deprecate for now though) similarly transform should be strict (i think it is actually) |
take |
@jreback @WillAyd - Rather than raising, what do you think about instead making it so that agg always aggs regardless of the result type. Some cases:
This has the benefit of simplifying/clarifying the behavior of agg, and I think makes it more intuitive, while not forbidding a user from getting e.g. a Series full of DataFrames if they so desire. |
i think we should do this but would require a depreciation cycle having stronger guarantees on agg & transform that they always reduce is great |
I observed that the following: df = pd.DataFrame({
"A": Series((1000, 2000), dtype=int),
"B": Series((1000, 2000), dtype=np.int64),
"C": Series(["a", "b"]),
})
df.agg(["mean", "sum"])
A B C
mean 1500.0 1500.0 NaN
sum 3000.0 3000.0 ab now produces But in this circumstance I don't want to drop column I tried to define my own method: def mean2(s:Series):
try:
ret = s.mean()
except Exception:
ret = pd.NA
return ret
df.agg([mean2, "sum"]) But this resulted in The function works fine with df.apply(mean2, axis=0)
A 1500.0
B 1500.0
C <NA>
dtype: object Is this a bug or what is the recommended approach for solving this? |
This looks to me to be a separate issue. You have a reducer that fails on certain dtypes, whereas this issue is about supplying a non-aggregating function to |
Issues where
agg
is used with non-reducing functions:I think
agg
shouldraise if the function(s) provided are not reducers. This can be tested by if the resulting index is equal toalways reduce regardless of the return value. That is, treat the result of the UDF as if it a scalar (even when it's not).self.grouper.result_index
.The text was updated successfully, but these errors were encountered: