Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: make Series.agg aggregate when possible #53325

Merged
merged 11 commits into from
Jun 7, 2023
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,8 @@ Deprecations
- Deprecated ``axis=1`` in :meth:`DataFrame.groupby` and in :class:`Grouper` constructor, do ``frame.T.groupby(...)`` instead (:issue:`51203`)
- Deprecated accepting slices in :meth:`DataFrame.take`, call ``obj[slicer]`` or pass a sequence of integers instead (:issue:`51539`)
- Deprecated explicit support for subclassing :class:`Index` (:issue:`45289`)
- Deprecated making functions given to :meth:`Series.agg` attempt to operate on each element in the :class:`Series` and only operate on the whole :class:`Series` if the elementwise operations failed. In the future, functions given to :meth:`Series.agg` will always operate on the whole :class:`Series` only. To keep the current behavior, use :meth:`Series.transform` instead. (:issue:`53325`)
- Deprecated making the functions in a list of functions given to :meth:`DataFrame.agg` attempt to operate on each element in the :class:`DataFrame` and only operate on the columns of the :class:`DataFrame` if the elementwise operations failed. To keep the current behavior, use :meth:`DataFrame.transform` instead. (:issue:`53325`)
- Deprecated passing a :class:`DataFrame` to :meth:`DataFrame.from_records`, use :meth:`DataFrame.set_index` or :meth:`DataFrame.drop` instead (:issue:`51353`)
- Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (:issue:`18702`)
- Deprecated the ``axis`` keyword in :meth:`DataFrame.ewm`, :meth:`Series.ewm`, :meth:`DataFrame.rolling`, :meth:`Series.rolling`, :meth:`DataFrame.expanding`, :meth:`Series.expanding` (:issue:`51778`)
Expand Down
24 changes: 13 additions & 11 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -1121,23 +1121,25 @@ def apply(self) -> DataFrame | Series:
def agg(self):
result = super().agg()
if result is None:
obj = self.obj
func = self.func

# string, list-like, and dict-like are entirely handled in super
assert callable(func)

# try a regular apply, this evaluates lambdas
# row-by-row; however if the lambda is expected a Series
# expression, e.g.: lambda x: x-x.quantile(0.25)
# this will fail, so we can try a vectorized evaluation

# we cannot FIRST try the vectorized evaluation, because
# then .agg and .apply would have different semantics if the
# operation is actually defined on the Series, e.g. str
# GH53325: The setup below is just to keep current behavior while emitting a
# deprecation message. In the future this will all be replaced with a simple
# `result = f(self.obj, *self.args, **self.kwargs)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about enforcing this deprecation; will we also check that result is a scalar-like? In theory f could non-reducing lambda *args, **kwargs: obj?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't. Rather, we treat whatever the UDF returns as if it were a scalar. An example where this can be useful:

df = DataFrame({"a": [1, 1, 2, 2, 2], "b": range(5)})
gb = df.groupby("a")
result = gb.agg(list)
print(result)
#            b
# a           
# 1     [0, 1]
# 2  [2, 3, 4]

While I think this is only particularly useful in groupby, for consistency I think we should apply this approach to all agg methods. It has the added benefit of not having to infer what is a scalar and what is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, This just deprecates a path that will never aggregate, but there are still other paths that won't aggregate, like you mention. I agree with @rhshadrach that a sensible path may be to just be very permissive, but OTOH if we want to restrict what we accept for the results fro aggregations, that can always be done later.

try:
result = self.obj.apply(func, args=self.args, **self.kwargs)
result = obj.apply(func, args=self.args, **self.kwargs)
except (ValueError, AttributeError, TypeError):
result = func(self.obj, *self.args, **self.kwargs)
result = func(obj, *self.args, **self.kwargs)
else:
msg = (
f"using {func} in {type(obj).__name__}.agg cannot aggregate and "
f"has been deprecated. Use {type(obj).__name__}.transform to "
f"keep behavior unchanged."
)
warnings.warn(msg, FutureWarning, stacklevel=find_stack_level())

return result

Expand Down
25 changes: 17 additions & 8 deletions pandas/tests/apply/test_frame_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -1478,8 +1478,8 @@ def test_any_apply_keyword_non_zero_axis_regression():
tm.assert_series_equal(result, expected)


def test_agg_list_like_func_with_args():
# GH 50624
def test_agg_mapping_func_deprecated():
# GH 53325
df = DataFrame({"x": [1, 2, 3]})

def foo1(x, a=1, c=0):
Expand All @@ -1488,17 +1488,26 @@ def foo1(x, a=1, c=0):
def foo2(x, b=2, c=0):
return x + b + c

msg = r"foo1\(\) got an unexpected keyword argument 'b'"
with pytest.raises(TypeError, match=msg):
df.agg([foo1, foo2], 0, 3, b=3, c=4)
# single func already takes the vectorized path
result = df.agg(foo1, 0, 3, c=4)
expected = df + 7
tm.assert_frame_equal(result, expected)

msg = "using .+ in Series.agg cannot aggregate and"

result = df.agg([foo1, foo2], 0, 3, c=4)
with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.agg([foo1, foo2], 0, 3, c=4)
expected = DataFrame(
[[8, 8], [9, 9], [10, 10]],
columns=MultiIndex.from_tuples([("x", "foo1"), ("x", "foo2")]),
[[8, 8], [9, 9], [10, 10]], columns=[["x", "x"], ["foo1", "foo2"]]
)
tm.assert_frame_equal(result, expected)

# TODO: the result below is wrong, should be fixed (GH53325)
with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.agg({"x": foo1}, 0, 3, c=4)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually gave a wrong result in main and in v1.5, which was quite surprising.

expected = DataFrame([2, 3, 4], columns=["x"])
tm.assert_frame_equal(result, expected)


def test_agg_std():
df = DataFrame(np.arange(6).reshape(3, 2), columns=["A", "B"])
Expand Down
22 changes: 22 additions & 0 deletions pandas/tests/apply/test_frame_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,28 @@ def test_transform_empty_listlike(float_frame, ops, frame_or_series):
obj.transform(ops)


def test_transform_listlike_func_with_args():
# GH 50624
df = DataFrame({"x": [1, 2, 3]})

def foo1(x, a=1, c=0):
return x + a + c

def foo2(x, b=2, c=0):
return x + b + c

msg = r"foo1\(\) got an unexpected keyword argument 'b'"
with pytest.raises(TypeError, match=msg):
df.transform([foo1, foo2], 0, 3, b=3, c=4)

result = df.transform([foo1, foo2], 0, 3, c=4)
expected = DataFrame(
[[8, 8], [9, 9], [10, 10]],
columns=MultiIndex.from_tuples([("x", "foo1"), ("x", "foo2")]),
)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("box", [dict, Series])
def test_transform_dictlike(axis, float_frame, box):
# GH 35964
Expand Down
6 changes: 5 additions & 1 deletion pandas/tests/apply/test_invalid_arg.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

from itertools import chain
import re
import warnings

import numpy as np
import pytest
Expand Down Expand Up @@ -307,7 +308,10 @@ def test_transform_and_agg_err_series(string_series, func, msg):
# we are trying to transform with an aggregator
with pytest.raises(ValueError, match=msg):
with np.errstate(all="ignore"):
string_series.agg(func)
# GH53325
with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)
string_series.agg(func)


@pytest.mark.parametrize("func", [["max", "min"], ["max", "sqrt"]])
Expand Down
49 changes: 31 additions & 18 deletions pandas/tests/apply/test_series_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,18 @@ def f(x, a=0, b=0, c=0):
return x + a + 10 * b + 100 * c

s = Series([1, 2])
result = s.agg(f, 0, *args, **kwargs)
msg = (
"in Series.agg cannot aggregate and has been deprecated. "
"Use Series.transform to keep behavior unchanged."
)
with tm.assert_produces_warning(FutureWarning, match=msg):
result = s.agg(f, 0, *args, **kwargs)
expected = s + increment
tm.assert_series_equal(result, expected)


def test_agg_list_like_func_with_args():
# GH 50624

def test_agg_mapping_func_deprecated():
# GH 53325
s = Series([1, 2, 3])

def foo1(x, a=1, c=0):
Expand All @@ -124,13 +128,13 @@ def foo1(x, a=1, c=0):
def foo2(x, b=2, c=0):
return x + b + c

msg = r"foo1\(\) got an unexpected keyword argument 'b'"
with pytest.raises(TypeError, match=msg):
s.agg([foo1, foo2], 0, 3, b=3, c=4)

result = s.agg([foo1, foo2], 0, 3, c=4)
expected = DataFrame({"foo1": [8, 9, 10], "foo2": [8, 9, 10]})
tm.assert_frame_equal(result, expected)
msg = "using .+ in Series.agg cannot aggregate and"
with tm.assert_produces_warning(FutureWarning, match=msg):
s.agg(foo1, 0, 3, c=4)
with tm.assert_produces_warning(FutureWarning, match=msg):
s.agg([foo1, foo2], 0, 3, c=4)
with tm.assert_produces_warning(FutureWarning, match=msg):
s.agg({"a": foo1, "b": foo2}, 0, 3, c=4)


def test_series_apply_map_box_timestamps(by_row):
Expand Down Expand Up @@ -391,23 +395,32 @@ def test_apply_map_evaluate_lambdas_the_same(string_series, func, by_row):
assert result == str(string_series)


def test_with_nested_series(datetime_series):
def test_agg_evaluate_lambdas(string_series):
# GH53325
# in the future, the result will be a Series class.

with tm.assert_produces_warning(FutureWarning):
result = string_series.agg(lambda x: type(x))
assert isinstance(result, Series) and len(result) == len(string_series)

with tm.assert_produces_warning(FutureWarning):
result = string_series.agg(type)
assert isinstance(result, Series) and len(result) == len(string_series)


@pytest.mark.parametrize("op_name", ["agg", "apply"])
def test_with_nested_series(datetime_series, op_name):
# GH 2316
# .agg with a reducer and a transform, what to do
msg = "Returning a DataFrame from Series.apply when the supplied function"
with tm.assert_produces_warning(FutureWarning, match=msg):
# GH52123
result = datetime_series.apply(
result = getattr(datetime_series, op_name)(
lambda x: Series([x, x**2], index=["x", "x^2"])
)
expected = DataFrame({"x": datetime_series, "x^2": datetime_series**2})
tm.assert_frame_equal(result, expected)

with tm.assert_produces_warning(FutureWarning, match=msg):
# GH52123
result = datetime_series.agg(lambda x: Series([x, x**2], index=["x", "x^2"]))
tm.assert_frame_equal(result, expected)


def test_replicate_describe(string_series, by_row):
# this also tests a result set that is all scalars
Expand Down
35 changes: 35 additions & 0 deletions pandas/tests/apply/test_series_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,21 @@
import pandas._testing as tm


@pytest.mark.parametrize(
"args, kwargs, increment",
[((), {}, 0), ((), {"a": 1}, 1), ((2, 3), {}, 32), ((1,), {"c": 2}, 201)],
)
def test_agg_args(args, kwargs, increment):
# GH 43357
def f(x, a=0, b=0, c=0):
return x + a + 10 * b + 100 * c

s = Series([1, 2])
result = s.transform(f, 0, *args, **kwargs)
expected = s + increment
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"ops, names",
[
Expand All @@ -28,6 +43,26 @@ def test_transform_listlike(string_series, ops, names):
tm.assert_frame_equal(result, expected)


def test_transform_listlike_func_with_args():
# GH 50624

s = Series([1, 2, 3])

def foo1(x, a=1, c=0):
return x + a + c

def foo2(x, b=2, c=0):
return x + b + c

msg = r"foo1\(\) got an unexpected keyword argument 'b'"
with pytest.raises(TypeError, match=msg):
s.transform([foo1, foo2], 0, 3, b=3, c=4)

result = s.transform([foo1, foo2], 0, 3, c=4)
expected = DataFrame({"foo1": [8, 9, 10], "foo2": [8, 9, 10]})
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("box", [dict, Series])
def test_transform_dictlike(string_series, box):
# GH 35964
Expand Down