Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert DataFrame.rename to keyword only; simplify axis validation #29140

Merged
merged 18 commits into from
Jan 9, 2020
64 changes: 63 additions & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,69 @@ New repr for :class:`~pandas.arrays.IntervalArray`

pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])

``DataFrame.rename`` now only accepts one positional argument
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- :meth:`DataFrame.rename` would previously accept positional arguments that would lead
to ambiguous or undefined behavior. From pandas 1.0, only the very first argument, which
maps labels to their new names along the default axis, is allowed to be passed by position
(:issue:`29136`).

*pandas 0.25.x*

.. code-block:: ipython

In [1]: df = pd.DataFrame([[1]])
In [2]: df.rename({0: 1}, {0: 2})
FutureWarning: ...Use named arguments to resolve ambiguity...
Out[2]:
2
1 1

*pandas 1.0.0*
.. ipython:: python

df.rename({0: 1}, {0: 2})

Note that errors will now be raised when conflicting or potentially ambiguous arguments are provided.

*pandas 0.25.x*

.. code-block:: ipython

In [1]: df.rename({0: 1}, index={0: 2})
Out[1]:
0
1 1

In [2]: df.rename(mapper={0: 1}, index={0: 2})
Out[2]:
0
2 1

*pandas 1.0.0*

.. ipython:: python
:okexcept:

df.rename({0: 1}, index={0: 2})
df.rename(mapper={0: 1}, index={0: 2})

You can still change the axis along which the first positional argument is applied by
supplying the ``axis`` keyword argument.

.. ipython:: python

df.rename({0: 1})
df.rename({0: 1}, axis=1)

If you would like to update both the index and column labels, be sure to use the respective
keywords.

.. ipython:: python

df.rename(index={0: 1}, columns={0: 2})

Extended verbose info output for :class:`~pandas.DataFrame`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -524,7 +587,6 @@ Optional libraries below the lowest tested version may still work, but are not c

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.


.. _whatsnew_1000.api.other:

Other API changes
Expand Down
1 change: 1 addition & 0 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")

Axis = Union[str, int]
Level = Union[str, int]
Ordered = Optional[bool]
JSONSerializable = Union[PythonScalar, List, Dict]
Axes = Collection
Expand Down
40 changes: 32 additions & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,12 @@
IO,
TYPE_CHECKING,
Any,
Callable,
FrozenSet,
Hashable,
Iterable,
List,
Mapping,
Optional,
Sequence,
Set,
Expand All @@ -38,7 +40,7 @@
from pandas._config import get_option

from pandas._libs import algos as libalgos, lib
from pandas._typing import Axes, Dtype, FilePathOrBuffer
from pandas._typing import Axes, Axis, Dtype, FilePathOrBuffer, Level
from pandas.compat import PY37
from pandas.compat._optional import import_optional_dependency
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -3986,7 +3988,25 @@ def drop(
"mapper",
[("copy", True), ("inplace", False), ("level", None), ("errors", "ignore")],
)
def rename(self, *args, **kwargs):
def rename(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change here was to be explicit about what is accepted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to being explicit. The only reason we had *args, **kwargs in the first place was to support the ambiguous case.

self,
mapper: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
*,
index: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
columns: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth making a name for this in _typing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking the same thing. Any suggestions? LabelReplacement?

] = None,
axis: Optional[Axis] = None,
copy: bool = True,
inplace: bool = False,
level: Optional[Level] = None,
errors: str = "ignore",
) -> "DataFrame":

"""
Alter axes labels.

Expand Down Expand Up @@ -4095,12 +4115,16 @@ def rename(self, *args, **kwargs):
2 2 5
4 3 6
"""
axes = validate_axis_style_args(self, args, kwargs, "mapper", "rename")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note - there is only one other use now of this validate_axis_style_args function (see DataFrame.reindex). The intent here is good, but I think the fact that it mutates the args and kwargs being passed it makes it very difficult to reason about. As a follow up to this would like to replace entirely with something simple that just operates on the axis argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything blocking a similar PR for .reindex?

As a follow up to this would like to replace entirely with something simple that just operates on the axis argument

Is this referring to DataFrame.reindex, or validate_axis_style_args? Ideally we can completely remove validate_axis_style_args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look before because I was trying to keep this focused, but it might not be too much extra effort. I'll take a look and if it gets out of hand let you know

kwargs.update(axes)
# Pop these, since the values are in `kwargs` under different names
kwargs.pop("axis", None)
kwargs.pop("mapper", None)
return super().rename(**kwargs)
return super().rename(
mapper=mapper,
index=index,
columns=columns,
axis=axis,
copy=copy,
inplace=inplace,
level=level,
errors=errors,
)

@Substitution(**_shared_doc_kwargs)
@Appender(NDFrame.fillna.__doc__)
Expand Down
90 changes: 58 additions & 32 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,14 @@
from pandas._config import config

from pandas._libs import Timestamp, iNaT, lib, properties
from pandas._typing import Dtype, FilePathOrBuffer, FrameOrSeries, JSONSerializable
from pandas._typing import (
Axis,
Dtype,
FilePathOrBuffer,
FrameOrSeries,
JSONSerializable,
Level,
)
from pandas.compat import set_function_name
from pandas.compat._optional import import_optional_dependency
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -917,7 +924,24 @@ def swaplevel(self: FrameOrSeries, i=-2, j=-1, axis=0) -> FrameOrSeries:
# ----------------------------------------------------------------------
# Rename

def rename(self, *args, **kwargs):
def rename(
self,
mapper: Optional[
Union[Mapping[Hashable, Any], Callable[[Hashable], Hashable]]
] = None,
*,
index: Optional[
Union[Mapping[Hashable, Any], Callable[[Hashable], Hashable]]
] = None,
columns: Optional[
Union[Mapping[Hashable, Any], Callable[[Hashable], Hashable]]
] = None,
axis: Optional[Axis] = None,
copy: bool = True,
inplace: bool = False,
level: Optional[Level] = None,
errors: str = "ignore",
):
"""
Alter axes input function or functions. Function / dict values must be
unique (1-to-1). Labels not contained in a dict / Series will be left
Expand Down Expand Up @@ -1030,44 +1054,46 @@ def rename(self, *args, **kwargs):

See the :ref:`user guide <basics.rename>` for more.
"""
axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
copy = kwargs.pop("copy", True)
inplace = kwargs.pop("inplace", False)
level = kwargs.pop("level", None)
axis = kwargs.pop("axis", None)
errors = kwargs.pop("errors", "ignore")
if axis is not None:
# Validate the axis
self._get_axis_number(axis)

if kwargs:
raise TypeError(
"rename() got an unexpected keyword "
f'argument "{list(kwargs.keys())[0]}"'
)

if com.count_not_none(*axes.values()) == 0:
if mapper is None and index is None and columns is None:
raise TypeError("must pass an index to rename")

self._consolidate_inplace()
if index is not None or columns is not None:
if axis is not None:
raise TypeError(
"Cannot specify both 'axis' and any of 'index' or 'columns'"
)
elif mapper is not None:
raise TypeError(
"Cannot specify both 'mapper' and any of 'index' or 'columns'"
)
else:
# use the mapper argument
if axis in {1, "columns"}:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We accept 3 arguments, mapper, index and columns. mapper can conflict with index or columns if either is provided, which the checks before this take care of. At this point if there aren't any conflicts, I think easiest to just combine the object

Note that this is arguably a little too strict, as this is no longer valid:

df.rename({"key": "val"}, axis=1, index={"key2": "val2"})

But I don't think there is a lot to gain in supporting that given the variety of other options available here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is arguably a little too strict, as this is no longer valid:

I don't recall if that case was originally discussed, but I think it's OK to break it.

columns = mapper
else:
index = mapper

result = self if inplace else self.copy(deep=copy)

# start in the axis order to eliminate too many copies
for axis in range(self._AXIS_LEN):
v = axes.get(self._AXIS_NAMES[axis])
if v is None:
for axis_no, replacements in enumerate((index, columns)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I am trying to get rid of all of the internal _AXIS_* functions and mappings in core.generic, which seem like relics of the panel days

If we buy into just having index and columns being the potential axes in that order things can be greatly simplified

if replacements is None:
continue
f = com.get_rename_function(v)
baxis = self._get_block_manager_axis(axis)

ax = self._get_axis(axis_no)
baxis = self._get_block_manager_axis(axis_no)
f = com.get_rename_function(replacements)

if level is not None:
level = self.axes[axis]._get_level_number(level)
level = ax._get_level_number(level)

# GH 13473
if not callable(v):
indexer = self.axes[axis].get_indexer_for(v)
if not callable(replacements):
indexer = ax.get_indexer_for(replacements)
if errors == "raise" and len(indexer[indexer == -1]):
missing_labels = [
label for index, label in enumerate(v) if indexer[index] == -1
label
for index, label in enumerate(replacements)
if indexer[index] == -1
]
raise KeyError(f"{missing_labels} not found in axis")

Expand Down Expand Up @@ -4032,7 +4058,7 @@ def add_prefix(self: FrameOrSeries, prefix: str) -> FrameOrSeries:
f = functools.partial("{prefix}{}".format, prefix=prefix)

mapper = {self._info_axis_name: f}
return self.rename(**mapper)
return self.rename(**mapper) # type: ignore
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't seem to get dict unpacking to place nicely with mypy. @simonjayhawkins


def add_suffix(self: FrameOrSeries, suffix: str) -> FrameOrSeries:
"""
Expand Down Expand Up @@ -4091,7 +4117,7 @@ def add_suffix(self: FrameOrSeries, suffix: str) -> FrameOrSeries:
f = functools.partial("{}{suffix}".format, suffix=suffix)

mapper = {self._info_axis_name: f}
return self.rename(**mapper)
return self.rename(**mapper) # type: ignore

def sort_values(
self,
Expand Down
25 changes: 19 additions & 6 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@
from io import StringIO
from shutil import get_terminal_size
from textwrap import dedent
from typing import IO, Any, Callable, Hashable, List, Optional
from typing import IO, Any, Callable, Hashable, List, Mapping, Optional, Union
import warnings

import numpy as np

from pandas._config import get_option

from pandas._libs import index as libindex, lib, reshape, tslibs
from pandas._typing import Axis, Level
from pandas.compat.numpy import function as nv
from pandas.util._decorators import Appender, Substitution
from pandas.util._validators import validate_bool_kwarg, validate_percentile
Expand Down Expand Up @@ -3892,7 +3893,16 @@ def align(
broadcast_axis=broadcast_axis,
)

def rename(self, index=None, **kwargs):
def rename(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Series method doesn't completely align with the DataFrame method, but I think this is an easy way to go about it while maintain backwards compat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value in accepting axis here, if we don't also have mapper and columns?

I don't see an easy way to accept mapper. That's essentially fulfilled by index.

So maybe my question is: should we also accept columns=? (Not arguing for this btw. I don't know what's best here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we validate axis at all, or just ignore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here axis gets ignored. Looks like this was implemented as part of #18589

So maybe my question is: should we also accept columns=? (Not arguing for this btw. I don't know what's best here).

Yea I'm not sure myself. I was thinking ideally it would have been great to align the signatures better but for where we are that would cause some API churn. So I guess best to leave unless there's a ringing need to do something here (?)

self,
index=None,
*,
axis=None,
copy=True,
inplace=False,
level=None,
errors="ignore",
):
"""
Alter Series index labels or name.

Expand All @@ -3906,6 +3916,8 @@ def rename(self, index=None, **kwargs):

Parameters
----------
axis : int or str
Unused. Accepted for compatability with DataFrame method only.
index : scalar, hashable sequence, dict-like or function, optional
Functions or dict-like are transformations to apply to
the index.
Expand All @@ -3923,6 +3935,7 @@ def rename(self, index=None, **kwargs):

See Also
--------
DataFrame.rename : Corresponding DataFrame method.
Series.rename_axis : Set the name of the axis.

Examples
Expand All @@ -3949,12 +3962,12 @@ def rename(self, index=None, **kwargs):
5 3
dtype: int64
"""
kwargs["inplace"] = validate_bool_kwarg(kwargs.get("inplace", False), "inplace")

if callable(index) or is_dict_like(index):
return super().rename(index=index, **kwargs)
return super().rename(
index, copy=copy, inplace=inplace, level=level, errors=errors
)
else:
return self._set_name(index, inplace=kwargs.get("inplace"))
return self._set_name(index, inplace=inplace)

@Substitution(**_shared_doc_kwargs)
@Appender(generic.NDFrame.reindex.__doc__)
Expand Down
Loading