Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Added axis argument to rename, reindex #17800

Merged
merged 2 commits into from
Oct 10, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1217,6 +1217,15 @@ following can be done:
This means that the reindexed Series's index is the same Python object as the
DataFrame's index.

.. versionadded:: 0.21.0

:meth:`DataFrame.reindex` also supports an "axis-style" calling convention,
where you specify a single ``labels`` argument and the ``axis`` it applies to.

.. ipython:: python

df.reindex(['c', 'f', 'b'], axis='index')
df.reindex(['three', 'two', 'one'], axis='columns')

.. seealso::

Expand Down Expand Up @@ -1413,12 +1422,23 @@ Series can also be used:

.. ipython:: python

df.rename(columns={'one' : 'foo', 'two' : 'bar'},
index={'a' : 'apple', 'b' : 'banana', 'd' : 'durian'})
df.rename(columns={'one': 'foo', 'two': 'bar'},
index={'a': 'apple', 'b': 'banana', 'd': 'durian'})

If the mapping doesn't include a column/index label, it isn't renamed. Also
extra labels in the mapping don't throw an error.

.. versionadded:: 0.21.0

:meth:`DataFrame.rename` also supports an "axis-style" calling convention, where
you specify a single ``mapper`` and the ``axis`` to apply that mapping to.

.. ipython:: python

df.rename({'one': 'foo', 'two': 'bar'}, axis='columns'})
df.rename({'a': 'apple', 'b': 'banana', 'd': 'durian'}, axis='columns'})


The :meth:`~DataFrame.rename` method also provides an ``inplace`` named
parameter that is by default ``False`` and copies the underlying data. Pass
``inplace=True`` to rename the data in place.
Expand Down
34 changes: 34 additions & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,40 @@ For example:
# the following is now equivalent
df.drop(columns=['B', 'C'])

.. _whatsnew_0210.enhancements.rename_reindex_axis:

``rename``, ``reindex`` now also accept axis keyword
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:`DataFrame.rename` and :meth:`DataFrame.reindex` methods have gained
the ``axis`` keyword to specify the axis to target with the operation
(:issue:`12392`).

Here's ``rename``:

.. ipython:: python

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(str.lower, axis='columns')
df.rename(id, axis='index')

And ``reindex``:

.. ipython:: python

df.reindex(['A', 'B', 'C'], axis='columns')
df.reindex([0, 1, 3], axis='index')

The "index, columns" style continues to work as before.

.. ipython:: python

df.rename(index=id, columns=str.lower)
df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C'])

We *highly* encourage using named arguments to avoid confusion when using either
style.

.. _whatsnew_0210.enhancements.categorical_dtype:

``CategoricalDtype`` for specifying categoricals
Expand Down
136 changes: 132 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
_values_from_object,
_maybe_box_datetimelike,
_dict_compat,
_all_not_none,
standardize_mapping)
from pandas.core.generic import NDFrame, _shared_docs
from pandas.core.index import (Index, MultiIndex, _ensure_index,
Expand Down Expand Up @@ -111,7 +112,13 @@
optional_by="""
by : str or list of str
Name or list of names which refer to the axis items.""",
versionadded_to_excel='')
versionadded_to_excel='',
optional_labels="""labels : array-like, optional
New labels / index to conform the axis specified by 'axis' to.""",
optional_axis="""axis : int or str, optional
Axis to target. Can be either the axis name ('index', 'columns')
or number (0, 1).""",
)

_numeric_only_doc = """numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use
Expand Down Expand Up @@ -2776,6 +2783,47 @@ def reindexer(value):

return np.atleast_2d(np.asarray(value))

def _validate_axis_style_args(self, arg, arg_name, index, columns,
axis, method_name):
if axis is not None:
# Using "axis" style, along with a positional arg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it would make it a lot harder if the default axis=0 is used instead of None ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I raise on

df.reindex(labels, axis=0)
df.reindex(index=labels, axis=0)

If I change the default axis to 0, then I can't detect those cases. I could go either way here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you raise on df.reindex(labels, axis=0) ? That seems perfectly valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid, just the axis=0 is redundant. Likewise with columns=labels, axis=1 (I currently raise). Happy to adjust that though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that are two different things; because you can either specify mapper/axis or index/columns.
df.reindex(labels, axis=0) is a case of mapper/axis, and is thus perfectly valid (and even more explicit than leaving out axis=0, although it is the default, so I don't think we should raise on this), while df.reindex(columns=labels, axis=1) is mixture of columns and axis, so OK to raise on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.reindex(index=labels, axis=0) is a bit the tricky case, as this one is indeed redundant and mixing the two. So in principle I would also raise here like for df.reindex(columns=labels, axis=1).
But if it is easier implementation-wise to allow that, I think that is OK (as although it is mixing both idioms, it is consistent in which axis compared to eg df.reindex(index=labels, axis=1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, #17800 (comment) was incorrect. .reindex(labels, axis=0) should clearly work!

Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do raise on the "mixing" case, so I think we agree on what should happen? And I think the current implementation does that. I'll ensure there are tests for all this.

In [1]: import pandas as pd
d
In [2]: df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

In [3]: df.reindex([0, 1], axis=0)
Out[3]:
   A  B
0  1  4
1  2  5

In [4]: df.reindex(index=[0, 1], axis=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-d6f30fd70cc7> in <module>()
----> 1 df.reindex(index=[0, 1], axis=0)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in reindex(self, labels, index, columns, axis, **kwargs)
   2945         index, columns = self._validate_axis_style_args(labels, 'labels',
   2946                                                         index, columns,
-> 2947                                                         axis, 'reindex')
   2948         return super(DataFrame, self).reindex(index=index, columns=columns,
   2949                                               **kwargs)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in _validate_axis_style_args(self, arg, arg_name, index, columns, axis, method_name)
   2788                     "\t.{method_name}.rename(index=index, columns=columns)"
   2789                 ).format(arg_name=arg_name, method_name=method_name)
-> 2790                 raise TypeError(msg)
   2791             if axis == 'index':
   2792                 index = arg

TypeError: Can't specify both 'axis' and 'index' or 'columns'. Specify either
        .reindex.rename(labels, axis=axis), or
        .reindex.rename(index=index, columns=columns)

In [5]: df.reindex(columns=[0, 1], axis=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-02a868f92f79> in <module>()
----> 1 df.reindex(columns=[0, 1], axis=1)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in reindex(self, labels, index, columns, axis, **kwargs)
   2945         index, columns = self._validate_axis_style_args(labels, 'labels',
   2946                                                         index, columns,
-> 2947                                                         axis, 'reindex')
   2948         return super(DataFrame, self).reindex(index=index, columns=columns,
   2949                                               **kwargs)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in _validate_axis_style_args(self, arg, arg_name, index, columns, axis, method_name)
   2788                     "\t.{method_name}.rename(index=index, columns=columns)"
   2789                 ).format(arg_name=arg_name, method_name=method_name)
-> 2790                 raise TypeError(msg)
   2791             if axis == 'index':
   2792                 index = arg

TypeError: Can't specify both 'axis' and 'index' or 'columns'. Specify either
        .reindex.rename(labels, axis=axis), or
        .reindex.rename(index=index, columns=columns)

# Both index and columns should be None then
axis = self._get_axis_name(axis)
if index is not None or columns is not None:
msg = (
"Can't specify both 'axis' and 'index' or 'columns'. "
"Specify either\n"
"\t.{method_name}.rename({arg_name}, axis=axis), or\n"
"\t.{method_name}.rename(index=index, columns=columns)"
).format(arg_name=arg_name, method_name=method_name)
raise TypeError(msg)
if axis == 'index':
index = arg
elif axis == 'columns':
columns = arg

elif _all_not_none(arg, index, columns):
msg = (
"Cannot specify all of '{arg_name}', 'index', and 'columns'. "
"Specify either {arg_name} and 'axis', or 'index' and "
"'columns'."
).format(arg_name=arg_name)
raise TypeError(msg)

elif _all_not_none(arg, index):
# This is the "ambiguous" case, so emit a warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth factoring this function out if its common with the drop changes? not sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put in pandas/util/_validators.py with all other arg validation code

msg = (
"Interpreting call to '.{method_name}(a, b)' as "
"'.{method_name}(index=a, columns=b)'. "
"Use keyword arguments to remove any ambiguity."
).format(method_name=method_name)
warnings.warn(msg, stacklevel=3)
index, columns = arg, index
elif index is None:
# This is for the default axis, like reindex([0, 1])
index = arg
return index, columns

@property
def _series(self):
result = {}
Expand Down Expand Up @@ -2902,7 +2950,11 @@ def align(self, other, join='outer', axis=None, level=None, copy=True,
broadcast_axis=broadcast_axis)

@Appender(_shared_docs['reindex'] % _shared_doc_kwargs)
def reindex(self, index=None, columns=None, **kwargs):
def reindex(self, labels=None, index=None, columns=None, axis=None,
**kwargs):
index, columns = self._validate_axis_style_args(labels, 'labels',
index, columns,
axis, 'reindex')
return super(DataFrame, self).reindex(index=index, columns=columns,
**kwargs)

Expand All @@ -2914,8 +2966,84 @@ def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True,
method=method, level=level, copy=copy,
limit=limit, fill_value=fill_value)

@Appender(_shared_docs['rename'] % _shared_doc_kwargs)
def rename(self, index=None, columns=None, **kwargs):
def rename(self, mapper=None, index=None, columns=None, axis=None,
**kwargs):
"""Alter axes labels.

Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed don't throw an
error.

See the :ref:`user guide <basics.rename>` for more.

Parameters
----------
mapper, index, columns : dict-like or function, optional
dict-like or functions transformations to apply to
that axis' values. Use either ``mapper`` and ``axis`` to
specify the axis to target with ``mapper``, or ``index`` and
``columns``.
axis : int or str, optional
Axis to target with ``mapper``. Can be either the axis name
('index', 'columns') or number (0, 1). The default is 'index'.
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new %(klass)s. If True then value of copy is
ignored.
level : int or level name, default None
In case of a MultiIndex, only rename labels in the specified
level.

Returns
-------
renamed : DataFrame

See Also
--------
pandas.DataFrame.rename_axis

Examples
--------

``DataFrame.rename`` supports two calling conventions

* ``(index=index_mapper, columns=columns_mapper, ...)
* ``(mapper, axis={'index', 'columns'}, ...)

We *highly* recommend using keyword arguments to clarify your
intent.

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(index=str, columns={"A": "a", "B": "c"})
a c
0 1 4
1 2 5
2 3 6

>>> df.rename(index=str, columns={"A": "a", "C": "c"})
a B
0 1 4
1 2 5
2 3 6

Using axis-style parameters

>>> df.rename(str.lower, axis='columns')
a b
0 1 4
1 2 5
2 3 6

>>> df.rename({1: 2, 2: 4}, axis='index')
A B
0 1 4
2 2 5
4 3 6
"""
index, columns = self._validate_axis_style_args(mapper, 'mapper',
index, columns,
axis, 'rename')
return super(DataFrame, self).rename(index=index, columns=columns,
**kwargs)

Expand Down
Loading