Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/COMPAT: support axis=None for logical reduction (reduce over all axes) #21486

Merged
merged 10 commits into from
Jun 26, 2018

Conversation

TomAugspurger
Copy link
Contributor

This is the minimal fix, just to get np.all / np.any working again.

Some followup items:

  1. Expand to all aggregations, not just logical ones
  2. Do logical reductions blockwiwse: DataFrame.any(axis={0, 1}) returns inconsistent values for non-zero timedelta #17667. Currently, we do DataFrame.values, which isn't necessary for logical reductions.

Accepts axis=None as reduce all dims
@TomAugspurger TomAugspurger added Numeric Operations Arithmetic, Comparison, and Logical operations Compat pandas objects compatability with Numpy or Python functions labels Jun 14, 2018
@TomAugspurger TomAugspurger added this to the 0.23.2 milestone Jun 14, 2018


This also provides compatibility with NumPy 1.15, which now dispatches to ``DataFrame.all``.
With NumPy 1.15 and pandas 0.23.1 or earlier, :func:`numpy.all` will not reduce over every axis:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not -> no longer ?

B False
dtype: bool

With pandas 0.23.2, that will correctly return False.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add ", as it did before with numpy < 1.15" ?

@@ -9055,8 +9057,16 @@ def _doc_parms(cls):

Parameters
----------
axis : int, default 0
Select the axis which can be 0 for indices and 1 for columns.
axis : {None, 0 or 'index', 1 or 'columns'}, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think default None is correct?
The default for our method is still 0? It's only when doing np.all(..) that we respect the axis=None ?

@@ -3238,6 +3238,8 @@ def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,

"""
delegate = self._values
if axis is None:
axis = self._stat_axis_number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this still needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have a look at this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this earlier. We still call self._get_axis_number(axis) to validate that the user passed axis is correct. I'll modify things to be clearer.

if skipna is None:
skipna = True
if axis is None:
axis = self._stat_axis_number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be an override of this in case of Panel? (which is the only case where the stat_axis differs from 0)

@@ -6859,6 +6864,13 @@ def f(x):
try:
values = self.values
result = f(values)

if (filter_type == 'bool' and values.dtype.kind == 'O' and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_object_dtype

@@ -1159,11 +1159,34 @@ def test_any_all(self):
self._check_bool_op('any', np.any, has_skipna=True, has_bool_only=True)
self._check_bool_op('all', np.all, has_skipna=True, has_bool_only=True)

df = DataFrame(randn(10, 4)) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a new test function here

@jorisvandenbossche jorisvandenbossche changed the title Logical agg API/COMPAT: support axis=None for logical reduction (reduce over all axes) Jun 21, 2018
@jorisvandenbossche
Copy link
Member

Tests are still failing: np.any/np.all don't seem to work on timedelta data (at least for certain versions of numpy, locally for me it is failing on 1.13)

@codecov
Copy link

codecov bot commented Jun 21, 2018

Codecov Report

Merging #21486 into master will decrease coverage by 0.02%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21486      +/-   ##
==========================================
- Coverage   91.92%    91.9%   -0.03%     
==========================================
  Files         153      153              
  Lines       49563    49562       -1     
==========================================
- Hits        45559    45548      -11     
- Misses       4004     4014      +10
Flag Coverage Δ
#multiple 90.3% <85.71%> (-0.03%) ⬇️
#single 41.77% <53.57%> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/util/_test_decorators.py 92.68% <100%> (+0.18%) ⬆️
pandas/core/generic.py 96.21% <100%> (+0.08%) ⬆️
pandas/core/series.py 94.19% <100%> (ø) ⬆️
pandas/core/panel.py 97.43% <62.5%> (-0.44%) ⬇️
pandas/core/frame.py 97.19% <90.9%> (-0.05%) ⬇️
pandas/util/testing.py 85.27% <0%> (-0.7%) ⬇️
pandas/core/indexing.py 93.37% <0%> (-0.05%) ⬇️
pandas/tseries/offsets.py 97.16% <0%> (-0.04%) ⬇️
pandas/core/dtypes/common.py 94.63% <0%> (-0.02%) ⬇️
pandas/core/indexes/multi.py 94.97% <0%> (-0.01%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b36b451...ae759bd. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jun 22, 2018

I think you need to change

    @pytest.mark.xfail(
        not _np_version_under1p15,
        reason="failing under numpy-dev gh-19976")
    @pytest.mark.parametrize("axis", [0, 1, None])
    def test_clip_against_frame(self, axis):
        df = DataFrame(np.random.randn(1000, 2))

in pandas/tests/frame/test_analytics.py which was the original failing test, to use your new decorator?

otherwise lgtm.

@TomAugspurger
Copy link
Contributor Author

Good catch. I've removed the xfail entirely, since I think it will pass on all versions now.

@jorisvandenbossche jorisvandenbossche merged commit f7ed7f8 into pandas-dev:master Jun 26, 2018
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018
…axes) (pandas-dev#21486)

* Compat with NumPy 1.15 logical func
* Accepts axis=None as reduce all dims

(cherry picked from commit f7ed7f8)
jorisvandenbossche pushed a commit that referenced this pull request Jul 5, 2018
…axes) (#21486)

* Compat with NumPy 1.15 logical func
* Accepts axis=None as reduce all dims

(cherry picked from commit f7ed7f8)
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
…axes) (pandas-dev#21486)

* Compat with NumPy 1.15 logical func
* Accepts axis=None as reduce all dims
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: NumPy logical reductions (any, all) fail on DataFrame with NumPy master
3 participants