Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix and test index division by zero #19347

Closed
wants to merge 24 commits into from

Conversation

jbrockmendel
Copy link
Member

Related: #19336, this implements the most important parts of the parametrization requested there.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry


try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side-note: This is still not great; left may no longer be self, so this may raise an AttributeError. The existing method specifically catches AttributeError so I guess this is intentional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use getattr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr(left, '_step', left) and remove the AttributeError catching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr(left, '_stop', left)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remove the AttributeError catching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope. Until I look at this closely, I'm assuming that the author had a reason for doing it this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair enough

result = op(idx, zero)
tm.assert_index_equal(result, expected)
ser_compat = op(Series(idx).astype('i8'), np.array(zero).astype('i8'))
tm.assert_series_equal(ser_compat, Series(result))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a "check that this matches the Series behavior" check. Note that because the Series implementation does not yet get this right for all dtypes, we are casting the Series version to int64 and checking that it matches the one version that does consistently work. Once the Series methods are fixed, this casting can be removed.

Parameters
----------
op : function (operator.add, operator.div, ...)
left : object, usually Index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually Index means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the input left is going to be an Index object most of the time, but could be an arbitrary object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what it means, I am asking to have you elaborate in the note on these cases, IOW why would be it be an arbitrary object, eg.. when the op is reversed?


Returns
-------
result : ndarray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is better in missing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -645,6 +645,44 @@ def fill_zeros(result, x, y, name, fill):
return result


def mask_zero_div_zero(x, y, result):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not dissimilar from fill_zeros which is not used very much at all. maybe can consolidate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First attempt was to edit fill_zeros to make it work, but I ended up changing it more than I thought appropriate. I'll take another look.


Returns
-------
filled_result : ndarray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantics you have here are not explained well. you are mutating result in-place, then returning it. is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll flesh it out. The semantics were meant to follow fill_zeros as closely as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok some more expl would be helpful. filling result is ok, you just have to make this really clear.

@@ -17,6 +17,11 @@

from pandas.tests.indexes.common import Base

# For testing division by (or of) zero for Series with length 5, this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be fixtures

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d


def gen_zeros(arr_len):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to conftest as its just a fixture

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be re-used in Series tests a few PRs from now. Which conftest should it go into?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put it in the top-level one pandas/conftest.py I would really like to move more things there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we pass the arr_len arg to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't you move this as a function, then create as a fixture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I move the function to conftest, then from test_numeric create a fixture like (?):

@pytest.fixture(params=[x for x in gen_zeros(5)
                                          if not isinstance(x, Series)])
def zero(request):
    # For testing division by (or of) zero for Series with length 5, this
    # gives several scalar-zeros and length-5 vector-zeros
    return request.param

Do we import conftest? I haven't seen that elsewhere. What's the upside of this over just having zeros in the module namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. you don't need to import conftest. you can put this in the module only if its only used there. the point of conftest is to share.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is only used in 1 place, just move it to the test file itself and make it into a fixture

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations labels Jan 24, 2018
@jreback
Copy link
Contributor

jreback commented Jan 24, 2018

this needs a sub-section in whatsnew to show what has changed (e.g. Int and Range Index with / 0 and such)

@codecov
Copy link

codecov bot commented Jan 24, 2018

Codecov Report

Merging #19347 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19347      +/-   ##
==========================================
- Coverage   91.69%   91.67%   -0.03%     
==========================================
  Files         148      148              
  Lines       48561    48586      +25     
==========================================
+ Hits        44530    44543      +13     
- Misses       4031     4043      +12
Flag Coverage Δ
#multiple 90.04% <100%> (-0.02%) ⬇️
#single 41.71% <22.5%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/range.py 95.65% <100%> (-0.06%) ⬇️
pandas/core/indexes/base.py 96.45% <100%> (ø) ⬆️
pandas/core/missing.py 85.9% <100%> (+1.11%) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69cd5fb...be1e2e1. Read the comment docs.

Index Division By Zero Fills Correctly
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Division operations on ``Index`` and subclasses will now fill positive / 0 with ``np.inf``, negative / 0 with ``-np.inf``, and 0 / 0 with ``np.nan``. This matches existing Series behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on Series. Add the issue number, can be multiple (e.g. this PR number and the xref issue)

In [6]: index = pd.UInt64Index([0, 1])
In [7]: index / np.array([0, 0], dtype=np.uint64)
Out[7]: UInt64Index([0, 0], dtype='uint64')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch the order of these blocks, we always write previous, then current.


Current Behavior:

.. code-block:: ipython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current should always be ipython, so the code executes.


Previous Behavior:

.. code-block:: ipython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code blocks here are fine, but you need to actually show the executed code as in an ipython session.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally clear on the difference. I'll be pushing shortly with fixes to most other comments and a best-guess as to these.


Returns
-------
filled_result : ndarray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok some more expl would be helpful. filling result is ok, you just have to make this really clear.

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d


def gen_zeros(arr_len):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't you move this as a function, then create as a fixture.


.. code-block:: ipython

index = pd.Index([-1, 0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a code-block needs to have a complete section copy pasted. refer to the sphinx docs


.. ipython:: python

index = pd.Index([-1, 0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break this into 2 blocks and give some comments

results = op(self, other)
return Index(results, **attrs)
except (ValueError, TypeError, AttributeError,
ZeroDivisionError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this raise ZeroDivisionError? isn't he point of the errstate to catch that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this raise ZeroDivisionError?

ATM pd.RangeIndex(0, 5) / 0 raises ZeroDivisionError.

isn't he point of the errstate to catch that?

I think errstate is to suppress warnings, but not really sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that seems wrong, but I guess can cover later


try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use getattr

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d


def gen_zeros(arr_len):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. you don't need to import conftest. you can put this in the module only if its only used there. the point of conftest is to share.

@jbrockmendel
Copy link
Member Author

the point of conftest is to share.

... isn't that also the point of pandas.util.testing?

@jbrockmendel
Copy link
Member Author

use getattr

Huh?

@jbrockmendel jbrockmendel reopened this Jan 25, 2018
@jbrockmendel
Copy link
Member Author

fastparquet errors on appveyor

In [10]: pd.RangeIndex(1, 5) / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-10-4c5e91d516f3> in <module>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the while traceback with line-number, just the error line (213) itself)


.. code-block:: ipython

In [6]: index = pd.Index([-1, 0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create this an an Int64Index

index = pd.Index([-1, 0, 1])
index / 0

# The result of division by zero should not depend on whether the zero is int or float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this case (or add above as well). the point of the previous/current display is to make it easy to map up what is changing.


try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr(left, '_stop', left)


try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remove the AttributeError catching

nan_mask = (zmask & (x == 0)).ravel()
neginf_mask = (zmask & (x < 0)).ravel()
posinf_mask = (zmask & (x > 0)).ravel()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments here

@@ -17,6 +17,11 @@

from pandas.tests.indexes.common import Base

# For testing division by (or of) zero for Series with length 5, this
# gives several scalar-zeros and length-5 vector-zeros
zeros = tm.gen_zeros(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make this a fixture, then you don't need to add a parameterize on each function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Index Division By Zero Fills Correctly
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Division operations on ``Index`` and subclasses will now fill positive / 0 with ``np.inf``, negative / 0 with ``-np.inf``, and 0 / 0 with ``np.nan``. This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a confusing sentence, its not obvious that 'postive / 0' mean a positive number divided by zero. maybe use double-backticks around positive / 0


try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

@@ -17,6 +17,11 @@

from pandas.tests.indexes.common import Base

# For testing division by (or of) zero for Series with length 5, this
# gives several scalar-zeros and length-5 vector-zeros
zeros = tm.gen_zeros(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d


def gen_zeros(arr_len):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is only used in 1 place, just move it to the test file itself and make it into a fixture

@jbrockmendel
Copy link
Member Author

Made zeros a fixture in test_numeric as requested, but that broke tests in test_range b/c of pytest magic. Just pushed a commit that reverted to having zeros be a list in test_numeric namespace.

@@ -18,6 +18,16 @@
from pandas.tests.indexes.common import Base


# For testing division by (or of) zero for Series with length 5, this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you misunderstand what I want here

# gives several scalar-zeros and length-5 vector-zeros
 zeros = [box([0] * 5, dtype=dtype)
          for box in [pd.Index, np.array]
          for dtype in [np.int64, np.uint64, np.float64]]
 zeros.extend([np.array(0, dtype=dtype)
              for dtype in [np.int64, np.uint64, np.float64]])
 zeros.extend([0, 0.0, long(0)])

@pytest.mark.fixture(params=zeros)
def zero(request):
    return request.param

then remove the @pytest.mark.parmetrize on all the test functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment:

Made zeros a fixture in test_numeric as requested, but that broke tests in test_range b/c of pytest magic

AFAICT the problem is caused by the fixture being in the namespace for test_numeric but not for test_range, which has a class that inherits from the test_numeric.Numeric (which defines the relevant tests). I guess we could get around that by putting this in a shared conftest file, but all that accomplishes is making it harder for a reader to find out where things are defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so move it to the conftest, this is quite standard practice and avoids code duplication.

@@ -18,6 +18,16 @@
from pandas.tests.indexes.common import Base


# For testing division by (or of) zero for Series with length 5, this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so move it to the conftest, this is quite standard practice and avoids code duplication.

@jbrockmendel
Copy link
Member Author

@jreback I think the fixtures are all as requested. LMK.


In [7]: index / 0
Out[7]: Int64Index([0, 0, 0], dtype='int64')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this after its built to make sure it looks ok.

>>> result # raw numpy result does not fill division by zero
array([0, 0, 0])
>>> mask_zero_div_zero(x, y, result)
array([ inf, nan, -inf])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add to the list to consolidate this module (e.g. the existing fill_zeros)

@jreback jreback added this to the 0.23.0 milestone Feb 6, 2018
@jreback jreback closed this in ed10bf6 Feb 6, 2018
@jreback
Copy link
Contributor

jreback commented Feb 6, 2018

thanks. a couple of comments for the future.

@jbrockmendel jbrockmendel deleted the div_zero2 branch February 6, 2018 04:44
harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Related: pandas-dev#19336

Author: Brock Mendel <[email protected]>

Closes pandas-dev#19347 from jbrockmendel/div_zero2 and squashes the following commits:

be1e2e1 [Brock Mendel] move fixture to conftest
64b0c08 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
aa969f8 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
000aefd [Brock Mendel] fix long again
9de356a [Brock Mendel] revert fixture to fix test_range failures
b8cf21d [Brock Mendel] flake8 remove unused import
afedba9 [Brock Mendel] whatsnew clarification
b51c2e1 [Brock Mendel] fixturize
37efd51 [Brock Mendel] make zero a fixture
965f721 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
d648ef6 [Brock Mendel] requested edits
1ef3a6c [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
78de1a4 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
0277d9f [Brock Mendel] add ipython output to whatsnew
5d7e3ea [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
ea75c3c [Brock Mendel] ipython block
6fc61bd [Brock Mendel] elaborate docstring
ca3bf42 [Brock Mendel] Whatsnew section
cd54349 [Brock Mendel] move dispatch_missing to core.missing
06df02a [Brock Mendel] py3 fix
84c74c5 [Brock Mendel] remove operator.div for py3
6acc2f7 [Brock Mendel] fix missing import
e0e89b9 [Brock Mendel] fix and and tests for divmod
969f342 [Brock Mendel] fix and test index division by zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants