-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix and test index division by zero #19347
Conversation
|
||
try: | ||
# apply if we have an override | ||
if step: | ||
with np.errstate(all='ignore'): | ||
rstep = step(self._step, other) | ||
rstep = step(left._step, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side-note: This is still not great; left
may no longer be self
, so this may raise an AttributeError. The existing method specifically catches AttributeError so I guess this is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use getattr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getattr(left, '_step', left)
and remove the AttributeError catching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getattr(left, '_stop', left)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and remove the AttributeError catching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope. Until I look at this closely, I'm assuming that the author had a reason for doing it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair enough
pandas/tests/indexes/test_numeric.py
Outdated
result = op(idx, zero) | ||
tm.assert_index_equal(result, expected) | ||
ser_compat = op(Series(idx).astype('i8'), np.array(zero).astype('i8')) | ||
tm.assert_series_equal(ser_compat, Series(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "check that this matches the Series behavior" check. Note that because the Series implementation does not yet get this right for all dtypes, we are casting the Series version to int64 and checking that it matches the one version that does consistently work. Once the Series methods are fixed, this casting can be removed.
pandas/core/indexes/base.py
Outdated
Parameters | ||
---------- | ||
op : function (operator.add, operator.div, ...) | ||
left : object, usually Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually Index means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the input left
is going to be an Index object most of the time, but could be an arbitrary object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what it means, I am asking to have you elaborate in the note on these cases, IOW why would be it be an arbitrary object, eg.. when the op is reversed?
pandas/core/indexes/base.py
Outdated
|
||
Returns | ||
------- | ||
result : ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is better in missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
pandas/core/missing.py
Outdated
@@ -645,6 +645,44 @@ def fill_zeros(result, x, y, name, fill): | |||
return result | |||
|
|||
|
|||
def mask_zero_div_zero(x, y, result): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not dissimilar from fill_zeros which is not used very much at all. maybe can consolidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First attempt was to edit fill_zeros to make it work, but I ended up changing it more than I thought appropriate. I'll take another look.
pandas/core/missing.py
Outdated
|
||
Returns | ||
------- | ||
filled_result : ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the semantics you have here are not explained well. you are mutating result in-place, then returning it. is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll flesh it out. The semantics were meant to follow fill_zeros as closely as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok some more expl would be helpful. filling result is ok, you just have to make this really clear.
pandas/tests/indexes/test_numeric.py
Outdated
@@ -17,6 +17,11 @@ | |||
|
|||
from pandas.tests.indexes.common import Base | |||
|
|||
# For testing division by (or of) zero for Series with length 5, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be fixtures
pandas/util/testing.py
Outdated
@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d): | |||
return panel4d | |||
|
|||
|
|||
def gen_zeros(arr_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to conftest as its just a fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be re-used in Series tests a few PRs from now. Which conftest should it go into?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put it in the top-level one pandas/conftest.py I would really like to move more things there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we pass the arr_len
arg to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't you move this as a function, then create as a fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I move the function to conftest, then from test_numeric create a fixture like (?):
@pytest.fixture(params=[x for x in gen_zeros(5)
if not isinstance(x, Series)])
def zero(request):
# For testing division by (or of) zero for Series with length 5, this
# gives several scalar-zeros and length-5 vector-zeros
return request.param
Do we import conftest? I haven't seen that elsewhere. What's the upside of this over just having zeros
in the module namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. you don't need to import conftest. you can put this in the module only if its only used there. the point of conftest is to share.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is only used in 1 place, just move it to the test file itself and make it into a fixture
this needs a sub-section in whatsnew to show what has changed (e.g. Int and Range Index with / 0 and such) |
Codecov Report
@@ Coverage Diff @@
## master #19347 +/- ##
==========================================
- Coverage 91.69% 91.67% -0.03%
==========================================
Files 148 148
Lines 48561 48586 +25
==========================================
+ Hits 44530 44543 +13
- Misses 4031 4043 +12
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.23.0.txt
Outdated
Index Division By Zero Fills Correctly | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Division operations on ``Index`` and subclasses will now fill positive / 0 with ``np.inf``, negative / 0 with ``-np.inf``, and 0 / 0 with ``np.nan``. This matches existing Series behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks on Series. Add the issue number, can be multiple (e.g. this PR number and the xref issue)
doc/source/whatsnew/v0.23.0.txt
Outdated
In [6]: index = pd.UInt64Index([0, 1]) | ||
In [7]: index / np.array([0, 0], dtype=np.uint64) | ||
Out[7]: UInt64Index([0, 0], dtype='uint64') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch the order of these blocks, we always write previous, then current.
doc/source/whatsnew/v0.23.0.txt
Outdated
|
||
Current Behavior: | ||
|
||
.. code-block:: ipython |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current should always be ipython, so the code executes.
doc/source/whatsnew/v0.23.0.txt
Outdated
|
||
Previous Behavior: | ||
|
||
.. code-block:: ipython |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code blocks here are fine, but you need to actually show the executed code as in an ipython session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally clear on the difference. I'll be pushing shortly with fixes to most other comments and a best-guess as to these.
pandas/core/missing.py
Outdated
|
||
Returns | ||
------- | ||
filled_result : ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok some more expl would be helpful. filling result is ok, you just have to make this really clear.
pandas/util/testing.py
Outdated
@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d): | |||
return panel4d | |||
|
|||
|
|||
def gen_zeros(arr_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't you move this as a function, then create as a fixture.
doc/source/whatsnew/v0.23.0.txt
Outdated
|
||
.. code-block:: ipython | ||
|
||
index = pd.Index([-1, 0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a code-block needs to have a complete section copy pasted. refer to the sphinx docs
doc/source/whatsnew/v0.23.0.txt
Outdated
|
||
.. ipython:: python | ||
|
||
index = pd.Index([-1, 0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break this into 2 blocks and give some comments
results = op(self, other) | ||
return Index(results, **attrs) | ||
except (ValueError, TypeError, AttributeError, | ||
ZeroDivisionError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this raise ZeroDivisionError? isn't he point of the errstate to catch that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this raise ZeroDivisionError?
ATM pd.RangeIndex(0, 5) / 0
raises ZeroDivisionError
.
isn't he point of the errstate to catch that?
I think errstate is to suppress warnings, but not really sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, that seems wrong, but I guess can cover later
|
||
try: | ||
# apply if we have an override | ||
if step: | ||
with np.errstate(all='ignore'): | ||
rstep = step(self._step, other) | ||
rstep = step(left._step, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use getattr
pandas/util/testing.py
Outdated
@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d): | |||
return panel4d | |||
|
|||
|
|||
def gen_zeros(arr_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. you don't need to import conftest. you can put this in the module only if its only used there. the point of conftest is to share.
... isn't that also the point of pandas.util.testing? |
Huh? |
fastparquet errors on appveyor |
doc/source/whatsnew/v0.23.0.txt
Outdated
In [10]: pd.RangeIndex(1, 5) / 0 | ||
--------------------------------------------------------------------------- | ||
ZeroDivisionError Traceback (most recent call last) | ||
<ipython-input-10-4c5e91d516f3> in <module>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the while traceback with line-number, just the error line (213) itself)
doc/source/whatsnew/v0.23.0.txt
Outdated
|
||
.. code-block:: ipython | ||
|
||
In [6]: index = pd.Index([-1, 0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create this an an Int64Index
doc/source/whatsnew/v0.23.0.txt
Outdated
index = pd.Index([-1, 0, 1]) | ||
index / 0 | ||
|
||
# The result of division by zero should not depend on whether the zero is int or float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this case (or add above as well). the point of the previous/current display is to make it easy to map up what is changing.
|
||
try: | ||
# apply if we have an override | ||
if step: | ||
with np.errstate(all='ignore'): | ||
rstep = step(self._step, other) | ||
rstep = step(left._step, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getattr(left, '_stop', left)
|
||
try: | ||
# apply if we have an override | ||
if step: | ||
with np.errstate(all='ignore'): | ||
rstep = step(self._step, other) | ||
rstep = step(left._step, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and remove the AttributeError catching
nan_mask = (zmask & (x == 0)).ravel() | ||
neginf_mask = (zmask & (x < 0)).ravel() | ||
posinf_mask = (zmask & (x > 0)).ravel() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comments here
pandas/tests/indexes/test_numeric.py
Outdated
@@ -17,6 +17,11 @@ | |||
|
|||
from pandas.tests.indexes.common import Base | |||
|
|||
# For testing division by (or of) zero for Series with length 5, this | |||
# gives several scalar-zeros and length-5 vector-zeros | |||
zeros = tm.gen_zeros(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make this a fixture, then you don't need to add a parameterize on each function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this
doc/source/whatsnew/v0.23.0.txt
Outdated
Index Division By Zero Fills Correctly | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Division operations on ``Index`` and subclasses will now fill positive / 0 with ``np.inf``, negative / 0 with ``-np.inf``, and 0 / 0 with ``np.nan``. This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a confusing sentence, its not obvious that 'postive / 0' mean a positive number divided by zero. maybe use double-backticks around positive / 0
|
||
try: | ||
# apply if we have an override | ||
if step: | ||
with np.errstate(all='ignore'): | ||
rstep = step(self._step, other) | ||
rstep = step(left._step, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this
pandas/tests/indexes/test_numeric.py
Outdated
@@ -17,6 +17,11 @@ | |||
|
|||
from pandas.tests.indexes.common import Base | |||
|
|||
# For testing division by (or of) zero for Series with length 5, this | |||
# gives several scalar-zeros and length-5 vector-zeros | |||
zeros = tm.gen_zeros(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this
pandas/util/testing.py
Outdated
@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d): | |||
return panel4d | |||
|
|||
|
|||
def gen_zeros(arr_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is only used in 1 place, just move it to the test file itself and make it into a fixture
Made |
pandas/tests/indexes/test_numeric.py
Outdated
@@ -18,6 +18,16 @@ | |||
from pandas.tests.indexes.common import Base | |||
|
|||
|
|||
# For testing division by (or of) zero for Series with length 5, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you misunderstand what I want here
# gives several scalar-zeros and length-5 vector-zeros
zeros = [box([0] * 5, dtype=dtype)
for box in [pd.Index, np.array]
for dtype in [np.int64, np.uint64, np.float64]]
zeros.extend([np.array(0, dtype=dtype)
for dtype in [np.int64, np.uint64, np.float64]])
zeros.extend([0, 0.0, long(0)])
@pytest.mark.fixture(params=zeros)
def zero(request):
return request.param
then remove the @pytest.mark.parmetrize
on all the test functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment:
Made zeros a fixture in test_numeric as requested, but that broke tests in test_range b/c of pytest magic
AFAICT the problem is caused by the fixture being in the namespace for test_numeric but not for test_range, which has a class that inherits from the test_numeric.Numeric (which defines the relevant tests). I guess we could get around that by putting this in a shared conftest file, but all that accomplishes is making it harder for a reader to find out where things are defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so move it to the conftest, this is quite standard practice and avoids code duplication.
pandas/tests/indexes/test_numeric.py
Outdated
@@ -18,6 +18,16 @@ | |||
from pandas.tests.indexes.common import Base | |||
|
|||
|
|||
# For testing division by (or of) zero for Series with length 5, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so move it to the conftest, this is quite standard practice and avoids code duplication.
@jreback I think the fixtures are all as requested. LMK. |
|
||
In [7]: index / 0 | ||
Out[7]: Int64Index([0, 0, 0], dtype='int64') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check this after its built to make sure it looks ok.
>>> result # raw numpy result does not fill division by zero | ||
array([0, 0, 0]) | ||
>>> mask_zero_div_zero(x, y, result) | ||
array([ inf, nan, -inf]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add to the list to consolidate this module (e.g. the existing fill_zeros)
thanks. a couple of comments for the future. |
Related: pandas-dev#19336 Author: Brock Mendel <[email protected]> Closes pandas-dev#19347 from jbrockmendel/div_zero2 and squashes the following commits: be1e2e1 [Brock Mendel] move fixture to conftest 64b0c08 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2 aa969f8 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2 000aefd [Brock Mendel] fix long again 9de356a [Brock Mendel] revert fixture to fix test_range failures b8cf21d [Brock Mendel] flake8 remove unused import afedba9 [Brock Mendel] whatsnew clarification b51c2e1 [Brock Mendel] fixturize 37efd51 [Brock Mendel] make zero a fixture 965f721 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2 d648ef6 [Brock Mendel] requested edits 1ef3a6c [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2 78de1a4 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2 0277d9f [Brock Mendel] add ipython output to whatsnew 5d7e3ea [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2 ea75c3c [Brock Mendel] ipython block 6fc61bd [Brock Mendel] elaborate docstring ca3bf42 [Brock Mendel] Whatsnew section cd54349 [Brock Mendel] move dispatch_missing to core.missing 06df02a [Brock Mendel] py3 fix 84c74c5 [Brock Mendel] remove operator.div for py3 6acc2f7 [Brock Mendel] fix missing import e0e89b9 [Brock Mendel] fix and and tests for divmod 969f342 [Brock Mendel] fix and test index division by zero
Related: #19336, this implements the most important parts of the parametrization requested there.
git diff upstream/master -u -- "*.py" | flake8 --diff