Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix IntervalTree handling of NaN #23353

Merged
merged 2 commits into from
Oct 30, 2018
Merged

Conversation

jschendel
Copy link
Member

xref #23327 : As far as I can tell the only reason that PR is failing is because of tests raising a RuntimeWarning due to this bug.

@jschendel jschendel added Bug Interval Interval data type labels Oct 26, 2018
@jschendel jschendel added this to the 0.24.0 milestone Oct 26, 2018
@pep8speaks
Copy link

Hello @jschendel! Thanks for submitting the PR.

@mroeschke
Copy link
Member

Does this apply to NaT as well?

@jschendel
Copy link
Member Author

Does this apply to NaT as well?

IntervalIndex performs an i8 conversion on datetime/timedelta data prior to passing it to IntervalTree, as IntervalTree only supports numeric dtypes, so only need to worry about np.nan in IntervalTree. As part of the i8 conversion NaT is explicitly converted to np.nan instead of NaT's integer value (implemented in #23327 but could maybe be moved here), as it's integer value can be mistakenly interpreted as non-null data once everything's been converted to integers.

@codecov
Copy link

codecov bot commented Oct 26, 2018

Codecov Report

Merging #23353 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23353      +/-   ##
==========================================
- Coverage   92.22%   92.18%   -0.05%     
==========================================
  Files         169      161       -8     
  Lines       51258    51160      -98     
==========================================
- Hits        47274    47161     -113     
- Misses       3984     3999      +15
Flag Coverage Δ
#multiple 90.61% <ø> (-0.05%) ⬇️
#single 42.24% <ø> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/util/_depr_module.py 0% <0%> (-65.12%) ⬇️
pandas/core/indexes/period.py 91.04% <0%> (-1.24%) ⬇️
pandas/core/dtypes/common.py 94.37% <0%> (-0.49%) ⬇️
pandas/core/dtypes/cast.py 88.92% <0%> (-0.31%) ⬇️
pandas/core/arrays/timedeltas.py 94.3% <0%> (-0.18%) ⬇️
pandas/core/nanops.py 95.05% <0%> (-0.15%) ⬇️
pandas/core/arrays/period.py 97.48% <0%> (-0.09%) ⬇️
pandas/core/series.py 93.89% <0%> (-0.03%) ⬇️
pandas/io/pytables.py 92.43% <0%> (-0.01%) ⬇️
pandas/core/generic.py 96.79% <0%> (-0.01%) ⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 437f31c...552accb. Read the comment docs.

# GH 23352
left, right = [0, 1, 2, np.nan], [1, 2, 3, np.nan]
with tm.assert_produces_warning(None):
IntervalTree(left, right, leaf_size=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly does this produce though? are the nan's valid? you are just dropping them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here was to verify that the RuntimeWarning described in the issue is not occurring. I guess this test isn't really necessary, as at least one CI instance is set to fail if warnings are raised. Removed this test, and instead parametrized the tree fixture to include leaf_size that will trigger the warning and cause the invalid behavior described in the issue (so beyond just a warning would cause the asserts to fail if there's a regression).

@jreback jreback merged commit 353a0f9 into pandas-dev:master Oct 30, 2018
@jschendel jschendel deleted the ivtree-nan branch October 30, 2018 14:36
@jreback
Copy link
Contributor

jreback commented Oct 31, 2018

this broke 32 bit builds
https://travis-ci.org/MacPython/pandas-wheels/jobs/448674095

can u have a look @jschendel

@jschendel
Copy link
Member Author

@jreback : it looks like this 32 bit failure is something that's come up before and is currently explicitly skipped in the other test where it occurs:

@pytest.mark.skipif(compat.is_platform_32bit(),
reason="int type mismatch on 32bit")
@pytest.mark.parametrize('leaf_size', [1, 10, 100, 10000])
def test_get_indexer_closed(self, closed, leaf_size):

This was done before my time, so not sure about the exact history here.

My plan is to similarly skip the newly written tests and open a relevant issue to formally document this. It doesn't seem particularly pressing to fix since this has been present for a while, only impacts 32 bit, and there haven't been any issues opened. Can look into fixing it if I'm mistaken, though I don't immediately see what needs to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: IntervalTree should not have NaN in nodes
4 participants