Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masking and overflow checks for datetimeindex and timedeltaindex ops #18020

Merged
merged 17 commits into from
Nov 4, 2017
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -679,7 +679,7 @@ def __sub__(self, other):
return self._add_delta(-other)
elif is_integer(other):
return self.shift(-other)
elif isinstance(other, datetime):
elif isinstance(other, (datetime, np.datetime64)):
return self._sub_datelike(other)
elif isinstance(other, Period):
return self._sub_period(other)
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import pandas.core.dtypes.concat as _concat
from pandas.errors import PerformanceWarning
from pandas.core.common import _values_from_object, _maybe_box
from pandas.core.algorithms import checked_add_with_arr

from pandas.core.indexes.base import Index, _index_shared_docs
from pandas.core.indexes.numeric import Int64Index, Float64Index
Expand Down Expand Up @@ -767,7 +768,7 @@ def _sub_datelike(self, other):
raise TypeError("DatetimeIndex subtraction must have the same "
"timezones or no timezones")
result = self._sub_datelike_dti(other)
elif isinstance(other, datetime):
elif isinstance(other, (datetime, np.datetime64)):
other = Timestamp(other)
if other is libts.NaT:
result = self._nat_new(box=False)
Expand All @@ -777,7 +778,8 @@ def _sub_datelike(self, other):
"timezones or no timezones")
else:
i8 = self.asi8
result = i8 - other.value
result = checked_add_with_arr(i8, -other.value,
arr_mask=self._isnan)
result = self._maybe_mask_results(result,
fill_value=libts.iNaT)
else:
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,8 @@ def _add_datelike(self, other):
else:
other = Timestamp(other)
i8 = self.asi8
result = checked_add_with_arr(i8, other.value)
result = checked_add_with_arr(i8, other.value,
arr_mask=self._isnan)
result = self._maybe_mask_results(result, fill_value=iNaT)
return DatetimeIndex(result, name=self.name, copy=False)

Expand Down
34 changes: 34 additions & 0 deletions pandas/tests/indexes/datetimes/test_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,6 +447,40 @@ def f():
t - offset
pytest.raises(OverflowError, f)

def test_datetimeindex_sub_timestamp_overflow(self):
dtimax = pd.to_datetime(['now', pd.Timestamp.max])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue for these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no issue for this; I noticed it when tracking down the TimedeltaIndex bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure there is, the PR number!

dtimin = pd.to_datetime(['now', pd.Timestamp.min])

tsneg = Timestamp('1950-01-01')
ts_neg_variants = [tsneg,
tsneg.to_pydatetime(),
tsneg.to_datetime64().astype('datetime64[ns]'),
tsneg.to_datetime64().astype('datetime64[D]')]

tspos = Timestamp('1980-01-01')
ts_pos_variants = [tspos,
tspos.to_pydatetime(),
tspos.to_datetime64().astype('datetime64[ns]'),
tspos.to_datetime64().astype('datetime64[D]')]

for variant in ts_neg_variants:
with pytest.raises(OverflowError):
dtimax - variant

expected = pd.Timestamp.max.value - tspos.value
for variant in ts_pos_variants:
res = dtimax - variant
assert res[1].value == expected

expected = pd.Timestamp.min.value - tsneg.value
for variant in ts_neg_variants:
res = dtimin - variant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as below, assert fully the result type.

assert res[1].value == expected

for variant in ts_pos_variants:
with pytest.raises(OverflowError):
dtimin - variant

def test_get_duplicates(self):
idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-02',
'2000-01-03', '2000-01-03', '2000-01-04'])
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/indexes/timedeltas/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1282,3 +1282,23 @@ def test_add_overflow(self):
result = (to_timedelta([pd.NaT, '5 days', '1 hours']) +
to_timedelta(['7 seconds', pd.NaT, '4 hours']))
tm.assert_index_equal(result, exp)

def test_timedeltaindex_add_timestamp_nat_masking(self):
# GH17991 checking for overflow-masking with NaT
tdinat = pd.to_timedelta(['24658 days 11:15:00', 'NaT'])

tsneg = Timestamp('1950-01-01')
ts_neg_variants = [tsneg,
tsneg.to_pydatetime(),
tsneg.to_datetime64().astype('datetime64[ns]'),
tsneg.to_datetime64().astype('datetime64[D]')]

tspos = Timestamp('1980-01-01')
ts_pos_variants = [tspos,
tspos.to_pydatetime(),
tspos.to_datetime64().astype('datetime64[ns]'),
tspos.to_datetime64().astype('datetime64[D]')]

for variant in ts_neg_variants + ts_pos_variants:
res = tdinat + variant
assert res[1] is pd.NaT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check sub as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would check both add/sub for the reverse, e.g. variant + tdinat (and -)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well assert fully, e.g.

tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do the full add/sub/radd/rsub matrix... that was kind of what I started out with. The question becomes where to put it, since arithmetic tests are scattered about. My preference would be new test modules test_arithmetic in each of indexes.timedeltas, indexes.datetimes, and indexes.periods where I can a) put these new tests and b) collect the arithmetic tests that are currently scattered about.

See discussion in #18026, #18036.

But for now I'll just edit the contents of the tests already introduced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well assert fully, e.g.

tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))

The first entry is not NaT, will not be constant across all of the variants (though the variants could be split into two groups over which it should be unchanging)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_arithmetic sounds like a good name. key is to share code as much as possible, via fixtures / parametrization / inheritence. We ideally want these objects to act as similar as possible, so keeping special cases to a minimum is important.

note before we do this, I think splitting out the tz-aware tests to its own hierarchy should be done.

see #17583 and #17694

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for now I'll just edit the contents of the tests already introduced.

Actually as I look at this, I'd much rather close out this bug fix and follow up with the Do It Right approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly. bug fixes are good. separate, self-contained refactorings to make things more readable are better!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. After the current deluge of clears up, I'll circle back to this.