-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix tz-aware DatetimeIndex + TimedeltaIndex (#17558) #17583
Conversation
Fix minor bug causing DatetimeIndex + TimedeltaIndex to raise an error, and fix another bug causing the sum of a tz-aware DatetimeIndex and a numpy array of timedeltas to incorrectly have timezone applied twice.
# and DatetimeIndex + TimedeltaIndex work as expected | ||
dti = pd.DatetimeIndex([pd.Timestamp("2017/01/01")]) | ||
dti = dti.tz_localize('US/Eastern') | ||
expected = pd.DatetimeIndex([pd.Timestamp("2017/01/01 01:00")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : add newline above this one.
td_np = np.array([np.timedelta64(1, 'h')], dtype="timedelta64[ns]") | ||
tm.assert_index_equal(dti + td_np, expected) | ||
tm.assert_index_equal(dti + td_np[0], expected) | ||
tm.assert_index_equal(dti + TimedeltaIndex(td_np), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just for-loop these checks instead? Also, let's be a little more explicit and write it this way:
actual = dti + ...
tm.assert_index_equal(actual, expected)
pandas/core/indexes/datetimelike.py
Outdated
if is_timedelta64_dtype(other): | ||
return self._add_delta(TimedeltaIndex(other)) | ||
elif is_integer_dtype(other): | ||
return NotImplemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test that hits this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but indirectly, without this branch it was breaking a couple of other tests that added a PeriodIndex
and a np.array(dtype=int64)
. I didn't check what the behavior is there, but could add a more direct test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct test is certainly appreciated. Go for it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I looked into the cases getting triggered with PeriodIndex
and unfortunately they're buried pretty deeply in internals (e.g. not obvious this would get called from any public API, at least any more so than the tests that were already reaching this). Also @jreback commented that this should only be internal facing, so I figure its better not to "advertise" this API in a test.
pandas/core/indexes/datetimelike.py
Outdated
elif is_integer_dtype(other): | ||
return NotImplemented | ||
else: | ||
raise TypeError("cannot add {typ1} and np.ndarray[{typ2}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as with the NotImplemented
. Let's make sure we have a test to hit this branch.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -506,6 +506,7 @@ Indexing | |||
- Bug in ``CategoricalIndex`` reindexing in which specified indices containing duplicates were not being respected (:issue:`17323`) | |||
- Bug in intersection of ``RangeIndex`` with negative step (:issue:`17296`) | |||
- Bug in ``IntervalIndex`` where performing a scalar lookup fails for included right endpoints of non-overlapping monotonic decreasing indexes (:issue:`16417`, :issue:`17271`) | |||
- Bug in adding ``DatetimeIndex`` with a ``TimedeltaIndex`` or a numpy array with ``dtype="timedelta64"`` (:issue:`17558`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct; the bug is only with a numpy array of dtype=timedelta64[ns]
(timedelta64 is not a valid dtype for pandas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "or" was a bit confusing, but there are two separate bugs:
- adding tz-aware
DatetimeIndex
withTimedeltaIndex
- adding tz-aware
DatetimeIndex
with numpy array which is timedelta64-dtype-like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how so? where does DTI + TDI tz-aware not work? your issues is w.r.t. np.array of timedeltas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned it briefly in the issue ticket, although I neglected to post the full stack trace:
In [1]: import numpy as np, pandas as pd
In [2]: dti = pd.DatetimeIndex([pd.Timestamp("2017/01/01")]).tz_localize('US/Eastern')
In [3]: td_np = np.array([np.timedelta64(1, 'ns')])
In [4]: (dti + pd.TimedeltaIndex(td_np))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-20f28143c2bb> in <module>()
----> 1 (dti + pd.TimedeltaIndex(td_np))
/spare/local/azhu/venv-el6-20170811-stonelib/lib/python2.7/site-packages/pandas/tseries/base.pyc in __add__(self, other)
617 from pandas.tseries.offsets import DateOffset
618 if isinstance(other, TimedeltaIndex):
--> 619 return self._add_delta(other)
620 elif isinstance(self, TimedeltaIndex) and isinstance(other, Index):
621 if hasattr(other, '_add_delta'):
/spare/local/azhu/venv-el6-20170811-stonelib/lib/python2.7/site-packages/pandas/tseries/index.pyc in _add_delta(self, delta)
787 new_values = self._add_delta_td(delta)
788 elif isinstance(delta, TimedeltaIndex):
--> 789 new_values = self._add_delta_tdi(delta)
790 # update name when delta is Index
791 name = com._maybe_match_name(self, delta)
/spare/local/azhu/venv-el6-20170811-stonelib/lib/python2.7/site-packages/pandas/tseries/base.pyc in _add_delta_tdi(self, other)
704 mask = (self._isnan) | (other._isnan)
705 new_values[mask] = tslib.iNaT
--> 706 return new_values.view(self.dtype)
707
708 def isin(self, values):
TypeError: data type not understood
pandas/core/indexes/datetimelike.py
Outdated
@@ -651,6 +652,15 @@ def __add__(self, other): | |||
raise TypeError("cannot add {typ1} and {typ2}" | |||
.format(typ1=type(self).__name__, | |||
typ2=type(other).__name__)) | |||
elif isinstance(other, np.ndarray): | |||
if is_timedelta64_dtype(other): | |||
return self._add_delta(TimedeltaIndex(other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can only accept timedelta64 here, integer is not valid either fro datetimelikes in add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in another comment above, the integer case is being used by internals for PeriodIndex
. Example: pandas/tseries/offsets.py:_beg_apply_index#L468
adds base_period (PeriodIndex) + roll (ndarray[dtype=int64])
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we don't allow this user facting; this is ONLY an internal operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I don't quite understand what you're getting at -- are you saying there's still something to be done here? I just added the integer branch to prevent aforementioned internally used operation from breaking. (since it returns NotImplemented
as it currently does, it will fall through to ufunc implementation.) I can add a code comment to explain the intent?
pandas/core/indexes/datetimelike.py
Outdated
@@ -731,7 +741,7 @@ def _add_delta_tdi(self, other): | |||
if self.hasnans or other.hasnans: | |||
mask = (self._isnan) | (other._isnan) | |||
new_values[mask] = iNaT | |||
return new_values.view(self.dtype) | |||
return new_values.view('i8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes adding tz-aware DatetimeIndex
and TimedeltaIndex
, since self.dtype
here is a Datetime64TZDtype
which causes new_values.view(..)
to raise an exception. Would you prefer another solution, say checking if is_datetime64tz_dtype(self.dtype)
first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes use is_datetime64tz_dtype(self.dtype) is the correct idiom. but you'll have to demonstrate this bug specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posted stack trace that hits this line above. So I can change this to
if is_datetime64tz_dtype(self.dtype):
return new_values.view('i8')
return new_values.view(self.dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the not the right fix.
tz-aware need to be turned into local i8, added, then turned back into the original timezone via localization. normally we do add i8 directly, BUT you cannot do this with tz-aware, e.g. the days are shifted otherwise (imagine adding days), or don't work when you cross DST.
pandas.tseries.offsets already does all of this. I haven't stepped thru this code lately, but need to hit a similar path to that.
@@ -460,6 +460,20 @@ def test_add_dti_dti(self): | |||
with pytest.raises(TypeError): | |||
dti + dti_tz | |||
|
|||
def test_add_dti_ndarray(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test which raises for numpy arrays of incorrect type (e.g. not m8[ns]);
# GH 17558 | ||
# Check that tz-aware DatetimeIndex + np.array(dtype="timedelta64") | ||
# and DatetimeIndex + TimedeltaIndex work as expected | ||
dti = pd.DatetimeIndex([pd.Timestamp("2017/01/01")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this entire test to datetimelike.py
; it will hit datetime, period, timedelta that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should prob move more of these add tests there as well, but can do that in another PR
Hello @azjps! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on September 20, 2017 at 00:50 Hours UTC |
pandas/tests/indexes/datetimelike.py
Outdated
idx = idx.tz_localize("US/Eastern") | ||
|
||
expected = idx + np.timedelta64(1, 'D') | ||
tm.assert_index_equal(idx, expected - np.timedelta64(1, 'D')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to explain the awkwardness here: was trying to figure out how to demonstrate that expected
actually has the properties we expect, e.g. has name = "x"
and is a day ahead of idx
@@ -76,3 +76,27 @@ def test_union(self): | |||
for case in cases: | |||
result = first.union(case) | |||
assert tm.equalContents(result, everything) | |||
|
|||
def test_add_dti_td(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is almost the same as special case of the above test in datetimelike.py
for DatetimeIndex
, if that test looks good I will remove this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments above you can simply re-define create_index to make this work (in another test class)
The internal implementation of _add_delta_td already correctly adds a datetime index and an array of timedeltas, so use that implementation instead of wrapping in a TimedeltaIndex (which requires a bit more metadata like name to be passed around). Move test to datetimelike to check addition of TimedeltaIndex and np.array(timedelta64) with each of {DatetimeIndex, TimedeltaIndex, PeriodIndex}. Fix the latter two to explicitly support addition with a numpy array. Clarify some comments such as the whatsnew and move tests to test_datetimelike.
Codecov Report
@@ Coverage Diff @@
## master #17583 +/- ##
==========================================
+ Coverage 91.19% 91.2% +<.01%
==========================================
Files 163 163
Lines 49626 49638 +12
==========================================
+ Hits 45258 45270 +12
Misses 4368 4368
Continue to review full report at Codecov.
|
elif isinstance(other, np.ndarray): | ||
if is_timedelta64_dtype(other): | ||
return self._add_delta(other) | ||
elif is_integer_dtype(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of this, we want to specifically define add in PeriodIndex which handles this case and calls super otherwise.
@@ -731,6 +744,8 @@ def _add_delta_tdi(self, other): | |||
if self.hasnans or other.hasnans: | |||
mask = (self._isnan) | (other._isnan) | |||
new_values[mask] = iNaT | |||
if is_datetime64tz_dtype(self.dtype): | |||
return new_values.view('i8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, see my comments
@@ -801,6 +802,8 @@ def _add_delta(self, delta): | |||
|
|||
if isinstance(delta, (Tick, timedelta, np.timedelta64)): | |||
new_values = self._add_delta_td(delta) | |||
elif isinstance(delta, np.ndarray) and is_timedelta64_dtype(delta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should simply be incorporated into the case right below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by below, do I understand you correctly that we can consolidate as:
if isinstance(delta, (Tick, timedelta, np.timedelta64, np.ndarray)):
new_values = self._add_delta_td(delta)
elif ..:
or are you suggesting to wrap delta
into a TimedeltaIndex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
but i think we need to check the dtype as well if it's a numpy array (can be separate from this)
if isinstance(delta, (Tick, timedelta, np.timedelta64)): | ||
new_values = self._add_delta_td(delta) | ||
name = self.name | ||
elif isinstance(delta, np.ndarray) and is_timedelta64_dtype(delta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
# and DatetimeIndex + TimedeltaIndex work as expected | ||
idx = self.create_index() | ||
idx.name = "x" | ||
if isinstance(idx, pd.DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use an if like this.
let's create a sub-class that specifically tests tz-aware (so .create_index()) works correctly. (add in datetimes/test_datetimelikes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree with this, but unfortunately a tz-aware-based subclass fails a whole bunch of other common tests, so the scope of this PR would have to be expanded further (for which I'd probably need help). Here's one example to illustrate, basically DatetimeIndex
constructor will read its input list into a (tz-naive) numpy array and "re-tz_localize" based on dtype's timezone:
def test_repr_roundtrip(self):
idx = self.create_index()
> tm.assert_index_equal(eval(repr(idx)), idx)
> raise AssertionError(msg)
E AssertionError: Index are different
E
E Index values are different (100.0 %)
E [left]: DatetimeIndex(['2013-01-01 05:00:00-05:00', '2013-01-02 05:00:00-05:00',
E '2013-01-03 05:00:00-05:00', '2013-01-04 05:00:00-05:00',
E '2013-01-05 05:00:00-05:00'],
E dtype='datetime64[ns, US/Eastern]', freq='D')
E [right]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
E '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00',
E '2013-01-05 00:00:00-05:00'],
E dtype='datetime64[ns, US/Eastern]', freq='D')
(Pdb) repr(idx) # broke up output for readability
"DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',\n"
" '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00',\n"
" '2013-01-05 00:00:00-05:00'],\n"
" dtype='datetime64[ns, US/Eastern]', freq='D')"
(Pdb) eval(repr(idx))
DatetimeIndex(['2013-01-01 05:00:00-05:00', '2013-01-02 05:00:00-05:00',
'2013-01-03 05:00:00-05:00', '2013-01-04 05:00:00-05:00',
'2013-01-05 05:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq='D')
Most of the other test failures can also be treated as variants of this, as these tests also construct a separate DatetimeIndex
instance without tz information, although its possible to (say) modify those tests directly to tz_localize
to UTC and then tz_convert
to appropriate timezone if create_index().tz
is available.
Also the tests related to union, intersection, symmetric_difference, etc try to do these operations against the underlying numpy .values
which are tz-naive, leading to issues. We maybe could get around these by just disabling that part of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the right way to do this is to do another PR which separates out the DTI tests to 2 classes for tz-naive and tz-aware and modifies create_index as appropriate. If there are failing tests we can look closer. Want to start on that?
@@ -76,3 +76,27 @@ def test_union(self): | |||
for case in cases: | |||
result = first.union(case) | |||
assert tm.equalContents(result, everything) | |||
|
|||
def test_add_dti_td(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments above you can simply re-define create_index to make this work (in another test class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put on hold until testing is unwound.
can you rebase and move the note to 0.21.1 |
can you rebase / update. we have slightly better testing in place now FYI. |
closing as stale |
Fix a bug that was causing a tz-aware DatetimeIndex + TimedeltaIndex to raise an error. (An internal function was documented to return an integer view, but instead was trying to use the DatetimeIndex's dtype.) Fix another bug causing the sum of a tz-aware DatetimeIndex and a numpy array of timedeltas to incorrectly have timezone applied twice, by adding another case in
DatetimeIndex.__add__
.git diff upstream/master -u -- "*.py" | flake8 --diff