Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: tz-aware DatetimeIndex + array(timedelta) gives incorrect result #17558

Closed
azjps opened this issue Sep 17, 2017 · 1 comment · Fixed by #18653
Closed

BUG: tz-aware DatetimeIndex + array(timedelta) gives incorrect result #17558

azjps opened this issue Sep 17, 2017 · 1 comment · Fixed by #18653
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@azjps
Copy link

azjps commented Sep 17, 2017

Code Sample, a copy-pastable example if possible

In [1]: import numpy as np, pandas as pd

In [2]: dt = pd.DatetimeIndex([pd.Timestamp("2017/01/01")], tz='US/Eastern')  # dtype=datetime64[ns, US/Eastern]

In [3]: td_np = np.array([np.timedelta64(1, 'ns')])  # dtype="timedelta64[ns]"

In [4]: (dt + td_np).values  # Bad, "applies" TZ twice to get 10:00 instead of 05:00!
Out[4]: array(['2017-01-01T10:00:00.000000001'], dtype='datetime64[ns]')

In [5]: (dt + td_np[0]).values
Out[5]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [6]: (dt + pd.Series(td_np)).values
Out[6]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [7]: (dt.tz_convert(None) + td_np).values
Out[7]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [8]: (dt.values + td_np)
Out[8]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [9]: (dt + pd.TimedeltaIndex(td_np))
TypeError: data type not understood

Problem description

(Above is run on 0.20.3) When adding a tz-aware DatetimeIndex and a numpy array of timedeltas, the result incorrectly has the timezone applied twice. There are several related issues on tz-aware DatetimeIndex and timedelta sums on the issue tracker, for example #14022. However, as far as I've read, those issues appear to have been mostly resolved as of ~0.19; this case just slipped through the cracks. I haven't found any other variants of this bug, except maybe that tz-aware DatetimeIndex + TimedeltaIndex raises an exception.

(Just for context, I am not relying on this behavior, just happened to come across it while updating some tests to use pandas 0.20.3, but it could lead to someone silently getting incorrect results.)

I think all that needs to be fixed here is to add another case in pd.DatetimeIndex[OpsMixin].__add__ to handle any array-like with dtype=timedelta64[..]. (Maybe there's a more general upstream solution?) If someone confirms I can submit a patch.

Expected Output

In [4]: (dt + td_np).values
Out[5]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.6.final.0 python-bits: 64 OS: Linux OS-release: 4.4.75-el6.x86_64.lime.1 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.3
pytest: 2.9.1
pip: 7.1.0
setuptools: 19.4
Cython: 0.24
numpy: 1.13.1
scipy: 0.17.0
xarray: None
IPython: 5.1.0
sphinx: 1.4.5
patsy: 0.4.1
dateutil: 2.2
pytz: 2015.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.4.0
html5lib: None
sqlalchemy: 1.0.12
pymysql: 0.6.3.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

This does look buggy. Though looking at the result in numpy space is pretty useless. Numpy does really odd things with tz, IOW it display in local tz (so this makes it really hard to read your example).

In [18]: dt
Out[18]: DatetimeIndex(['2017-01-01 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

In [19]: dt + td_np[0]
Out[19]: DatetimeIndex(['2017-01-01 00:00:00.000000001-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

In [20]: dt + td_np
Out[20]: DatetimeIndex(['2017-01-01 05:00:00.000000001-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

[19] and [20] should be the same

Further, generally it is a good idea to be very explict, IOW

In [21]: dt = pd.DatetimeIndex([pd.Timestamp("2017/01/01")]).tz_localize('US/Eastern')

In [22]: dt
Out[22]: DatetimeIndex(['2017-01-01 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

This is way more explicit that we are localizing.

But this is a small bug, happy to have a PR to fix.

@jreback jreback added this to the Next Major Release milestone Sep 17, 2017
azjps added a commit to azjps/pandas that referenced this issue Sep 18, 2017
Fix minor bug causing DatetimeIndex + TimedeltaIndex to raise an error,
and fix another bug causing the sum of a tz-aware DatetimeIndex and a
numpy array of timedeltas to incorrectly have timezone applied twice.
azjps added a commit to azjps/pandas that referenced this issue Sep 18, 2017
Fix minor bug causing DatetimeIndex + TimedeltaIndex to raise an error,
and fix another bug causing the sum of a tz-aware DatetimeIndex and a
numpy array of timedeltas to incorrectly have timezone applied twice.
@jreback jreback modified the milestones: Next Major Release, 0.22.0 Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
2 participants