Pandas 0.18.1 and 0.19.0 have different (and confusing) handling of TZ when subtracting Timedelta series vs subtracting single Timedelta #14524

TimTimMadden · 2016-10-27T19:26:29Z

A small, complete example of the issue

Apologies if this is already a known issue: I've not found anything similar in my googling, but my google-fu may have let me down.

When adding a single time delta to a time column with non-naive TZs, in pandas 0.19.0 this produces an array which still has the correct timezone. However, if adding an array of time deltas, the timezone is dropped and a dtype of <M8 is returned.

# Your code here
import pandas as pd

df = pd.DataFrame(pd.date_range(start=pd.Timestamp('2016-01-01 00:00:00+00'), end=pd.Timestamp('2016-01-01 23:59:59+00'), freq='H'))
df.columns = ['a']
df.a.dtype # Both 0.19.0 and 0.18.1 here return datetime64[ns, UTC]
df.a # Here, the +00 is visible.

df['b'] = df['a'].subtract(pd.Timedelta(seconds=60*60))
df['b'].dtype # Here, 0.18.1 returns dtype('<M8[ns]'), while 0.19.0 returns datetime64[ns, UTC]
df.b # Here, 0.19.0 will display the +00 while 0.18.1 will drop the TZ alteration

df['c'] = df['a'].subtract(df['b'] - df['a']) # Here, pandas 0.18.1 Fails due to trying to subtract a known timezone from a timezone-naive datetime.

df.c.dtype # Here, 0.19.0 returns dtype('<M8[ns]')
df.c # Here, 0.19.0 will not show the +00 TZ alteration.

Expected Output

I am not sure what the expected output should be. From my point of view, subtracting a Timedelta from a TZ specific timestamp should maintain the TZ. This is what appears to have happened between version 0.18.1 and 0.19.0 for single time deltas. However, I also think that the behaviour between the two cases above (single timedeltas and series of timedeltas) should be consistent.

Please clarify if I've understood this correctly!

>> df.a.dtype
 datetime64[ns, UTC]
>> df.a
0    2016-01-01 00:00:00+00:00
1    2016-01-01 01:00:00+00:00
2    2016-01-01 02:00:00+00:00
3    2016-01-01 03:00:00+00:00
4    2016-01-01 04:00:00+00:00
5    2016-01-01 05:00:00+00:00
6    2016-01-01 06:00:00+00:00
7    2016-01-01 07:00:00+00:00
8    2016-01-01 08:00:00+00:00
9    2016-01-01 09:00:00+00:00
10   2016-01-01 10:00:00+00:00
11   2016-01-01 11:00:00+00:00
12   2016-01-01 12:00:00+00:00
13   2016-01-01 13:00:00+00:00
14   2016-01-01 14:00:00+00:00
15   2016-01-01 15:00:00+00:00
16   2016-01-01 16:00:00+00:00
17   2016-01-01 17:00:00+00:00
18   2016-01-01 18:00:00+00:00
19   2016-01-01 19:00:00+00:00
20   2016-01-01 20:00:00+00:00
21   2016-01-01 21:00:00+00:00
22   2016-01-01 22:00:00+00:00
23   2016-01-01 23:00:00+00:00
Name: a, dtype: datetime64[ns, UTC]

>> df.b.dtype
datetime64[ns, UTC]
>> df.b
0    2015-12-31 23:00:00+00:00
1    2016-01-01 00:00:00+00:00
2    2016-01-01 01:00:00+00:00
3    2016-01-01 02:00:00+00:00
4    2016-01-01 03:00:00+00:00
5    2016-01-01 04:00:00+00:00
6    2016-01-01 05:00:00+00:00
7    2016-01-01 06:00:00+00:00
8    2016-01-01 07:00:00+00:00
9    2016-01-01 08:00:00+00:00
10   2016-01-01 09:00:00+00:00
11   2016-01-01 10:00:00+00:00
12   2016-01-01 11:00:00+00:00
13   2016-01-01 12:00:00+00:00
14   2016-01-01 13:00:00+00:00
15   2016-01-01 14:00:00+00:00
16   2016-01-01 15:00:00+00:00
17   2016-01-01 16:00:00+00:00
18   2016-01-01 17:00:00+00:00
19   2016-01-01 18:00:00+00:00
20   2016-01-01 19:00:00+00:00
21   2016-01-01 20:00:00+00:00
22   2016-01-01 21:00:00+00:00
23   2016-01-01 22:00:00+00:00
Name: b, dtype: datetime64[ns, UTC]

>> df.c.dtype
datetime64[ns, UTC]
>> df.c
0    2016-01-01 01:00:00+00
1    2016-01-01 02:00:00+00
2    2016-01-01 03:00:00+00
3    2016-01-01 04:00:00+00
4    2016-01-01 05:00:00+00
5    2016-01-01 06:00:00+00
6    2016-01-01 07:00:00+00
7    2016-01-01 08:00:00+00
8    2016-01-01 09:00:00+00
9    2016-01-01 10:00:00+00
10   2016-01-01 11:00:00+00
11   2016-01-01 12:00:00+00
12   2016-01-01 13:00:00+00
13   2016-01-01 14:00:00+00
14   2016-01-01 15:00:00+00
15   2016-01-01 16:00:00+00
16   2016-01-01 17:00:00+00
17   2016-01-01 18:00:00+00
18   2016-01-01 19:00:00+00
19   2016-01-01 20:00:00+00
20   2016-01-01 21:00:00+00
21   2016-01-01 22:00:00+00
22   2016-01-01 23:00:00+00
23   2016-01-02 00:00:00+00
Name: c, dtype: datetime64[ns, UTC]

Output of `pd.show_versions()`

(I have tested in two virtualenvs, the only difference between the two being the python version)

# Paste the output here ## INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 28.6.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-10-28T09:03:15Z

@TimTimMadden Thanks for the report.
This is a duplicate of #13905

There is indeed a bug in the arithmetic of timezone-ware datetime series with timedelta series.
You are correct that the other changes you see are improvements in 0.19.0 compared to 0.18 (eg keeping the timezone when subtracting a scalar timedelta).

TimTimMadden · 2016-10-28T11:07:05Z

Thanks for the reply, sorry for the duplicate!

For anyone finding this issue in the future, I needed to have two columns agreeing on tz-aware or tz-naive in order to do some time-window aggregation. Trying to re-cast the series using df.c.tz_localize('UTC') failed with the error message: TypeError: Already tz-aware, use tz_convert to convert.

I'm currently using the following workaround: df.c = df.c.apply(lambda t: t.tz_localize('UTC') if pd.notnan(t) else pd.NaT) which is a little messy but works for my purposes.

jorisvandenbossche · 2016-10-29T13:22:00Z

@TimTimMadden I am not fully following your workaround. Why was the apply needed to perform the tz_localize ?

jreback · 2016-10-29T15:00:31Z

closing as a dupe

TimTimMadden · 2016-10-29T15:38:21Z

@jorisvandenbossche Using the series tz_localize method produced the error message TypeError: Already tz-aware, use tz_convert to convert even though the contained timestamps had no time zone. I was forced to use the .apply(...) loop to use the individual timestamp tz_localize methods. This did confuse me, and wasn't exactly ideal due to the performance hit of using .apply(..). However, I didn't try to investigate further as the issue described in #10390 meant that I was forced to revert back to Pandas ver 0.18.1, as the timezone dropping bug for both series of timedeltas and single timedeltas handily means that the .max(axis=1) call completes successfully.

The part of code that meant investigating this timestamp behaviour is pretty critical to my app, as it resamples data to a regular time window while also providing me with a measure of how 'complete' that window is (i.e. what percentage of the reading for that normalised window is covered by one or more raw data readings). Having expected behaviour for this is rather important, but for the moment I'd rather have understood sub-optimal behaviour that works!

Thanks for your attention to these issues - I really do think this is a great library and massively appreciate the work you guys put in! I'm poking around in the source, and will open a pull request if I can figure out the root cause. :)

jorisvandenbossche · 2016-10-30T17:29:22Z

@TimTimMadden With your example above, and using 0.19.0, I am not seeing any problem with the localizing of the values:

In [163]: df['c']  # those are not timezone aware (that is a bug, see the other issue), but localizing them works
Out[163]: 
0    2016-01-01 01:00:00
1    2016-01-01 02:00:00
             ...        
22   2016-01-01 23:00:00
23   2016-01-02 00:00:00
Name: c, dtype: datetime64[ns]

In [164]: df['c'].dtype
Out[164]: dtype('<M8[ns]')

In [165]: df['c'].dt.tz_localize('UTC')
Out[165]: 
0    2016-01-01 01:00:00+00:00
1    2016-01-01 02:00:00+00:00
                ...           
22   2016-01-01 23:00:00+00:00
23   2016-01-02 00:00:00+00:00
Name: c, dtype: datetime64[ns, UTC]

If you still have a problem with this, please open an new issue with a reproducible example.

jorisvandenbossche added Bug Timedelta Timedelta data type Timezones Timezone data dtype Duplicate Report Duplicate issue or pull request and removed Bug labels Oct 28, 2016

jorisvandenbossche added this to the No action milestone Oct 28, 2016

TimTimMadden mentioned this issue Oct 28, 2016

BUG: DataFrame with tz-aware data and max(axis=1) returns NaN #10390

Closed

jreback closed this as completed Oct 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas 0.18.1 and 0.19.0 have different (and confusing) handling of TZ when subtracting Timedelta series vs subtracting single Timedelta #14524

Pandas 0.18.1 and 0.19.0 have different (and confusing) handling of TZ when subtracting Timedelta series vs subtracting single Timedelta #14524

TimTimMadden commented Oct 27, 2016 •

edited

Loading

jorisvandenbossche commented Oct 28, 2016

TimTimMadden commented Oct 28, 2016

jorisvandenbossche commented Oct 29, 2016

jreback commented Oct 29, 2016

TimTimMadden commented Oct 29, 2016

jorisvandenbossche commented Oct 30, 2016 •

edited

Loading

Pandas 0.18.1 and 0.19.0 have different (and confusing) handling of TZ when subtracting Timedelta series vs subtracting single Timedelta #14524

Pandas 0.18.1 and 0.19.0 have different (and confusing) handling of TZ when subtracting Timedelta series vs subtracting single Timedelta #14524

Comments

TimTimMadden commented Oct 27, 2016 • edited Loading

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

jorisvandenbossche commented Oct 28, 2016

TimTimMadden commented Oct 28, 2016

jorisvandenbossche commented Oct 29, 2016

jreback commented Oct 29, 2016

TimTimMadden commented Oct 29, 2016

jorisvandenbossche commented Oct 30, 2016 • edited Loading

TimTimMadden commented Oct 27, 2016 •

edited

Loading

Output of `pd.show_versions()`

jorisvandenbossche commented Oct 30, 2016 •

edited

Loading