Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: resample with tz-aware: Values falls after last bin #15549

Closed
ahcub opened this issue Mar 2, 2017 · 2 comments · Fixed by #18337
Closed

BUG: resample with tz-aware: Values falls after last bin #15549

ahcub opened this issue Mar 2, 2017 · 2 comments · Fixed by #18337
Labels
Bug Resample resample method
Milestone

Comments

@ahcub
Copy link
Contributor

ahcub commented Mar 2, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000], tz='UTC').tz_convert('America/Chicago')

print(index)

df = pd.DataFrame([1, 2], index=index)

print(df.resample('12h', closed='right', label='right').last().ffill())

Problem description

resampling is not handling non-UTC index properly due to daylight saving time change

and the problem occurs in file https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/resample.py
function: _get_time_bins
code: binner = labels = DatetimeIndex(freq=self.freq,...

this problem can be solved by converting ax tz to UTC before the resampling and applying the original tz after DatetimeIndex is created

so the code will look like this

    tz = ax.tz
    ax = ax.tz_convert('UTC')
    if len(ax) == 0:
        binner = labels = DatetimeIndex(
            data=[], freq=self.freq, name=ax.name)
        return binner, [], labels

    first, last = ax.min(), ax.max()
    first, last = _get_range_edges(first, last, self.freq,
                                   closed=self.closed,
                                   base=self.base)
    # GH #12037
    # use first/last directly instead of call replace() on them
    # because replace() will swallow the nanosecond part
    # thus last bin maybe slightly before the end if the end contains
    # nanosecond part and lead to `Values falls after last bin` error
    binner = labels = DatetimeIndex(freq=self.freq,
                                    start=first,
                                    end=last,
                                    name=ax.name).tz_convert(tz)

this cause the bins to be always aligned by UTC times rather than original tz, but I think that it is adequate behaviour as well.

Expected Output

I expect the resampling to be successful regardless of the time range selected

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 34.0.2 Cython: None numpy: 1.12.0 scipy: 0.18.1 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: 0.7.9.None psycopg2: None jinja2: None boto: None pandas_datareader: None
@ahcub
Copy link
Contributor Author

ahcub commented Mar 2, 2017

I can create a pull request with the change I described if it looks ok

@jreback
Copy link
Contributor

jreback commented Mar 2, 2017

might be (tangentially) related to #12351 and #12037 .
though i suspect the tz is messing with this.

sure a PR to fix would be great.

@jreback jreback added this to the Next Major Release milestone Mar 2, 2017
@jreback jreback changed the title ValueError: Values falls after last bin BUG: resample with tz-aware: Values falls after last bin Mar 2, 2017
ahcub added a commit to ahcub/pandas that referenced this issue Mar 2, 2017
@jreback jreback modified the milestones: Next Major Release, 0.21.1 Nov 20, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment