Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Why is resampling with rule='7d' different than resampling with rule='168h'? #44996

Open
2 tasks done
mkp-gebensleben opened this issue Dec 21, 2021 · 3 comments
Open
2 tasks done
Labels
Bug Resample resample method

Comments

@mkp-gebensleben
Copy link

mkp-gebensleben commented Dec 21, 2021

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/70436514/timestamp-binning-mechanics-when-resampling

Question about pandas

>>> df = pd.DataFrame(index=pd.date_range(start='2021-04-21 01:00:00', end='2021-04-28 01:00', freq='1d'), data=[1]*8)
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
            0
2021-04-22  2
2021-04-29  6
>>> df.resample(rule='168h', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
            0
2021-04-22  1
2021-04-29  7
  1. Why does this happen?
  2. Should this happen?

Using pandas 1.3.5

@mkp-gebensleben mkp-gebensleben added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Dec 21, 2021
@ms7463
Copy link
Contributor

ms7463 commented Dec 23, 2021

Looks like the issue happens here:

pandas/pandas/core/resample.py

Lines 1656 to 1668 in 5a22750

if self.freq != "D" and is_superperiod(self.freq, "D"):
if self.closed == "right":
# GH 21459, GH 9119: Adjust the bins relative to the wall time
bin_edges = binner.tz_localize(None)
bin_edges = bin_edges + timedelta(1) - Nano(1)
bin_edges = bin_edges.tz_localize(binner.tz).asi8
else:
bin_edges = binner.asi8
# intraday values on last day
if bin_edges[-2] > ax_values.max():
bin_edges = bin_edges[:-1]
binner = binner[:-1]

The 7d closed=right resample hits this condition, and this code branch readjusts the bin edges (starting at line 1666). If you run in debug mode and skip over this adjustment, you get the same results in your example.

Looks like this logic or similar has been there for a long time (but potentially meant to deal with Monthly/Weekly frequencies, rather than N*Daily frequencies?), maybe @mroeschke or @jorisvandenbossche can comment on it, since it seems they touched this section of the code most recently.

@mroeschke
Copy link
Member

This looks like a bug because resample has special logic for redefining 'D' in the presence of a DST transition #41943 which seems to negatively impact when there is no timezone defined in the example above.

This will hopefully be fixed in 2.0 where we want to remove this special casing: #44823

@mroeschke mroeschke added Bug Resample resample method and removed Usage Question Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 23, 2021
@mkp-gebensleben
Copy link
Author

mkp-gebensleben commented Jan 3, 2022

@mroeschke

which seems to negatively impact when there is no timezone defined in the example above

I'm getting the same results when using a timezone.

>>> df = pd.DataFrame(index=pd.date_range(start='2021-04-21 01:00:00', end='2021-04-28 01:00', freq='1d', tz=0), data=[1]*8)
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00+00:00', closed='right', label='right').sum()
                           0
2021-04-22 00:00:00+00:00  2
2021-04-29 00:00:00+00:00  6
>>> df.resample(rule='168h', origin='2021-04-29 00:00:00+00:00', closed='right', label='right').sum()
                           0
2021-04-22 00:00:00+00:00  1
2021-04-29 00:00:00+00:00  7

@mkp-gebensleben mkp-gebensleben changed the title QST: Why is resampling with rule='7d' different than resampling with rule='168h'? BUG: Why is resampling with rule='7d' different than resampling with rule='168h'? Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method
Projects
None yet
Development

No branches or pull requests

3 participants