-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
!I fix for BUG: resample with tz-aware: Values falls after last bin #15549 #18337
Conversation
pandas/tests/test_resample.py
Outdated
@@ -2719,6 +2719,11 @@ def test_resample_weekly_bug_1726(self): | |||
# it works! | |||
df.resample('W-MON', closed='left', label='left').first() | |||
|
|||
def test_resample_tz_aware_bug_15549(self): | |||
index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000], tz='UTC').tz_convert('America/Chicago') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference issue number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, but I already referenced it in the function name or you mean something else?
guys, can you help me to fix problem with travis? |
Read the output of the failing Travis build here. Your code fails to pass style-checks. The failure on the non-blocking build seems unrelated. Might just need to rebase on |
Codecov Report
@@ Coverage Diff @@
## master #18337 +/- ##
==========================================
- Coverage 91.38% 91.34% -0.04%
==========================================
Files 164 164
Lines 49790 49794 +4
==========================================
- Hits 45501 45485 -16
- Misses 4289 4309 +20
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18337 +/- ##
==========================================
- Coverage 91.36% 91.32% -0.05%
==========================================
Files 164 164
Lines 49718 49733 +15
==========================================
- Hits 45426 45418 -8
- Misses 4292 4315 +23
Continue to review full report at Codecov.
|
pandas/core/indexes/datetimes.py
Outdated
@@ -366,9 +366,11 @@ def __new__(cls, data=None, | |||
pass | |||
|
|||
if data is None: | |||
values_present = kwargs.pop('values_present', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the heck is all of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a change to fix a problem with resampling for tz-aware series in one of the edge cases
I hit this problem often unfortunately and from my understanding of the design I created a solution that doesn't break other functionality but fixes the problem I have
sorry if the solution doesn't look elegant, if you think there is other one that we can use, please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure what you are doing. Your changes should be restriced to pandas/core/resample.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you mean that I should not change anything in pandas/core/resample.py then unfortunately I don't know how to avoid that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the opposite. There's where the change should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can implement a solution with changes in pandas/core/resample.py
only I think
let me try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think this is a specific change that resample needs, not generically for date_ranges.
pandas/core/indexes/datetimes.py
Outdated
@@ -366,9 +366,11 @@ def __new__(cls, data=None, | |||
pass | |||
|
|||
if data is None: | |||
values_present = kwargs.pop('values_present', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure what you are doing. Your changes should be restriced to pandas/core/resample.py
pandas/core/resample.py
Outdated
@@ -1139,7 +1142,8 @@ def _get_time_bins(self, ax): | |||
start=first, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly are you trying to do here? what is the purpose of values_present
, IOW what case are you trying to handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the case is described here #15549
unfortunately, I get values fall after last bin
error when I do resampling of tz-aware series with summer-winter time change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in case if we are resampling series I believe an extra bin should be generated to handle this
but in case of generation of the datetimeindex with date_range for example we should not get the extra bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purpose of values_present is to be able to say which case is that, that we are creating datetimeindex for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the case. My point is that you can detect what you need and simply generate different bins at that point, rather than intrude to the implementation of date_range
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the soln that you posted in the issue looks promising. did you try that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, last time I made a PR with it you said that it is not right to resample by UTC a tz aware series
and after giving it a thought I think you are right
it's not a good idea to resample everything by UTC, also I did that change internally for our company and I got issues with it.
like when I resample with B freq, I get a data point on sunday, but not on friday
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that might be a bigger change, ok (and maybe not what we want semantically)
pandas/core/resample.py
Outdated
name=ax.name) | ||
|
||
# GH 15549 | ||
values_present = isinstance(getattr(self, 'obj', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not clear why you need this tests for obj is Series/DataFrame. that 's all it can be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to check that len(binner) >= 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I was not sure about that the obj always exists
I will do the check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was actually a mistake, it supposed to go start from binner[-1] not binner[-2], I fixed it anyway
@jreback please check the change, it looks like the most elegant version so far |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. some comments. pls add a whatsnew for 0.21.1 bug fix.
pandas/core/resample.py
Outdated
@@ -1141,6 +1141,13 @@ def _get_time_bins(self, ax): | |||
tz=tz, | |||
name=ax.name) | |||
|
|||
# GH 15549 | |||
if len(binner) > 1 and binner[-1] < last: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what is going on.
pandas/tests/test_resample.py
Outdated
@@ -2719,6 +2719,12 @@ def test_resample_weekly_bug_1726(self): | |||
# it works! | |||
df.resample('W-MON', closed='left', label='left').first() | |||
|
|||
def test_resample_tz_aware_bug_15549(self): | |||
index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the issue number to a comment (and out of the title)
pandas/tests/test_resample.py
Outdated
index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000], | ||
tz='UTC').tz_convert('America/Chicago') | ||
df = pd.DataFrame([1, 2], index=index) | ||
df.resample('12h', closed='right', label='right').last().ffill() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an expected and compare using assert_frame_equal
Hello @ahcub! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on November 21, 2017 at 07:36 Hours UTC |
@jreback I think the changes you requested were made, please let me know if anything else needs to be made or if all good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. small comments
ping on green
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -62,6 +62,7 @@ Bug Fixes | |||
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`) | |||
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`) | |||
- Bug in ``pd.concat`` when empty and non-empty DataFrames or Series are concatenated (:issue:`18178` :issue:`18187`) | |||
- Bug in ``DataFrame.resample(...)`` when there is a time change, resampling frequecy is big enough (:issue:`15549`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say DST here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp on frequency
pandas/core/resample.py
Outdated
# GH 15549 | ||
# In edge case of tz-aware resapmling binner last index can be | ||
# less than the last variable in data object. | ||
# This leads to `Values falls after last bin` error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this last sentence is not relevant; you are fixing this!
add why the bonner is too short (eg because of DST)
pandas/tests/test_resample.py
Outdated
index = pd.DatetimeIndex([1457537600000000000, 1458059600000000000], | ||
tz='UTC').tz_convert('America/Chicago') | ||
df = pd.DataFrame([1, 2], index=index) | ||
res_df = df.resample('12h', closed='right', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this result
'2016-03-15 01:00:00-05:00', | ||
'2016-03-15 13:00:00-05:00'] | ||
index = pd.DatetimeIndex(expected_index_values, | ||
tz='UTC').tz_convert('America/Chicago') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this expected
pandas/tests/test_resample.py
Outdated
@@ -2719,6 +2719,34 @@ def test_resample_weekly_bug_1726(self): | |||
# it works! | |||
df.resample('W-MON', closed='left', label='left').first() | |||
|
|||
def test_resample_tz_aware(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be more explicit in the test name
this is particularl to dst
did the changes, let me know if anything else is needed to change |
@jreback looks like it is green now |
thanks! |
…andas-dev#15549 (pandas-dev#18337) (cherry picked from commit 8efd1a0)
git diff upstream/master -u -- "*.py" | flake8 --diff