Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

!I fix for BUG: resample with tz-aware: Values falls after last bin #15549 #18337

Merged
merged 32 commits into from
Nov 21, 2017

Conversation

ahcub
Copy link
Contributor

@ahcub ahcub commented Nov 17, 2017

@gfyoung gfyoung added Bug Resample resample method labels Nov 17, 2017
@@ -2719,6 +2719,11 @@ def test_resample_weekly_bug_1726(self):
# it works!
df.resample('W-MON', closed='left', label='left').first()

def test_resample_tz_aware_bug_15549(self):
index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000], tz='UTC').tz_convert('America/Chicago')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference issue number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but I already referenced it in the function name or you mean something else?

@ahcub
Copy link
Contributor Author

ahcub commented Nov 18, 2017

guys, can you help me to fix problem with travis?
I'm not sure what is wrong there...

@gfyoung
Copy link
Member

gfyoung commented Nov 18, 2017

Read the output of the failing Travis build here. Your code fails to pass style-checks.

The failure on the non-blocking build seems unrelated. Might just need to rebase on master for it to go away on that build.

@codecov
Copy link

codecov bot commented Nov 19, 2017

Codecov Report

Merging #18337 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18337      +/-   ##
==========================================
- Coverage   91.38%   91.34%   -0.04%     
==========================================
  Files         164      164              
  Lines       49790    49794       +4     
==========================================
- Hits        45501    45485      -16     
- Misses       4289     4309      +20
Flag Coverage Δ
#multiple 89.14% <100%> (-0.03%) ⬇️
#single 39.49% <42.85%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.49% <100%> (+0.01%) ⬆️
pandas/core/resample.py 96.33% <100%> (+0.17%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 774030c...fc795c6. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 19, 2017

Codecov Report

Merging #18337 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18337      +/-   ##
==========================================
- Coverage   91.36%   91.32%   -0.05%     
==========================================
  Files         164      164              
  Lines       49718    49733      +15     
==========================================
- Hits        45426    45418       -8     
- Misses       4292     4315      +23
Flag Coverage Δ
#multiple 89.12% <100%> (-0.03%) ⬇️
#single 39.62% <0%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/resample.py 96.34% <100%> (+0.18%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-1.82%) ⬇️
pandas/core/series.py 94.84% <0%> (-0.19%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/internals.py 94.47% <0%> (-0.08%) ⬇️
pandas/core/indexes/base.py 96.43% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.48% <0%> (ø) ⬆️
pandas/core/indexes/period.py 92.92% <0%> (+0.02%) ⬆️
pandas/plotting/_core.py 82.49% <0%> (+0.03%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1647a72...2e77f72. Read the comment docs.

@@ -366,9 +366,11 @@ def __new__(cls, data=None,
pass

if data is None:
values_present = kwargs.pop('values_present', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the heck is all of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what do you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a change to fix a problem with resampling for tz-aware series in one of the edge cases
I hit this problem often unfortunately and from my understanding of the design I created a solution that doesn't break other functionality but fixes the problem I have
sorry if the solution doesn't look elegant, if you think there is other one that we can use, please let me know

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure what you are doing. Your changes should be restriced to pandas/core/resample.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean that I should not change anything in pandas/core/resample.py then unfortunately I don't know how to avoid that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the opposite. There's where the change should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can implement a solution with changes in pandas/core/resample.py only I think
let me try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this is a specific change that resample needs, not generically for date_ranges.

@@ -366,9 +366,11 @@ def __new__(cls, data=None,
pass

if data is None:
values_present = kwargs.pop('values_present', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure what you are doing. Your changes should be restriced to pandas/core/resample.py

@@ -1139,7 +1142,8 @@ def _get_time_bins(self, ax):
start=first,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly are you trying to do here? what is the purpose of values_present, IOW what case are you trying to handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case is described here #15549
unfortunately, I get values fall after last bin error when I do resampling of tz-aware series with summer-winter time change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in case if we are resampling series I believe an extra bin should be generated to handle this
but in case of generation of the datetimeindex with date_range for example we should not get the extra bin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purpose of values_present is to be able to say which case is that, that we are creating datetimeindex for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hope that makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the case. My point is that you can detect what you need and simply generate different bins at that point, rather than intrude to the implementation of date_range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the soln that you posted in the issue looks promising. did you try that?

Copy link
Contributor Author

@ahcub ahcub Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, last time I made a PR with it you said that it is not right to resample by UTC a tz aware series
and after giving it a thought I think you are right
it's not a good idea to resample everything by UTC, also I did that change internally for our company and I got issues with it.
like when I resample with B freq, I get a data point on sunday, but not on friday

Copy link
Contributor

@jreback jreback Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that might be a bigger change, ok (and maybe not what we want semantically)

name=ax.name)

# GH 15549
values_present = isinstance(getattr(self, 'obj', None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not clear why you need this tests for obj is Series/DataFrame. that 's all it can be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to check that len(binner) >= 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I was not sure about that the obj always exists
I will do the check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was actually a mistake, it supposed to go start from binner[-1] not binner[-2], I fixed it anyway

@ahcub
Copy link
Contributor Author

ahcub commented Nov 20, 2017

@jreback please check the change, it looks like the most elegant version so far

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. some comments. pls add a whatsnew for 0.21.1 bug fix.

@@ -1141,6 +1141,13 @@ def _get_time_bins(self, ax):
tz=tz,
name=ax.name)

# GH 15549
if len(binner) > 1 and binner[-1] < last:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here on what is going on.

@@ -2719,6 +2719,12 @@ def test_resample_weekly_bug_1726(self):
# it works!
df.resample('W-MON', closed='left', label='left').first()

def test_resample_tz_aware_bug_15549(self):
index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the issue number to a comment (and out of the title)

index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000],
tz='UTC').tz_convert('America/Chicago')
df = pd.DataFrame([1, 2], index=index)
df.resample('12h', closed='right', label='right').last().ffill()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an expected and compare using assert_frame_equal

@pep8speaks
Copy link

pep8speaks commented Nov 20, 2017

Hello @ahcub! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 21, 2017 at 07:36 Hours UTC

@ahcub ahcub closed this Nov 20, 2017
@ahcub ahcub reopened this Nov 20, 2017
@ahcub
Copy link
Contributor Author

ahcub commented Nov 21, 2017

@jreback I think the changes you requested were made, please let me know if anything else needs to be made or if all good

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. small comments
ping on green

@@ -62,6 +62,7 @@ Bug Fixes
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
- Bug in ``pd.concat`` when empty and non-empty DataFrames or Series are concatenated (:issue:`18178` :issue:`18187`)
- Bug in ``DataFrame.resample(...)`` when there is a time change, resampling frequecy is big enough (:issue:`15549`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say DST here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp on frequency

# GH 15549
# In edge case of tz-aware resapmling binner last index can be
# less than the last variable in data object.
# This leads to `Values falls after last bin` error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last sentence is not relevant; you are fixing this!
add why the bonner is too short (eg because of DST)

index = pd.DatetimeIndex([1457537600000000000, 1458059600000000000],
tz='UTC').tz_convert('America/Chicago')
df = pd.DataFrame([1, 2], index=index)
res_df = df.resample('12h', closed='right',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this result

'2016-03-15 01:00:00-05:00',
'2016-03-15 13:00:00-05:00']
index = pd.DatetimeIndex(expected_index_values,
tz='UTC').tz_convert('America/Chicago')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this expected

@@ -2719,6 +2719,34 @@ def test_resample_weekly_bug_1726(self):
# it works!
df.resample('W-MON', closed='left', label='left').first()

def test_resample_tz_aware(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be more explicit in the test name
this is particularl to dst

@ahcub
Copy link
Contributor Author

ahcub commented Nov 21, 2017

did the changes, let me know if anything else is needed to change

@ahcub
Copy link
Contributor Author

ahcub commented Nov 21, 2017

@jreback looks like it is green now
let me know if anything else needs to be changed

@jreback
Copy link
Contributor

jreback commented Nov 21, 2017

this might in theory fix:

#6085 and #12351

@jreback jreback added this to the 0.21.1 milestone Nov 21, 2017
@jreback jreback merged commit 8efd1a0 into pandas-dev:master Nov 21, 2017
@jreback
Copy link
Contributor

jreback commented Nov 21, 2017

thanks!

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: resample with tz-aware: Values falls after last bin
4 participants