Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: df.groupby(TimeGrouper(freq='48min', closed='right')) #6085

Closed
shura-v opened this issue Jan 25, 2014 · 6 comments
Closed

BUG: df.groupby(TimeGrouper(freq='48min', closed='right')) #6085

shura-v opened this issue Jan 25, 2014 · 6 comments
Labels
Bug Needs Info Clarification about behavior needed to assess issue Resample resample method Timezones Timezone data dtype

Comments

@shura-v
Copy link

shura-v commented Jan 25, 2014

related: #4197, #5694

I'm encoutering a bug in pandas 0.13.0:

df = pandas.read_hdf('d:\\dropbox\\s.h5', 'data')
df.groupby(TimeGrouper(freq='48min', closed='right'))
Traceback (most recent call last):
  File "D:/Dropbox/hf/tests/hdf.py", line 94, in <module>
    main()
  File "D:/Dropbox/hf/tests/hdf.py", line 74, in load
    df.groupby(TimeGrouper(freq='48min', closed='right'))
  File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 2434, in groupby
    sort=sort, group_keys=group_keys, squeeze=squeeze)
  File "C:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 789, in groupby
    return klass(obj, by, **kwds)
  File "C:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 238, in __init__
    level=level, sort=sort)
  File "C:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 1563, in _get_grouper
    gpr = key.get_grouper(obj)
  File "C:\Anaconda\lib\site-packages\pandas\tseries\resample.py", line 109, in get_grouper
    return self._get_time_grouper(obj)[1]
  File "C:\Anaconda\lib\site-packages\pandas\tseries\resample.py", line 115, in _get_time_grouper
    binner, bins, binlabels = self._get_time_bins(axis)
  File "C:\Anaconda\lib\site-packages\pandas\tseries\resample.py", line 150, in _get_time_bins
    bins = lib.generate_bins_dt64(ax_values, bin_edges, self.closed)
  File "lib.pyx", line 935, in pandas.lib.generate_bins_dt64 (pandas\lib.c:14968)
ValueError: Values falls after last bin

This is the max length when it works fine:

df = pandas.read_hdf('d:\\dropbox\\s.h5', 'data')
df = df[:23748]
df.groupby(TimeGrouper(freq='48min', closed='right'))

Also it works when I group it with TimeGrouper(freq='48min')

Here is data frame:
https://www.dropbox.com/s/ezrjralhb2yordm/s.h5

INSTALLED VERSIONS
------------------
Python: 2.7.5.final.0
OS: Windows
Release: 8
Processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.0
Cython: 0.19.2
Numpy: 1.7.1
Scipy: 0.13.2
statsmodels: 0.5.0
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2013b
bottleneck: Not installed
PyTables: 3.0.0
    numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.6.2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: Not installed
sqlalchemy: 0.8.3
lxml: 3.2.3
bs4: 4.3.1
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed
@shura-v
Copy link
Author

shura-v commented Jan 25, 2014

Niether works with freqs: '24min', '32min', '80min', '180min'

@jreback
Copy link
Contributor

jreback commented Jan 25, 2014

Use resample instead of TimeGrouper driectly

The binning is currently a bug (with a timezone index), see here:
#4197 and here #5694

I'll link this to the others

In [1]: df = pd.read_hdf('s.h5','data')

In [2]: df.resample('48T',closed='left')
Out[2]: 
                                Open       High        Low      Close  Volume
Date_Time                                                                    
2013-09-05 09:36:00-04:00  25.764468  25.772553  25.757021  25.765319       0
2013-09-05 10:24:00-04:00  25.917500  25.924167  25.914375  25.920833       0
2013-09-05 11:12:00-04:00  25.920208  25.921458  25.919167  25.920208       0
2013-09-05 12:00:00-04:00  25.923750  25.924792  25.922917  25.923750       0
2013-09-05 12:48:00-04:00  25.897500  25.898125  25.896250  25.896875       0
2013-09-05 13:36:00-04:00  25.917500  25.918542  25.916875  25.918125       0
2013-09-05 14:24:00-04:00  25.920000  25.920000  25.920000  25.920000       0
2013-09-05 15:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 16:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 16:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 17:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 18:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 19:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 20:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 20:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 21:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 22:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-05 23:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 00:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 00:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 01:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 02:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 03:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 04:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 04:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 05:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 06:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 07:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 08:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 08:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 09:36:00-04:00  25.473191  25.488511  25.462766  25.477872       0
2013-09-06 10:24:00-04:00  25.805000  25.809583  25.804583  25.809167       0
2013-09-06 11:12:00-04:00  25.910833  25.913333  25.908750  25.911667       0
2013-09-06 12:00:00-04:00  25.916875  25.917292  25.916042  25.916458       0
2013-09-06 12:48:00-04:00  25.915000  25.916458  25.914375  25.915417       0
2013-09-06 13:36:00-04:00  25.955208  25.956458  25.954583  25.956250       0
2013-09-06 14:24:00-04:00  25.970000  25.970000  25.970000  25.970000       0
2013-09-06 15:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 16:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 16:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 17:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 18:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 19:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 20:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 20:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 21:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 22:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-06 23:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 00:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 00:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 01:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 02:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 03:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 04:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 04:48:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 05:36:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 06:24:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 07:12:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 08:00:00-04:00        NaN        NaN        NaN        NaN     NaN
2013-09-07 08:48:00-04:00        NaN        NaN        NaN        NaN     NaN
                                 ...        ...        ...        ...     ...

[4238 rows x 5 columns]

In [3]: df.resample('48T',closed='right')
ValueError: Values falls after last bin

@shura-v
Copy link
Author

shura-v commented Jan 25, 2014

Thanks, this might be quite good workaround for others too:

tz = df.index.tzinfo
df = df.tz_convert('UTC')
resampled = df.resample('48T', closed='right')
df = resampled.tz_convert(tz)

@jreback
Copy link
Contributor

jreback commented Jan 25, 2014

gr8! keep in mind their is a very suble w.r.t. DST transitions....will be worked on in 0.14 as a bit non-trivial

pls join in the effort!

@mroeschke
Copy link
Member

@shura-v The file you provided is no longer available. Could you update this issue with a representative example?

@mroeschke mroeschke added the Needs Info Clarification about behavior needed to assess issue label Jun 28, 2018
@mroeschke
Copy link
Member

Please reopen if you have a representative example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Info Clarification about behavior needed to assess issue Resample resample method Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants