Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate on series causes core dump when TZ is specified on index #9243

Closed
echu79 opened this issue Jan 13, 2015 · 6 comments · Fixed by #21612
Closed

Truncate on series causes core dump when TZ is specified on index #9243

echu79 opened this issue Jan 13, 2015 · 6 comments · Fixed by #21612
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@echu79
Copy link

echu79 commented Jan 13, 2015

Hello
I am getting a core dump when truncating a series with a timezone specified on the index. Please see below:

This works fine:

import pandas
import datetime
import numpy as np    
rng = pandas.date_range('1/2/2005', '11/1/2010', freq='D')
ts = pandas.Series(np.random.randn(len(rng)), index=rng)
ts.truncate(datetime.datetime(2009,1,1), datetime.datetime(2011,1,1)).resample('D', how='sum')

However this core dumps for me:

import pandas
import datetime
import numpy as np    
rng = pandas.date_range('1/2/2005', '11/1/2010', freq='D', tz='US/Eastern')
ts = pandas.Series(np.random.randn(len(rng)), index=rng)
ts.truncate(datetime.datetime(2009,1,1), datetime.datetime(2011,1,1)).resample('D', how='sum')

The only difference is adding the TZ parameter to the date_range to set the index for a particular timezone.

This was working fine in .15, but seems to be broken in .15.2

Thanks

@jreback
Copy link
Contributor

jreback commented Jan 13, 2015

pls show pd.show_versions()

@echu79
Copy link
Author

echu79 commented Jan 13, 2015

here you go:

In [99]: pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.29.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.0
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.3.0
sphinx: 1.0.3
patsy: 0.3.0
dateutil: 1.5
pytz: 2011k
bottleneck: 0.6.0
tables: None
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: 1.5.7
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.0
lxml: None
bs4: None
html5lib: 0.90
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.6.1
pymysql: None
psycopg2: None

In [100]:

@jreback
Copy link
Contributor

jreback commented Jan 14, 2015

odd, this works ok for me (on osx, but same versions of deps). Can you step thru the 2nd example and show where it cores?

@jreback jreback added Bug Timezones Timezone data dtype labels Jan 14, 2015
@echu79
Copy link
Author

echu79 commented Jan 14, 2015

Thanks appreciate you investigating this. I've set a breakpoint on the last line in pandas before it core dumps. After this it goes into numpy

In [1]: import pandas

In [2]: import pdb

In [3]: import datetime

In [4]: import numpy as np

In [5]: rng = pandas.date_range('1/2/2005', '11/1/2010', freq='D', tz='US/Eastern')

In [6]: ts = pandas.Series(np.random.randn(len(rng)), index=rng)

In [7]: pdb.run('ts.truncate(datetime.datetime(2009,1,1), datetime.datetime(2011,1,1)).resample("D", how="sum")')

(1)()
(Pdb) break /opt/python//lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py:1829
Breakpoint 1 at /opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py:1829
(Pdb) cont
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py(1829)_aggregate()
-> agg_func(result, counts, values, self.bins)
(Pdb) w
/opt/python/lib/python2.7/bdb.py(387)run()
-> exec cmd in globals, locals
(1)()
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/generic.py(3005)resample()
-> return sampler.resample(self).finalize(self)
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/tseries/resample.py(85)resample()
-> rs = self._resample_timestamps()
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/tseries/resample.py(286)_resample_timestamps()
-> result = grouped.aggregate(self._agg_method)
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py(2292)aggregate()
-> return getattr(self, func_or_funcs)(_args, *_kwargs)
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py(106)f()
-> return self._cython_agg_general(alias, numeric_only=numeric_only)
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py(1091)_cython_agg_general()
-> result, names = self.grouper.aggregate(obj.values, how)
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py(1527)aggregate()
-> result = self._aggregate(result, counts, values, how, is_numeric)
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py(1829)_aggregate()
-> agg_func(result, counts, values, self.bins)
(Pdb) n
*** glibc detected *** /opt/python/bin/python: double free or corruption (out): 0x0000000004753510 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3bd5c75e76]
/lib64/libc.so.6[0x3bd5c789b3]
/opt/python/lib/python2.7/site-packages/numpy/core/multiarray.so(+0x1bc17)[0x7f790e20ac17]
/opt/python/lib/python2.7/site-packages/numpy/core/multiarray.so(+0x1ec9e)[0x7f790e20dc9e]
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/algos.so(+0x15631a)[0x7f790267931a]
/opt/python/lib/python2.7/site-packages/pandas-0.15.2-py2.7-linux-x86_64.egg/pandas/algos.so(+0x156fd9)[0x7f7902679fd9]

@echu79
Copy link
Author

echu79 commented Jan 14, 2015

Looks like the args passed in when specifying at least US/Eastern.

CORE DUMPS:
(Pdb) len(result)
669
(Pdb) len(values)
670
(Pdb) len(counts)
669
(Pdb) len(self.bins)
669

WORKS:
-> agg_func(result, counts, values, self.bins)
(Pdb) len(result)
670
(Pdb) len(counts)
670
(Pdb) len(values)
670
(Pdb) len(self.bins)
670

Tried this with GMT and the len of all the args match similar to passing no TZ. So seems like it's also TZ specific. Maybe something with daylight savings?

@jorisvandenbossche
Copy link
Member

I can confirm that for me it also totally broken (windows 7, python 2.7, pandas 0.15.2, pytz 2014.9)

@jorisvandenbossche jorisvandenbossche added this to the 0.16.0 milestone Feb 27, 2015
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: Next Major Release, 0.24.0 Jun 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants