Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncompatibility of pandas.read_pickle with pytz #6871

Closed
AndrewUshakov opened this issue Apr 11, 2014 · 15 comments
Closed

Uncompatibility of pandas.read_pickle with pytz #6871

AndrewUshakov opened this issue Apr 11, 2014 · 15 comments
Labels
Bug Compat pandas objects compatability with Numpy or Python functions

Comments

@AndrewUshakov
Copy link

Test program below:

import datetime, pytz, pandas

pandas.show_versions()

dates = [datetime.datetime(2014,1,1,1,1, tzinfo=pytz.utc),
         datetime.datetime(2014,2,2,2,2, tzinfo=pytz.utc),
         datetime.datetime(2014,3,3,3,3, tzinfo=pytz.utc)]

s = pandas.Series(dates)
s.to_pickle('series.pickle')

# import pickle
# with open('series.pickle', 'rb') as f: s1 = pickle.load(f)

s2 = pandas.read_pickle('series.pickle')

crashes at the last line with long stack trace. If replace "pytz.utc" by "datetime.timezone.utc", error disappears. Commented code fragment works properly in both cases.

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.1
Cython: None
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.5.0
IPython: None
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
sqlalchemy: 0.9.4
lxml: 3.3.4
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
Traceback (most recent call last):
  File "C:\Program Files\Python34\lib\site-packages\pandas\io\pickle.py", line 43, in try_read
    return pc.load(fh, encoding=encoding, compat=False)
  File "C:\Program Files\Python34\lib\site-packages\pandas\compat\pickle_compat.py", line 89, in load
    return up.load()
  File "C:\Program Files\Python34\lib\pickle.py", line 1036, in load
    dispatch[key[0]](self)
  File "C:\Program Files\Python34\lib\site-packages\pandas\compat\pickle_compat.py", line 18, in load_reduce
    if type(args[0]) is type:
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python34\lib\site-packages\pandas\io\pickle.py", line 49, in read_pickle
    return try_read(path)
  File "C:\Program Files\Python34\lib\site-packages\pandas\io\pickle.py", line 46, in try_read
    return pc.load(fh, encoding=encoding, compat=True)
  File "C:\Program Files\Python34\lib\site-packages\pandas\compat\pickle_compat.py", line 89, in load
    return up.load()
  File "C:\Program Files\Python34\lib\pickle.py", line 1036, in load
    dispatch[key[0]](self)
  File "C:\Program Files\Python34\lib\pickle.py", line 1316, in load_newobj_ex
    obj = cls.__new__(cls, *args, **kwargs)
TypeError: Required argument 'shape' (pos 1) not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python34\lib\site-packages\pandas\io\pickle.py", line 43, in try_read
    return pc.load(fh, encoding=encoding, compat=False)
  File "C:\Program Files\Python34\lib\site-packages\pandas\compat\pickle_compat.py", line 89, in load
    return up.load()
  File "C:\Program Files\Python34\lib\pickle.py", line 1036, in load
    dispatch[key[0]](self)
  File "C:\Program Files\Python34\lib\site-packages\pandas\compat\pickle_compat.py", line 18, in load_reduce
    if type(args[0]) is type:
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "H:/My Documents/PyCharm/LFDM/CalcFlowRate/tpicle.py", line 17, in <module>
    s2 = pandas.read_pickle('series.pickle')
  File "C:\Program Files\Python34\lib\site-packages\pandas\io\pickle.py", line 52, in read_pickle
    return try_read(path, encoding='latin1')
  File "C:\Program Files\Python34\lib\site-packages\pandas\io\pickle.py", line 46, in try_read
    return pc.load(fh, encoding=encoding, compat=True)
  File "C:\Program Files\Python34\lib\site-packages\pandas\compat\pickle_compat.py", line 89, in load
    return up.load()
  File "C:\Program Files\Python34\lib\pickle.py", line 1036, in load
    dispatch[key[0]](self)
  File "C:\Program Files\Python34\lib\pickle.py", line 1316, in load_newobj_ex
    obj = cls.__new__(cls, *args, **kwargs)
TypeError: Required argument 'shape' (pos 1) not found

Process finished with exit code 1

Best regards,
Andrew

@AndrewUshakov AndrewUshakov changed the title Uncompatibility of pandas.read_pickle with pytz Uncompatibility of pandas.read_pickle with pytz Apr 11, 2014
@jreback
Copy link
Contributor

jreback commented Apr 11, 2014

I'll mark it as a bug, but why would you do this anyhow?

surely you want an DatetimeIndex(dates)

@jreback jreback added this to the 0.15.0 milestone Apr 11, 2014
@AndrewUshakov
Copy link
Author

With datetime/pytz matplotlib "understands", that axis(es) is/are date/time, with numpy.datetime64 - doesn't. Please, run example below:

import pandas, datetime, pytz
import matplotlib.pyplot as plt

dates = [datetime.datetime(2014,1,1,1,1, tzinfo=pytz.utc),
         datetime.datetime(2014,2,2,2,2, tzinfo=pytz.utc),
         datetime.datetime(2014,3,3,3,3, tzinfo=pytz.utc)]

s1 = pandas.Series(pandas.DatetimeIndex(dates))
plt.plot(s1, s1)
plt.show()

s2 = pandas.Series(dates)
plt.plot(s2, s2)
plt.show()

@jreback
Copy link
Contributor

jreback commented Apr 11, 2014

is their a reason you are not using s1.plot()? actually works better as scale is maintained

@AndrewUshakov
Copy link
Author

s1.plot() (from my example) gives an error:

  File "D:\Temp\t.py", line 20, in <module>
    s1.plot()
  File "C:\Program Files\Python34\lib\site-packages\pandas\tools\plotting.py", line 1829, in plot_series
    plot_obj.generate()
  File "C:\Program Files\Python34\lib\site-packages\pandas\tools\plotting.py", line 903, in generate
    self._compute_plot_data()
  File "C:\Program Files\Python34\lib\site-packages\pandas\tools\plotting.py", line 984, in _compute_plot_data
    'plot'.format(numeric_data.__class__.__name__))
TypeError: Empty 'Series': no numeric data to plot

@jreback
Copy link
Contributor

jreback commented Apr 11, 2014

what exactly are you trying to do?

@alorenzo175
Copy link
Contributor

I'm running into a similar problem. If I generate a timeseries with UTC timezone in python 2.7.6 and pickle it like,

import pandas as pd
import numpy as np

drange = pd.date_range('2014-07-01 00:00:00', '2014-07-01 12:00:00', freq='H')
ser = pd.Series(np.random.rand(len(drange)), index=drange).tz_localize('UTC')
ser.to_pickle('/tmp/test2to3_ser')

and then try and read it in python 3.4.1

import pandas as pd
pd.read_pickle('/tmp/test2to3_ser')

I get

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-0b7937eff7fa> in <module>()
----> 1 pd.read_pickle('/tmp/test2to3_ser')

/home/tony/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/io/pickle.py in read_pickle(path)
     62     except:
     63         if PY3:
---> 64             return try_read(path, encoding='latin1')
     65         raise

/home/tony/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/io/pickle.py in try_read(path, encoding)
     56             except:
     57                 with open(path, 'rb') as fh:
---> 58                     return pc.load(fh, encoding=encoding, compat=True)
     59 
     60     try:

/home/tony/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/compat/pickle_compat.py in load(fh, encoding, compat, is_verbose)
     87         up.is_verbose = is_verbose
     88 
---> 89         return up.load()
     90     except:
     91         raise

/home/supas/forecasting_python/python3.4.1/lib/python3.4/pickle.py in load(self)
   1034                     raise EOFError
   1035                 assert isinstance(key, bytes_types)
-> 1036                 dispatch[key[0]](self)
   1037         except _Stop as stopinst:
   1038             return stopinst.value

/home/supas/forecasting_python/python3.4.1/lib/python3.4/pickle.py in load_newobj(self)
   1306         args = self.stack.pop()
   1307         cls = self.stack.pop()
-> 1308         obj = cls.__new__(cls, *args)
   1309         self.append(obj)
   1310     dispatch[NEWOBJ[0]] = load_newobj

TypeError: Required argument 'shape' (pos 1) not found

My system info for python2.7.6 is

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.17.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.0.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.3.5
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

and for 3.4.1

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.17.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.2
numpy: 1.8.1
scipy: 0.14.0
statsmodels: None
IPython: 2.1.0
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 2.0.4
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.3.5
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.6
pymysql: None
psycopg2: None

@lssilva
Copy link

lssilva commented Feb 8, 2015

Same problem here

@jreback
Copy link
Contributor

jreback commented Feb 8, 2015

try in 0.15.2 and master

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@akaihola
Copy link
Contributor

Edit: We decided to post this as a separate issue (#12163) since it's not identical to this issue (only a similar traceback).


We have this problem when writing pickles with Pandas 0.14.1 and reading them with Pandas 0.17.1. The problem occurs if the pickles contain isodate.UTC objects.

The problem can be reduced to this test case:

my_tz.py:

from datetime import tzinfo

class MyTz(tzinfo):
    def __init__(self):
        pass

write_pickle_test.py (run with Pandas 0.14.1):

import pandas as pd
import cPickle
from my_tz import MyTz

data = pd.Series(), MyTz()
with open('test.pickle', 'wb') as f:
    cPickle.dump(data, f)

read_pickle_test.py (run with Pandas 0.17.1):

import pandas as pd
pd.read_pickle('test.pickle')

Reading the pickle with pickle.load() would fail when trying to load the Series object:

TypeError: _reconstruct: First argument must be a sub-type of ndarray

which is why pd.read_pickle() attempts to use pandas.compat.pickle_compat.load(). But then we get this instead for the MyTz object:

$ python read_pickle.py
Traceback (most recent call last):
  File "read_pickle_test.py", line 2, in <module>
    pd.read_pickle('test.pickle')
  File "pandas/io/pickle.py", line 60, in read_pickle
    return try_read(path)
  File "pandas/io/pickle.py", line 57, in try_read
    return pc.load(fh, encoding=encoding, compat=True)
  File "pandas/compat/pickle_compat.py", line 116, in load
    return up.load()
  File "/usr/lib64/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "pandas/compat/pickle_compat.py", line 16, in load_reduce
    if type(args[0]) is type:
IndexError: tuple index out of range

I see the following problems in pandas.compat.pickle_compat.load_reduce():

(1) It attemps to access args[0] while args can be empty.

if type(args[0]) is type:
    n = args[0].__name__

(2) The above code is dead code anyway, since n isn't referenced to anywhere else in the function. Removing those two lines fixes the unpickling problem completely.

Note that the MyTz class in the test above does implement a proper __init__() method as required in Python documentation for tzinfo: "Special requirement for pickling: A tzinfo subclass must have an __init__ method that can be called with no arguments, else it can be pickled but possibly not unpickled again. This is a technical requirement that may be relaxed in the future." While isodate.UTC violates this requirement, that doesn't seem to be the cause of this problem.

(3) Also, at the very end of load_reduce():

stack[-1] = value

is code that is never reached, since all previous code branches either return or raise. Also, this line is invalid, since value isn't initialized anywhere in the function.

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2016

@jreback : can't reproduce this anymore:

...  # run the code above
>>> import pandas.util.testing as tm
>>> tm.assert_series_equal(s, s2)
>>>

@sinhrks
Copy link
Member

sinhrks commented Jul 26, 2016

yeah it should be supported by DatetimeTZDtype and tested. Closing.

@sinhrks sinhrks closed this as completed Jul 26, 2016
@ghost
Copy link

ghost commented Jul 1, 2017

So how to fix this?
When I do
pd.read_pickle('buyapple.pickle')
I get
IndexError: tuple index out of range

@jreback
Copy link
Contributor

jreback commented Jul 1, 2017

this was fixed quite a long time ago

you would have to show a reproducible example

@ghost
Copy link

ghost commented Jul 4, 2017

I am using this zipline example, which creates a pickle file as follows:
zipline run -f ../../zipline/examples/buyapple.py --start 2000-1-1 --end 2014-1-1 -o buyapple_out.pickle
Then I try to read the pickle file:

import pandas as pd
perf = pd.read_pickle('buyapple_out.pickle')

which gives me an error:
IndexError: tuple index out of range

@jreback
Copy link
Contributor

jreback commented Jul 4, 2017

@saitam1 you would have to show a pure pandas example, otherwise you should report this to zipline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

7 participants