Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead code breaks unpickling of timezone objects #12163

Closed
akaihola opened this issue Jan 28, 2016 · 4 comments · Fixed by #31161
Closed

Dead code breaks unpickling of timezone objects #12163

akaihola opened this issue Jan 28, 2016 · 4 comments · Fixed by #31161
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@akaihola
Copy link
Contributor

There is a problem when writing pickles with Pandas 0.14.1 and reading them with Pandas 0.17.1. The problem occurs if the pickles contain isodate.UTC or other timezone objects.

The problem can be reduced to this test case:

my_tz.py:

from datetime import tzinfo

class MyTz(tzinfo):
    def __init__(self):
        pass

write_pickle_test.py (run with Pandas 0.14.1):

import pandas as pd
import cPickle
from my_tz import MyTz

data = pd.Series(), MyTz()
with open('test.pickle', 'wb') as f:
    cPickle.dump(data, f)

read_pickle_test.py (run with Pandas 0.17.1):

import pandas as pd
pd.read_pickle('test.pickle')

Reading the pickle with pickle.load() would fail when trying to load the Series object:

TypeError: _reconstruct: First argument must be a sub-type of ndarray

which is why pd.read_pickle() attempts to use pandas.compat.pickle_compat.load(). But then we get this instead for the MyTz object:

$ python read_pickle.py
Traceback (most recent call last):
  File "read_pickle_test.py", line 2, in <module>
    pd.read_pickle('test.pickle')
  File "pandas/io/pickle.py", line 60, in read_pickle
    return try_read(path)
  File "pandas/io/pickle.py", line 57, in try_read
    return pc.load(fh, encoding=encoding, compat=True)
  File "pandas/compat/pickle_compat.py", line 116, in load
    return up.load()
  File "/usr/lib64/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "pandas/compat/pickle_compat.py", line 16, in load_reduce
    if type(args[0]) is type:
IndexError: tuple index out of range

I see the following problems in pandas.compat.pickle_compat.load_reduce():

(1) It attemps to access args[0] while args can be empty.

if type(args[0]) is type:
    n = args[0].__name__

(2) The above code is dead code anyway, since n isn't referenced to anywhere else in the function. Removing those two lines fixes the unpickling problem completely.

Note that the MyTz class in the test above does implement a proper __init__() method as required in Python documentation for tzinfo: "Special requirement for pickling: A tzinfo subclass must have an __init__ method that can be called with no arguments, else it can be pickled but possibly not unpickled again. This is a technical requirement that may be relaxed in the future." While isodate.UTC violates this requirement, that doesn't seem to be the cause of this problem.

(3) Also, at the very end of load_reduce():

stack[-1] = value

is code that is never reached, since all previous code branches either return or raise. Also, this line is invalid, since value isn't initialized anywhere in the function.


I originally sent this as a comment to issue #6871, but since it's not exactly the same thing (only a similar traceback), I think it deserves a separate issue.

@jreback
Copy link
Contributor

jreback commented Jan 28, 2016

ok, if you want to submit a patch that passes all current tests (and would have to add a tests for this to the pickle generation script as well), would take it.

Very odd that you are sub-classing the tz object, what is the purpose?

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate labels Jan 28, 2016
@jreback jreback added this to the Next Major Release milestone Jan 28, 2016
@Liam3851
Copy link
Contributor

I get a similar issue when pickling a DataFrame with a tz-aware index in pandas 0.18 under python 2.7, and unpickling under pandas 0.18 in Python 3.5 (no tz subclassing involved):

Under Python 2.7:

y = pd.DataFrame({"col":[1]}, index=pd.date_range("20160331", "20160331", tz="UTC", freq="T"))
y.to_pickle("C:/temp/withtz.pkl")
y.tz_localize(None).to_pickle("C:/temp/notz.pkl")
pd.read_pickle("C:/temp/withtz.pkl")                                                         
Out:                                                                                            
                           col                                                                      
2016-03-31 00:00:00+00:00    1 
pd.read_pickle("C:/temp/notz.pkl")
Out:                                                                                            
            col                                                                                     
2016-03-31    1  

Under Python 3.5:

pd.read_pickle("C:/temp/withtz.pkl") 
IndexError                                Traceback (most recent call last)                         
 <ipython-input-69-396a6fe8b382> in <module>()                                                       
 ----> 1 pd.read_pickle("C:/temp/test.pkl")                                                          

C:\Anaconda3\lib\site-packages\pandas\io\pickle.py in read_pickle(path)                             
     61     except:                                                                                 
     62         if PY3:                                                                             
---> 63             return try_read(path, encoding='latin1')                                        
     64         raise                                                                               

C:\Anaconda3\lib\site-packages\pandas\io\pickle.py in try_read(path, encoding)                      
     55             except:                                                                         
     56                 with open(path, 'rb') as fh:                                                
---> 57                     return pc.load(fh, encoding=encoding, compat=True)                      
     58                                                                                             
     59     try:                                                                                    

C:\Anaconda3\lib\site-packages\pandas\compat\pickle_compat.py in load(fh, encoding, compat,     is_verbose)                                                                                                 
    116         up.is_verbose = is_verbose                                                          
    117                                                                                             
--> 118         return up.load()                                                                    
    119     except:                                                                                 
    120         raise                                                                               

C:\Anaconda3\lib\pickle.py in load(self)                                                            
   1037                     raise EOFError                                                          
   1038                 assert isinstance(key, bytes_types)                                         
-> 1039                 dispatch[key[0]](self)                                                      
   1040         except _Stop as stopinst:                                                           
   1041             return stopinst.value                                                           

C:\Anaconda3\lib\site-packages\pandas\compat\pickle_compat.py in load_reduce(self)                  
     16     func = stack[-1]                                                                        
     17                                                                                             
---> 18     if type(args[0]) is type:                                                               
     19         n = args[0].__name__                                                                
     20                                                                                             

IndexError: tuple index out of range      

pd.read_pickle("C:/temp/notz.pkl")
Out[70]:                                                                                            
            col                                                                                     
2016-03-31    1  

packages installed:

 (Python 2.7)                                                                                                    
 INSTALLED VERSIONS                                                                                  
 ------------------                                                                                  
 commit: None                                                                                        
 python: 2.7.11.final.0                                                                              
 python-bits: 64                                                                                     
 OS: Windows                                                                                         
 OS-release: 7                                                                                       
 machine: AMD64                                                                                      
 processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel                                       
 byteorder: little                                                                                   
 LC_ALL: None                                                                                        
 LANG: None                                                                                          

pandas: 0.18.0                                                                                      
nose: 1.3.7                                                                                         
pip: 8.1.1                                                                                          
setuptools: 20.6.7                                                                                  
Cython: 0.24                                                                                        
numpy: 1.10.4                                                                                       
scipy: 0.17.0                                                                                       
statsmodels: 0.6.1                                                                                  
xarray: 0.7.2                                                                                       
IPython: 4.1.2                                                                                      
sphinx: 1.3.5                                                                                       
patsy: 0.4.1                                                                                        
dateutil: 2.5.2                                                                                     
pytz: 2016.3                                                                                        
blosc: None                                                                                         
bottleneck: 1.0.0                                                                                   
tables: 3.2.2                                                                                       
numexpr: 2.5.2                                                                                      
matplotlib: 1.5.1                                                                                   
openpyxl: 2.3.2                                                                                     
xlrd: 0.9.4                                                                                         
xlwt: 1.0.0                                                                                         
xlsxwriter: 0.8.4                                                                                   
lxml: 3.6.0                                                                                         
bs4: 4.4.1                                                                                          
html5lib: None                                                                                      
httplib2: 0.9.2                                                                                     
apiclient: 1.4.2                                                                                    
sqlalchemy: 1.0.12                                                                                  
pymysql: None                                                                                       
psycopg2: None                                                                                      
jinja2: 2.8                                                                                         
boto: 2.39.0                                                                                        

(Python 3.5)
INSTALLED VERSIONS                                                                                  
------------------                                                                                  
commit: None                                                                                        
python: 3.5.1.final.0                                                                               

python-bits: 64                                                                                     
OS: Windows                                                                                         
OS-release: 7                                                                                       
machine: AMD64                                                                                      
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel                                       
byteorder: little                                                                                   
LC_ALL: None                                                                                        
LANG: None                                                                                          

pandas: 0.18.0                                                                                      
nose: 1.3.7                                                                                         
pip: 8.1.1                                                                                          
setuptools: 20.6.7                                                                                  
Cython: 0.24                                                                                        
numpy: 1.10.4                                                                                       
scipy: 0.17.0                                                                                       
statsmodels: None                                                                                   
xarray: None                                                                                        
IPython: 4.1.2                                                                                      
sphinx: 1.3.1                                                                                       
patsy: 0.4.1                                                                                        
dateutil: 2.5.2                                                                                     
pytz: 2016.3                                                                                        
blosc: None                                                                                         
bottleneck: 1.0.0                                                                                   
tables: 3.2.2                                                                                       
numexpr: 2.5.2                                                                                      
matplotlib: 1.5.1                                                                                   
openpyxl: 2.3.2                                                                                     
xlrd: 0.9.4                                                                                         
xlwt: 1.0.0                                                                                         
xlsxwriter: 0.8.4                                                                                   
lxml: 3.6.0                                                                                         
bs4: None                                                                                           
html5lib: None                                                                                      
httplib2: None                                                                                      
apiclient: None                                                                                     
sqlalchemy: 1.0.12                                                                                  
pymysql: None                                                                                       
psycopg2: None                                                                                      
jinja2: 2.8                                                                                         
boto: 2.39.0                   

@akaihola
Copy link
Contributor Author

Very odd that you are sub-classing the tz object, what is the purpose?

@jreback, IIRC the subclassing of tz was an attempt to reduce the failing test case to a minimum. It probably went something like this:

  • had pickle problems when using isodate.UTC
  • tried to create a test case without isodate, with a minimal custom timezone object
  • got a failure which turned out to be for a different reason than with isodate
  • noticed two instances of dead code as well as use of an uninitialized variable in related Pandas code
  • made the bug report to point out that code

@mroeschke
Copy link
Member

Looks to work on master. Could use a test.

In [18]: from datetime import tzinfo
    ...:
    ...: class MyTz(tzinfo):
    ...:     def __init__(self):
    ...:         pass
    ...:

In [19]: import pickle

In [20]: data = pd.Series(), MyTz()
    ...: with open('test.pickle', 'wb') as f:
    ...:     pickle.dump(data, f)
    ...:

In [21]: pd.read_pickle('test.pickle')
    ...:
Out[21]: (Series([], dtype: float64), <__main__.MyTz at 0x11eaf1310>)

@mroeschke mroeschke added good first issue Needs Info Clarification about behavior needed to assess issue Needs Tests Unit test(s) needed to prevent regressions and removed Compat pandas objects compatability with Numpy or Python functions Needs Info Clarification about behavior needed to assess issue labels Oct 29, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants