Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_pickle compression does not work with in-memory buffers #26237

Closed
akhmerov opened this issue Apr 29, 2019 · 12 comments · Fixed by #35736
Closed

to_pickle compression does not work with in-memory buffers #26237

akhmerov opened this issue Apr 29, 2019 · 12 comments · Fixed by #35736
Labels
Bug IO Pickle read_pickle, to_pickle
Milestone

Comments

@akhmerov
Copy link

Code Sample, a copy-pastable example if possible

from io import BytesIO
import pandas
pandas.DataFrame([[]]).to_pickle(BytesIO(), compression=None)  # works
pandas.DataFrame([[]]).to_pickle(BytesIO())
# ValueError: Unrecognized compression type: infer (regression in 0.24 from 0.23)
pandas.DataFrame([[]]).to_pickle(BytesIO(), compression='zip')
# AttributeError: 'NoneType' object has no attribute 'find' (in 0.24)
# BadZipFile: File is not a zip file (in 0.22 and before)

Problem description

#22555 is closely related, but I believe this is a different issue because the errors occur at a different place in the code.

I believe the above is an issue because

  • Despite the argument name is "path" and the docstring reads path : string File path, the code contains multiple path_or_buf names. I'd be happy to make a PR amending the docstring if anybody confirms that the docstring is not precise.
  • The code above is actually useful (I want to let the user export a dataframe from a webapp)
  • compression='infer' failing is a regression

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 5.0.0-13-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.4.1
pip: 19.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

zesje-bot pushed a commit to zesje/zesje that referenced this issue Apr 29, 2019
@WillAyd WillAyd added Bug IO Data IO issues that don't fit into a more specific label Regression Functionality that used to work in a prior pandas version labels Apr 29, 2019
@WillAyd
Copy link
Member

WillAyd commented Apr 29, 2019

Thanks for the report. I assume this is a byproduct of #22011 (cc @dhimmel). Investigation and PRs would certainly be welcome.

@WillAyd WillAyd added this to the Contributions Welcome milestone Apr 29, 2019
@akhmerov
Copy link
Author

Is it correct that to_* methods are indended to work with anything that supports a buffer protocol?

@WillAyd
Copy link
Member

WillAyd commented Apr 29, 2019

Ah just realized that to_pickle is only documented as supporting a str argument to the path, so the fact that it worked before on a buffer was an implementation detail.

That said most of the IO methods support buffers so I think should be possible to extend that here and document accordingly

@WillAyd WillAyd removed the Regression Functionality that used to work in a prior pandas version label Apr 29, 2019
@jreback
Copy link
Contributor

jreback commented Apr 29, 2019

this is a duplicate: #5924

@jreback jreback closed this as completed Apr 29, 2019
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Apr 29, 2019
@akhmerov
Copy link
Author

akhmerov commented Apr 29, 2019

@jreback I'm not sure I follow: #5924 is about a different method (read_pickle), and also has nothing to do with compression, whereas without compression to_pickle works.

EDIT: also this issue is not about strings but buffers, #5924 doesn't seem to mention buffers at all.

@dqii
Copy link

dqii commented Jun 25, 2019

I'm still having this error. When I add ", compression=None)" I get the following error instead:


TypeError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
147 try:
--> 148 return func(f)
149 finally:

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in (f)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722

TypeError: file must have 'read' and 'readline' attributes

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
172 return read_wrapper(
--> 173 lambda f: pc.load(f, encoding=encoding, compat=False))
174 # compat pickle

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
147 try:
--> 148 return func(f)
149 finally:

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in (f)
172 return read_wrapper(
--> 173 lambda f: pc.load(f, encoding=encoding, compat=False))
174 # compat pickle

~/miniconda3/lib/python3.7/site-packages/pandas/compat/pickle_compat.py in load(fh, encoding, compat, is_verbose)
219 try:
--> 220 fh.seek(0)
221 if encoding is not None:

~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name)
5066 return self[name]
-> 5067 return object.getattribute(self, name)
5068

AttributeError: 'DataFrame' object has no attribute 'seek'

@dqii
Copy link

dqii commented Jun 25, 2019

This is the error I get without adding compression=None

"---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
145 compression=compression,
--> 146 is_text=False)
147 try:

~/miniconda3/lib/python3.7/site-packages/pandas/io/common.py in _get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text)
412 msg = 'Unrecognized compression type: {}'.format(compression)
--> 413 raise ValueError(msg)
414

ValueError: Unrecognized compression type: infer"

@akhmerov
Copy link
Author

@dqii are you using the minimal code snippet I shared above? What is your pandas version and Python version?

@dqii
Copy link

dqii commented Jun 25, 2019

Sorry on reflection I realized my error might be different. I was saving a large pandas dataframe. My pandas version is 0.24.2 and my Python version is 3.7.3. I made a separate thread for my issue in #27029. Sorry about that!

@jorisvandenbossche
Copy link
Member

@akhmerov I think you are correct that this is another issue as #5924

@jorisvandenbossche jorisvandenbossche removed the Duplicate Report Duplicate issue or pull request label Jun 25, 2019
@guidopetri
Copy link

I agree with WillAyd, to_pickle() should accept file buffers as well. It seems like it did in pandas 0.24.2 (despite the documentation) but with 0.25.0 it does not anymore.

@robb-brown
Copy link

The original bug, to_pickle() to a buffer not working with compression='infer' appears to still be broken in the current dev branch, and the fix seems to be very simple. If there isn't a reason it hasn't been fixed, I can provide a PR.

@mroeschke mroeschke removed the IO Data IO issues that don't fit into a more specific label label Apr 3, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Sep 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Pickle read_pickle, to_pickle
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants