to_pickle compression does not work with in-memory buffers #26237

akhmerov · 2019-04-29T12:06:45Z

Code Sample, a copy-pastable example if possible

from io import BytesIO
import pandas
pandas.DataFrame([[]]).to_pickle(BytesIO(), compression=None)  # works
pandas.DataFrame([[]]).to_pickle(BytesIO())
# ValueError: Unrecognized compression type: infer (regression in 0.24 from 0.23)
pandas.DataFrame([[]]).to_pickle(BytesIO(), compression='zip')
# AttributeError: 'NoneType' object has no attribute 'find' (in 0.24)
# BadZipFile: File is not a zip file (in 0.22 and before)

Problem description

#22555 is closely related, but I believe this is a different issue because the errors occur at a different place in the code.

I believe the above is an issue because

Despite the argument name is "path" and the docstring reads path : string File path, the code contains multiple path_or_buf names. I'd be happy to make a PR amending the docstring if anybody confirms that the docstring is not precise.
The code above is actually useful (I want to let the user export a dataframe from a webapp)
compression='infer' failing is a regression

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 5.0.0-13-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.4.1
pip: 19.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

closes #290 (see also pandas-dev/pandas#26237)

WillAyd · 2019-04-29T14:42:37Z

Thanks for the report. I assume this is a byproduct of #22011 (cc @dhimmel). Investigation and PRs would certainly be welcome.

akhmerov · 2019-04-29T14:50:11Z

Is it correct that to_* methods are indended to work with anything that supports a buffer protocol?

WillAyd · 2019-04-29T15:01:43Z

Ah just realized that to_pickle is only documented as supporting a str argument to the path, so the fact that it worked before on a buffer was an implementation detail.

That said most of the IO methods support buffers so I think should be possible to extend that here and document accordingly

jreback · 2019-04-29T15:43:41Z

this is a duplicate: #5924

akhmerov · 2019-04-29T16:08:48Z

@jreback I'm not sure I follow: #5924 is about a different method (read_pickle), and also has nothing to do with compression, whereas without compression to_pickle works.

EDIT: also this issue is not about strings but buffers, #5924 doesn't seem to mention buffers at all.

dqii · 2019-06-25T01:44:05Z

I'm still having this error. When I add ", compression=None)" I get the following error instead:

TypeError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
147 try:
--> 148 return func(f)
149 finally:

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in (f)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722

TypeError: file must have 'read' and 'readline' attributes

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
172 return read_wrapper(
--> 173 lambda f: pc.load(f, encoding=encoding, compat=False))
174 # compat pickle

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
147 try:
--> 148 return func(f)
149 finally:

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in (f)
172 return read_wrapper(
--> 173 lambda f: pc.load(f, encoding=encoding, compat=False))
174 # compat pickle

~/miniconda3/lib/python3.7/site-packages/pandas/compat/pickle_compat.py in load(fh, encoding, compat, is_verbose)
219 try:
--> 220 fh.seek(0)
221 if encoding is not None:

~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name)
5066 return self[name]
-> 5067 return object.getattribute(self, name)
5068

AttributeError: 'DataFrame' object has no attribute 'seek'

dqii · 2019-06-25T01:47:09Z

This is the error I get without adding compression=None

"---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in try_read(path, encoding)
165 warnings.simplefilter("ignore", Warning)
--> 166 return read_wrapper(lambda f: pkl.load(f))
167 except Exception: # noqa: E722

~/miniconda3/lib/python3.7/site-packages/pandas/io/pickle.py in read_wrapper(func)
145 compression=compression,
--> 146 is_text=False)
147 try:

~/miniconda3/lib/python3.7/site-packages/pandas/io/common.py in _get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text)
412 msg = 'Unrecognized compression type: {}'.format(compression)
--> 413 raise ValueError(msg)
414

ValueError: Unrecognized compression type: infer"

akhmerov · 2019-06-25T07:03:06Z

@dqii are you using the minimal code snippet I shared above? What is your pandas version and Python version?

dqii · 2019-06-25T07:43:24Z

Sorry on reflection I realized my error might be different. I was saving a large pandas dataframe. My pandas version is 0.24.2 and my Python version is 3.7.3. I made a separate thread for my issue in #27029. Sorry about that!

jorisvandenbossche · 2019-06-25T07:54:49Z

@akhmerov I think you are correct that this is another issue as #5924

guidopetri · 2019-07-26T09:27:02Z

I agree with WillAyd, to_pickle() should accept file buffers as well. It seems like it did in pandas 0.24.2 (despite the documentation) but with 0.25.0 it does not anymore.

robb-brown · 2020-01-08T03:50:10Z

The original bug, to_pickle() to a buffer not working with compression='infer' appears to still be broken in the current dev branch, and the fix seems to be very simple. If there isn't a reason it hasn't been fixed, I can provide a PR.

zesje-bot pushed a commit to zesje/zesje that referenced this issue Apr 29, 2019

specify compression in pickling

d1b8b38

closes #290 (see also pandas-dev/pandas#26237)

WillAyd added Bug IO Data IO issues that don't fit into a more specific label Regression Functionality that used to work in a prior pandas version labels Apr 29, 2019

WillAyd added this to the Contributions Welcome milestone Apr 29, 2019

WillAyd removed the Regression Functionality that used to work in a prior pandas version label Apr 29, 2019

jreback closed this as completed Apr 29, 2019

jreback added the Duplicate Report Duplicate issue or pull request label Apr 29, 2019

jorisvandenbossche reopened this Jun 25, 2019

jorisvandenbossche removed the Duplicate Report Duplicate issue or pull request label Jun 25, 2019

jreback mentioned this issue Oct 19, 2019

ENH: Allow read_pickle to accept file-like objects #29054

Closed

jreback mentioned this issue Nov 22, 2019

to_pickle doesn't support file handles #29790

Closed

jbrockmendel added the IO Pickle read_pickle, to_pickle label Dec 1, 2019

mroeschke removed the IO Data IO issues that don't fit into a more specific label label Apr 3, 2020

twoertwein mentioned this issue Aug 15, 2020

BUG/ENH: to_pickle/read_pickle support compression for file ojects #35736

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Sep 5, 2020

jreback closed this as completed in #35736 Sep 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_pickle compression does not work with in-memory buffers #26237

to_pickle compression does not work with in-memory buffers #26237

akhmerov commented Apr 29, 2019

INSTALLED VERSIONS

WillAyd commented Apr 29, 2019

akhmerov commented Apr 29, 2019

WillAyd commented Apr 29, 2019

jreback commented Apr 29, 2019

akhmerov commented Apr 29, 2019 •

edited

Loading

dqii commented Jun 25, 2019

dqii commented Jun 25, 2019

akhmerov commented Jun 25, 2019

dqii commented Jun 25, 2019

jorisvandenbossche commented Jun 25, 2019

guidopetri commented Jul 26, 2019

robb-brown commented Jan 8, 2020

to_pickle compression does not work with in-memory buffers #26237

to_pickle compression does not work with in-memory buffers #26237

Comments

akhmerov commented Apr 29, 2019

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Apr 29, 2019

akhmerov commented Apr 29, 2019

WillAyd commented Apr 29, 2019

jreback commented Apr 29, 2019

akhmerov commented Apr 29, 2019 • edited Loading

dqii commented Jun 25, 2019

dqii commented Jun 25, 2019

akhmerov commented Jun 25, 2019

dqii commented Jun 25, 2019

jorisvandenbossche commented Jun 25, 2019

guidopetri commented Jul 26, 2019

robb-brown commented Jan 8, 2020

Output of `pd.show_versions()`

akhmerov commented Apr 29, 2019 •

edited

Loading