BUG: Problem using to_csv with BytesIO #37292

amotl · 2020-10-20T22:35:26Z

Dear people of Pandas,

first things first: Thanks for all of your excellent work conceiving and maintaining Pandas. You know who you are.

With kind regards,
Andreas.

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
from io import BytesIO

df = pd.DataFrame()
buffer = BytesIO()

df.to_csv(buffer)

Problem description

The snippet above croaks with

TypeError: a bytes-like object is required, not 'str'

We expected this to work. However, we will be happy to learn otherwise. We also had a look at #22555 and #35129 which seem to be related but not exactly on the spot.

The background on this is that we are currently in the process of upgrading Kotori to Python 3 (yeah, we are late to the game). However, coming from this, we can confirm it worked when using Pandas 0.18.1 on Python 2 the other day.

The relevant code is

Thanks already for looking into this!

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : db08276
python : 3.6.9.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
Version : Darwin Kernel Version 17.7.0: Thu Jun 18 21:21:34 PDT 2020; root:xnu-4570.71.82.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.3
numpy : 1.19.2
pytz : 2018.9
dateutil : 2.8.1
pip : 20.2.3
setuptools : 50.3.0
Cython : None
pytest : 4.6.9
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.6
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.8
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

twoertwein · 2020-10-20T23:23:21Z

#35129 adds support for binary file handles but it is part of the to-be-released 1.2 version.

twoertwein · 2020-10-20T23:34:02Z

in the meantime you can wrap your io.BytesIO object using io.TextIOWapper (pandas 1.2 will do the same internally).

amotl · 2020-10-21T00:09:29Z

Dear Torsten,

thanks for your quick answer.

The problem with using a io.TextIOWrapper in general here is that it probably would work for all text-like output formats

https://github.com/daq-tools/kotori/blob/0.24.5/kotori/io/protocol/http.py#L599-L619

But we will get problems when using that for writing binary data into, like Excel and so on

For now, we will be using StringIO objects explicitly for rendering into text-like output formats using to_csv, to_json and to_html. For all others, we will keep the BytesIO.

The idea is to be able to get hold of the in-memory payload content in a generic fashion by using .getvalue() after dispatching to the specific rendering method.

https://github.com/daq-tools/kotori/blob/0.24.5/kotori/io/protocol/http.py#L682-L683

With kind regards,
Andreas.

P.S.: We will check back after upgrading to Pandas 1.2 in December and see how that goes. In the meanwhile, we will use the workaround as outlined above, explicitly using either StringIO or BytesIO, depending on the character of the output format.

twoertwein · 2020-10-21T01:10:55Z

you could probably do something like:

        if suffix in ['csv', 'txt']:
            # http://pandas.pydata.org/pandas-docs/stable/io.html#io-store-in-csv
            charset = 'utf-8'
            wrapper = TextIOWrapper(buffer, encoding=charset)
            df.to_csv(wrapper, header=True, index=False, encoding=charset, date_format='%Y-%m-%dT%H:%M:%S.%fZ')
            wrapper.flush()  # make sure that TextIOWrapper writes the content to buffer

amotl · 2020-10-21T12:35:00Z

Dear Torsten,

thanks for taking the time. I am now doing it like that using StringIO:

        if suffix in ['csv', 'txt']:
            # http://pandas.pydata.org/pandas-docs/stable/io.html#io-store-in-csv
            buffer = StringIO()
            df.to_csv(buffer, header=True, index=False, encoding='utf-8', date_format='%Y-%m-%dT%H:%M:%S.%fZ')
            charset = 'utf-8'

Is there any particular advantage using TextIOWrapper instead?

With kind regards,
Andreas.

twoertwein · 2020-10-21T14:50:24Z

the only advantage in your setup for using BytesIO+TextIOBuffer is that when you call getvalue you always get bytes andmnot sometimes bytes and sometimes strings.

amotl · 2020-10-21T15:52:12Z

When you call getvalue you always get bytes and not sometimes bytes and sometimes strings.

So true.

I've fiddled a bit with that variant in the code and the test suite on the other side and I am now using BytesIO+TextIOBuffer as you suggested. Thank you so much!

amotl added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 20, 2020

jreback added IO CSV read_csv, to_csv Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 20, 2020

jreback added this to the 1.2 milestone Oct 20, 2020

jreback closed this as completed Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Problem using to_csv with BytesIO #37292

BUG: Problem using to_csv with BytesIO #37292

amotl commented Oct 20, 2020 •

edited

Loading

INSTALLED VERSIONS

twoertwein commented Oct 20, 2020

twoertwein commented Oct 20, 2020

amotl commented Oct 21, 2020 •

edited

Loading

twoertwein commented Oct 21, 2020

amotl commented Oct 21, 2020

twoertwein commented Oct 21, 2020

amotl commented Oct 21, 2020

BUG: Problem using to_csv with BytesIO #37292

BUG: Problem using to_csv with BytesIO #37292

Comments

amotl commented Oct 20, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

twoertwein commented Oct 20, 2020

twoertwein commented Oct 20, 2020

amotl commented Oct 21, 2020 • edited Loading

twoertwein commented Oct 21, 2020

amotl commented Oct 21, 2020

twoertwein commented Oct 21, 2020

amotl commented Oct 21, 2020

amotl commented Oct 20, 2020 •

edited

Loading

Output of `pd.show_versions()`

amotl commented Oct 21, 2020 •

edited

Loading