Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading csvs through a TextIOWrapper raises OSError: failed to write whole buffer #17428

Open
2 tasks done
tdsmith opened this issue Jul 4, 2024 · 3 comments
Open
2 tasks done
Labels
A-io-csv Area: reading/writing CSV files bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@tdsmith
Copy link

tdsmith commented Jul 4, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import io
import polars as pl

fixture = """\
first_name,last_name
josé,österlung
additional,content
"""

k = 500  # does not trigger with k=50
long_fixture = fixture * k

# imagine `fixture_bytes` is a large file opened in binary mode:
fixture_bytes = io.BytesIO(long_fixture.encode("latin1"))

# so we need to wrap it in TextIOWrapper to present TextIO to polars:
wrapper = io.TextIOWrapper(fixture_bytes, "latin1", "replace")

pl.read_csv(wrapper)

yields:

OSError: failed to write whole buffer

Log output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/io/csv/functions.py", line 418, in read_csv
    df = _read_csv_impl(
         ^^^^^^^^^^^^^^^
  File "/Users/tim/projects/polars-textwrapper-repro/.direnv/python-3.12/lib/python3.12/site-packages/polars/io/csv/functions.py", line 564, in _read_csv_impl
    pydf = PyDataFrame.read_csv(
           ^^^^^^^^^^^^^^^^^^^^^
OSError: failed to write whole buffer

Issue description

Collecting the data into a StringIO before passing it to polars with e.g.

restringified = io.StringIO(wrapper.read())
pl.read_csv(restringified)

works, though is no longer a streaming operation.

Expected behavior

Should not crash.

Installed versions

--------Version info---------
Polars:               1.0.0
Index type:           UInt32
Platform:             macOS-14.5-arm64-arm-64bit
Python:               3.12.4 (main, Jul  3 2024, 11:45:52) [Clang 15.0.0 (clang-1500.3.9.4)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@tdsmith tdsmith added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jul 4, 2024
@raayu83
Copy link

raayu83 commented Jul 12, 2024

I stumbled upon the same problem when trying to implement polars import/export features for pyexasol (Exasol Database client library).
According to the docs, TextIOWrapper both IO[str] | IO[bytes] should work with read_csv, so this is probably a bug?

@coastalwhite coastalwhite added P-low Priority: low A-io-csv Area: reading/writing CSV files and removed needs triage Awaiting prioritization by a maintainer labels Jul 12, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jul 12, 2024
@harrybiddle
Copy link

I found this bug too - it's a big problem, because it means I can't pass streams (in my case, a stream of data from AWS S3) into Polars. I'm specifically avoiding collecting the data into a StringIO or BytesIO to minimise peak memory.

@bharathktw
Copy link

Team,
is there a plan to fix this issue as BytesIO is needed in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-csv Area: reading/writing CSV files bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

5 participants