Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_csv raises an error when both prefix and names are set to None #42387

Closed
3 tasks done
lhoestq opened this issue Jul 5, 2021 · 6 comments · Fixed by #42690
Closed
3 tasks done

BUG: read_csv raises an error when both prefix and names are set to None #42387

lhoestq opened this issue Jul 5, 2021 · 6 comments · Fixed by #42690
Assignees
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@lhoestq
Copy link
Contributor

lhoestq commented Jul 5, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Hi everyone, I'm running into this issue since pandas 1.3.0:

Code Sample, a copy-pastable example

import pandas as pd

pd.read_csv("path/to/any/csv", names=None, prefix=None)

raises

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-681798f605ab> in <module>()
----> 1 pd.read_csv("/content/sample_data/mnist_test.csv", names=None, prefix=None)

2 frames
/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py in _refine_defaults_read(dialect, delimiter, delim_whitespace, engine, sep, error_bad_lines, warn_bad_lines, on_bad_lines, names, prefix, defaults)
   1304 
   1305     if names is not lib.no_default and prefix is not lib.no_default:
-> 1306         raise ValueError("Specified named and prefix; you can only specify one.")
   1307 
   1308     kwds["names"] = None if names is lib.no_default else names

ValueError: Specified named and prefix; you can only specify one.

Problem description

With names=None and prefix=None those parameters shouldn't be considered as specified, and the code should run as if they were not passed as keyword arguments.

This is due to the changes in this PR #41446 that changed the default values of those two parameters from None to no_default

Expected Output

The code should load the csv using the default behavior as if names and prefix were not passed as keyword arguments

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : f00ed8f python : 3.7.10.final.0 python-bits : 64 OS : Linux OS-release : 5.4.104+ Version : #1 SMP Sat Jun 5 09:50:34 PDT 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.0
numpy : 1.19.5
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 57.0.0
Cython : 0.29.23
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 5.5.0
pandas_datareader: 0.9.0
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.3
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.13.3
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.4.18
tables : 3.4.4
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2

@lhoestq lhoestq added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 5, 2021
@lithomas1 lithomas1 added IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 5, 2021
@lithomas1 lithomas1 added this to the 1.3.1 milestone Jul 5, 2021
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jul 5, 2021

With names=None and prefix=None those parameters shouldn't be considered as specified

in Python, passing None is not the same as not passing optional arguments.

the api docs for 1.2 (https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.read_csv.html) clearly state the accepted types for both the names and prefix parameters as names : array-like, optional and prefix : str, optional

@jreback
Copy link
Contributor

jreback commented Jul 14, 2021

we would take a PR to fix this.

@jreback jreback removed the Closing Candidate May be closeable, needs more eyeballs label Jul 14, 2021
@V1NAY8
Copy link

V1NAY8 commented Jul 18, 2021

Even Eland has a similar issue, where we are passing both as None.
At the moment, we are handling it by not passing one of them if both are None.

@lithomas1
Copy link
Member

lithomas1 commented Jul 23, 2021

@simonjayhawkins Are you fine with patching this? I can send a PR if so, but it would probably miss the 1.3.1 cut.

@simonjayhawkins
Copy link
Member

sure. we discussed this on the dev call and patching this is fine. 1.3.1 probably Sunday.

@lhoestq
Copy link
Contributor Author

lhoestq commented Jul 26, 2021

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants