Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing an empty list to read_csv causes segmentation fault #45957

Closed
santhisenan opened this issue Feb 12, 2022 · 2 comments · Fixed by #46325
Closed

Passing an empty list to read_csv causes segmentation fault #45957

santhisenan opened this issue Feb 12, 2022 · 2 comments · Fixed by #46325
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version Segfault Non-Recoverable Error
Milestone

Comments

@santhisenan
Copy link

To reproduce the issue

Create a virtual environment with python 3.8.9 (same behaviour in 3.9.1 also). Install pandas using pip install pandas. Run pd.read_csv([]) after importing pandas.

>>> python
Python 3.8.9 (default, Aug 21 2021, 15:53:23) 
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.read_csv([])
fish: Job 1, 'python' terminated by signal SIGSEGV (Address boundary error)

Expected behaviour

Throw and error and exit gracefully. Currently, the python process is crashing.

Versions

OS: MacOS Big Sur 11.6
Python version 3.9.1

Output of pip freeze:

numpy==1.22.2
pandas==1.4.0
python-dateutil==2.8.2
pytz==2021.3
six==1.16.0

Running pd.show_versions() throws the following error, probably related to #44980

>>> import pandas as pd
>>> pd.show_versions()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/santhisenan/.local/share/virtualenvs/pd_versions-uc_eBlwP/lib/python3.9/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
    deps = _get_dependency_info()
  File "/Users/santhisenan/.local/share/virtualenvs/pd_versions-uc_eBlwP/lib/python3.9/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
    mod = import_optional_dependency(modname, errors="ignore")
  File "/Users/santhisenan/.local/share/virtualenvs/pd_versions-uc_eBlwP/lib/python3.9/site-packages/pandas/compat/_optional.py", line 126, in import_optional_dependency
    module = importlib.import_module(name)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/Users/santhisenan/.local/share/virtualenvs/pd_versions-uc_eBlwP/lib/python3.9/site-packages/setuptools/__init__.py", line 8, in <module>
    import _distutils_hack.override  # noqa: F401
  File "/Users/santhisenan/.local/share/virtualenvs/pd_versions-uc_eBlwP/lib/python3.9/site-packages/_distutils_hack/override.py", line 1, in <module>
    __import__('_distutils_hack').do_override()
  File "/Users/santhisenan/.local/share/virtualenvs/pd_versions-uc_eBlwP/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 71, in do_override
    ensure_local_distutils()
  File "/Users/santhisenan/.local/share/virtualenvs/pd_versions-uc_eBlwP/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 59, in ensure_local_distutils
    assert '_distutils' in core.__file__, core.__file__
AssertionError: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/core.py
@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv Bug Segfault Non-Recoverable Error labels Feb 12, 2022
@profMsaif
Copy link

the problem is in this files
c_parser_wrappe.py
which use parsers.pyi
[76]: self._reader = parsers.TextReader(src, **kwds)
src must be an io.TextOWWrapper
and it will crash if you passed any other object such as dict set or even a str
how ever this shouln't happen as it should throw a ValueError

the issue is in readers.py

[1213] --> if not isinstance(f, list):
# code for handling the file
# validating the typee ..ect

suggesions-->
remove this line and unindent the body of if statment
f value file_or_path shouldn't be a list object

@phofl phofl added the Regression Functionality that used to work in a prior pandas version label Mar 11, 2022
@phofl phofl added this to the 1.4.2 milestone Mar 11, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Mar 12, 2022
@simonjayhawkins
Copy link
Member

NOTE: before #45389 the ValueError was raised in _get_filepath_or_buffer

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_27570/1861184160.py in <module>
----> 1 pd.read_csv([])

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    584     kwds.update(kwds_defaults)
    585 
--> 586     return _read(filepath_or_buffer, kwds)
    587 
    588 

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds)
    480 
    481     # Create the parser.
--> 482     parser = TextFileReader(filepath_or_buffer, **kwds)
    483 
    484     if chunksize or iterator:

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/parsers/readers.py in __init__(self, f, engine, **kwds)
    809             self.options["has_index_names"] = kwds["has_index_names"]
    810 
--> 811         self._engine = self._make_engine(self.engine)
    812 
    813     def close(self):

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/parsers/readers.py in _make_engine(self, engine)
   1038             )
   1039         # error: Too many arguments for "ParserBase"
-> 1040         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1041 
   1042     def _failover_to_python(self):

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py in __init__(self, src, **kwds)
     49 
     50         # open handles
---> 51         self._open_handles(src, kwds)
     52         assert self.handles is not None
     53 

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/parsers/base_parser.py in _open_handles(self, src, kwds)
    220         Let the readers open IOHandles after they are done with their potential raises.
    221         """
--> 222         self.handles = get_handle(
    223             src,
    224             "r",

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    607 
    608     # open URLs
--> 609     ioargs = _get_filepath_or_buffer(
    610         path_or_buf,
    611         encoding=encoding,

~/miniconda3/envs/pandas-1.3.5/lib/python3.10/site-packages/pandas/io/common.py in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    394     if not is_file_like(filepath_or_buffer):
    395         msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 396         raise ValueError(msg)
    397 
    398     return IOArgs(

ValueError: Invalid file path or buffer object type: <class 'list'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version Segfault Non-Recoverable Error
Projects
None yet
5 participants