Skip to content

Commit

Permalink
[SPARK-40922][PYTHON] Document multiple path support in `pyspark.pand…
Browse files Browse the repository at this point in the history
…as.read_csv`

### What changes were proposed in this pull request?

as discussed in https://issues.apache.org/jira/browse/SPARK-40922:

> The path argument of `pyspark.pandas.read_csv(path, ...)` currently has type annotation `str` and is documented as
>
>       path : str
>           The path string storing the CSV file to be read.
>The implementation however uses `pyspark.sql.DataFrameReader.csv(path, ...)` which does support multiple paths:
>
>        path : str or list
>            string, or list of strings, for input path(s),
>            or RDD of Strings storing CSV rows.
>

This PR updates the type annotation and documentation of `path` argument of `pyspark.pandas.read_csv`

### Why are the changes needed?

Loading multiple CSV files at once is a useful feature to have and should be documented

### Does this PR introduce _any_ user-facing change?
it documents and existing feature

### How was this patch tested?
No need for tests (so far): only type annotations and docblocks were changed

Closes #38399 from soxofaan/SPARK-40922-pyspark-pandas-read-csv-multiple-paths.

Authored-by: Stefaan Lippens <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
soxofaan authored and HyukjinKwon committed Oct 27, 2022
1 parent 74668e2 commit 5b5eb23
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions python/pyspark/pandas/namespace.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ def range(


def read_csv(
path: str,
path: Union[str, List[str]],
sep: str = ",",
header: Union[str, int, None] = "infer",
names: Optional[Union[str, List[str]]] = None,
Expand All @@ -234,8 +234,8 @@ def read_csv(
Parameters
----------
path : str
The path string storing the CSV file to be read.
path : str or list
Path(s) of the CSV file(s) to be read.
sep : str, default ‘,’
Delimiter to use. Non empty string.
header : int, default ‘infer’
Expand Down Expand Up @@ -296,6 +296,10 @@ def read_csv(
Examples
--------
>>> ps.read_csv('data.csv') # doctest: +SKIP
Load multiple CSV files as a single DataFrame:
>>> ps.read_csv(['data-01.csv', 'data-02.csv']) # doctest: +SKIP
"""
# For latin-1 encoding is same as iso-8859-1, that's why its mapped to iso-8859-1.
encoding_mapping = {"latin-1": "iso-8859-1"}
Expand Down

0 comments on commit 5b5eb23

Please sign in to comment.