[SPARK-40922][PYTHON] Document multiple path support in `pyspark.pand…

…as.read_csv` ### What changes were proposed in this pull request? as discussed in https://issues.apache.org/jira/browse/SPARK-40922: > The path argument of `pyspark.pandas.read_csv(path, ...)` currently has type annotation `str` and is documented as > > path : str > The path string storing the CSV file to be read. >The implementation however uses `pyspark.sql.DataFrameReader.csv(path, ...)` which does support multiple paths: > > path : str or list > string, or list of strings, for input path(s), > or RDD of Strings storing CSV rows. > This PR updates the type annotation and documentation of `path` argument of `pyspark.pandas.read_csv` ### Why are the changes needed? Loading multiple CSV files at once is a useful feature to have and should be documented ### Does this PR introduce _any_ user-facing change? it documents and existing feature ### How was this patch tested? No need for tests (so far): only type annotations and docblocks were changed Closes #38399 from soxofaan/SPARK-40922-pyspark-pandas-read-csv-multiple-paths. Authored-by: Stefaan Lippens <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
apache · Oct 27, 2022 · 5b5eb23 · 5b5eb23
1 parent 74668e2
commit 5b5eb23
Showing 1 changed file with 7 additions and 3 deletions.
diff --git a/python/pyspark/pandas/namespace.py b/python/pyspark/pandas/namespace.py
@@ -213,7 +213,7 @@ def range(
 
 
 def read_csv(
-    path: str,
+    path: Union[str, List[str]],
     sep: str = ",",
     header: Union[str, int, None] = "infer",
     names: Optional[Union[str, List[str]]] = None,
@@ -234,8 +234,8 @@ def read_csv(
 
     Parameters
     ----------
-    path : str
-        The path string storing the CSV file to be read.
+    path : str or list
+        Path(s) of the CSV file(s) to be read.
     sep : str, default ‘,’
         Delimiter to use. Non empty string.
     header : int, default ‘infer’
@@ -296,6 +296,10 @@ def read_csv(
     Examples
     --------
     >>> ps.read_csv('data.csv')  # doctest: +SKIP
+
+    Load multiple CSV files as a single DataFrame:
+
+    >>> ps.read_csv(['data-01.csv', 'data-02.csv'])  # doctest: +SKIP
     """
     # For latin-1 encoding is same as iso-8859-1, that's why its mapped to iso-8859-1.
     encoding_mapping = {"latin-1": "iso-8859-1"}