Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iss4631 #4994

Merged
merged 2 commits into from
Mar 7, 2025
Merged

Iss4631 #4994

Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions docs/stable/data/csv/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Below are parameters that can be passed to the [`read_csv` function](#csv-functi
| `dateformat` | [Date format]({% link docs/stable/sql/functions/dateformat.md %}) used when parsing and writing dates. | `VARCHAR` | (empty) |
| `date_format` | Alias for `dateformat`; only available in the `COPY` statement. | `VARCHAR` | (empty) |
| `decimal_separator` | Decimal separator for numbers. | `VARCHAR` | `.` |
| `delim` | Delimiter character used to separate columns within each line. Alias for `sep`. | `VARCHAR` | `,` |
| `delim` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., 🦆. Alias for `sep`. | `VARCHAR` | `,` |
| `delimiter` | Alias for `delim`; only available in the `COPY` statement. | `VARCHAR` | `,` |
| `escape` | String used to escape the `quote` character within quoted values. | `VARCHAR` | `"` |
| `encoding` | Encoding used by the CSV file. Options are `utf-8`, `utf-16`, `latin-1`. Not available in the `COPY` statement (which always uses `utf-8`). | `VARCHAR` | `utf-8` |
Expand All @@ -108,7 +108,7 @@ Below are parameters that can be passed to the [`read_csv` function](#csv-functi
| `rejects_table` | Name of the [temporary table where information on faulty lines is stored]({% link docs/stable/data/csv/reading_faulty_csv_files.md %}#reject-errors). | `VARCHAR` | `reject_errors` |
| `rejects_limit` | Upper limit on the number of faulty lines per file that are recorded in the rejects table. Setting this to `0` means that no limit is applied. | `BIGINT` | `0` |
| `sample_size` | Number of sample lines for [auto detection of parameters]({% link docs/stable/data/csv/auto_detection.md %}). | `BIGINT` | 20480 |
| `sep` | Delimiter character used to separate columns within each line. Alias for `delim`. | `VARCHAR` | `,` |
| `sep` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., 🦆. Alias for `delim`. | `VARCHAR` | `,` |
| `skip` | Number of lines to skip at the start of each file. | `BIGINT` | 0 |
| `store_rejects` | Skip any lines with errors and store them in the rejects table. | `BOOL` | `false` |
| `strict_mode` | Enforces the strictness level of the CSV Reader. When set to `true`, the parser will throw an error upon encountering any issues. When set to `false`, the parser will attempt to read structurally incorrect files. It is important to note that reading structurally incorrect files can cause ambiguity; therefore, this option should be used with caution. | `BOOL` | `true` |
Expand All @@ -117,7 +117,9 @@ Below are parameters that can be passed to the [`read_csv` function](#csv-functi
| `types` or `dtypes` or `column_types` | Column types, as either a list (by position) or a struct (by name). See [example]({% link docs/stable/data/csv/tips.md %}#override-the-types-of-specific-columns). | `VARCHAR[]` or `STRUCT` | (empty) |
| `union_by_name` | Align columns from different files [by column name]({% link docs/stable/data/multiple_files/combining_schemas.md %}#union-by-name) instead of position. Using this option increases memory consumption. | `BOOL` | `false` |

> Tip We recommend the [`iconv` command-line tool](https://linux.die.net/man/1/iconv) to convert files with encodings not supported by `read_csv` to UTF-8. For example:
> Tip DuckDB's CSV reader supports UTF-8 (default), UTF-16 and Latin-1 encordings (see the `encoding` option).
> To convert files with different encodings, we recommend using the [`iconv` command-line tool](https://linux.die.net/man/1/iconv).
>
> ```bash
> iconv -f ISO-8859-2 -t UTF-8 input.csv > input-utf-8.csv
> ```
Expand Down