Skip to content

Commit

Permalink
Merge pull request #4994 from szarnyasg/iss4631
Browse files Browse the repository at this point in the history
Iss4631
  • Loading branch information
szarnyasg authored Mar 7, 2025
2 parents fbf5544 + ada5fe0 commit cce1730
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions docs/stable/data/csv/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Below are parameters that can be passed to the [`read_csv` function](#csv-functi
| `dateformat` | [Date format]({% link docs/stable/sql/functions/dateformat.md %}) used when parsing and writing dates. | `VARCHAR` | (empty) |
| `date_format` | Alias for `dateformat`; only available in the `COPY` statement. | `VARCHAR` | (empty) |
| `decimal_separator` | Decimal separator for numbers. | `VARCHAR` | `.` |
| `delim` | Delimiter character used to separate columns within each line. Alias for `sep`. | `VARCHAR` | `,` |
| `delim` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., 🦆. Alias for `sep`. | `VARCHAR` | `,` |
| `delimiter` | Alias for `delim`; only available in the `COPY` statement. | `VARCHAR` | `,` |
| `escape` | String used to escape the `quote` character within quoted values. | `VARCHAR` | `"` |
| `encoding` | Encoding used by the CSV file. Options are `utf-8`, `utf-16`, `latin-1`. Not available in the `COPY` statement (which always uses `utf-8`). | `VARCHAR` | `utf-8` |
Expand All @@ -108,7 +108,7 @@ Below are parameters that can be passed to the [`read_csv` function](#csv-functi
| `rejects_table` | Name of the [temporary table where information on faulty lines is stored]({% link docs/stable/data/csv/reading_faulty_csv_files.md %}#reject-errors). | `VARCHAR` | `reject_errors` |
| `rejects_limit` | Upper limit on the number of faulty lines per file that are recorded in the rejects table. Setting this to `0` means that no limit is applied. | `BIGINT` | `0` |
| `sample_size` | Number of sample lines for [auto detection of parameters]({% link docs/stable/data/csv/auto_detection.md %}). | `BIGINT` | 20480 |
| `sep` | Delimiter character used to separate columns within each line. Alias for `delim`. | `VARCHAR` | `,` |
| `sep` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., 🦆. Alias for `delim`. | `VARCHAR` | `,` |
| `skip` | Number of lines to skip at the start of each file. | `BIGINT` | 0 |
| `store_rejects` | Skip any lines with errors and store them in the rejects table. | `BOOL` | `false` |
| `strict_mode` | Enforces the strictness level of the CSV Reader. When set to `true`, the parser will throw an error upon encountering any issues. When set to `false`, the parser will attempt to read structurally incorrect files. It is important to note that reading structurally incorrect files can cause ambiguity; therefore, this option should be used with caution. | `BOOL` | `true` |
Expand All @@ -117,7 +117,9 @@ Below are parameters that can be passed to the [`read_csv` function](#csv-functi
| `types` or `dtypes` or `column_types` | Column types, as either a list (by position) or a struct (by name). See [example]({% link docs/stable/data/csv/tips.md %}#override-the-types-of-specific-columns). | `VARCHAR[]` or `STRUCT` | (empty) |
| `union_by_name` | Align columns from different files [by column name]({% link docs/stable/data/multiple_files/combining_schemas.md %}#union-by-name) instead of position. Using this option increases memory consumption. | `BOOL` | `false` |

> Tip We recommend the [`iconv` command-line tool](https://linux.die.net/man/1/iconv) to convert files with encodings not supported by `read_csv` to UTF-8. For example:
> Tip DuckDB's CSV reader supports UTF-8 (default), UTF-16 and Latin-1 encordings (see the `encoding` option).
> To convert files with different encodings, we recommend using the [`iconv` command-line tool](https://linux.die.net/man/1/iconv).
>
> ```bash
> iconv -f ISO-8859-2 -t UTF-8 input.csv > input-utf-8.csv
> ```
Expand Down

0 comments on commit cce1730

Please sign in to comment.