Skip to content

Commit

Permalink
Minor: Update documentation for `datafusion.execution.parquet.enable_…
Browse files Browse the repository at this point in the history
…page_index` (#6342)
  • Loading branch information
alamb authored May 13, 2023
1 parent 4b21b61 commit 90775b4
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
5 changes: 3 additions & 2 deletions datafusion/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -241,8 +241,9 @@ config_namespace! {
config_namespace! {
/// Options related to reading of parquet files
pub struct ParquetOptions {
/// If true, uses parquet data page level metadata (Page Index) statistics
/// to reduce the number of rows decoded.
/// If true, reads the Parquet data page level metadata (the
/// Page Index), if present, to reduce the I/O and number of
/// rows decoded.
pub enable_page_index: bool, default = true

/// If true, the parquet reader attempts to skip entire row groups based
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
| datafusion.execution.collect_statistics | false | Should DataFusion collect statistics after listing files |
| datafusion.execution.target_partitions | 0 | Number of partitions for query execution. Increasing partitions can increase concurrency. Defaults to the number of CPU cores on the system |
| datafusion.execution.time_zone | +00:00 | The default time zone Some functions, e.g. `EXTRACT(HOUR from SOME_TIME)`, shift the underlying datetime according to this time zone, and then extract the hour |
| datafusion.execution.parquet.enable_page_index | true | If true, uses parquet data page level metadata (Page Index) statistics to reduce the number of rows decoded. |
| datafusion.execution.parquet.enable_page_index | true | If true, reads the Parquet data page level metadata (the Page Index), if present, to reduce the I/O and number of rows decoded. |
| datafusion.execution.parquet.pruning | true | If true, the parquet reader attempts to skip entire row groups based on the predicate in the query and the metadata (min/max values) stored in the parquet file |
| datafusion.execution.parquet.skip_metadata | true | If true, the parquet reader skip the optional embedded metadata that may be in the file Schema. This setting can help avoid schema conflicts when querying multiple parquet files with schemas containing compatible types but different metadata |
| datafusion.execution.parquet.metadata_size_hint | NULL | If specified, the parquet reader will try and fetch the last `size_hint` bytes of the parquet file optimistically. If not specified, two reads are required: One read to fetch the 8-byte parquet footer and another to fetch the metadata length encoded in the footer |
Expand Down

0 comments on commit 90775b4

Please sign in to comment.