Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some existing config options to key-value based configuration #2756

Closed
andygrove opened this issue Jun 21, 2022 · 1 comment
Closed
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
PR #2754 adds an extensible configuration mechanism based on key-value pairs.

This issue proposes to migrate the following existing configuration options in SessionConfig to this approach:

    /// Default batch size while creating new batches, it's especially useful
    /// for buffer-in-memory batches since creating tiny batches would results
    /// in too much metadata memory consumption.
    pub batch_size: usize,
    /// Number of partitions for query execution. Increasing partitions can increase concurrency.
    pub target_partitions: usize,
    /// Default catalog name for table resolution
    default_catalog: String,
    /// Default schema name for table resolution
    default_schema: String,
    /// Whether the default catalog and schema should be created automatically
    create_default_catalog_and_schema: bool,
    /// Should DataFusion provide access to `information_schema`
    /// virtual tables for displaying schema information
    information_schema: bool,
    /// Should DataFusion repartition data using the join keys to execute joins in parallel
    /// using the provided `target_partitions` level
    pub repartition_joins: bool,
    /// Should DataFusion repartition data using the aggregate keys to execute aggregates in parallel
    /// using the provided `target_partitions` level
    pub repartition_aggregations: bool,
    /// Should DataFusion repartition data using the partition keys to execute window functions in
    /// parallel using the provided `target_partitions` level
    pub repartition_windows: bool,
    /// Should DataFusion parquet reader using the predicate to prune data
    pub parquet_pruning: bool,

Describe the solution you'd like
As described

Describe alternatives you've considered
None

Additional context
None

@alamb
Copy link
Contributor

alamb commented Sep 19, 2023

I believe we have ported all these options:

DataFusion CLI v31.0.0
❯ show all;
+------------------------------------------------------------+---------------------------+
| name                                                       | setting                   |
+------------------------------------------------------------+---------------------------+
| datafusion.catalog.create_default_catalog_and_schema       | true                      |
| datafusion.catalog.default_catalog                         | datafusion                |
| datafusion.catalog.default_schema                          | public                    |
| datafusion.catalog.format                                  |                           |
| datafusion.catalog.has_header                              | false                     |
| datafusion.catalog.information_schema                      | true                      |
| datafusion.catalog.location                                |                           |
| datafusion.execution.aggregate.scalar_update_factor        | 10                        |
| datafusion.execution.batch_size                            | 8192                      |
| datafusion.execution.coalesce_batches                      | true                      |
| datafusion.execution.collect_statistics                    | false                     |
| datafusion.execution.parquet.bloom_filter_enabled          | false                     |
| datafusion.execution.parquet.bloom_filter_fpp              |                           |
| datafusion.execution.parquet.bloom_filter_ndv              |                           |
| datafusion.execution.parquet.column_index_truncate_length  |                           |
| datafusion.execution.parquet.compression                   |                           |
| datafusion.execution.parquet.created_by                    | datafusion version 31.0.0 |
| datafusion.execution.parquet.data_page_row_count_limit     | 18446744073709551615      |
| datafusion.execution.parquet.data_pagesize_limit           | 1048576                   |
| datafusion.execution.parquet.dictionary_enabled            |                           |
| datafusion.execution.parquet.dictionary_page_size_limit    | 1048576                   |
| datafusion.execution.parquet.enable_page_index             | true                      |
| datafusion.execution.parquet.encoding                      |                           |
| datafusion.execution.parquet.max_row_group_size            | 1048576                   |
| datafusion.execution.parquet.max_statistics_size           |                           |
| datafusion.execution.parquet.metadata_size_hint            |                           |
| datafusion.execution.parquet.pruning                       | true                      |
| datafusion.execution.parquet.pushdown_filters              | false                     |
| datafusion.execution.parquet.reorder_filters               | false                     |
| datafusion.execution.parquet.skip_metadata                 | true                      |
| datafusion.execution.parquet.statistics_enabled            |                           |
| datafusion.execution.parquet.write_batch_size              | 1024                      |
| datafusion.execution.parquet.writer_version                | 1.0                       |
| datafusion.execution.planning_concurrency                  | 16                        |
| datafusion.execution.sort_in_place_threshold_bytes         | 1048576                   |
| datafusion.execution.sort_spill_reservation_bytes          | 10485760                  |
| datafusion.execution.target_partitions                     | 16                        |
| datafusion.execution.time_zone                             | +00:00                    |
| datafusion.explain.logical_plan_only                       | false                     |
| datafusion.explain.physical_plan_only                      | false                     |
| datafusion.explain.show_statistics                         | false                     |
| datafusion.optimizer.allow_symmetric_joins_without_pruning | true                      |
| datafusion.optimizer.bounded_order_preserving_variants     | false                     |
| datafusion.optimizer.enable_round_robin_repartition        | true                      |
| datafusion.optimizer.enable_topk_aggregation               | true                      |
| datafusion.optimizer.filter_null_join_keys                 | false                     |
| datafusion.optimizer.hash_join_single_partition_threshold  | 1048576                   |
| datafusion.optimizer.max_passes                            | 3                         |
| datafusion.optimizer.prefer_hash_join                      | true                      |
| datafusion.optimizer.repartition_aggregations              | true                      |
| datafusion.optimizer.repartition_file_min_size             | 10485760                  |
| datafusion.optimizer.repartition_file_scans                | true                      |
| datafusion.optimizer.repartition_joins                     | true                      |
| datafusion.optimizer.repartition_sorts                     | true                      |
| datafusion.optimizer.repartition_windows                   | true                      |
| datafusion.optimizer.skip_failed_rules                     | false                     |
| datafusion.optimizer.top_down_join_key_reordering          | true                      |
| datafusion.sql_parser.dialect                              | generic                   |
| datafusion.sql_parser.enable_ident_normalization           | true                      |
| datafusion.sql_parser.parse_float_as_decimal               | false                     |
+------------------------------------------------------------+---------------------------+
60 rows in set. Query took 0.003 seconds.

@alamb alamb closed this as completed Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants