Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Sort on single struct should fallback to Spark #811

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Aug 11, 2024

Which issue does this PR close?

Closes #807.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.80%. Comparing base (4fe43ad) to head (44855de).

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #811      +/-   ##
============================================
- Coverage     33.94%   33.80%   -0.14%     
+ Complexity      874      870       -4     
============================================
  Files           112      112              
  Lines         42916    42914       -2     
  Branches       9464     9452      -12     
============================================
- Hits          14567    14507      -60     
- Misses        25379    25428      +49     
- Partials       2970     2979       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

| spark.comet.scan.enabled | Whether to enable Comet scan. When this is turned on, Spark will use Comet to read Parquet data source. Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. By default, this config is true. | true |
| spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. By default is disabled. | false |
| spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
| spark.comet.sparkToColumnar.supportedOperatorList | A comma-separated list of operators that will be converted to Comet columnar format when 'spark.comet.sparkToColumnar.enabled' is true | Range,InMemoryTableScan |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shall we use ` instead of '

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not changed by this PR. I think there is previous PR changing it, but didn't update the document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document is updated automatically when make release locally.

@viirya viirya merged commit 071c780 into apache:main Aug 12, 2024
75 checks passed
@viirya
Copy link
Member Author

viirya commented Aug 12, 2024

Thanks @huaxingao

@viirya viirya deleted the fix_sort branch August 12, 2024 05:48
@@ -2501,6 +2501,13 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim

case SortExec(sortOrder, _, child, _)
if isCometOperatorEnabled(op.conf, CometConf.OPERATOR_SORT) =>
// TODO: Remove this constraint when we upgrade to new arrow-rs including
// https://github.com/apache/arrow-rs/pull/6225
if (child.output.length == 1 && child.output.head.dataType.isInstanceOf[StructType]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we add support for other types, do we need to update this to make it recursive so that we check for Map or Array containing struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add more data types here according to arrow-rs.

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fallback to Spark if sort on unsupported cases
4 participants