-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Sort on single struct should fallback to Spark #811
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #811 +/- ##
============================================
- Coverage 33.94% 33.80% -0.14%
+ Complexity 874 870 -4
============================================
Files 112 112
Lines 42916 42914 -2
Branches 9464 9452 -12
============================================
- Hits 14567 14507 -60
- Misses 25379 25428 +49
- Partials 2970 2979 +9 ☔ View full report in Codecov by Sentry. |
| spark.comet.scan.enabled | Whether to enable Comet scan. When this is turned on, Spark will use Comet to read Parquet data source. Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. By default, this config is true. | true | | ||
| spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. By default is disabled. | false | | ||
| spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 | | ||
| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 | | ||
| spark.comet.sparkToColumnar.supportedOperatorList | A comma-separated list of operators that will be converted to Comet columnar format when 'spark.comet.sparkToColumnar.enabled' is true | Range,InMemoryTableScan | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Shall we use ` instead of '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not changed by this PR. I think there is previous PR changing it, but didn't update the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document is updated automatically when make release
locally.
Thanks @huaxingao |
@@ -2501,6 +2501,13 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim | |||
|
|||
case SortExec(sortOrder, _, child, _) | |||
if isCometOperatorEnabled(op.conf, CometConf.OPERATOR_SORT) => | |||
// TODO: Remove this constraint when we upgrade to new arrow-rs including | |||
// https://github.com/apache/arrow-rs/pull/6225 | |||
if (child.output.length == 1 && child.output.head.dataType.isInstanceOf[StructType]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we add support for other types, do we need to update this to make it recursive so that we check for Map or Array containing struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add more data types here according to arrow-rs.
(cherry picked from commit 071c780)
Which issue does this PR close?
Closes #807.
Rationale for this change
What changes are included in this PR?
How are these changes tested?