apache · viirya · Aug 1, 2024 · Jul 30, 2024
diff --git a/common/src/main/scala/org/apache/comet/CometConf.scala b/common/src/main/scala/org/apache/comet/CometConf.scala
@@ -222,8 +222,8 @@ object CometConf extends ShimCometConf {
     conf("spark.comet.columnar.shuffle.memorySize")
       .doc(
         "The optional maximum size of the memory used for Comet columnar shuffle, in MiB. " +
-          "Note that this config is only used when `spark.comet.columnar.shuffle.enabled` is " +
-          "true. Once allocated memory size reaches this config, the current batch will be " +
+          "Note that this config is only used when `spark.comet.exec.shuffle.mode` is " +
+          "`jvm`. Once allocated memory size reaches this config, the current batch will be " +
           "flushed to disk immediately. If this is not configured, Comet will use " +
           "`spark.comet.shuffle.memory.factor` * `spark.comet.memoryOverhead` as " +
           "shuffle memory size. If final calculated value is larger than Comet memory " +
@@ -259,7 +259,7 @@ object CometConf extends ShimCometConf {
       "prefer dictionary encoding when shuffling the column. If the ratio is higher than " +
       "this config, dictionary encoding will be used on shuffling string column. This config " +
       "is effective if it is higher than 1.0. By default, this config is 10.0. Note that this " +
-      "config is only used when 'spark.comet.columnar.shuffle.enabled' is true.")
+      "config is only used when `spark.comet.exec.shuffle.mode` is `jvm`.")
     .doubleConf
     .createWithDefault(10.0)
 

diff --git a/docs/source/user-guide/configs.md b/docs/source/user-guide/configs.md
@@ -48,4 +48,4 @@ Comet provides the following configuration settings.
 | spark.comet.scan.enabled | Whether to enable Comet scan. When this is turned on, Spark will use Comet to read Parquet data source. Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. By default, this config is true. | true |
 | spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. By default is disabled. | false |
 | spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
-| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when 'spark.comet.columnar.shuffle.enabled' is true. | 10.0 |
+| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
diff --git a/docs/source/user-guide/installation.md b/docs/source/user-guide/installation.md
@@ -150,5 +150,5 @@ Some cluster managers may require additional configuration, see <https://spark.a
 To enable columnar shuffle which supports all partitioning and basic complex types, one more config is required:
 
 ```
---conf spark.comet.columnar.shuffle.enabled=true
+--conf spark.comet.exec.shuffle.mode=jvm
 ```