[BUG] Fix IT discrepancy which depending on TEST_PARALLEL #6044

res-life · 2022-07-21T11:17:45Z

Changes:

Use spark.jars configuration instead of spark.executor.extraClassPath and spark.driver.extraClassPath.
There're 2 paths depending on TEST_PARALLEL.

    if ((${#TEST_PARALLEL_OPTS[@]} > 0));
    then
        exec python "${RUN_TESTS_COMMAND[@]}" "${TEST_PARALLEL_OPTS[@]}" "${TEST_COMMON_OPTS[@]}"
    else
        # We set the GPU memory size to be a constant value even if only running with a parallelism of 1
        # because it helps us have consistent test runs.
        exec "$SPARK_HOME"/bin/spark-submit --jars "${ALL_JARS// /,}" \
            --driver-java-options "$PYSP_TEST_spark_driver_extraJavaOptions" \
            $SPARK_SUBMIT_FLAGS \
            --conf 'spark.rapids.memory.gpu.allocSize='"$PYSP_TEST_spark_rapids_memory_gpu_allocSize" \
            "${RUN_TESTS_COMMAND[@]}" "${TEST_COMMON_OPTS[@]}"
    fi

Update the first path, also use spark.jars which is same as --jars

spark.executor.extraClassPath is deprecated see:
https://spark.apache.org/docs/latest/configuration.html

spark.executor.extraClassPath: 
Extra classpath entries to prepend to the classpath of executors. 
This exists primarily for backwards-compatibility with older versions of Spark. 
Users typically should not need to set this option.

We can also find the spark.jars configure from the above link.

Update bash script to test jar file existence before set JAR_PATH

Signed-off-by: Chong Gao [email protected]

res-life · 2022-07-21T11:20:52Z

build

res-life · 2022-07-21T11:31:57Z

Note: should re-target 22.10

tgravescs · 2022-07-21T12:51:46Z

put in draft since seems for 22.10

gerashegalov

looks good, just minor comments

integration_tests/run_pyspark_from_build.sh

res-life · 2022-07-22T04:15:59Z

The following command gets an invalid path if do not have spark-avro jars: xxx/spark-avro*.jar,
it should return an empty string.

AVRO_JARS=$(echo "$LOCAL_JAR_PATH"/spark-avro*.jar)

This will cause the following error if put non-exist jar into spark.jars

22/07/22 04:03:12 ERROR SparkContext: Failed to add /home/non-exist.jar to Spark environment
java.io.FileNotFoundException: Jar /home/non-exist.jar not found
	at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1949)
	at org.apache.spark.SparkContext.addJar(SparkContext.scala:2004)
	at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:507)
	at org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:507)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:507)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.lang.Thread.run(Thread.java:748)

gerashegalov · 2022-07-22T04:51:22Z

The following command gets an invalid path if do not have spark-avro jars: xxx/spark-avro*.jar, it should return an empty string.
AVRO_JARS=$(echo "$LOCAL_JAR_PATH"/spark-avro*.jar)

Note that I suggested to replace echo by readlink #6044 (comment)

$ AVRO_JARS=$(readlink -f /non/existing/path/spark-avro*.jar)
$ echo -n "$AVRO_JARS" | wc -c
0

res-life · 2022-07-22T05:59:11Z

A single readlink -f path can't get the right answer, see below:
See the last revision code, currently, added the readlink to canonicalize the path.

$ ls /home/chongg/local-disk/code/spark-rapids/integration_tests/target
run_dir
// not have `integration-tests jar`

$ ls /home/chongg/local-disk/code/spark-rapids/integration_tests/target/rapids-4-spark-integration-tests*-spark330.jar
ls: cannot access '/home/chongg/local-disk/code/spark-rapids/integration_tests/target/rapids-4-spark-integration-tests*-spark330.jar': No such file or directory
// not have `integration-tests jar`

$ readlink -f /home/chongg/local-disk/code/spark-rapids/integration_tests/target/rapids-4-spark-integration-tests*-spark330.jar
/home/chongg/local-disk/code/spark-rapids/integration_tests/target/rapids-4-spark-integration-tests*-spark330.jar
// But readlink gets this non-existing path,  this command can't get the right answer

gerashegalov · 2022-07-27T20:55:21Z

re #6044 (comment)
I inadvertently gave you a switch -f that I often use with readlink allowing the leaf path component to be absent. But there is also -e requiring all path components to exist.

https://linuxcommand.org/lc3_man_pages/readlink1.html

res-life · 2022-07-28T06:24:57Z

build

res-life · 2022-07-28T06:25:45Z

re #6044 (comment) I inadvertently gave you a switch -f that I often use with readlink allowing the leaf path component to be absent. But there is also -e requiring all path components to exist.

https://linuxcommand.org/lc3_man_pages/readlink1.html

Good idea, done.

gerashegalov

LGTM

integration_tests/run_pyspark_from_build.sh

gerashegalov

LGTM,

please verify that the script works with Iceberg. Talking to @jlowe I recall it was sensitive to exrtaClassPath vs --jars classloader

spark-rapids/jenkins/spark-tests.sh

Lines 181 to 192 in e0a7cd2

    
             if [[ "$ICEBERG_SPARK_VER" < "3.3" ]]; then 
        
               # Classloader config is here to work around classloader issues with 
        
               # --packages in distributed setups, should be fixed by 
        
               # https://github.com/NVIDIA/spark-rapids/pull/5646 
        
               SPARK_SUBMIT_FLAGS="$BASE_SPARK_SUBMIT_ARGS $SEQ_CONF \ 
        
                 --conf spark.rapids.force.caller.classloader=false \ 
        
                 --packages org.apache.iceberg:iceberg-spark-runtime-${ICEBERG_SPARK_VER}_2.12:${ICEBERG_VERSION} \ 
        
                 --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ 
        
                 --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ 
        
                 --conf spark.sql.catalog.spark_catalog.type=hadoop \ 
        
                 --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/spark-warehouse-$$" \ 
        
                 ./run_pyspark_from_build.sh -m iceberg --iceberg

res-life · 2022-07-29T05:49:09Z

build

res-life · 2022-07-29T08:24:16Z

build

gerashegalov · 2022-07-29T08:37:48Z

build

…raClassPath and spark.driver.extraClassPath Signed-off-by: Chong Gao <[email protected]>

res-life · 2022-08-01T03:18:29Z

build

res-life · 2022-08-01T09:30:10Z

Investigating the class not found issue when running rapids_shuffle_smoke_test
https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/jenkins/spark-premerge-build.sh#L91

22/08/01 05:25:08 INFO SecurityManager: Changing view acls groups to: 
22/08/01 05:25:08 INFO SecurityManager: Changing modify acls groups to: 
22/08/01 05:25:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/08/01 05:25:08 INFO TransportClientFactory: Successfully created connection to premerge-ci-1-jenkins-rapids-premerge-github-5274-b5jn1-05nrp/10.233.110.96:39717 after 3 ms (0 ms spent in bootstraps)
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:382)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: com.nvidia.spark.rapids.spark311.RapidsShuffleManager
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:207)
	at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:275)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:205)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:442)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	... 4 more

gerashegalov · 2022-08-01T13:37:15Z

Investigating the class not found issue when running rapids_shuffle_smoke_test https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/jenkins/spark-premerge-build.sh#L91

This confirms my previous finding #5796 that extraClassPath has been there to begin with to deal with the bug in Spark Standalone. We can work it around and still remain consistent by inspecting whether spark.shuffle.manager is part of the config
https://github.com/NVIDIA/spark-rapids/blob/branch-22.10/integration_tests/run_pyspark_from_build.sh#L236
to decide whether to use --jars or extraClassPath. Then it's clearly documentable when to use which option

res-life · 2022-08-02T09:46:33Z

build

gerashegalov · 2022-08-02T20:14:47Z

build

@res-life blossom-ci is disabled on draft PR's

@pxLi maybe we could move the check to yml because there seems to be Boolean "draft" field on pull_request object https://stackoverflow.com/questions/68349031/only-run-actions-on-non-draft-pull-request

gerashegalov · 2022-08-02T20:29:47Z

integration_tests/run_pyspark_from_build.sh

+      else
+        # If specified master, set `spark.executor.extraClassPath` due to issue https://github.com/NVIDIA/spark-rapids/issues/5796
+        # Remove this line if the issue is fixed
+        export PYSP_TEST_spark_executor_extraClassPath="${ALL_JARS}"


This is not what I meant in #6044 (comment). Whether to use extraClassPath is not decided based on master. I just meant we need a similar check. Let us undo this change

We want to inspect whether PYSP_TEST_spark_shuffle_manager is set outside `if ((NUM_LOCAL_EXECS > 0)); then .., else ... fi'. Please refer to the comment for L202

gerashegalov · 2022-08-02T21:20:31Z

integration_tests/run_pyspark_from_build.sh

+        # `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar...
+        exec "$SPARK_HOME"/bin/spark-submit --conf spark.jars=${PYSP_TEST_spark_jars} \


if the above is acceptable here we can do

Suggested change

# `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar...

exec "$SPARK_HOME"/bin/spark-submit --conf spark.jars=${PYSP_TEST_spark_jars} \

if [[ -n "$PYSP_TEST_spark_jars" ]]; then

jarOpts=(--conf spark.jars="$PYSP_TEST_spark_jars")

elif [[ -n "$PYSP_TEST_spark_driver_extraClassPath" ]]; then

jarOpts=(--driver-class-path "$PYSP_TEST_spark_driver_extraClassPath")

fi

# `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar...

exec "$SPARK_HOME"/bin/spark-submit "${jarOpts[@]}" \

gerashegalov · 2022-08-02T21:22:39Z

integration_tests/run_pyspark_from_build.sh

-    export PYSP_TEST_spark_driver_extraClassPath="${ALL_JARS// /:}"
-    export PYSP_TEST_spark_executor_extraClassPath="${ALL_JARS// /:}"
+    export PYSP_TEST_spark_jars="${ALL_JARS}"


here we can have something to the tune of

if [[ "${PYSP_TEST_spark_shuffle_manager}" =~ "RapidsShuffleManager" ]]; then export PYSP_TEST_spark_driver_extraClassPath="${ALL_JARS// /:}" export PYSP_TEST_spark_executor_extraClassPath="${ALL_JARS// /:}" else export PYSP_TEST_spark_jars="${ALL_JARS}" fi

pxLi · 2022-08-03T00:43:18Z

build

@res-life blossom-ci is disabled on draft PR's

@pxLi maybe we could move the check to yml because there seems to be Boolean "draft" field on pull_request object https://stackoverflow.com/questions/68349031/only-run-actions-on-non-draft-pull-request

blossom-ci should work on draft PR. This one was actually timeout due to above issues

gerashegalov · 2022-08-03T02:35:17Z

blossom-ci should work on draft PR. This one was actually timeout due to above issues

Thanks @pxLi , good to know

res-life · 2022-08-11T05:42:29Z

build

gerashegalov

LGTM

res-life requested review from gerashegalov, razajafri and firestarman July 21, 2022 11:22

tgravescs marked this pull request as draft July 21, 2022 12:51

gerashegalov requested changes Jul 21, 2022

View reviewed changes

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved

gerashegalov reviewed Jul 28, 2022

View reviewed changes

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved

gerashegalov previously approved these changes Jul 28, 2022

View reviewed changes

res-life dismissed gerashegalov’s stale review via b1612c4 July 29, 2022 02:09

res-life changed the base branch from branch-22.08 to branch-22.10 July 29, 2022 05:05

gerashegalov previously approved these changes Jul 29, 2022

View reviewed changes

gerashegalov marked this pull request as ready for review July 29, 2022 08:37

sameerz added the test Only impacts tests label Jul 29, 2022

Update IT: use spark.jars configuration instead of spark.executor.ext…

422a074

…raClassPath and spark.driver.extraClassPath Signed-off-by: Chong Gao <[email protected]>

res-life dismissed gerashegalov’s stale review via 422a074 August 1, 2022 03:12

res-life force-pushed the discrepancy-of-it branch from b1612c4 to 422a074 Compare August 1, 2022 03:12

res-life marked this pull request as draft August 1, 2022 09:28

Fix premerge faied on rapids_shuffle_smoke_test

1aee252

gerashegalov requested changes Aug 2, 2022

View reviewed changes

Fix

77cceb0

res-life marked this pull request as ready for review August 11, 2022 05:41

gerashegalov approved these changes Aug 16, 2022

View reviewed changes

res-life merged commit f2b6bd0 into NVIDIA:branch-22.10 Aug 16, 2022

res-life deleted the discrepancy-of-it branch August 16, 2022 02:46

gerashegalov mentioned this pull request Aug 16, 2022

[BUG] Iceberg tests fail due to duplication of spark.jarc conf via PYSP_TST and on the command line #6344

Closed

abellina mentioned this pull request Aug 31, 2022

[BUG] CDH integration tests ClassNotFoundException: com.nvidia.spark.rapids.spark321cdh.RapidsShuffleManager #6417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fix IT discrepancy which depending on TEST_PARALLEL #6044

[BUG] Fix IT discrepancy which depending on TEST_PARALLEL #6044

res-life commented Jul 21, 2022 •

edited

Loading

res-life commented Jul 21, 2022

res-life commented Jul 21, 2022

tgravescs commented Jul 21, 2022

gerashegalov left a comment

res-life commented Jul 22, 2022

gerashegalov commented Jul 22, 2022

res-life commented Jul 22, 2022 •

edited

Loading

gerashegalov commented Jul 27, 2022

res-life commented Jul 28, 2022

res-life commented Jul 28, 2022

gerashegalov left a comment

gerashegalov left a comment

res-life commented Jul 29, 2022

res-life commented Jul 29, 2022

gerashegalov commented Jul 29, 2022

res-life commented Aug 1, 2022

res-life commented Aug 1, 2022 •

edited

Loading

gerashegalov commented Aug 1, 2022

res-life commented Aug 2, 2022

gerashegalov commented Aug 2, 2022

gerashegalov Aug 2, 2022

gerashegalov Aug 2, 2022

gerashegalov Aug 2, 2022

pxLi commented Aug 3, 2022

gerashegalov commented Aug 3, 2022

res-life commented Aug 11, 2022

gerashegalov left a comment

	if [[ "$ICEBERG_SPARK_VER" < "3.3" ]]; then
	# Classloader config is here to work around classloader issues with
	# --packages in distributed setups, should be fixed by
	# https://github.com/NVIDIA/spark-rapids/pull/5646
	SPARK_SUBMIT_FLAGS="$BASE_SPARK_SUBMIT_ARGS $SEQ_CONF \
	--conf spark.rapids.force.caller.classloader=false \
	--packages org.apache.iceberg:iceberg-spark-runtime-${ICEBERG_SPARK_VER}_2.12:${ICEBERG_VERSION} \
	--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
	--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
	--conf spark.sql.catalog.spark_catalog.type=hadoop \
	--conf spark.sql.catalog.spark_catalog.warehouse=/tmp/spark-warehouse-$$" \
	./run_pyspark_from_build.sh -m iceberg --iceberg

		# `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar...
		exec "$SPARK_HOME"/bin/spark-submit --conf spark.jars=${PYSP_TEST_spark_jars} \

-        # `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar...
-        exec "$SPARK_HOME"/bin/spark-submit --conf spark.jars=${PYSP_TEST_spark_jars} \
+        if [[ -n "$PYSP_TEST_spark_jars" ]]; then
+           jarOpts=(--conf spark.jars="$PYSP_TEST_spark_jars")
+        elif [[ -n "$PYSP_TEST_spark_driver_extraClassPath" ]]; then
+            jarOpts=(--driver-class-path "$PYSP_TEST_spark_driver_extraClassPath")
+        fi
+        # `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar...
+        exec "$SPARK_HOME"/bin/spark-submit "${jarOpts[@]}" \

[BUG] Fix IT discrepancy which depending on TEST_PARALLEL #6044

[BUG] Fix IT discrepancy which depending on TEST_PARALLEL #6044

Conversation

res-life commented Jul 21, 2022 • edited Loading

Changes:

res-life commented Jul 21, 2022

res-life commented Jul 21, 2022

tgravescs commented Jul 21, 2022

gerashegalov left a comment

Choose a reason for hiding this comment

res-life commented Jul 22, 2022

gerashegalov commented Jul 22, 2022

res-life commented Jul 22, 2022 • edited Loading

gerashegalov commented Jul 27, 2022

res-life commented Jul 28, 2022

res-life commented Jul 28, 2022

gerashegalov left a comment

Choose a reason for hiding this comment

gerashegalov left a comment

Choose a reason for hiding this comment

res-life commented Jul 29, 2022

res-life commented Jul 29, 2022

gerashegalov commented Jul 29, 2022

res-life commented Aug 1, 2022

res-life commented Aug 1, 2022 • edited Loading

gerashegalov commented Aug 1, 2022

res-life commented Aug 2, 2022

gerashegalov commented Aug 2, 2022

gerashegalov Aug 2, 2022

Choose a reason for hiding this comment

gerashegalov Aug 2, 2022

Choose a reason for hiding this comment

gerashegalov Aug 2, 2022

Choose a reason for hiding this comment

pxLi commented Aug 3, 2022

gerashegalov commented Aug 3, 2022

res-life commented Aug 11, 2022

gerashegalov left a comment

Choose a reason for hiding this comment

res-life commented Jul 21, 2022 •

edited

Loading

res-life commented Jul 22, 2022 •

edited

Loading

res-life commented Aug 1, 2022 •

edited

Loading