-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fix IT discrepancy which depending on TEST_PARALLEL #6044
Conversation
build |
Note: should re-target 22.10 |
put in draft since seems for 22.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just minor comments
The following command gets an invalid path if do not have
This will cause the following error if put
|
Note that I suggested to replace $ AVRO_JARS=$(readlink -f /non/existing/path/spark-avro*.jar)
$ echo -n "$AVRO_JARS" | wc -c
0 |
A single
|
re #6044 (comment) |
build |
Good idea, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
please verify that the script works with Iceberg. Talking to @jlowe I recall it was sensitive to exrtaClassPath vs --jars classloader
spark-rapids/jenkins/spark-tests.sh
Lines 181 to 192 in e0a7cd2
if [[ "$ICEBERG_SPARK_VER" < "3.3" ]]; then | |
# Classloader config is here to work around classloader issues with | |
# --packages in distributed setups, should be fixed by | |
# https://github.com/NVIDIA/spark-rapids/pull/5646 | |
SPARK_SUBMIT_FLAGS="$BASE_SPARK_SUBMIT_ARGS $SEQ_CONF \ | |
--conf spark.rapids.force.caller.classloader=false \ | |
--packages org.apache.iceberg:iceberg-spark-runtime-${ICEBERG_SPARK_VER}_2.12:${ICEBERG_VERSION} \ | |
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ | |
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ | |
--conf spark.sql.catalog.spark_catalog.type=hadoop \ | |
--conf spark.sql.catalog.spark_catalog.warehouse=/tmp/spark-warehouse-$$" \ | |
./run_pyspark_from_build.sh -m iceberg --iceberg |
build |
build |
build |
…raClassPath and spark.driver.extraClassPath Signed-off-by: Chong Gao <[email protected]>
b1612c4
to
422a074
Compare
build |
Investigating the
|
This confirms my previous finding #5796 that extraClassPath has been there to begin with to deal with the bug in Spark Standalone. We can work it around and still remain consistent by inspecting whether |
build |
@res-life blossom-ci is disabled on draft PR's @pxLi maybe we could move the check to yml because there seems to be Boolean "draft" field on pull_request object https://stackoverflow.com/questions/68349031/only-run-actions-on-non-draft-pull-request |
else | ||
# If specified master, set `spark.executor.extraClassPath` due to issue https://github.com/NVIDIA/spark-rapids/issues/5796 | ||
# Remove this line if the issue is fixed | ||
export PYSP_TEST_spark_executor_extraClassPath="${ALL_JARS}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what I meant in #6044 (comment). Whether to use extraClassPath is not decided based on master. I just meant we need a similar check. Let us undo this change
We want to inspect whether PYSP_TEST_spark_shuffle_manager
is set outside `if ((NUM_LOCAL_EXECS > 0)); then .., else ... fi'. Please refer to the comment for L202
# `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar... | ||
exec "$SPARK_HOME"/bin/spark-submit --conf spark.jars=${PYSP_TEST_spark_jars} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the above is acceptable here we can do
# `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar... | |
exec "$SPARK_HOME"/bin/spark-submit --conf spark.jars=${PYSP_TEST_spark_jars} \ | |
if [[ -n "$PYSP_TEST_spark_jars" ]]; then | |
jarOpts=(--conf spark.jars="$PYSP_TEST_spark_jars") | |
elif [[ -n "$PYSP_TEST_spark_driver_extraClassPath" ]]; then | |
jarOpts=(--driver-class-path "$PYSP_TEST_spark_driver_extraClassPath") | |
fi | |
# `spark.jars` is the same as `--jars`, e.g.: --jars a.jar,b.jar... | |
exec "$SPARK_HOME"/bin/spark-submit "${jarOpts[@]}" \ |
export PYSP_TEST_spark_driver_extraClassPath="${ALL_JARS// /:}" | ||
export PYSP_TEST_spark_executor_extraClassPath="${ALL_JARS// /:}" | ||
export PYSP_TEST_spark_jars="${ALL_JARS}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we can have something to the tune of
if [[ "${PYSP_TEST_spark_shuffle_manager}" =~ "RapidsShuffleManager" ]]; then
export PYSP_TEST_spark_driver_extraClassPath="${ALL_JARS// /:}"
export PYSP_TEST_spark_executor_extraClassPath="${ALL_JARS// /:}"
else
export PYSP_TEST_spark_jars="${ALL_JARS}"
fi
blossom-ci should work on draft PR. This one was actually timeout due to above issues |
Thanks @pxLi , good to know |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #5714
Changes:
spark.jars
configuration instead ofspark.executor.extraClassPath
andspark.driver.extraClassPath
.There're 2 paths depending on
TEST_PARALLEL
.Update the first path, also use
spark.jars
which is same as--jars
spark.executor.extraClassPath
is deprecated see:https://spark.apache.org/docs/latest/configuration.html
We can also find the
spark.jars
configure from the above link.bash
script to test jar file existence before set JAR_PATHSigned-off-by: Chong Gao [email protected]