Skip to content

Commit

Permalink
SparkUCX new protocol. (#8)
Browse files Browse the repository at this point in the history
Signed-off-by: Peter Rudenko <[email protected]>
  • Loading branch information
petro-rudenko authored Feb 11, 2022
1 parent 0ca7081 commit 1b67e97
Show file tree
Hide file tree
Showing 51 changed files with 1,898 additions and 2,592 deletions.
27 changes: 9 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,37 @@ UCX for Apache Spark is a high performance ShuffleManager plugin for Apache Spar
that are supported by [UCX](https://github.com/openucx/ucx#supported-transports), to perform Shuffle data transfers in Spark jobs.

## Runtime requirements
* Apache Spark 2.3/2.4/3.0
* Apache Spark 2.4/3.0
* Java 8+
* Installed UCX of version 1.10+, and [UCX supported transport hardware](https://github.com/openucx/ucx#supported-transports).
* Installed UCX of version 1.12+, and [UCX supported transport hardware](https://github.com/openucx/ucx#supported-transports).

## Installation

### Obtain UCX for Apache Spark
Please use the ["Releases"](https://github.com/NVIDIA/sparkucx/releases) page to download SparkUCX jar file
for your spark version (e.g. ucx-spark-1.0-for-spark-2.4.0-jar-with-dependencies.jar).
for your spark version (e.g. ucx-spark-1.1-for-spark-2.4.0-jar-with-dependencies.jar).
Put ucx-spark jar file in $SPARK_UCX_HOME on all the nodes in your cluster.
<br>If you would like to build the project yourself, please refer to the ["Build"](https://github.com/NVIDIA/sparkucx#build) section below.

Ucx binaries **must** be in Spark classpath on every Spark Master and Worker.
Ucx binaries **must** be in Spark classpath on every Spark Worker.
It can be obtained by installing the latest version from [Ucx release page](https://github.com/openucx/ucx/releases)

### Configuration

Provide Spark the location of the SparkUCX plugin jars and ucx shared binaries by using the extraClassPath option.

```
spark.driver.extraClassPath $SPARK_UCX_HOME/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:$UCX_PREFIX/lib
spark.driver.extraClassPath $SPARK_UCX_HOME/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar
spark.executor.extraClassPath $SPARK_UCX_HOME/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:$UCX_PREFIX/lib
```
To enable the UCX for Apache Spark Shuffle Manager plugin, add the following configuration property
to spark (e.g. in $SPARK_HOME/conf/spark-defaults.conf):

```
spark.shuffle.manager org.apache.spark.shuffle.UcxShuffleManager
spark.executorEnv.UCX_ERROR_SIGNALS ""
```
For spark-3.0 version add SparkUCX ShuffleIO plugin:
```
spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.compat.spark_3_0.UcxLocalDiskShuffleDataIO
```


### Build

Expand All @@ -44,15 +42,8 @@ Building the SparkUCX plugin requires [Apache Maven](http://maven.apache.org/) a
Build instructions:

```
% git clone https://github.com/openucx/sparkucx
% git clone https://github.com/nvidia/sparkucx
% cd sparkucx
% mvn -DskipTests clean package -Pspark-2.4
% mvn -DskipTests clean package -Pspark-3.0
```

### Performance

UCX for Apache Spark plugin is built to provide the best performance out-of-the-box, and provides multiple configuration options to further tune UCX for Apache Spark per-job.
For more information on how to setup [HiBench](https://github.com/Intel-bigdata/HiBench) benchmark and reproduce results, please refer to [Accelerated Apache SparkUCX 2.4/3.0 cluster deployment](https://docs.mellanox.com/pages/releaseview.action?pageId=19819236).

![Performance results](https://docs.mellanox.com/download/attachments/19819236/image2020-1-23_15-39-14.png)

6 changes: 2 additions & 4 deletions buildlib/azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,17 @@ stages:
mavenSetM2Home: true
publishJUnitResults: false
goals: "clean package"
options: "-B -Dmaven.repo.local=$(System.DefaultWorkingDirectory)/target/.deps -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn -Pspark-$(profile_version)"
options: "-B -Dmaven.repo.local=$(System.DefaultWorkingDirectory)/target/.deps -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn -Pspark-$(profile_version)"
- bash: |
set -xeE
module load dev/jdk-1.8 tools/spark-$(spark_version)
source buildlib/test.sh
if [[ $(get_rdma_device_iface) != "" ]]
then
export UCX_BRANCH=v1.12.x
export SPARK_UCX_JAR=$(System.DefaultWorkingDirectory)/target/ucx-spark-1.0-for-spark-$(profile_version)-jar-with-dependencies.jar
export SPARK_UCX_JAR=$(System.DefaultWorkingDirectory)/target/ucx-spark-1.1-for-spark-$(profile_version)-jar-with-dependencies.jar
export SPARK_LOCAL_DIRS=$(System.DefaultWorkingDirectory)/target/spark
export SPARK_VERSION=$(spark_version)
export UCX_LIB=$HPCX_UCX_DIR/lib
cd $(System.DefaultWorkingDirectory)/target/
run_tests
else
Expand Down
13 changes: 7 additions & 6 deletions buildlib/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ NODELIST=${NODELIST:="localhost"}

UCX_LIB=${UCX_LIB:=${LD_LIBRARY_PATH}}

SPARK_UCX_JAR=${SPARK_UCX_JAR:=$PWD/spark-ucx-1.0-for-spark-2.4-jar-with-dependencies.jar}
SPARK_UCX_JAR=${SPARK_UCX_JAR:=$PWD/spark-ucx-1.1-for-spark-2.4-jar-with-dependencies.jar}

PROCESSES_PER_INSTANCE=${PROCESSES_PER_INSTANCE:=2}

Expand Down Expand Up @@ -106,7 +106,7 @@ build_ucx() {
git clone -b ${UCX_BRANCH} --depth=1 https://github.com/openucx/ucx.git && cd ucx
./autogen.sh
mkdir build && cd build
../contrib/configure-release-mt --with-java --prefix=$PWD
../contrib/configure-release-mt --prefix=$PWD
make -j `nproc`
make install
UCX_LIB=$PWD/lib/
Expand All @@ -115,15 +115,16 @@ build_ucx() {
setup_configuration() {
mkdir -p ${SPARK_CONF_DIR}

echo ${NODELIST} | tr -s ' ' '\n' >> "${SPARK_CONF_DIR}/slaves"
echo ${NODELIST} | tr -s ' ' '\n' > "${SPARK_CONF_DIR}/slaves"

cat <<-EOF > ${SPARK_CONF_DIR}/spark-defaults.conf
spark.shuffle.manager org.apache.spark.shuffle.UcxShuffleManager
spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.compat.spark_3_0.UcxLocalDiskShuffleDataIO
spark.shuffle.readHostLocalDisk.enabled false
spark.driver.extraClassPath ${SPARK_UCX_JAR}:${UCX_LIB}
spark.executor.extraClassPath ${SPARK_UCX_JAR}:${UCX_LIB}
spark.driver.extraClassPath ${SPARK_UCX_JAR}
spark.executor.extraClassPath ${SPARK_UCX_JAR}:${UCX_LIB}:${UCX_LIB}/ucx
spark.shuffle.ucx.driver.port $(( ${SPARK_MASTER_PORT} + 1 ))
spark.executorEnv.UCX_ERROR_SIGNALS ""
spark.executorEnv.UCX_LOG_LEVEL trace
EOF

cat <<-EOF > ${SPARK_CONF_DIR}/spark-env.sh
Expand Down
64 changes: 1 addition & 63 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ See file LICENSE for terms.
<modelVersion>4.0.0</modelVersion>
<groupId>org.openucx</groupId>
<artifactId>ucx-spark</artifactId>
<version>1.0</version>
<version>1.1</version>
<name>${project.artifactId}</name>
<description>
A high-performance, scalable and efficient shuffle manager plugin for Apache Spark,
Expand All @@ -34,53 +34,10 @@ See file LICENSE for terms.
</properties>

<profiles>
<profile>
<id>spark-2.1</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<excludes>
<exclude>**/spark_3_0/**</exclude>
<exclude>**/spark_2_4/**</exclude>
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>**/spark_3_0/**</exclude>
<exclude>**/spark_2_4/**</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
<properties>
<spark.version>2.1.0</spark.version>
<sonar.exclusions>**/spark_3_0/**, **/spark_2_4/**</sonar.exclusions>
<scala.version>2.11.12</scala.version>
<scala.compat.version>2.11</scala.compat.version>
</properties>
</profile>
<profile>
<id>spark-2.4</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<excludes>
<exclude>**/spark_3_0/**</exclude>
<exclude>**/spark_2_1/**</exclude>
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
Expand All @@ -107,16 +64,6 @@ See file LICENSE for terms.
</activation>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<excludes>
<exclude>**/spark_2_1/**</exclude>
<exclude>**/spark_2_4/**</exclude>
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
Expand Down Expand Up @@ -156,15 +103,6 @@ See file LICENSE for terms.
<build>
<finalName>${project.artifactId}-${project.version}-for-${project.activeProfiles[0].id}</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
Expand Down
Loading

0 comments on commit 1b67e97

Please sign in to comment.