Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up build: unnecessary invalidation in the incremental recompile mode [databricks] #9698

Merged
merged 30 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7f7d4cb
Consolidate delta-lake poms
gerashegalov Nov 10, 2023
91e718a
skip build-info if revision unchanged
gerashegalov Nov 11, 2023
afb81b2
scala2.13
gerashegalov Nov 11, 2023
da86cda
compile output as dependency
gerashegalov Nov 11, 2023
9684214
scala2.13 common
gerashegalov Nov 11, 2023
f6dd0e5
revert to 4.3.0
gerashegalov Nov 11, 2023
3d0fe74
scala2.13
gerashegalov Nov 11, 2023
fd8bbd9
downgrade scala-maven-plugin
gerashegalov Nov 12, 2023
f79d88b
incremental compile
gerashegalov Nov 12, 2023
868e36b
incremental end-to-end
gerashegalov Nov 13, 2023
ca299c8
may achieve with a single aggregate module
gerashegalov Nov 13, 2023
2a35697
remove aggrgator-tmp
gerashegalov Nov 14, 2023
74da55a
Merge remote-tracking branch 'origin/branch-23.12' into speedupBuildInfo
gerashegalov Nov 14, 2023
8ea3f76
delete diff
gerashegalov Nov 14, 2023
6157a5b
Revert delta-lake changes
gerashegalov Nov 14, 2023
baff166
scala2.13
gerashegalov Nov 14, 2023
56f6ecd
revert unneeded changes
gerashegalov Nov 14, 2023
9cc11f2
change execution id
gerashegalov Nov 14, 2023
7ea535a
Replace `skip` with `maven.scaladoc.skip`
gerashegalov Nov 14, 2023
c3d9740
robuster jar swapping logic
gerashegalov Nov 14, 2023
b4ec26e
scala2.13 sync
gerashegalov Nov 14, 2023
5e4764a
Merge remote-tracking branch 'origin/branch-23.12' into speedupBuildInfo
gerashegalov Nov 14, 2023
d92fc17
Initialize dirs
gerashegalov Nov 15, 2023
3f144ad
Fix equals check, regen scala2.13
gerashegalov Nov 15, 2023
6af04b5
use copy instead of move
gerashegalov Nov 15, 2023
db14701
undo compiler plugin change
gerashegalov Nov 15, 2023
f2051e3
rapids.jni.unpack.skip
gerashegalov Nov 15, 2023
e5aecbc
Apply suggestions from code review
gerashegalov Nov 15, 2023
5efd1c2
reviews
gerashegalov Nov 15, 2023
24ca3c7
Fixes
gerashegalov Nov 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/mvn-verify-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ env:
COMMON_MVN_FLAGS: >-
-Ddist.jar.compress=false
-DskipTests
-Dskip
-Dmaven.scaladoc.skip

jobs:
get-shim-versions-from-dist:
Expand Down
12 changes: 9 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,19 +218,25 @@ for a single Spark version Shim alone.
To this end in a pre-production build you can set the Boolean property
`dist.jar.compress` to `false`, its default value is `true`.

Furthermore, after the first build execution on the clean repository the spark-rapids-jni
SNAPSHOT dependency typically does not change until the next nightly CI build, or the next install
to the local Maven repo if you are working on a change to the native code. So you can save
significant time spent on repeated unpacking these dependencies by adding `-Drapids.jni.unpack.skip`
to the `dist` build command.

The time saved is more significant if you are merely changing
the `aggregator` module, or the `dist` module, or just incorporating changes from
[spark-rapids-jni](https://github.com/NVIDIA/spark-rapids-jni/blob/branch-23.04/CONTRIBUTING.md#local-testing-of-cross-repo-contributions-cudf-spark-rapids-jni-and-spark-rapids)

For example, to quickly repackage `rapids-4-spark` after the
initial `./build/buildall` you can iterate by invoking
```Bash
mvn package -pl dist -PnoSnapshots -Ddist.jar.compress=false
mvn package -pl dist -PnoSnapshots -Ddist.jar.compress=false -Drapids.jni.unpack.skip
```

or similarly
```Bash
./build/buildall --rebuild-dist-only --option="-Ddist.jar.compress=false"
./build/buildall --rebuild-dist-only --option="-Ddist.jar.compress=false -Drapids.jni.unpack.skip"
```

## Code contributions
Expand Down Expand Up @@ -282,7 +288,7 @@ Before proceeding with importing spark-rapids into IDEA or switching to a differ
profile, execute the install phase with the corresponding `buildver`, e.g. for Spark 3.4.0:

```bash
mvn clean install -Dbuildver=340 -Dskip -DskipTests
mvn clean install -Dbuildver=340 -Dmaven.scaladoc.skip -DskipTests
```

##### Importing the project
Expand Down
72 changes: 70 additions & 2 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
<rapids.shade.package>com.nvidia.shaded.spark</rapids.shade.package>
<rapids.compressed.artifact>false</rapids.compressed.artifact>
<rapids.shim.jar.test.phase>none</rapids.shim.jar.test.phase>
<rapids.default.jar.excludePattern>**/*</rapids.default.jar.excludePattern>
<rapids.default.jar.phase>initialize</rapids.default.jar.phase>
<!-- Maven to register attached artifact , which we later replace -->
<rapids.shim.jar.phase>initialize</rapids.shim.jar.phase>
</properties>
<dependencies>
<dependency>
Expand Down Expand Up @@ -73,7 +77,6 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>${spark.version.classifier}</shadedClassifierName>
<artifactSet>
<excludes>
<exclude>org.slf4j:*</exclude>
Expand Down Expand Up @@ -108,13 +111,78 @@
<executions>
<execution>
<id>main-${spark.version.classifier}</id>
<phase>package</phase>
<phase>compile</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<id>init-dirs</id>
<phase>initialize</phase>
<goals><goal>run</goal></goals>
<configuration>
<target>
<mkdir dir="${project.build.outputDirectory}"/>
</target>
</configuration>
</execution>
<execution>
<id>generate-build-info</id>
<phase>none</phase>
</execution>
<execution>
<id>create-aggregator-for-downstream-if-content-changed</id>
<goals><goal>run</goal></goals>
<phase>process-classes</phase>
<configuration>
<target>
<taskdef resource="net/sf/antcontrib/antcontrib.properties"/>
<property name="realAggJar"
location="${project.build.outputDirectory}/../${project.build.finalName}-shaded.jar"/>
<property name="aggJarForDownstream"
location="${project.build.outputDirectory}/../${project.build.finalName}-${spark.version.classifier}.jar"/>
<property name="newClassesDir" location="${project.build.outputDirectory}/../new-classes"/>
<property name="oldClassesDir" location="${project.build.outputDirectory}/../old-classes"/>
<echo>Checking if need to recreate: ${aggJarForDownstream}</echo>
<!-- using diff instead of checksum to deal with the expected zip metadata diff -->

<!-- make sure we start with a clean new dir -->
<mkdir dir="${newClassesDir}"/>
<delete dir="${newClassesDir}"/>
<unzip src="${realAggJar}" dest="${newClassesDir}"/>
<mkdir dir="${oldClassesDir}"/>

<exec executable="diff"
resultproperty="diff.result">
<arg value="-q"/>
<arg value="-r"/>
<arg value="${oldClassesDir}"/>
<arg value="${newClassesDir}"/>
</exec>

<ac:if xmlns:ac="antlib:net.sf.antcontrib">
<equals arg1="0" arg2="${diff.result}"/>
<then>
<echo>Aggregator jar unchanged</echo>
</then>
<else>
<echo>Aggregator jar changed, recreating final jar</echo>
<delete dir="${oldClassesDir}"/>
<move file="${newClassesDir}" tofile="${oldClassesDir}"/>
<copy file="${realAggJar}" tofile="${aggJarForDownstream}"/>
</else>
</ac:if>
</target>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
Expand Down
4 changes: 2 additions & 2 deletions build/buildall
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ function build_single_shim() {
-DskipTests \
-Dbuildver="$BUILD_VER" \
-Drat.skip="$SKIP_CHECKS" \
-Dskip \
-Dmaven.scaladoc.skip \
-Dmaven.scalastyle.skip="$SKIP_CHECKS" \
-pl aggregator -am > "$LOG_FILE" 2>&1 || {
[[ "$LOG_FILE" != "/dev/tty" ]] && echo "$LOG_FILE:" && tail -20 "$LOG_FILE" || true
Expand Down Expand Up @@ -303,5 +303,5 @@ time (
echo "Resuming from $joinShimBuildFrom build only using $BASE_VER"
$MVN $FINAL_OP -rf $joinShimBuildFrom $MODULE_OPT $MVN_PROFILE_OPT $INCLUDED_BUILDVERS_OPT \
-Dbuildver="$BASE_VER" \
-DskipTests -Dskip
-DskipTests -Dmaven.scaladoc.skip
)
2 changes: 2 additions & 0 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
<dist.jar.name>${project.build.directory}/${project.build.finalName}-${jni.classifier}.jar</dist.jar.name>
<dist.jar.pom.url>jar:file:${dist.jar.name}!/META-INF/maven/${project.groupId}/${project.artifactId}/pom.xml</dist.jar.pom.url>
<rapids.default.jar.phase>none</rapids.default.jar.phase>
<rapids.jni.unpack.skip>false</rapids.jni.unpack.skip>
</properties>
<profiles>
<profile>
Expand Down Expand Up @@ -447,6 +448,7 @@ self.log("... OK")
<goal>unpack</goal>
</goals>
<configuration>
<skip>${rapids.jni.unpack.skip}</skip>
<artifactItems>
<!-- if add new artifacts, should set `overWrite` as true -->
<artifactItem>
Expand Down
2 changes: 1 addition & 1 deletion integration_tests/src/assembly/bin.xml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
<outputDirectory>integration_tests</outputDirectory>
</file>
<file>
<source>${project.build.directory}/extra-resources/rapids4spark-version-info.properties</source>
<source>${project.build.outputDirectory}/rapids4spark-version-info.properties</source>
<outputDirectory>integration_tests</outputDirectory>
</file>
</files>
Expand Down
2 changes: 1 addition & 1 deletion jenkins/databricks/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ $MVN_CMD -B -Ddatabricks -Dbuildver=$BUILDVER clean package -DskipTests $MVN_OPT
if [[ "$WITH_DEFAULT_UPSTREAM_SHIM" != "0" ]]; then
echo "Building the default Spark shim and creating a two-shim dist jar"
UPSTREAM_BUILDVER=$($MVN_CMD help:evaluate -q -pl dist -Dexpression=buildver -DforceStdout)
$MVN_CMD -B package -pl dist -am -DskipTests -Dskip $MVN_OPT \
$MVN_CMD -B package -pl dist -am -DskipTests -Dmaven.scaladoc.skip $MVN_OPT \
-Dincluded_buildvers=$UPSTREAM_BUILDVER,$BUILDVER
fi

Expand Down
2 changes: 1 addition & 1 deletion jenkins/spark-premerge-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ fi

CUDA_CLASSIFIER=${CUDA_CLASSIFIER:-'cuda11'}
MVN_CMD="mvn -Dmaven.wagon.http.retryHandler.count=3"
MVN_BUILD_ARGS="-Drat.skip=true -Dskip -Dmaven.scalastyle.skip=true -Dcuda.version=$CUDA_CLASSIFIER"
MVN_BUILD_ARGS="-Drat.skip=true -Dmaven.scaladoc.skip -Dmaven.scalastyle.skip=true -Dcuda.version=$CUDA_CLASSIFIER"

mvn_verify() {
echo "Run mvn verify..."
Expand Down
72 changes: 54 additions & 18 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -819,6 +819,7 @@
<cloudera.repo.enabled>false</cloudera.repo.enabled>
<bloop.installPhase>install</bloop.installPhase>
<bloop.configDirectory>${spark.rapids.source.basedir}/.bloop</bloop.configDirectory>
<build.info.path>${project.build.outputDirectory}/rapids4spark-version-info.properties</build.info.path>
</properties>

<dependencyManagement>
Expand Down Expand Up @@ -966,30 +967,64 @@
</target>
</configuration>
</execution>
<execution>
<id>setup-dirs</id>
<phase>initialize</phase>
<goals><goal>run</goal></goals>
<configuration>
<target>
<mkdir dir="${project.build.directory}/extra-resources"/>
<mkdir dir="${project.build.directory}/tmp"/>
</target>
</configuration>
</execution>
<execution>
<id>generate-build-info</id>
<phase>generate-resources</phase>
<configuration>
<!-- Execute the shell script to generate the plugin build information. -->
<target name="build-info">
<mkdir dir="${project.build.directory}/extra-resources"/>
<mkdir dir="${project.build.directory}/tmp"/>
<exec executable="bash"
output="${project.build.directory}/extra-resources/rapids4spark-version-info.properties"
resultproperty="build-info.exitCode"
errorproperty="build-info.errorMsg"
failonerror="false">
<arg value="${spark.rapids.source.basedir}/build/build-info"/>
<arg value="${project.version}"/>
<arg value="${spark-rapids-jni.version}"/>
<exec executable="git" outputproperty="git.head.revision">
<arg value="rev-parse"/>
<arg value="HEAD"/>
</exec>
<fail message="exec build-info.sh failed, exit code is ${build-info.exitCode}, error msg is ${build-info.errorMsg}">
<condition>
<not>
<equals arg1="${build-info.exitCode}" arg2="0"/>
</not>
</condition>
</fail>
<property file="${build.info.path}" prefix="saved.build-info"/>
<echo>
Comparing git revisions:
previous=${saved.build-info.revision}
current=${git.head.revision}
</echo>
<taskdef resource="net/sf/antcontrib/antcontrib.properties"/>
<ac:if xmlns:ac="antlib:net.sf.antcontrib">
<equals arg1="${git.head.revision}" arg2="${saved.build-info.revision}"/>
<then>
<echo>
Git revisions unchanged: skipping version info file generation.
Delete ${build.info.path} or mvn clean if regeneration desired.
This will force full Scala code rebuild in downstream modules.
</echo>
</then>
<else>
<echo>Generating new version info file</echo>
<mkdir dir="${project.build.outputDirectory}"/>
<exec executable="bash"
output="${build.info.path}"
resultproperty="build-info.exitCode"
errorproperty="build-info.errorMsg"
failonerror="false">
<arg value="${spark.rapids.source.basedir}/build/build-info"/>
<arg value="${project.version}"/>
<arg value="${spark-rapids-jni.version}"/>
</exec>
<fail message="exec build-info.sh failed, exit code is ${build-info.exitCode}, error msg is ${build-info.errorMsg}">
<condition>
<not>
<equals arg1="${build-info.exitCode}" arg2="0"/>
</not>
</condition>
</fail>
</else>
</ac:if>
</target>
</configuration>

Expand Down Expand Up @@ -1049,6 +1084,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<executions>
<execution>
<id>default-compile</id>
Expand Down Expand Up @@ -1118,8 +1154,8 @@
<arg>-Xfatal-warnings</arg>
<!-- #endif scala-2.12 -->
<arg>-Wconf:cat=lint-adapted-args:e</arg>
<arg>-Xsource:2.13</arg>
<!-- #if scala-2.13 --><!--
<arg>-Xsource:2.13</arg>
<arg>-Ywarn-unused:locals,patvars,privates</arg>
<arg>-Wconf:cat=deprecation:wv,any:e</arg>
<arg>-Wconf:cat=scaladoc:wv</arg>
Expand Down
Loading