-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] non-deterministic compiled SQLExecPlugin.class with scala 2.13 deployment #9571
Comments
After a couple of tests, the issue was confirmed showing after the deploy command https://github.com/NVIDIA/spark-rapids/blob/branch-23.12/jenkins/spark-nightly-build.sh#L120-L124 this could be repro locally
after the installation command, please check scala2.13/{sql-plugin|aggregator}/target/spark331 (or /.m2), B. execute the deploy command for intermediate pkgs
after the deployment command, please check scala2.13/{sql-plugin|aggregator}/target/spark331 (or /.m2) again, It looks like rapids-4-spark-sql_2.13 got re-compiled with different result in the deploy process causing the issue in CI
|
Some simpler repro steps that do not require access to the internal repository, any CI Jenkins settings, and doesn't spend time building unrelated projects in the repo:
|
Verified this is not really the deploy goal that is directly problem but rather the fact that the deploy goal implicitly re-runs the install goal (and all previous goals like compile). Here's even simpler repro steps:
For some reason SQLExecPlugin and 22 other files are getting recompiled in the sql-plugin project as part of the prerequisite phases executed before the deploy phase, and it's during that recompile that the change occurs. I verified from Maven |
I verified that if we avoid doing an install run first followed by a deploy and instead directly build everything during a single deploy build, we end up with |
The affected functions are part of scala.Function1, so on a hunch, I looked for other classes in sql-plugin that are using that trait. Other classes are affected by this if recompiled in isolation (e.g.: GpuLog) (i.e.:
|
If I hack out the use of ScalaStack and have the code just use scala.collection.mutable.Stack directly for the 2.13 build, the |
Describe the bug
We saw it failed in our CI
not bitwise-identical across shims
constantly.after comparing the generated SQLExecPlugin.class in different spark shims, I found the only diff is that
mvn install
after following mvn deploy:
A$ vs A
Steps/Code to reproduce bug
run with internal
rapids_scala213_nightly-dev-github
pipeline (dup a pipeline for testing)UPDATE: local repro steps #9571 (comment)
Expected behavior
After build SQLExecPlugin.class should always have the identical sha value in different shims,
or move SQLExecPlugin.class out of check
The text was updated successfully, but these errors were encountered: