Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGBUS JVM error seen with bazel version 6.3.0 #23146

Closed
ryanmacdonald opened this issue Jul 30, 2024 · 9 comments
Closed

SIGBUS JVM error seen with bazel version 6.3.0 #23146

ryanmacdonald opened this issue Jul 30, 2024 · 9 comments
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged

Comments

@ryanmacdonald
Copy link

ryanmacdonald commented Jul 30, 2024

Description of the bug:

I'm seeing the following SIGBUS error signature while running a large Scala build with Bazel v6.3.0:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f90f3c68a39, pid=2, tid=3
#
# JRE version:  (11.0.15+10) (build )
# Java VM: OpenJDK 64-Bit Server VM (11.0.15+10-LTS, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc21a39]  PerfMemory::alloc(unsigned long)+0x59
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /usr/local/blah/hs_err_pid2.log
#
#

I see in a few different previously filed GitHub issues (e.g., here and here) have resolved this error by adding the --sandbox_tmpfs_path=/tmp flag, however when I do this I see:

ERROR:
1722306775.109326470: src/main/tools/linux-sandbox.cc:152: calling pipe(2)...
1722306775.109397060: src/main/tools/linux-sandbox.cc:171: calling clone(2)...
1722306775.118793770: src/main/tools/linux-sandbox.cc:180: linux-sandbox-pid1 has PID 498031
1722306775.118867250: src/main/tools/linux-sandbox-pid1.cc:681: Pid1Main started
1722306775.119031019: src/main/tools/linux-sandbox.cc:197: done manipulating pipes
1722306775.156080366: src/main/tools/linux-sandbox-pid1.cc:275: tmpfs: /tmp
1722306775.163422045: src/main/tools/linux-sandbox-pid1.cc:285: working dir: /usr/local/home/ryanmacdonald/.cache/bazel/_bazel_ryanmacdonald/537546fbafb6167a7c1db
d6d108126ed/sandbox/linux-sandbox/20/execroot/
1722306775.282923973: src/main/tools/linux-sandbox-pid1.cc:320: writable: /usr/local/home/ryanmacdonald/.cache/bazel/_bazel_ryanmacdonald/537546fbafb6167a7c1dbd6d
108126ed/sandbox/linux-sandbox/20/execroot/darwinn_tpu
1722306775.282977563: src/main/tools/linux-sandbox-pid1.cc:320: writable: /tmp/cloud/batch/004684105
src/main/tools/linux-sandbox-pid1.cc:329: "mount(/tmp/cloud/batch/004684105, /tmp/cloud/batch/004684105, nullptr, MS_BIND | MS_REC, nullptr)": No such file

The error occurs sporadically, about 30-50% of the time in trials I've done

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

N/A

Which operating system are you running Bazel on?

Red Hat EL 8.10

What is the output of bazel info release?

release 6.3.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

error: No such remote 'origin'
2c29c0091687076a5145aa71bce95422f0de70f3

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@github-actions github-actions bot added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Jul 30, 2024
@ryanmacdonald
Copy link
Author

ryanmacdonald commented Jul 30, 2024

Worth noting that I see a similar error when I attempt to use Bazel v7.2.1 without the ----sandbox_tmpfs_path=/tmp flag:

ERROR: /workspace/us/cbf/user/ryanmacdonald/BUILD:9:19: Executing genrule //pkg-preprocess failed: (Exit 1): linux-sandbox failed: error executing Genrule command 
  (cd /usr/local/home/ryanmacdonald/.cache/bazel/_bazel_ryanmacdonald/997005025ac9f0daba86b61b2c3d2ad0/sandbox/linux-sandbox/40/execroot/_main && \
src/main/tools/linux-sandbox-pid1.cc:320: "mount(/tmp/cloud/batch/004741035, /tmp/cloud/batch/004741035, nullptr, MS_BIND | MS_REC, nullptr)": No such file or directory

@fmeum
Copy link
Collaborator

fmeum commented Jul 30, 2024

Could you share the Bazel command you are running with all its flags as well as the directory in which it runs? What's in /tmp/cloud?

@ryanmacdonald
Copy link
Author

Full bazel command line:

bazel build --define proj=foo --compilation_mode=opt --remote_cache=<remote cache url> --extra_toolchains=@local_jdk//:all --sandbox_tmpfs_path=/tmp //path/to/target

/tmp/cloud/ has a batch subdirectory and then a bunch of subdirs under that with 9 digit names:

/tmp/cloud/batch> ls
001301595  003002190  004112935  005060954  005683908  013316919  014160521  014516820  014571466  014625115
001693463  003240165  004841924  005061070  007954108  013504958  014170704  014527071  014574927  014632679
001695585  003240703  004932092  005543441  008888472  013630435  014317761  014541580  014582087  014641326
002331398  003424045  004961724  005543559  009281013  014079770  014324237  014546661  014585025  014647758
002353218  003545566  005044664  005623208  012103415  014096863  014375605  014553554  014600433
003002110  003860811  005045323  005683519  012720324  014102594  014401783  014571380  014609712

All these dirs are empty

I tried adding --spawn_strategy=processwrapper-sandbox to these options and that seems to remove the linux-sandbox-pid1.cc error, but now the build errors out with a Java stack overflow about 15% of the time

@fmeum
Copy link
Collaborator

fmeum commented Jul 31, 2024

Do you know where these /tmp/cloud folders come from? Bazel may mount them if you were to run the build under /tmp, but it looks like you aren't. If these directories are updated concurrently, that could explain the failure.

@ryanmacdonald
Copy link
Author

Ah, I forgot to include that I'm setting our --output_base=/tmp/bazel_build_<unique_id> as well for the above errors

These /tmp/cloud dirs are apparently created by our runner where each /tmp/cloud/batch/<#> dir is individually given to a job such that within the job $TMPDIR evaluates to some unique /tmp/cloud/batch/<#> dir

Should I be doing something like --output_base="$TMPDIR" and --sandbox_tmpfs_path="$TMPDIR"?

@ryanmacdonald
Copy link
Author

Hey @fmeum, any other thoughts on this?

@fmeum
Copy link
Collaborator

fmeum commented Aug 6, 2024

You may be running into #23217, albeit with a different error message. Does your build succeed without TMPDIR set?

@meteorcloudy
Copy link
Member

@oquenchil Can you take a look?

@oquenchil
Copy link
Contributor

This was fixed by upgrading to 7.0.1 (which contains Fabian's fix #20749), then using hermetic tmp and passing --sandbox_add_mount_pair for a special required directory that was being hidden from the system's /tmp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged
Projects
None yet
Development

No branches or pull requests

7 participants