Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java RUNFILES_MANIFEST logic is lacking #20676

Open
DavidZbarsky-at opened this issue Dec 25, 2023 · 5 comments
Open

Java RUNFILES_MANIFEST logic is lacking #20676

DavidZbarsky-at opened this issue Dec 25, 2023 · 5 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: bug

Comments

@DavidZbarsky-at
Copy link

Description of the bug:

This logic seem incorrect in recent Bazel nightlies. I would expect it to look more like so.

This breaks Java builds with --noenable_runfiles, especially anything using rules_jvm_external since it has an internal java binary (AddJarManifestEntry) that it runs on 3p deps, which fails at runtime with the following:

ERROR: /private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/external/maven/BUILD:4727:11: Stamping the manifest of @@maven//:org_apache_kerby_kerb_server failed: (Exit 127): sandbox-exec failed: error executing StampJarManifest command 
  (cd /private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/sandbox/darwin-sandbox/6890/execroot/_main && \
  exec env - \
    TMPDIR=/var/folders/ll/86ylz6g91fnf6x2c00jywg3m0000gq/T/ \
  /usr/bin/sandbox-exec -f /private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/sandbox/darwin-sandbox/6890/sandbox.sb /var/tmp/_bazel_david.zbarsky/install/d6daf49e33736f131b503f89e218c8fe/process-wrapper '--timeout=0' '--kill_delay=15' '--stats=/private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/sandbox/darwin-sandbox/6890/stats.out' bazel-out/darwin_arm64-opt-exec-ST-13d3ddad9198/bin/external/rules_jvm_external~5.3/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry --source bazel-out/darwin_arm64-fastbuild/bin/external/maven/org/apache/kerby/kerb-server/1.0.1/kerb-server-1.0.1.jar --output bazel-out/darwin_arm64-fastbuild/bin/external/maven/org/apache/kerby/kerb-server/1.0.1/processed_kerb-server-1.0.1.jar --manifest-entry Target-Label:@@maven//:org_apache_kerby_kerb_server)
grep: /private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/sandbox/darwin-sandbox/6890/execroot/_main/bazel-out/darwin_arm64-opt-exec-ST-13d3ddad9198/bin/external/rules_jvm_external~5.3/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry.runfiles/MANIFEST: No such file or directory
grep: /private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/sandbox/darwin-sandbox/6890/execroot/_main/bazel-out/darwin_arm64-opt-exec-ST-13d3ddad9198/bin/external/rules_jvm_external~5.3/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry.runfiles/MANIFEST: No such file or directory
grep: /private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/sandbox/darwin-sandbox/6890/execroot/_main/bazel-out/darwin_arm64-opt-exec-ST-13d3ddad9198/bin/external/rules_jvm_external~5.3/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry.runfiles/MANIFEST: No such file or directory
grep: /private/var/tmp/_bazel_david.zbarsky/5cb4fae43cf358e621837c0f9b3546d5/sandbox/darwin-sandbox/6890/execroot/_main/bazel-out/darwin_arm64-opt-exec-ST-13d3ddad9198/bin/external/rules_jvm_external~5.3/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry.runfiles/MANIFEST: No such file or directory
bazel-out/darwin_arm64-opt-exec-ST-13d3ddad9198/bin/external/rules_jvm_external~5.3/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry: line 399: exec: ::: not found

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

OSX

What is the output of bazel info release?

release 7.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Yes, presumably from when the manifest file location changed name.

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@sgowroji
Copy link
Member

Hi @DavidZbarsky-at,  Can you provide complete steps to reproduce at our end. Thanks!

@DavidZbarsky-at
Copy link
Author

Here's the setup:

(env-18.16.0) david.zbarsky@JF6FQ9PXP9 example % cat .bazelversion 
last_green
(env-18.16.0) david.zbarsky@JF6FQ9PXP9 example % cat WORKSPACE
(env-18.16.0) david.zbarsky@JF6FQ9PXP9 example % cat .bazelrc 
common --java_runtime_version=remotejdk_11
(env-18.16.0) david.zbarsky@JF6FQ9PXP9 example % cat BUILD
genrule(
    name = "gen_bin",
    cmd = "echo 'public class Bin { public static void main(String[] args) {} }' > $@",
    outs = ['Bin.java']
)

java_binary(
    name = "Bin",
    main_class = "Bin",
    srcs = ["Bin.java"],
)

genrule(
    name = "gen",
    tools = [":Bin"],
    cmd = "$(location :Bin) && touch $@",
    outs = ["fake"],
)

You can repro the failure with bazel build --noenable_runfiles :fake.

Note that bazel run --noenable_runfiles :Bin and bazel build :fake both work, i.e. the runfiles only fail to be located when the java binary is used as a tool in another action and MANIFEST_ONLY mode.

@fmeum
Copy link
Collaborator

fmeum commented Dec 27, 2023

The reproducer passes for me with --spawn_strategy=local. I think that the failure with sandboxed execution occurs because the sandbox always relies on the runfiles directory, even with --noenable_runfiles.

Not trying to make this more complicated then it needs to be, but I think that the real root cause of this is a design issue rather than a bug. RUNFILES_MANIFEST_ONLY is set at analysis time, but whether a runfiles directory is available depends on the chosen execution strategy: sandboxed as well as remote execution always use the runfiles directory, local execution may not.

The proper fix would be to get rid of RUNFILES_MANIFEST_ONLY everywhere and replace it with a mechanism that detects the availability of the runfiles directory at runtime. Runfiles libraries such as @bazel_tools//tools/cpp/runfiles "solve" this by always using the manifest if it exists, but parsing the manifest if the runfiles directory exists is wasteful.

For example, we may be able to check whether the runfiles directory is populated by looking for any directory under it - unfortunately, pre-Bzlmod the workspace name isn't fixed, so we can't just look for its directory.

@lberki @tjgq Do you see any potential problems with such an approach? Also @laszlocsomor in case you still like to think about runfiles questions :-)

fmeum added a commit to fmeum/bazel that referenced this issue Dec 29, 2023
Fixes two separate but related issues:
* On all OSes, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from on to off did not result in runfiles
  being cleared due to a missing call to
  `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to
  `SymlinkTreeStrategy` in f84329e.
* On Windows only, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from off to on did not result in runfiles
  being created since the logic in `RunfilesTreeUpdater` would see a
  copy of the manifest instead of a symlink.

Work towards bazelbuild#20676
Fixes bazelbuild#19333
fmeum added a commit to fmeum/bazel that referenced this issue Dec 29, 2023
Fixes two separate but related issues:
* On all OSes, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from on to off did not result in runfiles
  being cleared due to a missing call to
  `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to
  `SymlinkTreeStrategy` in f84329e.
* On Windows only, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from off to on did not result in runfiles
  being created since the logic in `RunfilesTreeUpdater` would see a
  copy of the manifest instead of a symlink.

Work towards bazelbuild#20676
Fixes bazelbuild#19333
fmeum added a commit to fmeum/bazel that referenced this issue Dec 29, 2023
Fixes two separate but related issues:
* On all OSes, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from on to off did not result in runfiles
  being cleared due to a missing call to
  `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to
  `SymlinkTreeStrategy` in f84329e.
* On Windows only, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from off to on did not result in runfiles
  being created since the logic in `RunfilesTreeUpdater` would see a
  copy of the manifest instead of a symlink.

Work towards bazelbuild#20676
Fixes bazelbuild#19333
@lberki
Copy link
Contributor

lberki commented Jan 2, 2024

Yeah, checking whether the runfiles tree exists at run-time instead of using an environment variable that is not even correct in some cases sounds like a better approach.

@hvadehra WDYT?

(the runfiles logic in Bazel is kinda crufty; I'm trying to fix it bit by bit whenever I have the time, but there is only so much I can do in my discretionary time)

@fmeum
Copy link
Collaborator

fmeum commented Jan 2, 2024

I am fixing some incrementality issues with the runfiles directory in #20695. Along the way, I also found that I need some way to distinguish an "empty" runfiles directory (with --noenable_runfiles) from a populated one - having an _runfiles_manifest_only file at the root, next to _repo_mapping, would do that and also be a clear sign for runfiles libraries to use the manifest instead.

copybara-service bot pushed a commit that referenced this issue Jan 3, 2024
Fixes three separate but related issues with `--nobuild_runfile_links`:
* On all OSes, flipping `--enable_runfiles` from on to off did not result in runfiles being cleared due to a missing call to `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to `SymlinkTreeStrategy` in f84329e.
* On Windows only, flipping `--enable_runfiles` from off to on did not result in runfiles being created since the logic in `RunfilesTreeUpdater` would see a copy of the manifest instead of a symlink. This is fixed by additionally checking whether the runfiles directory contains the first runfile on Windows. If runfiles were disabled before, it won't, otherwise it will.
* On all OSes, `--noenable_runfiles` was ignored by `bazel run`, with runfiles always being created. This is fixed by using `RunfilesTreeUpdater` instead of custom and incorrect logic.

With the fixed behavior, the runfiles tree for tests is now cleared right before the test spawn is executed, which makes the test action unable to create its working directory, the subdirectory of the runfiles directory corresponding to the workspace name. To fix this and also get rid of the inconsistency of having another action write into the runfiles tree, this logic is moved into the `SymlinkTreeHelper` and thus applied to all runfiles trees.

Work towards #20676
Fixes #19333

Closes #20695.

PiperOrigin-RevId: 595369235
Change-Id: I32efc0e6a1d29291470ce5eafcc69bbd9ab276b9
@hvadehra hvadehra added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: bug
Projects
None yet
Development

No branches or pull requests

7 participants