-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CANCELLED: Cancelled by user exception with bazel 6.0.0 and a remote cache via grpc #17298
Comments
This exception is not correlated with poor remote-cache performance as per buchgr/bazel-remote#629 (comment) |
We've also been seeing this. I usually see this in our CI builds, very rarely when building on my dev machine. I was finally able to reproduce it on my dev machine by building, then
# Log options read from rc files, to aid debugging.
common:jenkins --announce_rc
# Collect profile information
build:jenkins --profile=build_profile.json.gz
build:jenkins --experimental_announce_profile_path
# Build with optimizations.
build:jenkins -c opt
# Include time and version stamping information in build artifacts.
# Note that this makes the build less reproducible, so should be avoided
# for unofficial builds as it reduces the effectiveness of caching.
build:jenkins --stamp
# Show complete command lines, environment variables, and outputs from failed
# build actions.
build:jenkins --verbose_failures
# Build everything, including non-test targets, for `bazel test //...`
test:jenkins --nobuild_tests_only
# Include detailed information about test failures.
test:jenkins --test_summary=detailed
# Log which tests have inappropriate timeouts set, rather than a single
# warning that such tests exists.
test:jenkins --test_verbose_timeout_warnings
# Drop the in-memory build state cache early to reduce the build system's memory
# consumption. See https://docs.bazel.build/memory-saving-mode.html.
# This means subsequent builds need to re-run all of the build rule logic to
# rebuild the action graph from scratch, but this doesn't matter in CI because
# we shut down the server anyway.
build:jenkins --discard_analysis_cache
build:jenkins --notrack_incremental_state
build:jenkins --nokeep_state_after_build
# Don't spam the console, since probably no one is looking at it in real time.
build:jenkins --show_progress_rate_limit=2
# Download as little as possible beyond the tarballs.
build:jenkins --remote_download_outputs=minimal
build:jenkins --experimental_remote_download_regex=.*((-[^/]*.tar.zst)|/test.xml)$
# Avoid OOM due to too much parallelism on jenkins build machines.
build:jenkins --local_ram_resources=HOST_RAM*.25
# Have bazel track actual memory available on the system, rather than assuming
# every action is using what it thinks it will use.
build:jenkins --experimental_local_memory_estimate |
@adam-azarchs: which cache server are you using? |
One managed by EngFlow (who are already aware of this issue, but thought I'd share the logs here as well in case they help) |
OK, since this reproduces with two different cache servers (EngFlow's and bazel-remote), that suggests it's probably a bazel issue. |
I managed to get it to reproduce that error again, but without the flags unique to our CI builds and without any other builds coming earlier in the logs, so those flags were a read herring as were the
This was still a build with no files changed
but I'd just switched back to bazel 6.0.0 after doing a build with 5.2.3 so it of course restarted the server. I also noticed this later in the log:
Though perhaps that's redundant with the exception trace. |
…ent. The gRPC remote execution client frequently "converts" gRPC calls into `ListenableFuture`s by setting a `SettableFuture` in the `onCompleted` or `onError` gRPC stub callbacks. If the future has direct executor callbacks, those callbacks will execute with the gRPC Context of the freshly completed call. That is problematic if the `Context` was canceled (canceling the call `Context` is good hygiene after completing a gRPC call), and the future callback goes to make further gRPC calls. Therefore, this change removes all usage of gRPC `Context` cancellation. It would be nice if there was instead some way to avoid leaking `Context`s between calls instead of having totally forswear `Context` cancellation. However, I can't see a good way to enforce proper isolation. Fixes bazelbuild#17298.
…ent. (#17438) The gRPC remote execution client frequently "converts" gRPC calls into `ListenableFuture`s by setting a `SettableFuture` in the `onCompleted` or `onError` gRPC stub callbacks. If the future has direct executor callbacks, those callbacks will execute with the gRPC Context of the freshly completed call. That is problematic if the `Context` was canceled (canceling the call `Context` is good hygiene after completing a gRPC call), and the future callback goes to make further gRPC calls. Therefore, this change removes all usage of gRPC `Context` cancellation. It would be nice if there was instead some way to avoid leaking `Context`s between calls instead of having totally forswear `Context` cancellation. However, I can't see a good way to enforce proper isolation. Fixes #17298. Closes #17426. PiperOrigin-RevId: 507730469 Change-Id: Iea74acad4592952700e41d34672f6478de509d5e Co-authored-by: Benjamin Peterson <[email protected]>
…ent. The gRPC remote execution client frequently "converts" gRPC calls into `ListenableFuture`s by setting a `SettableFuture` in the `onCompleted` or `onError` gRPC stub callbacks. If the future has direct executor callbacks, those callbacks will execute with the gRPC Context of the freshly completed call. That is problematic if the `Context` was canceled (canceling the call `Context` is good hygiene after completing a gRPC call), and the future callback goes to make further gRPC calls. Therefore, this change removes all usage of gRPC `Context` cancellation. It would be nice if there was instead some way to avoid leaking `Context`s between calls instead of having totally forswear `Context` cancellation. However, I can't see a good way to enforce proper isolation. Fixes #17298. Closes #17426. PiperOrigin-RevId: 507730469 Change-Id: Iea74acad4592952700e41d34672f6478de509d5e
Description of the bug:
Have you found anything relevant by searching the web?
Nothing else found.
Any other information, logs, or outputs that you want to share?
Also reported at buchgr/bazel-remote#629: may be related to slow remote cache performance.
The text was updated successfully, but these errors were encountered: