-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC resource exhaustion errors during BEP upload #12050
Comments
This actually looks like a buildbuddy bug a few folks have asked me about recently, not a Bazel issue. Your BES upload is greater than the gRPC default of 4mb, hence the message you see here (you're sending 10.6mb). If you're using a fork of buildbuddy, which I assume you are, then you can go and fix this yourself like so:
If you'd like to take a look at alternative BES implementations, feel free to give us a shout :) |
Hey @zachgrayio, thanks for your quick response. Is this something that gets configured on the server or on the client? I found this grpc-gateway comment and this Stack Overflow response suggesting it was a client-side config. Maybe it needs to be configured in both places? I can't find references to the grpc MaxXMsgSize configs on the Bazel project though. Also, someone ran into this issue with Buildbarn as well (granted, they could also have the bug if it's a server-side thing). |
In this case it's a missing server option I think (grpc.MaxRecvMsgSize()). |
Here's some background on this issue: gRPC has a built-in maximum message size controlled by the receiver (in this case the buildbuddy service). The default value in Java is 4 MiB. Bazel does not automatically limit itself to the server-defined maximum message size. Doing so is difficult, as some of the proto messages in the Build Event Protocol / Service are inherently monolithic, and cannot be automatically broken into separate messages. As such, we were targeting a maximum size of about 50 MiB. Depending on which event is too large, you may be able to reduce the event size by setting Unfortunately, the error message above contains the error code, but not which message caused it. |
Or set |
Hey @ulfjack thanks for the tips.
Do you know where this is specified? Mostly just curious.
Interesting. It might make more sense to increase it on the receiver if the client's already allowing 50 MiB. We might look into that approach.
I'm curious, how come this might also help? The docs for |
I am not aware of any place where that is publicly documented. This is my personal recollection from working on the BEP. I think it makes sense for BES implementations to provide a knob to allow larger than default packets. However, there are also reasons for preferring smaller packets (e.g., preventing service outages due to memory exhaustion), and there are knobs in Bazel to adjust that as well. The original BEP design had a repeated field representing a flat list of all 'important' outputs of a configured target. However, this turned out to be problematic because some configured targets have a huge list of such outputs. We then migrated to a nested-set style listing of important outputs. However, this is technically an incompatible change, and so we added the |
Thanks for reporting @SrodriguezO and for the background @ulfjack. We've bumped the default max grpc limit in buildbuddy-io/buildbuddy@7cd6929 which should go live in the next release (targeting this afternoon). Configurability incoming as well. Feel free to upstream your changes in the future @zachgrayio! |
Closing this as it doesn't seem to be a Bazel bug after all. Thank you @siggisim for the quick turnaround! |
This flag reduces the largest proto size, which helps avoid sharp edges with remote execution systems (e.g., bazelbuild#12050). RELNOTES[INC]: --legacy_important_outputs now has a default of false.
This flag reduces the largest BES proto size, which helps avoid sharp edges with remote execution systems (e.g., bazelbuild#12050). RELNOTES[INC]: --legacy_important_outputs now has a default of false.
*** Reason for rollback *** Breaking ResultStore customers. RELNOTES[INC]: --legacy_important_outputs default reverted to true. *** Original change description *** Set --legacy_important_outputs to false by default. This flag reduces the largest proto size, which helps avoid sharp edges with remote execution systems (e.g., #12050). Closes #14353. PiperOrigin-RevId: 450067034
Description of the problem:
gRPC resource exhaustion errors during BEP upload when a very large build completes very quickly due to cache hits.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
We often run into it when a very large build completes very quickly due to cache hits (and
--bes_backend
is specified).What operating system are you running Bazel on?
Ubuntu 18.04
What's the output of
bazel info release
?release 3.3.0
Have you found anything relevant by searching the web?
Similar issue encountered:
Any other information, logs, or outputs that you want to share?
Error logs:
Bazel Exit Code:
Other Info:
The text was updated successfully, but these errors were encountered: