-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote execution build time regression with Bazel 5.1+ when action has many inputs #15872
Comments
I will take a look at this and providing a repro with |
@coeuvre it's not as obvious, but using https://github.com/bazelbuild/rules_nodejs/tree/stable/examples/create-react-app does appear to reproduce this pretty reliably. (Run I added a tiny bit of logging, and it looks like this action has ~30k inputs. This is much smaller than the action that caused me to report this, but it seems pretty clearly affected too. With Bazel 5.2, the "upload missing inputs" step in the profile consistently takes > 1.5 sec on my 16 core/64gb dev machine. With the commit reverted, I haven't seen it take longer than 800ms. (Rather un-scientifically, I was just modifying the text in |
I can't reproduce this using https://github.com/bazelbuild/rules_nodejs/tree/stable/examples/create-react-app. There are ~188K inputs for the final With Bazel 5.2.0, the "upload missing inputs" step on my machine takes ~2.3 sec. But it's the same as Bazel 4.1.0 and Bazel 5.0.0. |
Hmm, that's definitely very strange. I added a simple log line to print
I made a more specific repro: see https://github.com/clint-stripe/action_many_inputs.
Using 100k, it's very obvious running with 5.2, or 5.2 with the PR reverted: With the revert: In all cases, these are after Bazel has started up and the action has been run once, so we are comparing between runs where the inputs have already been uploaded and nothing about Bazel startup is included in the profile. There's not a lot of detail in the profiles, but I've attached them in case it's helpful. |
Thanks for your repro! I think now I know where is the regression. To "upload missing inputs", we need 3 steps: We already uploaded missing inputs during previous builds, so c) should be identical for both cases. The mentioned change didn't touch I added more traces in the profile and tested with your provided repro. For the HEAD: a) took ~3.4s and b) took ~3.1s. with the revert: a) took ~83ms and b) took ~3s. The reason why I can't reproduce this before is probably b) took more time than a) (I am at home network) so I couldn't notice. BTW, which remote server did you test with? I will work on the fix. |
Ah, awesome! Thank you so much for the quick reply -- really appreciate you taking a look here. We have an internal remote execution cluster, described a bit in a recent blog post. My tests were on a cloud VM, hitting the cluster in the same region, so it makes sense we have different network situations here too :) |
Also add more tests. Fixes bazelbuild#15872.
Also add more tests. Fixes bazelbuild#15872.
Also add more tests. Fixes bazelbuild#15872.
Also add more tests. Fixes bazelbuild#15872.
Also add more tests. Fixes bazelbuild#15872.
Also add more tests. Fixes bazelbuild#15872.
The regression was introduced in 702df84 where we essentially create a subscriber for each digest to subscribe the result of `findMissingBlobs`. This change update the code to not create so many subscribers but maintain the same functionalities. Fixes bazelbuild#15872. Closes bazelbuild#15890. PiperOrigin-RevId: 463826260 Change-Id: Id0b1c7c309fc9653a47c5df95c609b34e6510cde
The regression was introduced in 702df84 where we essentially create a subscriber for each digest to subscribe the result of `findMissingBlobs`. This change update the code to not create so many subscribers but maintain the same functionalities. Fixes #15872. Closes #15890. PiperOrigin-RevId: 463826260 Change-Id: Id0b1c7c309fc9653a47c5df95c609b34e6510cde
Description of the bug:
As of Bazel 5.1 (specifically, the change in #15091 -- cc @coeuvre and @Wyverald), we have seen an action take considerably more time to execute remotely. We previously saw <5 seconds spent in "upload missing inputs", and this PR has increased that to 90+ seconds in CI and several minutes on developer machines.
This action has around 600k inputs, though we see very long execution time even when only ~5 of them change. It appears that this is allocating much much more than previously, though I am not very familiar with rxjava, let alone its internals 😄
With these changes, it's also impossible to interrupt Bazel; this helpfully prints a stack trace of the thread though:
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I don't have a minimal repro readily available: if necessary I suspect we could reproduce this with
rules_nodejs
pretty easily given thatnode_modules
tend to have a lot of input files. Let me know if this would be helpful!Which operating system are you running Bazel on?
Linux
What is the output of
bazel info release
?release 5.2.0
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: