Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialized MD5 calls slow down ∅ builds #1210

Closed
schroederc opened this issue Apr 28, 2016 · 13 comments
Closed

Serialized MD5 calls slow down ∅ builds #1210

schroederc opened this issue Apr 28, 2016 · 13 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) type: bug
Milestone

Comments

@schroederc
Copy link
Contributor

#1042, #923, and #835 are running into the same problem from different perspectives. I'm adding this issue for the core problem in Bazel: it serializes hashing artifacts larger than 4K.

Even setting up Bazel for success with no changes at all, Kythe's build is rather slow because of this (we have quite a few large >4K artifacts):

$ bazel build //...
INFO: Found 692 targets...
INFO: Elapsed time: 5.564s, Critical Path: 4.66s

The suggestion to use --noexperimental_check_output_file (from #923) does not improve this situation:

bazel build --noexperimental_check_output_file //...
INFO: Found 692 targets...
INFO: Elapsed time: 5.206s, Critical Path: 4.64s

However, if I rebuild Bazel to remove the synchronization on the MD5_LOCK in DigestUtils.java, the null build is much nicer:

$ bazel build //...
INFO: Found 692 targets...
INFO: Elapsed time: 1.289s, Critical Path: 0.58s

The reasoning for this serialization in DigestUtils.java does not seem to universally apply (as stated in #835 and made apparent with the above experiment):

// We'll have to read file content in order to calculate the digest. In that case
// it would be beneficial to serialize those calculations since there is a high
// probability that MD5 will be requested for multiple output files simultaneously.
// Exception is made for small (<=4K) files since they will not likely to introduce
// significant delays (at worst they will result in two extra disk seeks by
// interrupting other reads).

As for a solution, here's my two cents. If there is evidence that the serialization does benefit some builds, then I would like to see a flag to adjust this on a per machine case (with .bazelrc) or maybe have Bazel figure it out on its own if it can. If there isn't any evidence that the serialization helps, then perhaps it's time to remove it altogether.

@meteorcloudy meteorcloudy added type: bug P2 We'll consider working on this in future. (Assignee optional) labels Apr 29, 2016
@damienmg
Copy link
Contributor

Try using --watchfs

@damienmg
Copy link
Contributor

(that's a startup flag so it should be used before build and works only on server mode)

@schroederc
Copy link
Contributor Author

With --watchfs:

$ bazel --watchfs build //...
INFO: Found 695 targets...
INFO: Elapsed time: 5.241s, Critical Path: 4.65s

Even if that did help, wouldn't it just bypass the problem for null builds? Wouldn't the serialization still affect any other build?

@damienmg
Copy link
Contributor

Wait 5s for a null build with watchfs?

For the MD5 lock, you might be interested in cl/113053143 (internal reference), there is a long discussion about that, I am unsure what was the result.

@schroederc
Copy link
Contributor Author

As an update, I've found that the middle-ground actually works better. Completely eliminating the lock does slow builds with a large number of changes (also ends up with a lot of "slow read" log messages). However, turning the lock into a semaphore with 20 (just some number) permits worked pretty well for both the null build and a clean build. Maybe a flag could be added to tune the number with a default larger than 1.

@AustinSchuh
Copy link
Contributor

Consider seeding the semaphore with the min of the number of disk IO credits available or cores * 1.5.

bazel-io pushed a commit that referenced this issue Jun 6, 2016
…ultiple threads when calculating the MD5 hash even for large files. Might improve performance when using an SSD.

Fixes #835 and #1210.

--
MOS_MIGRATED_REVID=124128233
@philwo
Copy link
Member

philwo commented Jun 7, 2016

Could you please check if this flag helps and report back? :) The MD5 lock is bypassed with the flag set - I agree that using a Semaphore would be even better. I can have a look at this if you think this is going into the right direction.

@schroederc
Copy link
Contributor Author

Thanks for the progress! 👍

Full build w/o flag: INFO: Elapsed time: 322.923s, Critical Path: 239.11s
Null build w/o flag: INFO: Elapsed time: 2.013s, Critical Path: 1.33s

Full build w/ flag: INFO: Elapsed time: 327.246s, Critical Path: 265.28s
Null build w/ flag: INFO: Elapsed time: 0.885s, Critical Path: 0.20s

Looks like what I saw with my first change. Builds with lots of changes slow down slightly and builds with few changes perform better.

@werkt
Copy link
Contributor

werkt commented Nov 7, 2016

Is there a plan for making this less experimental? I doubt we're going to see the extended filesystem md5 attribute out here in the wild anytime soon...

@philwo
Copy link
Member

philwo commented Nov 11, 2016

@werkt We could enable this by default if we had a reliable way of detecting whether the workspace and output_base are on an SSD... any idea?

@bsilver8192
Copy link
Contributor

For Linux, once you figure out what filesystem they're on, /sys/class/block/${DEVICE}/queue/rotational gives you the information. In my (limited) experimentation just now, device mapper seems to propagate it correctly too. Does that work?

@dapirian
Copy link

dapirian commented Aug 1, 2018

Why was this closed? We're also noticing this issue, as of 0.16

@philwo
Copy link
Member

philwo commented Aug 1, 2018

@dapirian Could you please try with the --experimental_multi_threaded_digest flag? Please let me know if it helps. The only thing left to do is either make this the default (and accept that it will degrade performance a lot for people still using HDDs) or add some auto-detection to only enable it when the output_base is on an SSD (but we don't have a cross-platform and reliable solution for this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) type: bug
Projects
None yet
Development

No branches or pull requests

8 participants