Serialized MD5 calls slow down ∅ builds #1210

schroederc · 2016-04-28T23:39:42Z

#1042, #923, and #835 are running into the same problem from different perspectives. I'm adding this issue for the core problem in Bazel: it serializes hashing artifacts larger than 4K.

Even setting up Bazel for success with no changes at all, Kythe's build is rather slow because of this (we have quite a few large >4K artifacts):

$ bazel build //...
INFO: Found 692 targets...
INFO: Elapsed time: 5.564s, Critical Path: 4.66s

The suggestion to use --noexperimental_check_output_file (from #923) does not improve this situation:

bazel build --noexperimental_check_output_file //...
INFO: Found 692 targets...
INFO: Elapsed time: 5.206s, Critical Path: 4.64s

However, if I rebuild Bazel to remove the synchronization on the MD5_LOCK in DigestUtils.java, the null build is much nicer:

$ bazel build //...
INFO: Found 692 targets...
INFO: Elapsed time: 1.289s, Critical Path: 0.58s

The reasoning for this serialization in DigestUtils.java does not seem to universally apply (as stated in #835 and made apparent with the above experiment):

// We'll have to read file content in order to calculate the digest. In that case
// it would be beneficial to serialize those calculations since there is a high
// probability that MD5 will be requested for multiple output files simultaneously.
// Exception is made for small (<=4K) files since they will not likely to introduce
// significant delays (at worst they will result in two extra disk seeks by
// interrupting other reads).

As for a solution, here's my two cents. If there is evidence that the serialization does benefit some builds, then I would like to see a flag to adjust this on a per machine case (with .bazelrc) or maybe have Bazel figure it out on its own if it can. If there isn't any evidence that the serialization helps, then perhaps it's time to remove it altogether.

The text was updated successfully, but these errors were encountered:

damienmg · 2016-04-29T09:42:08Z

Try using --watchfs

damienmg · 2016-04-29T09:42:37Z

(that's a startup flag so it should be used before build and works only on server mode)

schroederc · 2016-04-29T14:53:19Z

With --watchfs:

$ bazel --watchfs build //...
INFO: Found 695 targets...
INFO: Elapsed time: 5.241s, Critical Path: 4.65s

Even if that did help, wouldn't it just bypass the problem for null builds? Wouldn't the serialization still affect any other build?

damienmg · 2016-04-29T15:05:38Z

Wait 5s for a null build with watchfs?

For the MD5 lock, you might be interested in cl/113053143 (internal reference), there is a long discussion about that, I am unsure what was the result.

schroederc · 2016-05-13T20:20:54Z

As an update, I've found that the middle-ground actually works better. Completely eliminating the lock does slow builds with a large number of changes (also ends up with a lot of "slow read" log messages). However, turning the lock into a semaphore with 20 (just some number) permits worked pretty well for both the null build and a clean build. Maybe a flag could be added to tune the number with a default larger than 1.

AustinSchuh · 2016-05-13T22:23:37Z

Consider seeding the semaphore with the min of the number of disk IO credits available or cores * 1.5.

…ultiple threads when calculating the MD5 hash even for large files. Might improve performance when using an SSD. Fixes #835 and #1210. -- MOS_MIGRATED_REVID=124128233

philwo · 2016-06-07T08:25:12Z

Could you please check if this flag helps and report back? :) The MD5 lock is bypassed with the flag set - I agree that using a Semaphore would be even better. I can have a look at this if you think this is going into the right direction.

schroederc · 2016-06-08T19:18:02Z

Thanks for the progress! 👍

Full build w/o flag: INFO: Elapsed time: 322.923s, Critical Path: 239.11s
Null build w/o flag: INFO: Elapsed time: 2.013s, Critical Path: 1.33s

Full build w/ flag: INFO: Elapsed time: 327.246s, Critical Path: 265.28s
Null build w/ flag: INFO: Elapsed time: 0.885s, Critical Path: 0.20s

Looks like what I saw with my first change. Builds with lots of changes slow down slightly and builds with few changes perform better.

werkt · 2016-11-07T19:56:57Z

Is there a plan for making this less experimental? I doubt we're going to see the extended filesystem md5 attribute out here in the wild anytime soon...

philwo · 2016-11-11T10:55:51Z

@werkt We could enable this by default if we had a reliable way of detecting whether the workspace and output_base are on an SSD... any idea?

bsilver8192 · 2016-11-11T18:16:44Z

For Linux, once you figure out what filesystem they're on, /sys/class/block/${DEVICE}/queue/rotational gives you the information. In my (limited) experimentation just now, device mapper seems to propagate it correctly too. Does that work?

dapirian · 2018-08-01T18:33:04Z

Why was this closed? We're also noticing this issue, as of 0.16

philwo · 2018-08-01T18:43:07Z

@dapirian Could you please try with the --experimental_multi_threaded_digest flag? Please let me know if it helps. The only thing left to do is either make this the default (and accept that it will degrade performance a lot for people still using HDDs) or add some auto-detection to only enable it when the output_base is on an SSD (but we don't have a cross-platform and reliable solution for this).

meteorcloudy added type: bug P2 We'll consider working on this in future. (Assignee optional) labels Apr 29, 2016

damienmg added the category: performance / correctness > misc performance label Jun 14, 2016

philwo self-assigned this Dec 9, 2016

philwo modified the milestones: 0.5, 0.6 Dec 9, 2016

m-chaturvedi mentioned this issue May 25, 2017

"Slow read"s when building with Bazel RobotLocomotion/drake#6129

Closed

philwo closed this as completed Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialized MD5 calls slow down ∅ builds #1210

Serialized MD5 calls slow down ∅ builds #1210

schroederc commented Apr 28, 2016

damienmg commented Apr 29, 2016

damienmg commented Apr 29, 2016

schroederc commented Apr 29, 2016

damienmg commented Apr 29, 2016

schroederc commented May 13, 2016

AustinSchuh commented May 13, 2016

philwo commented Jun 7, 2016

schroederc commented Jun 8, 2016

werkt commented Nov 7, 2016

philwo commented Nov 11, 2016

bsilver8192 commented Nov 11, 2016

dapirian commented Aug 1, 2018

philwo commented Aug 1, 2018

Serialized MD5 calls slow down ∅ builds #1210

Serialized MD5 calls slow down ∅ builds #1210

Comments

schroederc commented Apr 28, 2016

damienmg commented Apr 29, 2016

damienmg commented Apr 29, 2016

schroederc commented Apr 29, 2016

damienmg commented Apr 29, 2016

schroederc commented May 13, 2016

AustinSchuh commented May 13, 2016

philwo commented Jun 7, 2016

schroederc commented Jun 8, 2016

werkt commented Nov 7, 2016

philwo commented Nov 11, 2016

bsilver8192 commented Nov 11, 2016

dapirian commented Aug 1, 2018

philwo commented Aug 1, 2018