Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use a larger buffer size for
java.util.zip.*Stream
classes
`DeflaterInputStream`, `GZIPInputStream`, `GZIPOutputStream`, and `InflaterInputStream`, all use an internal byte buffer of 512 bytes by default. Whenever the wrapped stream exceeds this size, a full copy to a new buffer will occur, which will increase at increments of the same size. For example, a stream of length 2K will be copied four times. Increasing the size of the buffer we use can result in significant reductions in CPU usage (read: copies). Examples in the repository -------------------------- There are already two places where we increase the default size of these buffers: - `//src/main/java/com/google/devtools/build/lib/bazel/repository/TarGzFunction.java` - `//src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpStream.java` Prior art --------- There is an open enhancement issue in the OpenJDK tracker on this which contains a benchmark for `InflaterOutputStream`: > Increase the default, internal buffer size of the Streams in `java.util.zip` > https://bugs.openjdk.org/browse/JDK-8242864 A similar change was merged in for JDK15+ in 2020: > Improve performance of `InflaterOutputStream.write()` > https://bugs.openjdk.org/browse/JDK-8242848 Providing a simple benchmark ---------------------------- I'm inlining a simple `jmh` benchmark and the results underneath it for one `GzipInputStream` case. The benchmark: ``` @fork(1) @threads(1) @WarmUp(iterations = 2) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class GZIPInputStreamBenchmark { @param({"1024", "3072", "9216"}) long inputLength; @param({"512", "1024", "4096", "8192"}) int bufferSize; private byte[] content; @setup(Level.Iteration) public void setup() throws IOException { var baos = new ByteArrayOutputStream(); // No need to set the buffer size on this as it's a one-time cost for setup and not counted in the result. var gzip = new GZIPOutputStream(baos); var inputBytes = generateRandomByteArrayOfLength(inputLength); gzip.write(inputBytes); gzip.finish(); this.content = baos.toByteArray(); } @benchmark @BenchmarkMode(Mode.AverageTime) public void getGzipInputStream(Blackhole bh) throws IOException { try (var is = new ByteArrayInputStream(this.content); var gzip = new GZIPInputStream(is, bufferSize)) { bh.consume(gzip.readAllBytes()); } } byte[] generateRandomByteArrayOfLength(long length) { var random = new Random(); var intStream = random.ints(0, 5000).limit(length).boxed(); return intStream.collect( ByteArrayOutputStream::new, (baos, i) -> baos.write(i.intValue()), (baos1, baos2) -> baos1.write(baos2.toByteArray(), 0, baos2.size()) ).toByteArray(); } } ``` The results: ``` Benchmark (bufferSize) (inputLength) Mode Cnt Score Error Units GZIPInputStreamBenchmark.getGzipInputStream 512 1024 avgt 5 3207.217 ± 24.919 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 3072 avgt 5 5874.191 ± 5.827 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 9216 avgt 5 15567.345 ± 93.281 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 1024 avgt 5 2580.566 ± 14.566 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 3072 avgt 5 4154.582 ± 16.016 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 9216 avgt 5 9942.521 ± 61.215 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 1024 avgt 5 2150.255 ± 52.770 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 3072 avgt 5 2289.185 ± 71.396 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 9216 avgt 5 5656.891 ± 28.499 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 1024 avgt 5 2177.427 ± 30.896 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 3072 avgt 5 2517.390 ± 21.296 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 9216 avgt 5 5227.932 ± 55.525 ns/op ``` Co-authored-by: Kushal Pisavadia <[email protected]> Closes #20316. PiperOrigin-RevId: 588444920 Change-Id: I1fb47f0b08dcb8d72f3e2c43534c33d60efb87f2
- Loading branch information