Skip to content

Commit

Permalink
improve unsafe Decompression Performance ~4%
Browse files Browse the repository at this point in the history
This is another attempt to replace the aggressive compiler after the
failed attempt #69 (wrote out of bounds in some cases)

The unrolling is avoided by manually unrolling less aggressive.
Decompression performance is slightly improved by ca 4%, except the
smallest test case.
  • Loading branch information
PSeitz committed May 27, 2023
1 parent ab128d1 commit 756e2f0
Showing 1 changed file with 13 additions and 2 deletions.
15 changes: 13 additions & 2 deletions src/block/decompress.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,24 @@ unsafe fn duplicate_overlapping(
// This is the same strategy used by the reference C implementation https://github.com/lz4/lz4/pull/772
output_ptr.write(0u8);
let dst_ptr_end = output_ptr.add(match_length);
while *output_ptr < dst_ptr_end {
// Note that we copy 4 bytes, instead of one.

while output_ptr.add(1) < dst_ptr_end {
// Note that this loop unrolling is done, so that the compiler doesn't do it in a awful
// way.
// Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
// This is not what we want, as large overlapping copies are not that common.
core::ptr::copy(start, *output_ptr, 1);
start = start.add(1);
*output_ptr = output_ptr.add(1);

core::ptr::copy(start, *output_ptr, 1);
start = start.add(1);
*output_ptr = output_ptr.add(1);
}

if *output_ptr < dst_ptr_end {
core::ptr::copy(start, *output_ptr, 1);
*output_ptr = output_ptr.add(1);
}
}

Expand Down

0 comments on commit 756e2f0

Please sign in to comment.