hash/crc32: use faster method for CRC32-C #16107

RaduBerinde · 2016-06-18T14:24:18Z

The hardware (SSE4.2) assisted CRC32-C implementation can be made faster (from my experience about 2.5 - 2.7 times faster if checksumming enough bytes e.g. 512+),

The gist of the idea is that (on recent Intel platforms at least) the CRC32 instruction takes three cycles but three of them can be pipelined in parallel if they don't depend on each other; so we can split the buffer into three, checksum the three pieces in "parallel" (interleaving the CRC32 instructions) and then combine the results using some tables.

This article goes into more depth: http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411

I can take a stab at implementing it at some point when I have some time of my hands, unless there are objections.

ianlancetaylor · 2016-06-20T17:40:50Z

If you can make it faster while producing the same results, by all means go for it. Thanks.

RaduBerinde · 2016-06-24T21:06:15Z

Working on a decent initial version. Sneak peek:

CastagnoliCrc15B-4    597MB/s ± 0%    662MB/s ± 1%   +10.86%  (p=0.029 n=4+4)
CastagnoliCrc40B-4   1.74GB/s ± 0%   1.68GB/s ± 1%    -3.35%  (p=0.029 n=4+4)
CastagnoliCrc1KB-4   7.99GB/s ± 0%  14.14GB/s ± 0%   +76.89%  (p=0.029 n=4+4)
CastagnoliCrc4KB-4   8.84GB/s ± 0%  18.15GB/s ± 0%  +105.38%  (p=0.029 n=4+4)
CastagnoliCrc32KB-4  9.11GB/s ± 1%  19.40GB/s ± 0%  +112.94%  (p=0.029 n=4+4)

gopherbot · 2016-08-21T19:00:27Z

CL https://golang.org/cl/24471 mentions this issue.

gopherbot · 2016-08-28T01:00:22Z

CL https://golang.org/cl/27931 mentions this issue.

The algorithm is explained in the comments. The improvement in throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x for larger buffers. Additionally, we no longer initialize the software tables if SSE4.2 is available. Adding a test for the SSE implementation (restricted to amd64 and amd64p32). Benchmarks on a Haswell i5-4670 @ 3.4 GHz: name old time/op new time/op delta CastagnoliCrc15B-4 21.9ns ± 1% 22.9ns ± 0% +4.45% CastagnoliCrc15BMisaligned-4 22.6ns ± 0% 23.4ns ± 0% +3.43% CastagnoliCrc40B-4 23.3ns ± 0% 23.9ns ± 0% +2.58% CastagnoliCrc40BMisaligned-4 25.4ns ± 0% 26.1ns ± 0% +2.86% CastagnoliCrc512-4 72.6ns ± 0% 52.8ns ± 0% -27.33% CastagnoliCrc512Misaligned-4 76.3ns ± 1% 56.3ns ± 0% -26.18% CastagnoliCrc1KB-4 128ns ± 1% 89ns ± 0% -30.04% CastagnoliCrc1KBMisaligned-4 130ns ± 0% 88ns ± 0% -32.65% CastagnoliCrc4KB-4 461ns ± 0% 187ns ± 0% -59.40% CastagnoliCrc4KBMisaligned-4 463ns ± 0% 191ns ± 0% -58.77% CastagnoliCrc32KB-4 3.58µs ± 0% 1.35µs ± 0% -62.22% CastagnoliCrc32KBMisaligned-4 3.58µs ± 0% 1.36µs ± 0% -61.84% name old speed new speed delta CastagnoliCrc15B-4 684MB/s ± 1% 655MB/s ± 0% -4.32% CastagnoliCrc15BMisaligned-4 663MB/s ± 0% 641MB/s ± 0% -3.32% CastagnoliCrc40B-4 1.72GB/s ± 0% 1.67GB/s ± 0% -2.69% CastagnoliCrc40BMisaligned-4 1.58GB/s ± 0% 1.53GB/s ± 0% -2.82% CastagnoliCrc512-4 7.05GB/s ± 0% 9.70GB/s ± 0% +37.59% CastagnoliCrc512Misaligned-4 6.71GB/s ± 1% 9.09GB/s ± 0% +35.43% CastagnoliCrc1KB-4 7.98GB/s ± 1% 11.46GB/s ± 0% +43.55% CastagnoliCrc1KBMisaligned-4 7.86GB/s ± 0% 11.70GB/s ± 0% +48.75% CastagnoliCrc4KB-4 8.87GB/s ± 0% 21.80GB/s ± 0% +145.69% CastagnoliCrc4KBMisaligned-4 8.83GB/s ± 0% 21.39GB/s ± 0% +142.25% CastagnoliCrc32KB-4 9.15GB/s ± 0% 24.22GB/s ± 0% +164.62% CastagnoliCrc32KBMisaligned-4 9.16GB/s ± 0% 24.00GB/s ± 0% +161.94% Fixes #16107. Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907 Reviewed-on: https://go-review.googlesource.com/27931 Run-TryBot: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]>

ianlancetaylor added this to the Go1.8 milestone Jun 20, 2016

gopherbot closed this as completed in 54d7de7 Aug 27, 2016

This was referenced Aug 28, 2016

hash/crc32: port faster CRC32-C implementation from AMD64 to AMD64P32 #16908

Closed

hash/crc32: clean up logic around tables #16909

Closed

golang locked and limited conversation to collaborators Aug 28, 2017

gopherbot added the FrozenDueToAge label Aug 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hash/crc32: use faster method for CRC32-C #16107

hash/crc32: use faster method for CRC32-C #16107

RaduBerinde commented Jun 18, 2016

ianlancetaylor commented Jun 20, 2016

RaduBerinde commented Jun 24, 2016

gopherbot commented Aug 21, 2016

gopherbot commented Aug 28, 2016

hash/crc32: use faster method for CRC32-C #16107

hash/crc32: use faster method for CRC32-C #16107

Comments

RaduBerinde commented Jun 18, 2016

ianlancetaylor commented Jun 20, 2016

RaduBerinde commented Jun 24, 2016

gopherbot commented Aug 21, 2016

gopherbot commented Aug 28, 2016