System.IO.Compression.Crc32Helper should use Crc32 intrinsics #40244

benaadams · 2020-08-02T20:25:25Z

Rather than interoping to zlib when the Crc32 intrinsics are available on Arm and x86/x64

runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/Crc32Helper.ZLib.cs

Lines 11 to 26 in 995224d

    
           // Calculate CRC based on the old CRC and the new bytes 
        
           public static unsafe uint UpdateCrc32(uint crc32, byte[] buffer, int offset, int length) 
        
           { 
        
               Debug.Assert((buffer != null) && (offset >= 0) && (length >= 0) && (offset <= buffer.Length - length)); 
        
               fixed (byte* bufferPtr = &buffer[offset]) 
        
               { 
        
                   return Interop.zlib.crc32(crc32, bufferPtr, length); 
        
               } 
        
           } 
        
           public static unsafe uint UpdateCrc32(uint crc32, ReadOnlySpan<byte> buffer) 
        
           { 
        
               fixed (byte* bufferPtr = &MemoryMarshal.GetReference(buffer)) 
        
               { 
        
                   return Interop.zlib.crc32(crc32, bufferPtr, buffer.Length); 
        
               }

/cc @tannergooding

danmoseley · 2020-08-02T20:39:56Z

According to #2036, the intrinsics use different polynomials. If I understand right, zlib uses 0x04C11DB7/0xEDB88320 and SSE uses 0x1EDC6F41. Wikipedia has a large table.

Is that right?

benaadams · 2020-08-02T20:49:47Z

Sigh might be; I think you might have to use PCLMULQDQ instead

JimBobSquarePants · 2020-08-02T20:51:12Z

Tada! Take what you want and feed back any improvements you can make.

https://github.com/SixLabors/ImageSharp/blob/f4f689ce67ecbcc35cebddba5aacb603e6d1068a/src/ImageSharp/Formats/Png/Zlib/Crc32.cs

brantburnett · 2023-04-23T18:43:52Z

I believe that this can be done now using the vectorized implementation now found in System.IO.Hashing (it uses PCLMULQDQ on Intel and equivalents on ARM).

My concern is that System.IO.Compression appears to be an always included part of the framework, while System.IO.Hashing is distributed on NuGet. We'd need to make System.IO.Hashing always available. I'm not sure what the procedures would be for getting such a change approved, or if it's worth it.

stephentoub · 2023-04-24T03:30:36Z

Yeah, I don't think we're going to want to pull NonCryptographicHashAlgorithm down into corelib. We could consider alternate ways of sharing the code / implementation with System.IO.Compression, though. Note that the checksum is only used there as part of ZipArchive when writing out an entry. It is, however, a meaningful chunk of the time required to do so, something around 30% for typical use if memory serves.

brantburnett · 2023-04-29T15:34:50Z

I did some testing to see if this improvement might be worthwhile. Here are the results I'm seeing on my Intel Windows machine. The test was compressing a single file to a ZipArchive using a similar approach to the other compression tests in the microbenchmarks. I may be able to squeeze out a bit more by using the static Update method if it's accessible by internalizing Crc32.

The biggest risk would be old Intel CPUs that don't support the intrinsics required for vectorization. I would suspect that the zlib scalar implementation may be better for those cases. The upside is we avoid a GC pin and transition and gain vectorization when available.

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1635)
Intel Core i7-10850H CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=8.0.100-preview.3.23178.7
[Host] : .NET 8.0.0 (8.0.23.17408), X64 RyuJIT AVX2
Job-DUHTZA : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-MWSTQA : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method	Job	file	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD
Compress	Interop	TestDocument.pdf	3,102.3 us	29.86 us	27.93 us	3,106.4 us	3,060.1 us	3,170.0 us	1.00	0.00
Compress	C#	TestDocument.pdf	3,046.6 us	21.40 us	20.02 us	3,045.1 us	3,016.4 us	3,080.2 us	0.98	0.01

Compress	Interop	alice29.txt	3,978.3 us	24.40 us	22.82 us	3,971.5 us	3,950.2 us	4,026.7 us	1.00	0.00
Compress	C#	alice29.txt	3,953.8 us	33.50 us	27.98 us	3,943.8 us	3,909.5 us	4,015.5 us	0.99	0.01

Compress	Interop	sum	662.0 us	11.95 us	11.18 us	658.1 us	650.2 us	684.0 us	1.00	0.00
Compress	C#	sum	639.7 us	2.26 us	1.77 us	640.3 us	637.1 us	642.2 us	0.96	0.02

stephentoub · 2023-04-30T02:42:08Z

Thanks, @brantburnett. Just a few percent in throughput isn't worth bringing this functionality down from System.IO.Hashing nor including a large amount of code in System.IO.Compression. Based on these numbers, I'd speculate that the zlib being used already does some amount of vectorization in its crc32 calculation, e.g. the zlib from Intel that .NET uses on Windows does:
https://github.com/dotnet/runtime/blob/main/src/native/external/zlib-intel/crc_folding.c

benaadams added the tenet-performance Performance related issue label Aug 2, 2020

Dotnet-GitSync-Bot added area-System.IO.Compression untriaged New issue has not been triaged by the area owner labels Aug 2, 2020

danmoseley added this to the Future milestone Aug 2, 2020

carlossanlop removed the untriaged New issue has not been triaged by the area owner label Aug 6, 2020

brantburnett mentioned this issue Mar 13, 2023

Add vectorization to improve CRC32 performance #83321

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.IO.Compression.Crc32Helper should use Crc32 intrinsics #40244

System.IO.Compression.Crc32Helper should use Crc32 intrinsics #40244

benaadams commented Aug 2, 2020

danmoseley commented Aug 2, 2020

benaadams commented Aug 2, 2020

JimBobSquarePants commented Aug 2, 2020

brantburnett commented Apr 23, 2023

stephentoub commented Apr 24, 2023

brantburnett commented Apr 29, 2023

stephentoub commented Apr 30, 2023

System.IO.Compression.Crc32Helper should use Crc32 intrinsics #40244

System.IO.Compression.Crc32Helper should use Crc32 intrinsics #40244

Comments

benaadams commented Aug 2, 2020

danmoseley commented Aug 2, 2020

benaadams commented Aug 2, 2020

JimBobSquarePants commented Aug 2, 2020

brantburnett commented Apr 23, 2023

stephentoub commented Apr 24, 2023

brantburnett commented Apr 29, 2023

stephentoub commented Apr 30, 2023