-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.IO.Compression.Crc32Helper should use Crc32 intrinsics #40244
Comments
According to #2036, the intrinsics use different polynomials. If I understand right, zlib uses 0x04C11DB7/0xEDB88320 and SSE uses 0x1EDC6F41. Wikipedia has a large table. Is that right? |
Sigh might be; I think you might have to use PCLMULQDQ instead |
Tada! Take what you want and feed back any improvements you can make. |
I believe that this can be done now using the vectorized implementation now found in System.IO.Hashing (it uses PCLMULQDQ on Intel and equivalents on ARM). My concern is that System.IO.Compression appears to be an always included part of the framework, while System.IO.Hashing is distributed on NuGet. We'd need to make System.IO.Hashing always available. I'm not sure what the procedures would be for getting such a change approved, or if it's worth it. |
Yeah, I don't think we're going to want to pull NonCryptographicHashAlgorithm down into corelib. We could consider alternate ways of sharing the code / implementation with System.IO.Compression, though. Note that the checksum is only used there as part of ZipArchive when writing out an entry. It is, however, a meaningful chunk of the time required to do so, something around 30% for typical use if memory serves. |
I did some testing to see if this improvement might be worthwhile. Here are the results I'm seeing on my Intel Windows machine. The test was compressing a single file to a ZipArchive using a similar approach to the other compression tests in the microbenchmarks. I may be able to squeeze out a bit more by using the static Update method if it's accessible by internalizing Crc32. The biggest risk would be old Intel CPUs that don't support the intrinsics required for vectorization. I would suspect that the zlib scalar implementation may be better for those cases. The upside is we avoid a GC pin and transition and gain vectorization when available. BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1635) PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
|
Thanks, @brantburnett. Just a few percent in throughput isn't worth bringing this functionality down from System.IO.Hashing nor including a large amount of code in System.IO.Compression. Based on these numbers, I'd speculate that the zlib being used already does some amount of vectorization in its crc32 calculation, e.g. the zlib from Intel that .NET uses on Windows does: |
Rather than interoping to zlib when the Crc32 intrinsics are available on Arm and x86/x64
runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/Crc32Helper.ZLib.cs
Lines 11 to 26 in 995224d
/cc @tannergooding
The text was updated successfully, but these errors were encountered: