-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize XxHash3 #77756
Optimize XxHash3 #77756
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsHey there!
So overall, this PR brings x2 performance improvement on large buffers and lower the performance gap with the native version (~10%). Small note not covered by this PR: while checking with the native version, I noticed that the hash is different (for both the current and optimized version vs native). Will have to dig further why.
|
Another note: When I checked the implementation on a macOS/m1Pro I noticed that the ARM64 implementation is significantly slower than the x86 version (e.g x3). I haven't checked, but I saw on the xxHash repository that they are doing things slightly differently for ARM platform (e.g maybe using aligned loads). Will have to dig this one as well. |
I bet it's because of this: #76641 (comment)
|
src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs
Outdated
Show resolved
Hide resolved
What native implementation are you calling and what are you comparing it against? I'd validated all of the test data against a locally built xxhash.dll calling XXH3_64bits_withSeed. |
src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs
Outdated
Show resolved
Hide resolved
Yeah, you can see the comment I left in the code about it and my question about better ideas for dealing with it: |
Yes, I validated against a locally built xxhash.
The C# bytes are read from the destination span, casting to a So the bytes are actually in reverse order. Is it expected? |
Tests are failing, will dig further into these errors tonight. |
Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones Issue DetailsHey there!
So overall, this PR brings x2 performance improvement on large buffers and lower the performance gap with the native version (~10%). Small note not covered by this PR: while checking with the native version, I noticed that the hash is different (for both the current and optimized version vs native). Will have to dig further why.
|
Commit b72972c should fix the tests failing on the scalar path. |
Not sure I understand the discussion about the endianess (intermixed with whether it should be commented on the XML doc API) but I concur that using Big Endian is really confusing and counter intuitive. Who is going to expect this frankly? Last time I programmed a big endian processor that was 30 years ago with the Motorola 68000 😅 XXHash is following the native endian order (except for 128 bits where they are using 2 64 bits) and I would hope that we could keep the same logic so that is doesn't cause such trouble when trying to match with the C++ implementation. |
I don't see any APIs that are outputting bytes, only numbers. Can you elaborate? |
Sorry, I meant that they output numbers (except for the 128 bits which are 2x64 bit numbers) that thus follow the native endianness. Unless there is something in the .NET HashAlgorithm that mandates that all of them should always emit byte[] in big endian form for any underlying hash algorithm? |
That's where #76279 will come in.
The abstract base type here only has methods which output bytes rather than numbers. The xxhash implementations emit those bytes as big endian because of this: |
Oh, right, I can see it mentioned in the header file here That makes sense then, thanks! |
FWIW, I don't see 50% improvement. On my machine the best I see is ~25%:
Still, a nice bump. I don't love the manual loop unrolling and associated code increase, but it's probably still worth it. Thanks. |
Oh interesting. What's your CPU? |
|
Thanks, I don't have a recent Intel CPU, but I will check on a older Kaby-Lake how it behaves. In the meantime, I received a Windows Dev Kit 2023 with ARM64 and the results are:
Which gives a similar performance boost to x2 times, but it is still 2x slower than the native version. |
Hey there!
This is a PR to optimize XxHash3 for large buffers (> 240 bytes) by trying to avoid loading/storing the accumulators on each 64 bytes but instead doing it after a whole batch.
XXH3Native
is the C++ versionXXXH3
is the current C# versionXXH3Optimized
is this PR C# versionSo overall, this PR brings x2 performance improvement on large buffers and lower the performance gap with the native version (~10%).
Small note not covered by this PR: while checking with the native version, I noticed that the hash is different (for both the current and optimized version vs native). Will have to dig further why.