Improve hashing and equality allocations/performance #5304

drewnoakes · 2023-07-10T10:47:56Z

Bug

Fixes: NuGet/Home#12746

Regression? Last working version: N/A

Description

Improve the performance of hashing and equality checks.

Create a HashCodeCombiner.AddUnorderedSequence that avoids needing calls to OrderBy that cause buffers to be allocated and sort algorithms to run.
Improve NoAllocEnumerate for IList<T> to no longer allocate a fallback enumerator, and instead enumerate the list directly by index.
Use NoAllocEnumerate in HashCodeCombiner.
Improve EqualityUtility.OrderedEquals when sequence contains a single item, again, avoiding OrderBy.
Fix incorrect calls to HashCodeCombiner.GetHashCode where callers should be using CombinedHash instead. The former boxes the struct in order to call the virtual ValueType.GetHashCode. Convert HashCodeCombiner to a ref struct to prevent this occurring in future.
Fix places where strings were being treated as sequences of characters when adding to a hash.
Remove HashCodeCombiner.CheckInitialized. This method has no practical utility in the existing code. It's possible to define the initial seed in a field initializer in newer versions of C#. Calling it from every method reduces that chance that calls can be inlined, and introduces branches that hurt perf.
Null annotate some APIs.
Add unit test class for HashCodeCombiner.

PR Checklist

`OrderBy` internally allocates storage for the items. That buffer can be avoided when there's only one item.

This avoids a boxed (allocating) call to ValueType.GetHashCode.

Avoids enumerator allocations and improves performance.

This method has no practical utility in the existing code. It's possible to define the initial seed in a field initializer in newer versions of C#. Calling it from every method reduces that chance that calls can be inlined, and introduces branches that hurt perf.

This type is only for short-term use on the stack. Making it a ref struct also prevents accidentally calling `GetHashCode` on the instance rather than `CombinedHash`, as the former would try to box the struct which is not allowed for `ref struct`.

Avoids using a fallback enumerator when the type is found to be `IList<T>` as that type can be enumerated by the struct enumerator directly by index. Also change the representation of the enumerator type from int to enum, to reduce the use of magic numbers.

Use `NoAllocEnumerate` in `foreach`, and implement unordered sequence methods that don't require data to be sorted in order to produce consistent hashes. Add unit tests for HashCodeCombiner to validate these new algorithms, and apply them throughout the solution in places where `GetHashCode` methods were previously allocating buffers and sorting items via `OrderBy`.

nkolev92

LGTM!

nkolev92 · 2023-07-14T05:56:27Z

build/Shared/HashCodeCombiner.cs

        }

+        internal int CombinedHash => _combinedHash.GetHashCode();


I think in general when making refactorings, making minor stylistic changes only makes things harder to review sa every time it is reviewed, people would review something that's a no-op.

From memory the IDE showed a diagnostic suggesting this change. I try to reduce the number of diagnostics in the gutter/margin as part of refactoring/tidy ups.

If this style is not preferred, perhaps remove the diagnostic from the solution in the .editorconfig file. As it stands, it seems like a request to change this.

In #9, we touch on that, https://github.com/NuGet/NuGet.Client/blob/dev/docs/coding-guidelines.md.

The general idea is that it's preferred to use that in new code, but part of the internal feedback for the turnaround of some of these PRs is the sheer number of line changes.

nkolev92 · 2023-07-14T06:12:48Z

Can you please create a matching issue?

Thanks!

drewnoakes · 2023-07-14T07:11:59Z

NuGet/Home#12746

nkolev92 · 2023-07-14T16:46:57Z

The CI is failing due to a machine timeout :(
I retriggered that and hopefully it passes.

@NuGet/nuget-client can someone shepherd this?

Builds on the work started in #5304. Rather than ordering items to add them to the hash code in a stable fashion, we re-use the new `AddUnorderedSequence` method. That method does not need to order its inputs, meaning we can avoid allocating buffers for the inputs, and running sort algorithms on those items. Instead we can just enumerate the items once while computing the hash, keeping running state on the stack.

drewnoakes added 10 commits July 10, 2023 12:13

Add null annotations

c5ddf5e

Don't order a single value

b3ca559

`OrderBy` internally allocates storage for the items. That buffer can be avoided when there's only one item.

Remove redundant null check

cea758d

Use CombinedHash over GetHashCode

751180b

This avoids a boxed (allocating) call to ValueType.GetHashCode.

Treat strings as objects not sequences

c07676b

Avoids enumerator allocations and improves performance.

Remove redundant GetHashCode call

8bcef33

Spelling

77c8e3f

Use consistent type argument name

5484f89

Make HashCodeCombiner a ref struct

e9837a2

This type is only for short-term use on the stack. Making it a ref struct also prevents accidentally calling `GetHashCode` on the instance rather than `CombinedHash`, as the former would try to box the struct which is not allowed for `ref struct`.

drewnoakes requested a review from a team as a code owner July 10, 2023 10:47

drewnoakes changed the title ~~Dev drnoakes hashing and equality performance~~ Improve hashing and equality allocations/performance Jul 10, 2023

drewnoakes added 2 commits July 10, 2023 20:58

drewnoakes force-pushed the dev-drnoakes-hashing-and-equality-performance branch from 648900f to 40d8aff Compare July 10, 2023 10:58

drewnoakes added the Perf label Jul 10, 2023

nkolev92 approved these changes Jul 14, 2023

View reviewed changes

nkolev92 added the Community PRs created by someone not in the NuGet team label Jul 14, 2023

donnie-msft self-assigned this Jul 14, 2023

donnie-msft merged commit 49cd1d8 into dev Jul 17, 2023

donnie-msft deleted the dev-drnoakes-hashing-and-equality-performance branch July 17, 2023 23:29

drewnoakes mentioned this pull request Jul 21, 2023

Remove allocations in hash code computations #5325

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve hashing and equality allocations/performance #5304

Improve hashing and equality allocations/performance #5304

drewnoakes commented Jul 10, 2023 •

edited

Loading

nkolev92 left a comment

nkolev92 Jul 14, 2023

drewnoakes Jul 14, 2023

nkolev92 Jul 14, 2023

nkolev92 commented Jul 14, 2023

drewnoakes commented Jul 14, 2023

nkolev92 commented Jul 14, 2023

Improve hashing and equality allocations/performance #5304

Improve hashing and equality allocations/performance #5304

Conversation

drewnoakes commented Jul 10, 2023 • edited Loading

Bug

Description

PR Checklist

nkolev92 left a comment

Choose a reason for hiding this comment

nkolev92 Jul 14, 2023

Choose a reason for hiding this comment

drewnoakes Jul 14, 2023

Choose a reason for hiding this comment

nkolev92 Jul 14, 2023

Choose a reason for hiding this comment

nkolev92 commented Jul 14, 2023

drewnoakes commented Jul 14, 2023

nkolev92 commented Jul 14, 2023

drewnoakes commented Jul 10, 2023 •

edited

Loading