-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve hashing and equality allocations/performance #5304
Improve hashing and equality allocations/performance #5304
Conversation
`OrderBy` internally allocates storage for the items. That buffer can be avoided when there's only one item.
This avoids a boxed (allocating) call to ValueType.GetHashCode.
Avoids enumerator allocations and improves performance.
This method has no practical utility in the existing code. It's possible to define the initial seed in a field initializer in newer versions of C#. Calling it from every method reduces that chance that calls can be inlined, and introduces branches that hurt perf.
This type is only for short-term use on the stack. Making it a ref struct also prevents accidentally calling `GetHashCode` on the instance rather than `CombinedHash`, as the former would try to box the struct which is not allowed for `ref struct`.
Avoids using a fallback enumerator when the type is found to be `IList<T>` as that type can be enumerated by the struct enumerator directly by index. Also change the representation of the enumerator type from int to enum, to reduce the use of magic numbers.
Use `NoAllocEnumerate` in `foreach`, and implement unordered sequence methods that don't require data to be sorted in order to produce consistent hashes. Add unit tests for HashCodeCombiner to validate these new algorithms, and apply them throughout the solution in places where `GetHashCode` methods were previously allocating buffers and sorting items via `OrderBy`.
648900f
to
40d8aff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
} | ||
|
||
internal int CombinedHash => _combinedHash.GetHashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general when making refactorings, making minor stylistic changes only makes things harder to review sa every time it is reviewed, people would review something that's a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From memory the IDE showed a diagnostic suggesting this change. I try to reduce the number of diagnostics in the gutter/margin as part of refactoring/tidy ups.
If this style is not preferred, perhaps remove the diagnostic from the solution in the .editorconfig
file. As it stands, it seems like a request to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #9, we touch on that, https://github.com/NuGet/NuGet.Client/blob/dev/docs/coding-guidelines.md.
The general idea is that it's preferred to use that in new code, but part of the internal feedback for the turnaround of some of these PRs is the sheer number of line changes.
Can you please create a matching issue? Thanks! |
The CI is failing due to a machine timeout :( @NuGet/nuget-client can someone shepherd this? |
Builds on the work started in #5304. Rather than ordering items to add them to the hash code in a stable fashion, we re-use the new `AddUnorderedSequence` method. That method does not need to order its inputs, meaning we can avoid allocating buffers for the inputs, and running sort algorithms on those items. Instead we can just enumerate the items once while computing the hash, keeping running state on the stack.
Builds on the work started in #5304. Rather than ordering items to add them to the hash code in a stable fashion, we re-use the new `AddUnorderedSequence` method. That method does not need to order its inputs, meaning we can avoid allocating buffers for the inputs, and running sort algorithms on those items. Instead we can just enumerate the items once while computing the hash, keeping running state on the stack.
Bug
Fixes: NuGet/Home#12746
Regression? Last working version: N/A
Description
Improve the performance of hashing and equality checks.
HashCodeCombiner.AddUnorderedSequence
that avoids needing calls toOrderBy
that cause buffers to be allocated and sort algorithms to run.NoAllocEnumerate
forIList<T>
to no longer allocate a fallback enumerator, and instead enumerate the list directly by index.NoAllocEnumerate
inHashCodeCombiner
.EqualityUtility.OrderedEquals
when sequence contains a single item, again, avoidingOrderBy
.HashCodeCombiner.GetHashCode
where callers should be usingCombinedHash
instead. The former boxes the struct in order to call the virtualValueType.GetHashCode
. ConvertHashCodeCombiner
to aref struct
to prevent this occurring in future.HashCodeCombiner.CheckInitialized
. This method has no practical utility in the existing code. It's possible to define the initial seed in a field initializer in newer versions of C#. Calling it from every method reduces that chance that calls can be inlined, and introduces branches that hurt perf.HashCodeCombiner
.PR Checklist
PR has a meaningful title
PR has a linked issue.
Described changes
Tests
Documentation