Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve hashing and equality allocations/performance #5304

Merged
merged 12 commits into from
Jul 17, 2023

Conversation

drewnoakes
Copy link
Contributor

@drewnoakes drewnoakes commented Jul 10, 2023

Bug

Fixes: NuGet/Home#12746

Regression? Last working version: N/A

Description

Improve the performance of hashing and equality checks.

  • Create a HashCodeCombiner.AddUnorderedSequence that avoids needing calls to OrderBy that cause buffers to be allocated and sort algorithms to run.
  • Improve NoAllocEnumerate for IList<T> to no longer allocate a fallback enumerator, and instead enumerate the list directly by index.
  • Use NoAllocEnumerate in HashCodeCombiner.
  • Improve EqualityUtility.OrderedEquals when sequence contains a single item, again, avoiding OrderBy.
  • Fix incorrect calls to HashCodeCombiner.GetHashCode where callers should be using CombinedHash instead. The former boxes the struct in order to call the virtual ValueType.GetHashCode. Convert HashCodeCombiner to a ref struct to prevent this occurring in future.
  • Fix places where strings were being treated as sequences of characters when adding to a hash.
  • Remove HashCodeCombiner.CheckInitialized. This method has no practical utility in the existing code. It's possible to define the initial seed in a field initializer in newer versions of C#. Calling it from every method reduces that chance that calls can be inlined, and introduces branches that hurt perf.
  • Null annotate some APIs.
  • Add unit test class for HashCodeCombiner.

PR Checklist

  • PR has a meaningful title

  • PR has a linked issue.

  • Described changes

  • Tests

    • Automated tests added
    • OR
    • Test exception
    • OR
    • N/A
  • Documentation

    • Documentation PR or issue filled
    • OR
    • N/A

`OrderBy` internally allocates storage for the items. That buffer can be avoided when there's only one item.
This avoids a boxed (allocating) call to ValueType.GetHashCode.
Avoids enumerator allocations and improves performance.
This method has no practical utility in the existing code. It's possible to define the initial seed in a field initializer in newer versions of C#. Calling it from every method reduces that chance that calls can be inlined, and introduces branches that hurt perf.
This type is only for short-term use on the stack. Making it a ref struct also prevents accidentally calling `GetHashCode` on the instance rather than `CombinedHash`, as the former would try to box the struct which is not allowed for `ref struct`.
@drewnoakes drewnoakes requested a review from a team as a code owner July 10, 2023 10:47
@drewnoakes drewnoakes changed the title Dev drnoakes hashing and equality performance Improve hashing and equality allocations/performance Jul 10, 2023
Avoids using a fallback enumerator when the type is found to be `IList<T>` as that type can be enumerated by the struct enumerator directly by index.

Also change the representation of the enumerator type from int to enum, to reduce the use of magic numbers.
Use `NoAllocEnumerate` in `foreach`, and implement unordered sequence methods that don't require data to be sorted in order to produce consistent hashes.

Add unit tests for HashCodeCombiner to validate these new algorithms, and apply them throughout the solution in places where `GetHashCode` methods were previously allocating buffers and sorting items via `OrderBy`.
@drewnoakes drewnoakes force-pushed the dev-drnoakes-hashing-and-equality-performance branch from 648900f to 40d8aff Compare July 10, 2023 10:58
Copy link
Member

@nkolev92 nkolev92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

}

internal int CombinedHash => _combinedHash.GetHashCode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general when making refactorings, making minor stylistic changes only makes things harder to review sa every time it is reviewed, people would review something that's a no-op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From memory the IDE showed a diagnostic suggesting this change. I try to reduce the number of diagnostics in the gutter/margin as part of refactoring/tidy ups.

If this style is not preferred, perhaps remove the diagnostic from the solution in the .editorconfig file. As it stands, it seems like a request to change this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #9, we touch on that, https://github.com/NuGet/NuGet.Client/blob/dev/docs/coding-guidelines.md.

The general idea is that it's preferred to use that in new code, but part of the internal feedback for the turnaround of some of these PRs is the sheer number of line changes.

@nkolev92
Copy link
Member

Can you please create a matching issue?

Thanks!

@drewnoakes
Copy link
Contributor Author

NuGet/Home#12746

@nkolev92
Copy link
Member

The CI is failing due to a machine timeout :(
I retriggered that and hopefully it passes.

@NuGet/nuget-client can someone shepherd this?

@nkolev92 nkolev92 added the Community PRs created by someone not in the NuGet team label Jul 14, 2023
@donnie-msft donnie-msft self-assigned this Jul 14, 2023
@donnie-msft donnie-msft merged commit 49cd1d8 into dev Jul 17, 2023
@donnie-msft donnie-msft deleted the dev-drnoakes-hashing-and-equality-performance branch July 17, 2023 23:29
drewnoakes added a commit that referenced this pull request Jul 21, 2023
Builds on the work started in #5304.

Rather than ordering items to add them to the hash code in a stable fashion, we re-use the new `AddUnorderedSequence` method. That method does not need to order its inputs, meaning we can avoid allocating buffers for the inputs, and running sort algorithms on those items. Instead we can just enumerate the items once while computing the hash, keeping running state on the stack.
drewnoakes added a commit that referenced this pull request Jul 23, 2023
Builds on the work started in #5304.

Rather than ordering items to add them to the hash code in a stable fashion, we re-use the new `AddUnorderedSequence` method. That method does not need to order its inputs, meaning we can avoid allocating buffers for the inputs, and running sort algorithms on those items. Instead we can just enumerate the items once while computing the hash, keeping running state on the stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community PRs created by someone not in the NuGet team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve hashing and equality allocations/performance
3 participants