-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce worst-case alg complexity of MemoryExtensions.IndexOfAny #53652
Conversation
Tagging subscribers to this area: @GrabYourPitchforks, @carlossanlop Issue DetailsBased on a comment thread at #53115 (comment). In a nutshell, given a large enough needle, the code pattern below may have Span<T> haystack = GetHaystack();
Span<T> needles = GetNeedles();
while (true)
{
int idxOfFirstNeedle = haystack.IndexOfAny(needles);
if (idxOfFirstNeedle >= 0)
{
DoSomethingWith(haystack.Slice(0, idxOfFirstNeedle);
haystack = haystack.Slice(idxOfFirstNeedle + 1);
}
else
{
DoSomethingWith(haystack); // remainder
break;
}
} This PR reduces the algorithmic complexity of this slice-and-loop pattern to worst-case I'm intentionally keeping this PR simple, and a future PR can introduce vectorization or any other optimization technique to help get some of this perf back. The unit tests introduced as part of this PR should help catch future regressions in this area. I don't have perf numbers right now but should be able to share them by the end of the week if there's interest in seeing them. Per investigation in the previously linked thread, I don't expect this to impact typical libraries code, as
|
src/libraries/System.Memory/tests/Span/IndexOfAny.AlgorithmicComplexity.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
Edit: It's because |
Benchmark resultsFull benchmark app: https://github.com/GrabYourPitchforks/ConsoleApplicationBenchmark/blob/471fa6822a5c5e5fecfc14196e54dbfe82a0f20c/ConsoleAppBenchmark/IndexOfAnyRunner.cs [Benchmark]
public int SliceInALoop()
{
var haystack = _haystack;
_ = haystack.Length; // allow JIT to prove not null
ReadOnlySpan<T> haystackSpan = haystack;
var needles = _needles;
_ = needles.Length; // allow JIT to prove not null
ReadOnlySpan<T> needlesSpan = needles;
while (true)
{
int idx = haystackSpan.IndexOfAny(needlesSpan);
if (idx < 0)
{
return haystackSpan.Length; // length of final slice
}
haystackSpan = haystackSpan.Slice(idx + 1);
}
}
DiscussionThis benchmark tests Don't read too deeply into the benchmark showing that the new |
// and when this is called in a loop, we want the entire loop to be bounded by O(n * l) | ||
// rather than O(n^2 * l). | ||
|
||
if (typeof(T).IsValueType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GrabYourPitchforks, for chars, should we do the same thing string.IndexOfAny does, using ProbabilisticMap? Then presumably string.IndexOfAny could just always delegate to MemoryExtensions.IndexOfAny rather than special-casing specific lengths?
It also occurs to me that for really large sets, we could invert the vectorization, and for each character in the haystack, vectorize the searching of the needles. Presumably that would require a relatively large set of input characters, though, e.g. at least 8.
Based on a comment thread at #53115 (comment).
In a nutshell, given a large enough needle, the code pattern below may have
O(n^2 * l)
complexity, where n is the length of the haystack and l is the number of needles.This PR reduces the algorithmic complexity of this slice-and-loop pattern to worst-case
O(n * l)
, which is what most callers probably expect. The downside is that there's a higher constant factor, and this higher constant factor will dominate when the haystack is small or when first needle has a much higher probability of being found than any subsequent needle.I'm intentionally keeping this PR simple, and a future PR can introduce vectorization or any other optimization technique to help get some of this perf back. The unit tests introduced as part of this PR should help catch future regressions in this area.
I don't have perf numbers right now but should be able to share them by the end of the week if there's interest in seeing them. Per investigation in the previously linked thread, I don't expect this to impact typical libraries code, as
MemoryExtensions.[Last]IndexOfAny
already has logic to handle specific needle counts typical of perf-critical code paths.