-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryExtensions.Replace(Span<T>, T, T) implemented #76337
Conversation
…and in the scalar loop
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsDescriptionFixes #75322 Replaces some open coded loops that I found (or that were linked in the issue), and forwared BenchmarksRun on x64 with AVX2.
|
Author: | gfoidl |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes for review.
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static void ReplaceValueType<T>(ref T src, ref T dst, T oldValue, T newValue, nuint length) where T : struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For string.Replace(char, char)
to work, the signature has to look like this. Cf. #75322 (comment)
Codegen-wise (at least on windows) the length
is passed via stack. This method will be inlined, so no problem and below for the vectorized code-path (not inlined) the cost will be amortized. So I think this isn't a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my edification, how much worse it it if string.Replace is implemented as a memcpy and then an in-place replace? 2x worse? Less than that? I'm wondering whether the two-arg case is important enough that we should be exposing it publicly, or if the in-place replace is sufficient. We went with only the two-arg case for endianness reversal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change would look like 64c7e8136aad79f074842b3ac3fc95aea78a35f8.
For the benchmarks I just tested the worst-case*, i.e. the first char is a match, so the whole string gets copied and then again from the beginning iterated over to replace --
* string consisting all of -
, then string.Replace('-', '+')
Numbers look like (AVX2 machine):
| Method | Length | Mean | Median | Ratio |
|----------- |------- |----------:|----------:|------:|
| OutOfPlace | 7 | 34.65 ns | 34.10 ns | 1.00 |
| Inplace | 7 | 34.32 ns | 33.12 ns | 1.00 |
| | | | | |
| OutOfPlace | 8 | 26.95 ns | 26.08 ns | 1.00 |
| Inplace | 8 | 27.53 ns | 27.27 ns | 1.01 |
| | | | | |
| OutOfPlace | 15 | 24.33 ns | 24.33 ns | 1.00 |
| Inplace | 15 | 45.03 ns | 43.91 ns | 1.85 |
| | | | | |
| OutOfPlace | 16 | 25.90 ns | 24.98 ns | 1.00 |
| Inplace | 16 | 32.42 ns | 32.27 ns | 1.16 |
| | | | | |
| OutOfPlace | 31 | 26.86 ns | 26.69 ns | 1.00 |
| Inplace | 31 | 36.23 ns | 36.21 ns | 1.35 |
| | | | | |
| OutOfPlace | 100 | 37.64 ns | 37.18 ns | 1.00 |
| Inplace | 100 | 50.73 ns | 50.67 ns | 1.35 |
| | | | | |
| OutOfPlace | 500 | 108.17 ns | 107.71 ns | 1.00 |
| Inplace | 500 | 131.40 ns | 129.90 ns | 1.22 |
| | | | | |
| OutOfPlace | 1000 | 192.27 ns | 191.45 ns | 1.00 |
| Inplace | 1000 | 223.64 ns | 221.66 ns | 1.19 |
For the best-case, i.e. no match, the runtime should be equal, as the IndexOf(oldValue)-scan is the only thing executing.
Then of course there are all variations in between.
PS: numbers are a bit flaky on re-runs, but the given numbers are quite representative (no my machine).
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
I see failures from CI like
This wasn't on my table -- |
ReplaceValueType is called from string.Replace(char, char) so the Debug.Assert was on wrong position, as at entry to method non accelerated platforms are allowed to call it.
Intentionally leave one iteration off, as the remaining elements are done vectorized anyway. This eliminates the less probable case (cf. dotnet#76337 (comment)) that the last vector is done twice.
bc85ce3
to
ebcbb9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs
Outdated
Show resolved
Hide resolved
} | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static void ReplaceValueType<T>(ref T src, ref T dst, T oldValue, T newValue, nuint length) where T : struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my edification, how much worse it it if string.Replace is implemented as a memcpy and then an in-place replace? 2x worse? Less than that? I'm wondering whether the two-arg case is important enough that we should be exposing it publicly, or if the in-place replace is sufficient. We went with only the two-arg case for endianness reversal.
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs
Outdated
Show resolved
Hide resolved
In CI there are lots of
|
I've re-triggered the CI, let's see if it helps. |
Description
Fixes #75322
Replaces some open coded loops that I found (or that were linked in the issue), and forwared
string.Replace(char, char)
to this new span-based replace.Benchmarks
Run on x64 with AVX2.
string.Replace(char, char)
Is this slight regression OK?
For me yes, as it avoids code duplication and it's a few ns.
Others
For replacement of the open coded loops I didn't run benchmarks, as in each iteration there is a
call
toIndexOf
.Now in the worst case there is one
call
toReplaceValueTypeVectorized
.