Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryExtensions.Replace(Span<T>, T, T) implemented #76337

Merged
merged 19 commits into from
Nov 11, 2022

Conversation

gfoidl
Copy link
Member

@gfoidl gfoidl commented Sep 28, 2022

Description

Fixes #75322

Replaces some open coded loops that I found (or that were linked in the issue), and forwared string.Replace(char, char) to this new span-based replace.

Benchmarks

Run on x64 with AVX2.

string.Replace(char, char)

|         Method | Length |      Mean | Ratio |
|--------------- |------- |----------:|------:|
|        Default |      7 |  26.92 ns |  1.00 |
| ViaSpanHelpers |      7 |  23.40 ns |  0.87 |     <- less code in the scalar path
|                |        |           |       |
|        Default |      8 |  29.98 ns |  1.00 |
| ViaSpanHelpers |      8 |  21.30 ns |  0.71 |     <- Vector128<ushort> (= 8 chars) can be used, as opposed to Vector<ushort> (= 16 chars)
|                |        |           |       |
|        Default |     15 |  37.62 ns |  1.00 |
| ViaSpanHelpers |     15 |  22.46 ns |  0.60 |     <- Vector128<ushort> (= 8 chars) can be used, as opposed to Vector<ushort> (= 16 chars)
|                |        |           |       |
|        Default |     16 |  21.11 ns |  1.00 |
| ViaSpanHelpers |     16 |  22.54 ns |  1.07 |     <- the call to the vectorized path, as opposed to vectorized path inlined in string.Replace
|                |        |           |       |
|        Default |     31 |  23.18 ns |  1.00 |
| ViaSpanHelpers |     31 |  23.82 ns |  1.03 |     <- same for the rest, but as longer the length, the less shows the cost of the call
|                |        |           |       |
|        Default |    100 |  33.61 ns |  1.00 |
| ViaSpanHelpers |    100 |  35.31 ns |  1.06 |
|                |        |           |       |
|        Default |    500 |  94.92 ns |  1.00 |
| ViaSpanHelpers |    500 |  96.93 ns |  1.02 |
|                |        |           |       |
|        Default |   1000 | 172.06 ns |  1.00 |
| ViaSpanHelpers |   1000 | 175.28 ns |  1.02 |

Is this slight regression OK?
For me yes, as it avoids code duplication and it's a few ns.

Others

For replacement of the open coded loops I didn't run benchmarks, as in each iteration there is a call to IndexOf.
Now in the worst case there is one call to ReplaceValueTypeVectorized.

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 28, 2022
@ghost
Copy link

ghost commented Sep 28, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Fixes #75322

Replaces some open coded loops that I found (or that were linked in the issue), and forwared string.Replace(char, char) to this new span-based replace.

Benchmarks

Run on x64 with AVX2.

string.Replace(char, char)

|         Method | Length |      Mean | Ratio |
|--------------- |------- |----------:|------:|
|        Default |      7 |  26.92 ns |  1.00 |
| ViaSpanHelpers |      7 |  23.40 ns |  0.87 |     <- less code in the scalar path
|                |        |           |       |
|        Default |      8 |  29.98 ns |  1.00 |
| ViaSpanHelpers |      8 |  21.30 ns |  0.71 |     <- Vector128<ushort> (= 8 chars) can be used, as opposed to Vector<ushort> (= 16 chars)
|                |        |           |       |
|        Default |     15 |  37.62 ns |  1.00 |
| ViaSpanHelpers |     15 |  22.46 ns |  0.60 |     <- Vector128<ushort> (= 8 chars) can be used, as opposed to Vector<ushort> (= 16 chars)
|                |        |           |       |
|        Default |     16 |  21.11 ns |  1.00 |
| ViaSpanHelpers |     16 |  22.54 ns |  1.07 |     <- the call to the vectorized path, as opposed to vectorized path inlined in string.Replace
|                |        |           |       |
|        Default |     31 |  23.18 ns |  1.00 |
| ViaSpanHelpers |     31 |  23.82 ns |  1.03 |     <- same for the rest, but as longer the length, the less shows the cost of the call
|                |        |           |       |
|        Default |    100 |  33.61 ns |  1.00 |
| ViaSpanHelpers |    100 |  35.31 ns |  1.06 |
|                |        |           |       |
|        Default |    500 |  94.92 ns |  1.00 |
| ViaSpanHelpers |    500 |  96.93 ns |  1.02 |
|                |        |           |       |
|        Default |   1000 | 172.06 ns |  1.00 |
| ViaSpanHelpers |   1000 | 175.28 ns |  1.02 |

Is this slight regression OK?
For me yes, as it avoids code duplication and it's a few ns.

Others

For replacement of the open coded loops I didn't run benchmarks, as in each iteration there is a call to IndexOf.
Now in the worst case there is one call to ReplaceValueTypeVectorized.

Author: gfoidl
Assignees: -
Labels:

area-System.Memory, new-api-needs-documentation

Milestone: -

Copy link
Member Author

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes for review.

}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void ReplaceValueType<T>(ref T src, ref T dst, T oldValue, T newValue, nuint length) where T : struct
Copy link
Member Author

@gfoidl gfoidl Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For string.Replace(char, char) to work, the signature has to look like this. Cf. #75322 (comment)

Codegen-wise (at least on windows) the length is passed via stack. This method will be inlined, so no problem and below for the vectorized code-path (not inlined) the cost will be amortized. So I think this isn't a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my edification, how much worse it it if string.Replace is implemented as a memcpy and then an in-place replace? 2x worse? Less than that? I'm wondering whether the two-arg case is important enough that we should be exposing it publicly, or if the in-place replace is sufficient. We went with only the two-arg case for endianness reversal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change would look like 64c7e8136aad79f074842b3ac3fc95aea78a35f8.

For the benchmarks I just tested the worst-case*, i.e. the first char is a match, so the whole string gets copied and then again from the beginning iterated over to replace -- $O(2n)$ instead of $O(n)$ with the out-of-place approach (current PR).

* string consisting all of -, then string.Replace('-', '+')

Numbers look like (AVX2 machine):

|     Method | Length |      Mean |    Median | Ratio |
|----------- |------- |----------:|----------:|------:|
| OutOfPlace |      7 |  34.65 ns |  34.10 ns |  1.00 |
|    Inplace |      7 |  34.32 ns |  33.12 ns |  1.00 |
|            |        |           |           |       |
| OutOfPlace |      8 |  26.95 ns |  26.08 ns |  1.00 |
|    Inplace |      8 |  27.53 ns |  27.27 ns |  1.01 |
|            |        |           |           |       |
| OutOfPlace |     15 |  24.33 ns |  24.33 ns |  1.00 |
|    Inplace |     15 |  45.03 ns |  43.91 ns |  1.85 |
|            |        |           |           |       |
| OutOfPlace |     16 |  25.90 ns |  24.98 ns |  1.00 |
|    Inplace |     16 |  32.42 ns |  32.27 ns |  1.16 |
|            |        |           |           |       |
| OutOfPlace |     31 |  26.86 ns |  26.69 ns |  1.00 |
|    Inplace |     31 |  36.23 ns |  36.21 ns |  1.35 |
|            |        |           |           |       |
| OutOfPlace |    100 |  37.64 ns |  37.18 ns |  1.00 |
|    Inplace |    100 |  50.73 ns |  50.67 ns |  1.35 |
|            |        |           |           |       |
| OutOfPlace |    500 | 108.17 ns | 107.71 ns |  1.00 |
|    Inplace |    500 | 131.40 ns | 129.90 ns |  1.22 |
|            |        |           |           |       |
| OutOfPlace |   1000 | 192.27 ns | 191.45 ns |  1.00 |
|    Inplace |   1000 | 223.64 ns | 221.66 ns |  1.19 |

For the best-case, i.e. no match, the runtime should be equal, as the IndexOf(oldValue)-scan is the only thing executing.
Then of course there are all variations in between.

PS: numbers are a bit flaky on re-runs, but the given numbers are quite representative (no my machine).

@gfoidl
Copy link
Member Author

gfoidl commented Sep 29, 2022

I see failures from CI like

JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.sh
Process terminated. Assertion failed.
   at System.SpanHelpers.ReplaceValueType[T](T& src, T& dst, T oldValue, T newValue, UIntPtr length)
   at System.String.Replace(Char oldChar, Char newChar)
   at UnrollEqualsStartsWIth.RunTests(Type type)
   at UnrollEqualsStartsWIth.Main()

This wasn't on my table -- will address them tomorrow. (done with eec7376, found it more quickly than thought initially)

ReplaceValueType is called from string.Replace(char, char) so the Debug.Assert was on wrong position, as at entry to method non accelerated platforms are allowed to call it.
Intentionally leave one iteration off, as the remaining elements are done vectorized anyway. This eliminates the less probable case (cf. dotnet#76337 (comment)) that the last vector is done twice.
@gfoidl gfoidl force-pushed the memoryextensions_replace branch from bc85ce3 to ebcbb9f Compare October 10, 2022 10:18
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void ReplaceValueType<T>(ref T src, ref T dst, T oldValue, T newValue, nuint length) where T : struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my edification, how much worse it it if string.Replace is implemented as a memcpy and then an in-place replace? 2x worse? Less than that? I'm wondering whether the two-arg case is important enough that we should be exposing it publicly, or if the in-place replace is sufficient. We went with only the two-arg case for endianness reversal.

@gfoidl
Copy link
Member Author

gfoidl commented Nov 7, 2022

In CI there are lots of

2022-11-07T19:29:54.8813391Z   Retrying 'FindPackagesByIdAsync' for source 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/49a1bb2b-12b0-475f-adbd-1560fc76be38/nuget/v3/flat2/system.text.encoding.extensions/index.json'.
2022-11-07T19:29:54.8815846Z   Response status code does not indicate success: 500 (Internal Server Error - Request was blocked due to exceeding usage of resource 'Concurrency' in namespace 'IPAddress'. For more information on why your request was blocked, see the topic "Rate limits" on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=823950). (DevOps Activity ID: 98AA3880-47F6-42F4-923A-307431AC4A03)).
2022-11-07T19:29:59.6386982Z   Retrying 'FindPackagesByIdAsync' for source 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/1a5f89f6-d8da-4080-b15f-242650c914a8/nuget/v3/flat2/system.reflection.extensions/index.json'.
...
2022-11-07T19:32:12.8485509Z   Response status code does not indicate success: 503 (Service Unavailable - TF10216: Azure DevOps services are currently unavailable. Try again later. Activity Id: e149fcd3-05f1-4295-9808-801bb4d7b9cf (DevOps Activity ID: E149FCD3-05F1-4295-9808-801BB4D7B9CF)).
2022-11-07T19:32:14.7782231Z   Retrying 'FindPackagesByIdAsync' for source 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/a65e5cb4-26c0-410f-9457-06db3c5254be/nuget/v3/flat2/runtime.linux-arm64.microsoft.netcore.runtime.jit.tools/index.json'.
2022-11-07T19:32:14.7785140Z   Response status code does not indicate success: 503 (Service Unavailable - TF10216: Azure DevOps services are currently unavailable. Try again later. Activity Id: 33ad97cf-a40e-4562-85fa-49f8be050393 (DevOps Activity ID: 33AD97CF-A40E-4562-85FA-49F8BE050393)).
2022-11-07T19:32:14.9826360Z   Failed to download package 'System.Runtime.4.1.0' from 'https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/45bacae2-5efb-47c8-91e5-8ec20c22b4f8/nuget/v3/flat2/system.runtime/4.1.0/system.runtime.4.1.0.nupkg'.
...

@adamsitnik
Copy link
Member

In CI there are lots of

I've re-triggered the CI, let's see if it helps.

@stephentoub stephentoub merged commit 005e280 into dotnet:main Nov 11, 2022
@gfoidl gfoidl deleted the memoryextensions_replace branch November 11, 2022 21:17
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Memory community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: MemoryExtensions.Replace(Span<T>, T, T)
3 participants