-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Add MemoryExtensions Reverse API to get parity with array #26381
Conversation
@dotnet-bot test Windows x86 Release Build |
Assert.Equal<byte>(expected, actual); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having so many different tests is overkill given the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that these were copied from the Array.Reverse tests and modified to work with Span. If that is the case, then having test parity with Array.Reverse is worthwhile regardless of the simplicity of the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At minimum, I'd remove the big memory tests - we're having trouble keeping the ones we have running as it is. Given that System.Memory seems be a dumping ground and has 37,000+ tests already, I'd favor careful monitoring of code coverage and fewer but better targeted tests.
(btw, if Array.Reverse is now using the simple implementation, I think the same argument can be made for those tests.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that these were copied from the Array.Reverse tests
The original Array.Reverse
had a pretty complex implementation because of it was half-reflection based, with multiple special fast paths implemented in C++ in the runtime.
This one has a very straightforward implementation, so the testing of many different variants is not warranted.
if Array.Reverse is now using the simple implementation
It does not quite - the non-generic one at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that these were copied from the Array.Reverse tests and modified to work with Span.
This one has a very straightforward implementation, so the testing of many different variants is not warranted.
They were not taken from the Array.Reverse tests, but I will take a look at them now and try to simplify/reduce the test set. Are there other tests aside from the ones listed below:
https://github.com/dotnet/coreclr/blob/master/tests/src/CoreMangLib/cti/system/array/arrayreverse1.cs
https://github.com/dotnet/coreclr/blob/master/tests/src/CoreMangLib/cti/system/array/arrayreverse2.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main Array.Reverse tests live in CoreFX: https://github.com/dotnet/corefx/blob/master/src/System.Runtime/tests/System/ArrayTests.cs
The tests under https://github.com/dotnet/coreclr/blob/master/tests/src/CoreMangLib/ are not great and redundant with what we have in CoreFX. I hope we will get rid of them: https://github.com/dotnet/coreclr/issues/12782
We have some System.Data.SqlClient tests failing on Windows.10.Amd64.Open / Windows.81.Amd64.Open:
Any ideas why? Here is the filed issue: https://github.com/dotnet/corefx/issues/26382 cc @saurabh500, @corivera |
@ahsonkhan according to @MattGal here, they recently fixed a CI issue that was hiding some failures. My brotli PR is almost all red for the same reason. |
From @saurabh500, #26306 (comment)
|
{ | ||
T temp = Unsafe.Add(ref p, i); | ||
Unsafe.Add(ref p, i) = Unsafe.Add(ref p, j); | ||
Unsafe.Add(ref p, j) = temp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this end up calculating each of the locations twice, or does the JIT see and remove the duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT does remove one of them, but not both (see the lea instruction). Although, the calculations are done as part of the mov/lea instructions.
00007ff8`72c60940 488b01 mov rax,qword ptr [rcx] ds:00000008`d577e178=0000018ccee62c88
00007ff8`72c60943 33d2 xor edx,edx
00007ff8`72c60945 8b4908 mov ecx,dword ptr [rcx+8]
00007ff8`72c60948 ffc9 dec ecx
00007ff8`72c6094a 85c9 test ecx,ecx
00007ff8`72c6094c 7e21 jle ConsoleApp16!Program.Reverse[[System.Int32, System.Private.CoreLib]](System.Span`1<Int32>)+0x2f (00007ff8`72c6096f)
00007ff8`72c6094e 4c63c2 movsxd r8,edx
00007ff8`72c60951 468b0c80 mov r9d,dword ptr [rax+r8*4]
00007ff8`72c60955 4e8d0480 lea r8,[rax+r8*4]
00007ff8`72c60959 4c63d1 movsxd r10,ecx
00007ff8`72c6095c 468b1c90 mov r11d,dword ptr [rax+r10*4]
00007ff8`72c60960 458918 mov dword ptr [r8],r11d
00007ff8`72c60963 46890c90 mov dword ptr [rax+r10*4],r9d
00007ff8`72c60967 ffc2 inc edx
00007ff8`72c60969 ffc9 dec ecx
00007ff8`72c6096b 3bd1 cmp edx,ecx
00007ff8`72c6096d 7cdf jl ConsoleApp16!Program.Reverse[[System.Int32, System.Private.CoreLib]](System.Span`1<Int32>)+0xe (00007ff8`72c6094e)
00007ff8`72c6096f c3 ret
But, we could re-write it as follows to produce optimal disassembly (courtesy of Levi):
// Version A
public static unsafe void Reverse<T>(Span<T> span)
{
if (span.Length > 1)
{
ref T p = ref MemoryMarshal.GetReference(span);
IntPtr i = IntPtr.Zero;
IntPtr j = (IntPtr)span.Length - 1;
do
{
var temp1 = Unsafe.Add(ref p, i);
var temp2 = Unsafe.Add(ref p, j);
Unsafe.Add(ref p, i) = temp2;
Unsafe.Add(ref p, j) = temp1;
i += 1;
j -= 1;
} while (i.ToPointer() < j.ToPointer());
}
}
00007fff`98502740 488b01 mov rax,qword ptr [rcx] ds:0000003f`fa97dbb8=0000022fca27ae90
00007fff`98502743 8b5108 mov edx,dword ptr [rcx+8]
00007fff`98502746 83fa01 cmp edx,1
00007fff`98502749 7e25 jle EndianReverse!EndianReverse.Program.Reverse[[System.Byte, System.Private.CoreLib]](System.Span`1<Byte>)+0x30 (00007fff`98502770)
00007fff`9850274b 33c9 xor ecx,ecx
00007fff`9850274d 4863d2 movsxd rdx,edx
00007fff`98502750 48ffca dec rdx
00007fff`98502753 440fb60408 movzx r8d,byte ptr [rax+rcx]
00007fff`98502758 440fb60c10 movzx r9d,byte ptr [rax+rdx]
00007fff`9850275d 44880c08 mov byte ptr [rax+rcx],r9b
00007fff`98502761 44880410 mov byte ptr [rax+rdx],r8b
00007fff`98502765 48ffc1 inc rcx
00007fff`98502768 48ffca dec rdx
00007fff`9850276b 483bca cmp rcx,rdx
00007fff`9850276e 72e3 jb EndianReverse!EndianReverse.Program.Reverse[[System.Byte, System.Private.CoreLib]](System.Span`1<Byte>)+0x13 (00007fff`98502753)
00007fff`98502770 c3 ret
However, for some reason, the following (sub-optimal) implementation ends up doing slightly better in the micro-benchmarks (especially for Guids):
// Version B
public static unsafe void Reverse<T>(this Span<T> span)
{
if (span.Length > 1)
{
ref T p = ref MemoryMarshal.GetReference(span);
IntPtr i = IntPtr.Zero;
IntPtr j = (IntPtr)span.Length - 1;
do
{
T temp = Unsafe.Add(ref p, i);
Unsafe.Add(ref p, i) = Unsafe.Add(ref p, j);
Unsafe.Add(ref p, j) = temp;
i += 1;
j -= 1;
} while (i.ToPointer() < j.ToPointer());
}
}
00007fff`98512740 488b01 mov rax,qword ptr [rcx] ds:000000d5`db37dd08=00000218b916ae90
00007fff`98512743 8b5108 mov edx,dword ptr [rcx+8]
00007fff`98512746 83fa01 cmp edx,1
00007fff`98512749 7e28 jle EndianReverse!EndianReverse.Program.Reverse[[System.Byte, System.Private.CoreLib]](System.Span`1<Byte>)+0x33 (00007fff`98512773)
00007fff`9851274b 33c9 xor ecx,ecx
00007fff`9851274d 4863d2 movsxd rdx,edx
00007fff`98512750 48ffca dec rdx
00007fff`98512753 440fb60408 movzx r8d,byte ptr [rax+rcx]
00007fff`98512758 4c8d0c08 lea r9,[rax+rcx]
00007fff`9851275c 440fb61410 movzx r10d,byte ptr [rax+rdx]
00007fff`98512761 458811 mov byte ptr [r9],r10b
00007fff`98512764 44880410 mov byte ptr [rax+rdx],r8b
00007fff`98512768 48ffc1 inc rcx
00007fff`9851276b 48ffca dec rdx
00007fff`9851276e 483bca cmp rcx,rdx
00007fff`98512771 72e0 jb EndianReverse!EndianReverse.Program.Reverse[[System.Byte, System.Private.CoreLib]](System.Span`1<Byte>)+0x13 (00007fff`98512753)
00007fff`98512773 c3 ret
Should I change it to the version that produces the most optimal codegen? I can't really explain the micro-benchmark results that I am seeing though since they conflict with the disassembly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably the codegen for Guids looks differently different; not just that it has an extra lea =>r9 over the "A" version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would stick with the original, especially if we've used that idiomatic sequence for swapping array elements elsewhere. Best not to have too many variant patterns for the same thing if we can help it.
If you think the codegen for that version can be improved, please open an issue over in CoreCLR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue and keeping the implementation unchanged:
https://github.com/dotnet/coreclr/issues/15937
span = actual; | ||
span.Slice(2, 0).Reverse(); | ||
|
||
Assert.Equal<byte>(expected, span.ToArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a test of a empty span
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is testing that reversing an empty slice of a span does not reverse anything in the backing array.
I propose changing the core loop implementation of Original ref T p = ref Unsafe.As<byte, T>(ref array.GetRawSzArrayData());
int i = index;
int j = index + length - 1;
while (i < j)
{
T temp = Unsafe.Add(ref p, i);
Unsafe.Add(ref p, i) = Unsafe.Add(ref p, j);
Unsafe.Add(ref p, j) = temp;
i++;
j--;
} (I've omitted the bounds checks at the beginning of the routine. This routine is compiled for 00007ff8`c205542d 4883c110 add rcx,10h
00007ff8`c2055431 8bc2 mov eax,edx
00007ff8`c2055433 428d5400ff lea edx,[rax+r8-1]
00007ff8`c2055438 3bc2 cmp eax,edx
00007ff8`c205543a 7d21 jge 00007ff8`c205545d ; break if i >= j
00007ff8`c205543c 4c63c0 movsxd r8,eax ; start of loop
00007ff8`c205543f 468b0c81 mov r9d,dword ptr [rcx+r8*4]
00007ff8`c2055443 4e8d0481 lea r8,[rcx+r8*4]
00007ff8`c2055447 4c63d2 movsxd r10,edx
00007ff8`c205544a 468b1c91 mov r11d,dword ptr [rcx+r10*4]
00007ff8`c205544e 458918 mov dword ptr [r8],r11d
00007ff8`c2055451 46890c91 mov dword ptr [rcx+r10*4],r9d
00007ff8`c2055455 ffc0 inc eax
00007ff8`c2055457 ffca dec edx
00007ff8`c2055459 3bc2 cmp eax,edx
00007ff8`c205545b 7cdf jl 00007ff8`c205543c ; end of loop
The main loop is 33 bytes of codegen consisting of 11 instructions. New if (length >= 1)
{
ByReference<T> src = new ByReference<T>(ref Unsafe.Add(ref Unsafe.As<byte, T>(ref array.GetRawSzArrayData()), index));
ByReference<T> dest = new ByReference<T>(ref Unsafe.Add(ref src.Value, (IntPtr)length - 1));
do
{
T temp = src.Value;
src.Value = dest.Value;
dest.Value = temp;
src = new ByReference<T>(ref Unsafe.Add(ref src.Value, 1));
dest = new ByReference<T>(ref Unsafe.Add(ref dest.Value, -1));
} while (Unsafe.IsBefore(ref src.Value, ref dest.Value));
} 00007ff8`c32615ed 4585c0 test r8d,r8d
00007ff8`c32615f0 7e2c jle 00007ff8`c326161e ; break if length <= 1
00007ff8`c32615f2 4883c110 add rcx,10h
00007ff8`c32615f6 4863c2 movsxd rax,edx
00007ff8`c32615f9 488d0c81 lea rcx,[rcx+rax*4]
00007ff8`c32615fd 4963c0 movsxd rax,r8d
00007ff8`c3261600 48ffc8 dec rax
00007ff8`c3261603 488d0481 lea rax,[rcx+rax*4]
00007ff8`c3261607 8b11 mov edx,dword ptr [rcx] ; start of loop
00007ff8`c3261609 448b00 mov r8d,dword ptr [rax]
00007ff8`c326160c 448901 mov dword ptr [rcx],r8d
00007ff8`c326160f 8910 mov dword ptr [rax],edx
00007ff8`c3261611 4883c104 add rcx,4
00007ff8`c3261615 4883c0fc add rax,0FFFFFFFFFFFFFFFCh
00007ff8`c3261619 483bc8 cmp rcx,rax
00007ff8`c326161c 72e9 jb 00007ff8`c3261607 ; end of loop The main loop is 23 bytes of codegen consisting of 8 instructions. The "indirect reference computed from a base and an offset with a known stride length" behavior is replaced with a simple reference. This is accomplished by incrementing / decrementing the
[Intrinsic]
[NonVersionable]
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static bool IsBefore<T>(ref T left, ref T right)
{
throw new PlatformNotSupportedException();
// ldarg.0
// ldarg.1
// clt.un
// ret
} This change results in an approx. 25 - 30% speedup on my machine where T's size is <= the native word length (including references). There are diminishing returns if T is a larger struct like |
|
You may want create an API proposal issue to add |
Perhaps the results above can be used as an additional argument to prioritize development of the ref reassignment feature given the large perf gains we can milk from it? |
@agocke @jaredpar When can we expect "ref reassigment" feature to land? We can use it to get a pretty significant perf wins as demonstrated by @GrabYourPitchforks |
API proposal for |
In that case, I will keep the original implementation, as is. |
…efx#26381) * Add MemoryExtensions Reverse API to get parity with array * Fix tests and add large byte array test * Remove some redundant tests and the big memory test. Commit migrated from dotnet/corefx@ac70074
Fixes https://github.com/dotnet/corefx/issues/25850
Implementation is similar to Array.Reverse: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs#L1565-L1587
cc @jkotas, @KrzysztofCwalina, @dotnet/corefxlab-contrib, @atsushikan