-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IL Generation: struct and struct records methods are slower #5136
Comments
I changed the benchmark and I get something even more disturbing: BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-4650U CPU 1.70GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=2.1.300
[Host] : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT DEBUG
Core : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
Job=Core Runtime=Core
For all these tests, I'm using a struct that contains a single int, and an operation next that build a new struct containing an incremented value. The The tests with suffix The disturbing thing is that for struct and struct records, the property version is 6.5 slower that others. |
Here is the code for the different structs: [<Struct>]
type SingleStruct =
val value: int
new(value) = { value = value }
member x.Next = SingleStruct(x.value + 1)
let snext (x:SingleStruct) = SingleStruct(x.value + 1)
[<Struct>]
type SingleCase = SingleCase of int
with
member x.Next = let (SingleCase v) = x in SingleCase(v+1)
let scnext (SingleCase v) = SingleCase(v+1)
[<Struct>]
type SingleRec = { value : int }
with
member x.Next = { value = x.value + 1 }
let rnext x = { value = x.value + 1 } |
And the benchmark: [<CoreJob>]
type Benchm2() =
[<Benchmark>]
member __.Struct() =
let mutable s = SingleStruct(0)
for _ in 1 .. 100000 do
s <- s.Next
[<Benchmark>]
member __.Struct_f() =
let mutable s = SingleStruct(0)
for _ in 1 .. 100000 do
s <- snext s
[<Benchmark>]
member __.Union() =
let mutable s = SingleCase 0
for _ in 1 .. 100000 do
s <- s.Next
[<Benchmark>]
member __.Union_f() =
let mutable s = SingleCase 0
for _ in 1 .. 100000 do
s <- scnext s
[<Benchmark>]
member __.Rec() =
let mutable s = { value = 0 }
for _ in 1 .. 100000 do
s <- s.Next
[<Benchmark>]
member __.Rec_f() =
let mutable s = { value = 0 }
for _ in 1 .. 100000 do
s <- rnext s
[<Benchmark(Baseline=true)>]
member __.Int() =
let mutable s = 0
for _ in 1 .. 100000 do
s <- s + 1 |
Several interesting point looking at the IL:
Here is the inlined part of the loop for StringStruct when using the .Next property :
and the version when using the function:
Same number of instructions, but with a slight different layout for the 3 first instructions. The second one get simplified by the JIT to code equivalent to using directly an Int32, the first one takes 6.5 times more time. |
@thinkbeforecoding You pass the struct by value to the functions and I guess it passed by value to the extension methods too. Fun fact: you can have near native int speed with ref structs (at least with the C# implementation right now). Unfortunately NET Core 2.1 F# compiler does not have Span support yet (ref struct shipped with this) but you can try it out with .NET (afaik it shipped with VS 15.8): #4888 using System;
using System.Runtime.CompilerServices;
namespace CSharpStruct
{
public ref struct CSharpStruct
{
private int _value;
public CSharpStruct(int value)
{
_value = value;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public CSharpStruct Next()
{
return new CSharpStruct(_value+1);
}
}
}
Example project - based on your code - are attached: |
Strange that you have 1.5x time for Struct Union where I have something closer to 1.0x ?? |
@thinkbeforecoding I guess it's due the different method alignment and platform ABI (and optimization levels / bugs) for passing structs: https://github.com/dotnet/coreclr/issues/16619 and https://github.com/dotnet/coreclr/issues/6264 My benchmark was ran on Ubuntu 16.04 x86_64 & .NET Core 2.1. |
Another run, with some small modification - passing byrefs to snext and rnext functions will makes it lot slower. At least the ref struct performance is consistent.
|
I haven't followed the whole discussion but I looked at the IL difference between
Is it just some random alignment problem or is there a genuine issue here? |
This feels related to #2688. Basically I think
and
should both respect the byrefness of
and
They should become:
and
|
@abelbraaksma Ah no, I don't think that's the cause of the difference between Rec and Rec_f after all. It's still a good thing to fix however |
@dsyme The performance overhead seems to be a legitimate issue here. It also seems that a small difference in the IL could make a huge difference in performance. Passing a struct by reference with 5x performance penality seems just too high to me. Update: It seems that Rec and Rec_fByRef have exactly the same generated bytecode.
Rec:
Rec_f:
Rec_fByRef:
The updated benchmark is here: `` The performance fluctuation seems too large (~50%) for method and loop alignment, but I did not yet investigated it. |
Well it seems that #2688 is really similar or the same to the issue here: fsharp/fslang-design#287 (comment) and the response here: fsharp/fslang-design#287 (comment) "Basically we need to rejig the pattern match compiler in F# so it can accept an LValue as its starting point rather than RValue. It's fairly conceptually straightforward but needs care." |
@zpodlovics Yes, it's the same issue |
https://github.com/Microsoft/visualfsharp/compare/master...dsyme:fix-5136?expand=1 is a quick draft of a possible fix for
This actually fixes the perf problems in this thread (I think the perf improvements are from the second though in other cases the first will help a lot as well)
|
Awsome ! |
You probably meant @zpodlovics? Though I'm honored ;) |
Closing as #5148 is merged. |
Doing a diff between structs and struct records generated code, I noticed that an extra copy is done in methods using copy and update.
Repro steps
Expected behavior
IL should be the same
Actual behavior
There is a few extra instructions for struct records:
the rest is exactly similar.
Known workarounds
Use struct types.
Related information
Is this needed ? The Structure.NextLine seems ok without it.
These lines seems generated by a call to
mkAddrGet
function (TastOps.fs) for byref value types.The text was updated successfully, but these errors were encountered: