Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Move DateTime for Unix to shared partition #22383

Merged
merged 4 commits into from
Feb 8, 2019
Merged

Conversation

marek-safar
Copy link

No description provided.

@marek-safar marek-safar force-pushed the shared2 branch 2 times, most recently from 7bceafe to 04711bb Compare February 3, 2019 22:02
Copy link
Member

@filipnavara filipnavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small conflict with dotnet/corert#6932. Actually not a conflict, just a duplication of the Interop prototype. Otherwise the approach is similar to what I did in #21848, so I agree with it, but it may have small performance penalty. Likely smaller than on Windows in my benchmarks since the P/Invoke helpers are implemented differently.

@marek-safar
Copy link
Author

It's not a conflict, your change actually makes the unification simpler which is great. I don't know what's the perf difference between fcall and pinvoke for coreclr (hopefully not significant)

@filipnavara
Copy link
Member

I don't know what's the perf difference between fcall and pinvoke for coreclr (hopefully not significant)

Bad news: It is significant enough that it still matters on hot paths. When benchmarking UtcNow on 64-bit Windows it was ~2.5ns per P/Invoke call, or slightly less than 10% performance degradation on the whole API (which is basically dominated by the P/Invoke).

Good news: It is way less than I expected and it can likely by improved. Furthermore, the numbers could be different on Unix and they are definitely different on 32-bit Windows due to differences in implementation of CORINFO_HELP_INIT_PINVOKE_FRAME.

@filipnavara
Copy link
Member

I already have the benchmark handy, so I will run it on Mac tomorrow when I get to office.

@filipnavara
Copy link
Member

filipnavara commented Feb 4, 2019

This on .NET Core 3.0 Preview 2 runtime:

Method Mean Error StdDev
UtcNow [.NET Core 3.0 Preview 2] 33.59 ns 0.0863 ns 0.0720 ns
UtcNow [P/Invoke] 38.80 ns 0.0483 ns 0.0452 ns

I don't know if the difference is only the P/Invoke or if there's some other possible explanation for it (eg. System.Private.CoreLib being crossgen-ed for the first benchmark, while the second benchmark running as application code). I'm trying to figure out how to get more detailed picture of what causes it. Technically these native APIs should work with blittable types and there should be no marshalling overhead. Aside from that the differences between P/Invoke and InternalCall that I know about are 1) P/Invoke switches the thread to preemptive GC mode and 2) erects a frame on the stack. I would expect some overhead, but not nearly this big. My previous measurements showed that the overhead on 32-bit Windows was the lowest (~5%) and I expected macOS to be in the same ballpark.

@filipnavara
Copy link
Member

For completeness (Mono 5.18.0.245 runtime):

Method Mean Error StdDev
UtcNow [Mono 5.18.0.245] 30.61 ns 0.0263 ns 0.0220 ns
UtcNow [P/Invoke] 35.38 ns 0.0869 ns 0.0813 ns

[DllImport(Libraries.SystemNative, EntryPoint = "SystemNative_GetSystemTimeAsTicks")]
internal static extern long GetSystemTimeAsTicks();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference moving forward is for files like this that already exist in corefx to be moved in a corefx PR to the right shared location, and then have it be mirrored over to coreclr/corert/mono. I realize that adds a small amount of friction, in that there's a small delay, but it's helpful for history, and also for avoiding the duplication that we end up getting when we forget to subsequently move corefx over to the shared one, and we end up with the same file multiple times and corefx not using the same one that coreclr/corert/mono are using.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done that way for this particular one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done that way for this particular one.

Oh, ok, thanks. I commented on it because it's showing up here as new in the diff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you restated the general guidance because I had to learn it too. The PR has a "conflict" with the file that was moved (the contents are identical, history in CoreFX is preserved).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this PR just hasn't been rebased yet. Thanks.

@jkotas
Copy link
Member

jkotas commented Feb 7, 2019

shared/Interop/Unix/System.Native/Interop.GetSystemTimeAsTicks.cs(9,10): error CS0246: The type or namespace name 'DllImportAttribute' could not be found (are you missing a using directive or an assembly reference?)

@jkotas
Copy link
Member

jkotas commented Feb 7, 2019

@stephentoub Do you think that the performance regression that this introduces is acceptable?

@@ -1115,7 +1115,7 @@
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetPwUid.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetRandomBytes.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetSystemTimeAsTicks.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetTimestamp.cs" Condition="'$(FeaturePortableThreadPool)' == 'true'" />
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetTimestamp.cs" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to remove the condition here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, it was a bad rebase conflict resolution on my side

@stephentoub
Copy link
Member

Do you think that the performance regression that this introduces is acceptable?

Am I reading things correctly that the regression is due to trying to share effectively one additional line of code? If so, doesn't seem worth the regression. e.g. we could still share the file but just have:

return new DateTime(((ulong)(Interop.Sys.GetSystemTimeAsTicks() + UnixEpochTicks)) | KindUtc);

instead be:

return new DateTime(((ulong)(GetSystemTimeAsTicks() + UnixEpochTicks)) | KindUtc);

and have GetSystemTimeAsTicks be implemented to call Interop.Sys.GetSystemTimeAsTicks on mono/corert and GetSystemTimeAsFileTime in coreclr, or something along those lines, and that would avoid the regression?

@marek-safar
Copy link
Author

Am I reading things correctly that the regression is due to trying to share effectively one additional line of code?

Well, you can read it that way but it's actually about sharing GetSystemTimeAsTicks implementation. I think it'd be beneficial to have a single implementation for PAL outside of VM/runtime. Right now that's achieved by using p-invoke to implementation which lives in CoreFX. I don't know the history behind it but if this is sub-optimal setup performance wise we should perhaps address this in general as I'd be surprised if this method is the only one where it matters.

@stephentoub
Copy link
Member

but it's actually about sharing GetSystemTimeAsTicks implementation

That can be shared without regressing coreclr, though, right? The implementation you're trying to share is Interop.Sys.GetSystemTimeAsTicks, so I'm simply suggesting shielding the call to that such that coreclr continues to have its existing faster path.

if this is sub-optimal setup performance wise we should perhaps address this in general

If there are things we can do to improve P/Invoke overheads, we should. But I suspect that won't happen for 3.0.

@filipnavara
Copy link
Member

...or something along those lines, and that would avoid the regression?

Yes, it would. Same thing could be done for the Windows version of the file. It still bothers me to do that because of inferior P/Invoke performance, but it's improvement over the current state of affairs.

My previous measurements showed that the overhead on 32-bit Windows was the lowest (~5%) and I expected macOS to be in the same ballpark.

I found one flaw in the conclusions of my benchmarks. On Windows in my PR there was a minor performance improvement due to moving the GetSystemTimeAsFileTime / GetSystemTimePreciseAsFileTime selection to managed code. The JIT tiered recompilation eventually compiles it completely away. The P/Invoke hurts the code path, but this gets a tiny little bit of the performance regression back.

@filipnavara
Copy link
Member

If there are things we can do to improve P/Invoke overheads, we should. But I suspect that won't happen for 3.0.

I was looking into it and it doesn't look easy. Making CORINFO_HELP_INIT_PINVOKE_FRAME inlinable on 64-bit Windows would likely help on the Windows side of things. The generated x86 code is slightly better than the x64 code in that respect. That won't help on Linux/macOS side though.

I was even considering something like let-me-shoot-in-the-foot-and-do-unsafe-PInvoke-optimizations CoreLib-only attribute. A runtime unaware of this attribute would generate standard P/Invoke with frame and GC preemptive mode. CoreCLR could interpret the attribute and treat it more like an internal call and avoid some of the overhead. Not very happy about this solution, but it could be implemented relatively quickly with no impact on code not marked with such special attribute.

@jkotas
Copy link
Member

jkotas commented Feb 7, 2019

if this is sub-optimal setup performance wise

PInvoke setup has to handle PInvokes that take arbitrary amount of time - we do not want to block the GC for arbitrary amount of time. It is about as good as it gets for the general case. We may be able to shave like two instructions out of it if we try, but that won't move the needle.

The existing FCall path avoids the PInvoke overhead. It is ok to do that because of we know the GetSystemTimeAsFileTime function should take less than microsecond (blocking GC for a microsecond is ok) and does not do any other problematic actions like taking locks that can deadlock with GC.

If we wanted to avoid the PInvoke overhead for PInvokes in the above category, we would need to introduce @filipnavara's let-me-shoot-in-the-foot-and-do-unsafe-PInvoke-optimizations attribute to mark them. The runtime would recognize the attribute and skip the proper PInvoke setup for these. I suspect that it would be another one of those helps-in-microbenchmarks features like AggressiveInlining that folks would use incorrectly half of the time. It would be a proper way to do what https://github.com/dotnet/coreclr/issues/22320 is asking for.

let-me-shoot-in-the-foot-and-do-unsafe-PInvoke-optimizations CoreLib-only attribute.

I do not think it would be worth it to do as CoreLib-only feature. The FCalls serve the purpose for CoreLib. I understand that we would be able to share a bit more code via System.Native if we had this attribute, but it is not worth it just for that. If we do this, it should be proper public facing feature, tested, documented, etc.

@filipnavara
Copy link
Member

The FCalls serve the purpose for CoreLib.

The annoying thing about them is that you have to implement the unmanaged code and at the same time you lose the JIT optimizations and portability to other runtimes.

For the straight-to-Win/PAL-API calls the fact that you have to use unmanaged code is just a barrier to entry that is much higher than let-me-shoot-in-the-foot-and-do-unsafe-PInvoke-optimizations attribute. It's literally 10x more code. Sure, people won't misuse it like AggressiveInlining, but is it worth it? I don't know.

Missing the JIT optimizations is something much more subtle and non-obvious. The implementation of GetSystemTimeAsFileTime is a good example. The unmanaged code has unguarded code for deciding whether GetSystemTimeAsFileTime/GetSystemTimePreciseAsFileTime should be used. It looks like a missing optimization, but in theory it's a bug. The code could be executed in parallel from two different threads. Since the decision to use one or the other is not deterministic each of the threads can come to a different conclusion on which method should be used. In one thread DateTime.UtcNow can return precise value, while on the other it can return the imprecise one. Very unlikely, but this can result in observable behavior where the time flows backwards (if you have a second time source and synchronize the threads very precisely).

Implementing this correctly in managed code is trivial (static readonly property with assigned initial value computed by a method). Aside from the correctness you also get additional optimization from the tiered compilation where the static readonly property is completely eliminated from the code on recompilation. It is just an indirect call in the unmanaged code, but even that has some tiny measurable impact.

@marek-safar
Copy link
Author

I have picked up simple case intentionally to explore this and will #if-def it out for CoreCLR to land this PR but I think it's worth coming up with a better solution instead of risking different behaviour because the code lives at different places where it should actually be identical.

@stephentoub
Copy link
Member

I have picked up simple case intentionally to explore this

Next time maybe call that out in the PR description 😉

@marek-safar
Copy link
Author

Next time maybe call that out in the PR description

It'd then never land ;-)

@filipnavara
Copy link
Member

Next time maybe call that out in the PR description

It'd then never land ;-)

Sorry for spoiling the fun :) If I didn't try it before on exactly the same piece of code it could have swept through silently.

@stephentoub
Copy link
Member

It'd then never land

I think you're joking 😄 but just in case, let's try to be very upfront about these kinds of things rather than hoping that tradeoffs being made to favor shareability over performance go unnoticed by reviewers.

@marek-safar
Copy link
Author

yeah, that joke didn't go down well. The intention was not to hide anything and the performance was discussed from the first comment on. I have to admit I am surprised by the perf impact on CoreCLR though and didn't expect that.

@jkotas
Copy link
Member

jkotas commented Feb 7, 2019

annoying thing about them is that you have to implement the unmanaged code

Yep, we would love to implement the whole runtime in C#. We do not have all the right tools for it yet. We have a lot more tools for it compared to few years ago. Note that new tools that help with this are public features (e.g. ref locals to give one example) that everybody benefits from. It makes the cost/benefit ratio favorable. Similar tools added just for the use of the runtime itself tend to have a lot less favorable cost/benefit ratio (have been there before).

in theory it's a bug. The code could be executed in parallel from two different threads. Since the decision to use one or the other is not deterministic each of the threads can come to a different conclusion on which method should be used

#22466

@jkotas jkotas merged commit 4c480a5 into dotnet:master Feb 8, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Feb 8, 2019
* Move DateTime for Unix to shared partition

* Keep CoreCLR specific implementation

Signed-off-by: dotnet-bot <[email protected]>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/mono that referenced this pull request Feb 8, 2019
* Move DateTime for Unix to shared partition

* Keep CoreCLR specific implementation

Signed-off-by: dotnet-bot <[email protected]>
marek-safar added a commit to mono/mono that referenced this pull request Feb 8, 2019
* Move DateTime for Unix to shared partition

* Keep CoreCLR specific implementation

Signed-off-by: dotnet-bot <[email protected]>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Feb 8, 2019
* Move DateTime for Unix to shared partition

* Keep CoreCLR specific implementation

Signed-off-by: dotnet-bot <[email protected]>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request Feb 8, 2019
* Move DateTime for Unix to shared partition

* Keep CoreCLR specific implementation

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/corert that referenced this pull request Feb 8, 2019
* Move DateTime for Unix to shared partition

* Keep CoreCLR specific implementation

Signed-off-by: dotnet-bot <[email protected]>
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Move DateTime for Unix to shared partition

* Keep CoreCLR specific implementation


Commit migrated from dotnet/coreclr@4c480a5
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants