-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Move DateTime for Unix to shared partition #22383
Conversation
7bceafe
to
04711bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small conflict with dotnet/corert#6932. Actually not a conflict, just a duplication of the Interop prototype. Otherwise the approach is similar to what I did in #21848, so I agree with it, but it may have small performance penalty. Likely smaller than on Windows in my benchmarks since the P/Invoke helpers are implemented differently.
It's not a conflict, your change actually makes the unification simpler which is great. I don't know what's the perf difference between fcall and pinvoke for coreclr (hopefully not significant) |
Bad news: It is significant enough that it still matters on hot paths. When benchmarking Good news: It is way less than I expected and it can likely by improved. Furthermore, the numbers could be different on Unix and they are definitely different on 32-bit Windows due to differences in implementation of CORINFO_HELP_INIT_PINVOKE_FRAME. |
I already have the benchmark handy, so I will run it on Mac tomorrow when I get to office. |
This on .NET Core 3.0 Preview 2 runtime:
I don't know if the difference is only the P/Invoke or if there's some other possible explanation for it (eg. System.Private.CoreLib being crossgen-ed for the first benchmark, while the second benchmark running as application code). I'm trying to figure out how to get more detailed picture of what causes it. Technically these native APIs should work with blittable types and there should be no marshalling overhead. Aside from that the differences between P/Invoke and InternalCall that I know about are 1) P/Invoke switches the thread to preemptive GC mode and 2) erects a frame on the stack. I would expect some overhead, but not nearly this big. My previous measurements showed that the overhead on 32-bit Windows was the lowest (~5%) and I expected macOS to be in the same ballpark. |
For completeness (Mono 5.18.0.245 runtime):
|
[DllImport(Libraries.SystemNative, EntryPoint = "SystemNative_GetSystemTimeAsTicks")] | ||
internal static extern long GetSystemTimeAsTicks(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference moving forward is for files like this that already exist in corefx to be moved in a corefx PR to the right shared location, and then have it be mirrored over to coreclr/corert/mono. I realize that adds a small amount of friction, in that there's a small delay, but it's helpful for history, and also for avoiding the duplication that we end up getting when we forget to subsequently move corefx over to the shared one, and we end up with the same file multiple times and corefx not using the same one that coreclr/corert/mono are using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done that way for this particular one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done that way for this particular one.
Oh, ok, thanks. I commented on it because it's showing up here as new in the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you restated the general guidance because I had to learn it too. The PR has a "conflict" with the file that was moved (the contents are identical, history in CoreFX is preserved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this PR just hasn't been rebased yet. Thanks.
|
@stephentoub Do you think that the performance regression that this introduces is acceptable? |
@@ -1115,7 +1115,7 @@ | |||
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetPwUid.cs" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetRandomBytes.cs" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetSystemTimeAsTicks.cs" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetTimestamp.cs" Condition="'$(FeaturePortableThreadPool)' == 'true'" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)Interop\Unix\System.Native\Interop.GetTimestamp.cs" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need to remove the condition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, it was a bad rebase conflict resolution on my side
Am I reading things correctly that the regression is due to trying to share effectively one additional line of code? If so, doesn't seem worth the regression. e.g. we could still share the file but just have: return new DateTime(((ulong)(Interop.Sys.GetSystemTimeAsTicks() + UnixEpochTicks)) | KindUtc); instead be: return new DateTime(((ulong)(GetSystemTimeAsTicks() + UnixEpochTicks)) | KindUtc); and have GetSystemTimeAsTicks be implemented to call Interop.Sys.GetSystemTimeAsTicks on mono/corert and GetSystemTimeAsFileTime in coreclr, or something along those lines, and that would avoid the regression? |
Well, you can read it that way but it's actually about sharing GetSystemTimeAsTicks implementation. I think it'd be beneficial to have a single implementation for PAL outside of VM/runtime. Right now that's achieved by using p-invoke to implementation which lives in CoreFX. I don't know the history behind it but if this is sub-optimal setup performance wise we should perhaps address this in general as I'd be surprised if this method is the only one where it matters. |
That can be shared without regressing coreclr, though, right? The implementation you're trying to share is Interop.Sys.GetSystemTimeAsTicks, so I'm simply suggesting shielding the call to that such that coreclr continues to have its existing faster path.
If there are things we can do to improve P/Invoke overheads, we should. But I suspect that won't happen for 3.0. |
Yes, it would. Same thing could be done for the Windows version of the file. It still bothers me to do that because of inferior P/Invoke performance, but it's improvement over the current state of affairs.
I found one flaw in the conclusions of my benchmarks. On Windows in my PR there was a minor performance improvement due to moving the GetSystemTimeAsFileTime / GetSystemTimePreciseAsFileTime selection to managed code. The JIT tiered recompilation eventually compiles it completely away. The P/Invoke hurts the code path, but this gets a tiny little bit of the performance regression back. |
I was looking into it and it doesn't look easy. Making I was even considering something like let-me-shoot-in-the-foot-and-do-unsafe-PInvoke-optimizations CoreLib-only attribute. A runtime unaware of this attribute would generate standard P/Invoke with frame and GC preemptive mode. CoreCLR could interpret the attribute and treat it more like an internal call and avoid some of the overhead. Not very happy about this solution, but it could be implemented relatively quickly with no impact on code not marked with such special attribute. |
PInvoke setup has to handle PInvokes that take arbitrary amount of time - we do not want to block the GC for arbitrary amount of time. It is about as good as it gets for the general case. We may be able to shave like two instructions out of it if we try, but that won't move the needle. The existing FCall path avoids the PInvoke overhead. It is ok to do that because of we know the GetSystemTimeAsFileTime function should take less than microsecond (blocking GC for a microsecond is ok) and does not do any other problematic actions like taking locks that can deadlock with GC. If we wanted to avoid the PInvoke overhead for PInvokes in the above category, we would need to introduce @filipnavara's let-me-shoot-in-the-foot-and-do-unsafe-PInvoke-optimizations attribute to mark them. The runtime would recognize the attribute and skip the proper PInvoke setup for these. I suspect that it would be another one of those helps-in-microbenchmarks features like
I do not think it would be worth it to do as CoreLib-only feature. The FCalls serve the purpose for CoreLib. I understand that we would be able to share a bit more code via |
The annoying thing about them is that you have to implement the unmanaged code and at the same time you lose the JIT optimizations and portability to other runtimes. For the straight-to-Win/PAL-API calls the fact that you have to use unmanaged code is just a barrier to entry that is much higher than let-me-shoot-in-the-foot-and-do-unsafe-PInvoke-optimizations attribute. It's literally 10x more code. Sure, people won't misuse it like Missing the JIT optimizations is something much more subtle and non-obvious. The implementation of Implementing this correctly in managed code is trivial (static readonly property with assigned initial value computed by a method). Aside from the correctness you also get additional optimization from the tiered compilation where the |
I have picked up simple case intentionally to explore this and will #if-def it out for CoreCLR to land this PR but I think it's worth coming up with a better solution instead of risking different behaviour because the code lives at different places where it should actually be identical. |
Next time maybe call that out in the PR description 😉 |
It'd then never land ;-) |
Sorry for spoiling the fun :) If I didn't try it before on exactly the same piece of code it could have swept through silently. |
I think you're joking 😄 but just in case, let's try to be very upfront about these kinds of things rather than hoping that tradeoffs being made to favor shareability over performance go unnoticed by reviewers. |
yeah, that joke didn't go down well. The intention was not to hide anything and the performance was discussed from the first comment on. I have to admit I am surprised by the perf impact on CoreCLR though and didn't expect that. |
Yep, we would love to implement the whole runtime in C#. We do not have all the right tools for it yet. We have a lot more tools for it compared to few years ago. Note that new tools that help with this are public features (e.g.
|
* Move DateTime for Unix to shared partition * Keep CoreCLR specific implementation Signed-off-by: dotnet-bot <[email protected]>
* Move DateTime for Unix to shared partition * Keep CoreCLR specific implementation Signed-off-by: dotnet-bot <[email protected]>
* Move DateTime for Unix to shared partition * Keep CoreCLR specific implementation Signed-off-by: dotnet-bot <[email protected]>
* Move DateTime for Unix to shared partition * Keep CoreCLR specific implementation Signed-off-by: dotnet-bot <[email protected]>
* Move DateTime for Unix to shared partition * Keep CoreCLR specific implementation Signed-off-by: dotnet-bot <[email protected]>
* Move DateTime for Unix to shared partition * Keep CoreCLR specific implementation Signed-off-by: dotnet-bot <[email protected]>
* Move DateTime for Unix to shared partition * Keep CoreCLR specific implementation Commit migrated from dotnet/coreclr@4c480a5
No description provided.