-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undoing boxing doesn't work with C# 7 pattern matching #10195
Comments
The |
Was curious if the prototype stack allocation code (see #4584) could unblock this, but not quite. What the One might imagine that the devirtualization code could always ask for the unboxed entry if there is one (or the initial devirtualization check with the VM could return a flag saying this is the boxed entry) and then we could update the devirtualized call to invoke the unboxed entry on the internals of the boxed object. We would know at this point that the call could not make the boxed object escape (actually that is more generally true: calls to value class methods can't make their "this" pointers escape). So even if the method did not get inlined the stack alloc code might kick in and remove the newobj. But we'd still end up making a copy and it's possible that copy or the payload pointer math might confuse us and block stack allocation. Still think we are better off pursuing the "Multi-use Box" ideas where instead of having a Box temp we forward the Box itself around, and refcount it so that if we manage to knock out enough uses we can remove the newobj that way. In this case that might mean using the single-def local non-null info I've also been prototyping to remove the early null check after the isinst, so that the dual-use box is immeidately reduced to single-use and normal devirt + unboxing + inlining would apply. |
@AndyAyersMS this seems more important to optimize as pattern matching becomes more prevalent. |
@davidfowl agreed. Unfortunately it's not easy. We need to do this early in the jit, but the "multi-use" box optimization requires a moderate amount of analysis (in particular: find all the consumers of the box, and to verify none of them can modify the boxed value), and our powers of deduction early in the jit are limited. |
Suspect we won't get to this in .NET 9, so moving to future. |
Equation changes when there are multiple usages: [MethodImpl(MethodImplOptions.NoInlining)]
static void Cast<T>(T thing)
{
if (thing is IAnimal)
{
var animal = (IAnimal)thing;
animal.MakeSound();
animal.MakeSound();
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void Pattern<T>(T thing)
{
if (thing is IAnimal animal)
{
animal.MakeSound();
animal.MakeSound();
}
} now Pattern() has better perf score. with loop, codegen becomes identical: [MethodImpl(MethodImplOptions.NoInlining)]
static void Cast<T>(T thing)
{
if (thing is IAnimal)
{
var animal = ((IAnimal)thing);
for (int i = 0; i < 10_000; i++)
animal.MakeSound();
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void Pattern<T>(T thing)
{
if (thing is IAnimal animal)
{
for (int i = 0; i < 10_000; i++)
animal.MakeSound();
}
} so it's essentially the single-use scenario for which |
Enable object stack allocation for ref classes and extend the support to include boxed value classes. Use a specialized unbox helper for stack allocated boxes, both to avoid apparent escape of the box by the helper, and to ensure all box field accesses are visible to the JIT. Update the local address visitor to rewrite trees involving address of stack allocated boxes in some cases to avoid address exposure. Disable old promotion for stack allocated boxes (since we have no field handles) and allow physical promotion to enregister the box method table and/or payload as appropriate. In OSR methods handle the fact that the stack allocation may actually have been a heap allocation by the Tier0 method. The analysis TP cost is around 0.4-0.7% (notes below). Boxes are much less likely to escape than ref classes (roughly ~90% of boxes escape, ~99.8% of ref classes escape). Codegen impact is diminished somewhat because many of the boxes are dead and were already getting optimized away. Fixes #4584, #9118, #10195, #11192, #53585, #58554, #85570 --------- Co-authored-by: Jakob Botsch Nielsen <[email protected]> Co-authored-by: Jan Kotas <[email protected]>
Fixed by #103361 |
Short version: I was reading @stephentoub's article Performance Improvements in .NET Core 2.1. I noticed that his example for avoiding boxing allocations thanks to dotnet/coreclr#14698 uses
is
followed by a cast, when in C# 7, the same code could be simplified using pattern matching. So I was wondering if using C# 7 features also results in the same efficient code. It turns out it doesn't and I think this should be improved.More details:
Consider this code:
The IL for the relevant methods is:
Notice how in
Pattern
, the boxed object is saved to a local variable (typed as the interface).The disassembly from .Net Core 2.1.0-preview2-26406-04 win10-x64 is:
Notice how for
Cast
, almost all the code, including the boxing allocation, is optimized away (the remaining). But formov
seems to be unnecessary, but that's not really relevant herePattern
, all the code is still there, including an allocation and a non-inlined call toDog.IAnimal.MakeSound
.The two versions of the code do the same thing, so I think they should have comparable performance. Especially since the pattern matching version is more readable and I suspect it's also going to be more common in new code than the other version.
How hard would it be to make this optimization work even in the pattern matching version?
If it would be too hard to perform this optimization in the JIT, is there a reasonable way for the C# compiler to emit IL that would be optmized?
cc (?): @AndyAyersMS, @benaadams, @justinvp
category:cq
theme:importer
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: