-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Initial implementation of object stack allocation #20814
Conversation
Variables of TYP_STRUCT with non-value class handles represent stack-allocated objects. Temporarily disable promotion of fields of stack-allocated objects.
The feature is off by default, I will check that there are no diffs. |
@AndyAyersMS @echesakovMSFT @dotnet/jit-contrib PTAL |
@jkotas PTAL at the change in jitinterface.cpp |
src/jit/objectalloc.h
Outdated
return false; | ||
} | ||
|
||
if (comp->info.compCompHnd->isValueClass(clsHnd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you've already paid the cost to fetch the class attributes above you can just check for CORINFO_FLG_VALUECLASS
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/jit/objectalloc.cpp
Outdated
if (comp->lvaCount > 0) | ||
{ | ||
m_EscapingPointers = BitVecOps::MakeEmpty(&m_bitVecTraits); | ||
m_ConnGraphAdjacencyMatrix = new (comp->getAllocator()) BitSetShortLongRep[comp->lvaCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a new CMK_ value for ObjectAllocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/jit/objectalloc.cpp
Outdated
const bool lclVarsOnly = true; | ||
const bool computeStack = true; | ||
|
||
comp->fgWalkTreePre(&stmt->gtStmtExpr, BuildConnGraphVisitor, &callbackData, lclVarsOnly, computeStack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend using GenTreeVisitor instead of fgWalk functions (which are wrappers around GenTreeVisitor). It tends to be slightly faster and avoids the hassle with extra static functions, casts to walk data and whatnot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Done.
{ | ||
bool spcOptimizationEnabled = SPCOptimizationsEnabled(); | ||
|
||
CallTestAndVerifyAllocation(AllocateSimpleClassAndAddFields, 12, !spcOptimizationEnabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These allocation amounts are in bytes? If so they may vary by architecture...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are return values from the methods. I'm not checking the exact allocation bytes, just whether heap allocation amount changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this moving along. Left some notes but nothing that would block merging this.
src/jit/objectalloc.cpp
Outdated
assert(parentStack != nullptr); | ||
int parentIndex = 1; | ||
|
||
bool done = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: I think this code would read a bit better if we inverted the sense of this and had a keepChecking
flag instead of a done
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/jit/objectalloc.cpp
Outdated
{ | ||
if (!objectAllocator->IsLclVarEscaping(lclNum)) | ||
{ | ||
JITDUMP("V%02u first escapes (2) via [%06u]\n", lclNum, tree->gtTreeID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need the (2)
here. Also we seem to use dspTreeID
to dump the IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2298,7 +2300,8 @@ unsigned CEEInfo::getClassGClayout (CORINFO_CLASS_HANDLE clsHnd, BYTE* gcPtrs) | |||
{ | |||
// Get offset into the value class of the first pointer field (includes a +Object) | |||
size_t cbSeriesSize = pByValueSeries->GetSeriesSize() + pMT->GetBaseSize(); | |||
size_t cbOffset = pByValueSeries->GetSeriesOffset() - OBJECT_SIZE; | |||
size_t cbSeriesOffset = pByValueSeries->GetSeriesOffset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This layout is not version resilient. This should have assert and/or throw platform not supported that will make sure that the JIT won't ever get the non-resilient layout when compiling for R2R.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an assert here and also in getHeapClassSize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JIT/EE changes look good to me.
I disabled the feature for x86 which uses a different JIT32_GCENCODER and disabled the new tests on x86. I don't want to make changes to JIT32_GCENCODER. My plan is to implement retyping of the trees involving pointers to stack-allocated objects so that no special handling in gc encoders is needed. |
@AndyAyersMS @echesakov @mikedn @jkotas I addressed all review feedback, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Are you going to post the enable diff summary here?
Yes, I'm running the diffs on the latest version, will post when I have the results. |
99d7fc4
to
b51ff00
Compare
Diffs with optimization enabled:
|
A typical diff at allocation site:
becomes
|
The size of the additional initialization in the prologue depends on whether there are other initializations and on the shape of the object.
or
|
@AndyAyersMS I pushed a commit where I changed the type of the tree used to rewrite uses from TYP_BYREF to TYP_I_IMPL. I was hitting this assert with TYP_BYREF: Line 14305 in f72025c
|
Seems reasonable. |
Also add an assert to getHeapClassSize to ensure it's not called in R2R cross-version-bubble.
a28cf6e
to
d7e47b1
Compare
…hout gc fields. This change implements a conservative flow-insensitive escape analysis and stack allocation of non-box objects without gc fields. Handling of objects with gc fields, box objects, and fixed-size arrays is future work. Escape analysis is based on the one described here: https://www.cc.gatech.edu/~harrold/6340/cs6340_fall2009/Readings/choi99escape.pdf Main limitations of this version of the escape analysis: 1. The analysis is flow-insensitive. 2. The analysis is intra-procedural and only sees the current method and the inlined methods. 3. The analysis assumes that references passed to non-pure-helper calls escape. 4. The analysis assumes that any references assigned to fields of objects escape. Some of these limitations will be removed in future work. I started with prior prototypes from @echesakovMSFT and @AndyAyersMS and extended and refactored parts of them. I also added tests for cases that are currently handled or will be handled soon.
d7e47b1
to
4e57598
Compare
I verified no diffs with the optimization off. |
@dotnet-bot test this |
@dotnet-bot test Windows_NT arm64 Cross Debug Innerloop Build |
What characteristics a reference type and its instance must have to be allocated on stack? |
@omariom Currently any of the following will block stack allocation:
|
May I ask why is 8Kb (1KB)? |
It's a placeholder number for now, we will tune this when we are able to handle more cases. |
@erozenfeld Thanks for the quick reply. Just a reminder, struct promotion currently can unwarp struct objs <= 128 bytes on x64. While the tuning for escape analysis, we may be able to improve the size limitation for struct promotion as well. But I am not if there are other concerns (e.g., register pressure). |
I would suggest that the tuning of struct promotion heuristics should be considered relatively independent of object stack allocation. And there are certainly other concerns, not limited to register pressure. Struct promotion currently creates a local for each field of a struct, even if only one of them is frequently accessed, and the JIT suffers - both throughput and code quality - when there are too many locals. Prior to bumping the limit we may want to look into the feasibility of independently promoting just the frequently referenced fields. I had thought there was an issue for that, but I can't find it at the moment. |
Oh, that sounds pretty coool, thanks! |
Why can't a string by stack allocated? Given that they are usually the most commonly allocated object type. |
@ayende Stack allocation of constant-sized strings and constant-sized arrays is in the plan in #20253 (although we have to be careful with strings since they may be interned). Array and string allocation representation in the jit is different from the representation of allocations of other objects so this initial implementation doesn't include them. For example, the size of the strings and arrays is determined from constructor arguments and not from the type itself. |
Given that an object allocated in a loop is where this would be of most benefit, is there any work planned to allow that? |
@YairHalberstadt This is on the list but it's not the highest priority at the moment. If you have examples from real-world code where allocations performed in a loop can and should be moved to the stack please share them. |
@erozenfeld This is quite common in Linq code, where I will often call eg using System.Collections.Generic;
using System.Linq;
public class C
{
public void M(List<int[]> arrays, List<int[]> newArrays)
{
foreach(var array in arrays)
{
int i = 42;
newArrays.Add(array.Where(x => x > i).ToArray());
}
}
} This generates the following: public class C
{
[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
public int i;
internal bool <M>b__0(int x)
{
return x > i;
}
}
public void M(List<int[]> arrays, List<int[]> newArrays)
{
List<int[]>.Enumerator enumerator = arrays.GetEnumerator();
try
{
while (enumerator.MoveNext())
{
int[] current = enumerator.Current;
<>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
<>c__DisplayClass0_.i = 42;
newArrays.Add(current.Where(<>c__DisplayClass0_.<M>b__0).ToArray());
}
}
finally
{
((IDisposable)enumerator).Dispose();
}
}
} Note that whilst the same display class could theoretically be reused on each iteration of the loop, Roslyn can't know this since it doesn't do escape analysis, and by the time it gets to the JIT, it is probably to late to do that. |
} | ||
|
||
//------------------------------------------------------------------------ | ||
// CanLclVarEscape: Returns true iff local variable can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in "iff", same with other "ifs" in this file :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"iff" means "if and only if".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh didn't know that. nevermind then.
f4ed491: Add JitObjectStackAllocation config option.
b19ec61: Allow creation of variables of TYP_STRUCT with non-value class handles.
Variables of TYP_STRUCT with non-value class handles represent stack-allocated objects.
Temporarily disable promotion of fields of stack-allocated objects.
323d3b3: Make getClassGClayout work with with class types.
Also add an assert to getHeapClassSize to ensure it's not
called in R2R cross-version-bubble.
d7e47b1: Implement escape analysis and stack allocation of non-box objects without gc fields.
This change implements a conservative flow-insensitive escape analysis and stack allocation
of non-box objects without gc fields.
Handling of objects with gc fields, box objects, and fixed-size arrays is future work.
Escape analysis is based on the one described here: https://www.cc.gatech.edu/~harrold/6340/cs6340_fall2009/Readings/choi99escape.pdf
Main limitations of this version of the escape analysis:
Some of these limitations will be removed in future work.
I started with prior prototypes from @echesakovMSFT and @AndyAyersMS and extended and refactored
parts of them.