-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[skip ci] Update first-class-structs doc #26416
Conversation
@dotnet/jit-contrib @mikedn @tannergooding PTAL |
Normalizing Struct Types | ||
------------------------ | ||
We would like to facilitate full enregistration of structs with the following properties: | ||
1. Its fields are infrequently accessed, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical that enregistering structs only when fields are infrequently accessed is going to provide a lot of value. If fields aren't accessed then it all mostly resumes to struct copying. So that
x = f.a;
y = f.a;
with CSE and enregistration would hopefully generate just
mov eax, [rsi+8]
mov ebx, eax
Nice but I'm not sure how common such scenarios are.
Ultimately we'll probably need to consider the possibility of LCL_FLD
working on enregistered struct by generating appropriate shift and masking code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue arises when a struct is only passed or returned in registers. We currently don't have a mechanism for keeping such values in a register, unless they're enregisterable structs (i.e. SIMD* today) or have a single field of the same type as the register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, that case is really ugly today. Speaking of struct return, I also have concerns about the general case where the struct is returned via a buffer and this transform is done pretty early in phase ordering. My initial impression is that it would be better to postpone that to lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, though I confess that I've less experience in making changes to the handling of struct returns.
|
||
We propose to create the following additional types to be used where struct types of the given size are | ||
passed and/or returned in registers: | ||
* `TYP_STRUCT1`, `TYP_STRUCT2`, `TYP_STRUCT4`, `TYP_STRUCT8` (on 64-bit systems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that adding more struct types is a good thing. If anything, there are places in the JIT that have tables indexed by var_types and those tables will grow bigger. In particular, VN has AFAIR a bidimensional array where one of the dimensions is var_types and it's largish enough that adding/remove a type registers in the memory allocation stats.
I'm also not sure how these would interact with ClassLayout
- would we need layout for a node having TYP_STRUCT2
? I'd say yes because discarding layout/handle information has occassionally proved to be problematic. But then we end up with duplicate info - the size is encoded in both node's type and layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about TYP_STRUCT16
?
The System V ABI defines __int128
as an ABI primitive, and while it is passed in register as if it were struct __int128 { int64_t low, high; };
, it has additional packing/alignment requirements (much like TYP_SIMD16
and TYP_SIMD32
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would envision that these types would be handled, going forward, in the same way that SIMD types are today. Today you can have multiple TYP_SIMD16
values with different class handles. The specialized struct types would only imply size and register file.
It may be the case that this kind of thing could be deferred to register allocation, and dealt with without adding new types, but I think that would add complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that __int128
would be best handled as a special case, since, unlike the other special structs it would not be referenced as a single entity, adding, I think, undue complexity in backend components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Struct with specific alignment requirements - I guess we can simply add alignment info to ClassLayout
:). Well, adding more info to it would make it bigger but the typical number of layouts a method needs is usually very low. And even if it's not it's still likely to be better than querying the VM multiple times or storing the same info in multiple places (e.g. in fgArgTabEntry
objects).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the memory usage issue I remembered about - it's there. Simply adding these 4 types to the list results in a 0.5% regression in memory usage, mostly thanks to VN. Perhaps there's a way to avoid that but...
At least on Linux x64, more structs can be returned in registers than on Windows x64 - a 3 byte struct is returned in eax
and passed in, say, edi
.
More generally, IMO it should be possible to enregister any structure small enough to fit in a register. You never known when it could be useful (think that for example on ARM you have ubfx
/sbfx
instructions which makes accessing fields of an enregistered structs cheap compared to x64 where you typically need 2 instructions to shift and zero/sign extended).
As such I don't think specific size TYP_STRUCT
types help. It's not going to be very practical to later add even more TYP_STRUCT3
, TYP_STRUCT5
and so on.
I'll have to take a closer look at LSRA code but my expectancy is that for a OBJ
node of suitable size the ref position/interval building code simply builds a ref position that asks for a suitable GPR (or why not, XMM) register and that's all there is to it.
I noticed that LSRA's interval has a RegisterType registerType
where RegisterType
is actually var_types
. I find that a bit strange considering that ultimately there are just GPR and XMM registers and the finer distinction between types (TYP_SHORT
vs. TYP_USHORT
) is irrelevant to LSRA. Even requesting a byte sized registers on x86 isn't done by using TYP_BYTE
but rather by restricting the register candidate set to those that support byte access.
And the funny thing is that LSRA too contributes to the memory regression that occurs when adding more types. It too has arrays indexed by type (e.g. maxSpill
) even though only a couple of entries are actually used.
|
||
These struct-typed nodes are created by the importer, but transformed in morph, and so are not | ||
encountered by most phases of the JIT: | ||
* `GT_INDEX`: This is transformed to a `GT_IND` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be good to mention somewhere the horrible complexity associated with array of structs element access - the IND
must be preserved due to the array info map where it acts as a key. And then the whole element access tree tends to end up wrapped in OBJ(ADDR(...))
. FWIW I'd love to get rid of that. At a minimum we should be able to change the IND
to OBJ
and avoid wrapping. We should also be able to remove the class handle from the array info map and instead used the one provided by OBJ
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add that, though it's been a while since I've run into that complexity.
* Additional child `gtDynamicSize` | ||
* Note that these aren't really struct types; they represent dynamically sized blocks | ||
of arbitrary data. | ||
* For `GT_LCL_VAR` nodes, the shape information is obtained via the `LclVarDsc` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention LCL_FLD
as well? ClassLayout
was started with the goal of allowing LCL_FLD
struct typed nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LclFld with layout support: aa086e4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. That was an accidental omission.
* `GT_IND`: in this case the LHS is expected to provide the struct handle | ||
* Proposed: `GT_IND` would no longer be used for struct types | ||
* `GT_CALL` | ||
* `GT_RET_EXPR` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, GT_RET_EXPR
after morph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. It should probably be GTK_NOTLIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and GTK_NOTMIR
if we had such a thing. Lacking that we should probably assert that we don't see it in global morph, it's supposed to be gone after inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And apparently the case GT_RET_EXPR
in gtGetStructHandleIfPresent
isn't reached, though it's hard to be 100% sure as there are quite a few callers of gtGetStructHandleIfPresent
.
* `GT_RET_EXPR` | ||
* `GT_LCL_VAR` | ||
* `GT_LCL_FLD` | ||
* Proposed: These would only be used for actual fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to disagree about the "actual fields" part. I don't like field sequences too much but recently I played enough with them and concluded that they work well enough at least for now. That includes NotAField
.
Another thing I'm chasing is the removal of IsLocalAddrExpression checks. They're horrible in many ways, including blocking the removal of ASG nodes. And for that I need that tracked/SSA variables are only those that are always accessed via LCL_VAR
and LCL_FLD
nodes, any indirect access would leave the variable address exposed. To minimize such circumstances I need LCL_FLD
to be flexible and handle all kind of odd scenarios. From what I've seen so far it can do that just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think the statement is too strong. What I'm mostly interested in is eliminating the use of GT_LCL_FLD
in situations where it is referring to the entire struct, but by a different type (e.g. as the LHS of an assignment, or as an argument to a call). I hope that we can get to the point where, at least prior to Lowering
, any full reference to a local struct is a GT_LCL_VAR
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the case of accessing an entire var using LCL_FLD
is a bit more weird. Perhaps we can wrap the LCL_VAR
in a BITCAST
or other similar, new node to signal reinterpretation. Or perhaps just allow some level of type mismatching to occur in IR, though that would probably require a bit more care. In any case, I would like to have a way that avoids the use of indirections to perform reinterpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea would be that the cases where we currently use LCL_FLD
would be either
- handled in
Lowering
viaBITCAST
when they are used to load a struct into a register, or - handled in codegen by deferring the handling of register-sized copies
|
||
* Importer | ||
* Vector types are normalized to the appropriate `TYP_SIMD*` type. Other struct nodes have `TYP_STRUCT`. | ||
* Proposed: all struct-valued nodes that are created with a class handle will retain an index into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit - the ClassLayout index is only needed when there are space constraints (e.g. there's no room in LclFld
for a new pointer but we can fit in a uint16_t
index. It can also be useful in VN. Otherwise using a ClassLayout pointer is preferrable.
* Proposed: This transformation would be made even if it is passed in non-matching | ||
registers. The necessary transformations for correct code generation would be | ||
made in `Lowering`. | ||
* If it is passed by reference, a copy is made, and that copy is passed according to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if copies required by implicit by ref parameters shouldn't be done in lowering. It's not clear if there's any advtange in doing those copies earlier as the required temps are address exposed and thus ignored by optimizer for most purposes. Perhaps we can insert these copies after the liveness pass before LSRA - the last use information in produces should allow the elision of some copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll add it.
* Transform struct values that are passed to or returned from calls by creating one or more of the following: | ||
* `GT_FIELD_LIST` of `GT_LCL_FLD` when the struct is non-enregisterable and is passed in multiple | ||
registers. | ||
* `GT_BITCAST` when a promoted floating point field of a single-field struct is passed in an integer register. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BITCAST
nodes could also be inserted during morph, not clear if it's useful. I suppose some could be CSE candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they would be all that useful as CSE candidates, though I suppose it could potentially eliminate a move between register files. I'd propose that we leave this as proposed, to avoid moving BITCAST
nodes up in the phase ordering. We can change that later if it looks worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In genral, BITCAST
nodes should anyway move up in phase ordering. I already have a change that produces such node early, to avoid blocking variable enregistration due to float/int reinterpretation and it does show some CQ improvements. The only problem I've yet to solve is decomposition of the double/long case on 32 bit targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with having them appear early, but I'd still prefer that we defer any code that's purely ABI-related to Lowering
.
* Fully promote structs of pointer size or less that have fewer field references than calls | ||
or returns. | ||
* Investigate whether it would be useful to re-type single-field structs, rather than creating new lclVars. | ||
This would complicate type analysis when copied, passed or returned, but would avoid unnecessarily expanding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For passing IMO we should see if we can allow some sort of implicit reinterpretation to occur in IR (e.g. pass TYP_INT to single field struct typed call arg node). Of course, that means that the call arg should no longer attempt to recover the handle from the arg tree but rather keep track of the original type from the call signature. ClassLayout + no-GT_LIST based experiment here: 317ca47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea, though I suspect it would be a fair bit of work to tease out all the dependencies on the args retaining their struct types.
On a related note, the HFA queries seem to result in a lot of JIT/EE traffic when HFA types are in use. It might be nice to cache that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any problem to include HFA related info in ClassLayout
. In fact the commit mentioned above does just that as an experiment. It doesn't actually cache anything right now but it did move HFA related queries to ClassLayout
just to get an idea how it may look.
|
||
### <a name="Block-Assignments"/>Improve and Simplify Block Assignment Morphing | ||
|
||
* `fgMorphOneAsgBlockOp` should probably be eliminated, and its functionality either moved to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried. Main issue is that the current codegen for unrolled block ops is kind of poor. Codegen fix mostly done here: #21711
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mention that.
* The cache includes entries for opaque "struct" values of a given size, as well as for | ||
structs with known class handles | ||
|
||
### Defer ABI-specific transformations to `Lowering`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts about getting in my removal of GT_LIST
node from calls and field lists before doing any more significant work in call morphing and start getting conflicts left and right? 😁 It's nearly done (e.g #26392)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on reviewing that today. It's another big one ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, might be a bit too early to do that. I'll have to remove a ton of commits because only the first 10 or so are relevant. The rest are experiments that I wanted to pass through CI testing. Also, the ARM part is missing (I have it locally but it's buggy ATM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then I'll have a quick look and you can ping me when you think it's ready for an actual review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry about the confusion. #26392 is actually fine, I was thinking about a different PR that needs cleanup.
I'm going to merge this, further changes can be made as needed. |
Update first-class-structs doc Commit migrated from dotnet/coreclr@930a7b9
No description provided.