Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[skip ci] Update first-class-structs doc #26416

Merged
merged 3 commits into from
Aug 29, 2019

Conversation

CarolEidt
Copy link

No description provided.

@CarolEidt
Copy link
Author

@dotnet/jit-contrib @mikedn @tannergooding PTAL
I haven't updated the sample IR at the end - not sure it's worth the exercise; opinions welcome.
There may be additional issues that could be addressed with these changes; if you know of any please mention them.

Normalizing Struct Types
------------------------
We would like to facilitate full enregistration of structs with the following properties:
1. Its fields are infrequently accessed, and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical that enregistering structs only when fields are infrequently accessed is going to provide a lot of value. If fields aren't accessed then it all mostly resumes to struct copying. So that

x = f.a;
y = f.a;

with CSE and enregistration would hopefully generate just

mov eax, [rsi+8]
mov ebx, eax

Nice but I'm not sure how common such scenarios are.
Ultimately we'll probably need to consider the possibility of LCL_FLD working on enregistered struct by generating appropriate shift and masking code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue arises when a struct is only passed or returned in registers. We currently don't have a mechanism for keeping such values in a register, unless they're enregisterable structs (i.e. SIMD* today) or have a single field of the same type as the register.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, that case is really ugly today. Speaking of struct return, I also have concerns about the general case where the struct is returned via a buffer and this transform is done pretty early in phase ordering. My initial impression is that it would be better to postpone that to lowering.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, though I confess that I've less experience in making changes to the handling of struct returns.


We propose to create the following additional types to be used where struct types of the given size are
passed and/or returned in registers:
* `TYP_STRUCT1`, `TYP_STRUCT2`, `TYP_STRUCT4`, `TYP_STRUCT8` (on 64-bit systems)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that adding more struct types is a good thing. If anything, there are places in the JIT that have tables indexed by var_types and those tables will grow bigger. In particular, VN has AFAIR a bidimensional array where one of the dimensions is var_types and it's largish enough that adding/remove a type registers in the memory allocation stats.

I'm also not sure how these would interact with ClassLayout - would we need layout for a node having TYP_STRUCT2? I'd say yes because discarding layout/handle information has occassionally proved to be problematic. But then we end up with duplicate info - the size is encoded in both node's type and layout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about TYP_STRUCT16?

The System V ABI defines __int128 as an ABI primitive, and while it is passed in register as if it were struct __int128 { int64_t low, high; };, it has additional packing/alignment requirements (much like TYP_SIMD16 and TYP_SIMD32).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would envision that these types would be handled, going forward, in the same way that SIMD types are today. Today you can have multiple TYP_SIMD16 values with different class handles. The specialized struct types would only imply size and register file.
It may be the case that this kind of thing could be deferred to register allocation, and dealt with without adding new types, but I think that would add complexity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that __int128 would be best handled as a special case, since, unlike the other special structs it would not be referenced as a single entity, adding, I think, undue complexity in backend components.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struct with specific alignment requirements - I guess we can simply add alignment info to ClassLayout :). Well, adding more info to it would make it bigger but the typical number of layouts a method needs is usually very low. And even if it's not it's still likely to be better than querying the VM multiple times or storing the same info in multiple places (e.g. in fgArgTabEntry objects).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the memory usage issue I remembered about - it's there. Simply adding these 4 types to the list results in a 0.5% regression in memory usage, mostly thanks to VN. Perhaps there's a way to avoid that but...

At least on Linux x64, more structs can be returned in registers than on Windows x64 - a 3 byte struct is returned in eax and passed in, say, edi.

More generally, IMO it should be possible to enregister any structure small enough to fit in a register. You never known when it could be useful (think that for example on ARM you have ubfx/sbfx instructions which makes accessing fields of an enregistered structs cheap compared to x64 where you typically need 2 instructions to shift and zero/sign extended).

As such I don't think specific size TYP_STRUCT types help. It's not going to be very practical to later add even more TYP_STRUCT3, TYP_STRUCT5 and so on.

I'll have to take a closer look at LSRA code but my expectancy is that for a OBJ node of suitable size the ref position/interval building code simply builds a ref position that asks for a suitable GPR (or why not, XMM) register and that's all there is to it.

I noticed that LSRA's interval has a RegisterType registerType where RegisterType is actually var_types. I find that a bit strange considering that ultimately there are just GPR and XMM registers and the finer distinction between types (TYP_SHORT vs. TYP_USHORT) is irrelevant to LSRA. Even requesting a byte sized registers on x86 isn't done by using TYP_BYTE but rather by restricting the register candidate set to those that support byte access.

And the funny thing is that LSRA too contributes to the memory regression that occurs when adding more types. It too has arrays indexed by type (e.g. maxSpill) even though only a couple of entries are actually used.


These struct-typed nodes are created by the importer, but transformed in morph, and so are not
encountered by most phases of the JIT:
* `GT_INDEX`: This is transformed to a `GT_IND`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to mention somewhere the horrible complexity associated with array of structs element access - the IND must be preserved due to the array info map where it acts as a key. And then the whole element access tree tends to end up wrapped in OBJ(ADDR(...)). FWIW I'd love to get rid of that. At a minimum we should be able to change the IND to OBJ and avoid wrapping. We should also be able to remove the class handle from the array info map and instead used the one provided by OBJ.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that, though it's been a while since I've run into that complexity.

* Additional child `gtDynamicSize`
* Note that these aren't really struct types; they represent dynamically sized blocks
of arbitrary data.
* For `GT_LCL_VAR` nodes, the shape information is obtained via the `LclVarDsc`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention LCL_FLD as well? ClassLayout was started with the goal of allowing LCL_FLD struct typed nodes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LclFld with layout support: aa086e4

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That was an accidental omission.

* `GT_IND`: in this case the LHS is expected to provide the struct handle
* Proposed: `GT_IND` would no longer be used for struct types
* `GT_CALL`
* `GT_RET_EXPR`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, GT_RET_EXPR after morph?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. It should probably be GTK_NOTLIR

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and GTK_NOTMIR if we had such a thing. Lacking that we should probably assert that we don't see it in global morph, it's supposed to be gone after inlining.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And apparently the case GT_RET_EXPR in gtGetStructHandleIfPresent isn't reached, though it's hard to be 100% sure as there are quite a few callers of gtGetStructHandleIfPresent.

* `GT_RET_EXPR`
* `GT_LCL_VAR`
* `GT_LCL_FLD`
* Proposed: These would only be used for actual fields.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to disagree about the "actual fields" part. I don't like field sequences too much but recently I played enough with them and concluded that they work well enough at least for now. That includes NotAField.

Another thing I'm chasing is the removal of IsLocalAddrExpression checks. They're horrible in many ways, including blocking the removal of ASG nodes. And for that I need that tracked/SSA variables are only those that are always accessed via LCL_VAR and LCL_FLD nodes, any indirect access would leave the variable address exposed. To minimize such circumstances I need LCL_FLD to be flexible and handle all kind of odd scenarios. From what I've seen so far it can do that just fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think the statement is too strong. What I'm mostly interested in is eliminating the use of GT_LCL_FLD in situations where it is referring to the entire struct, but by a different type (e.g. as the LHS of an assignment, or as an argument to a call). I hope that we can get to the point where, at least prior to Lowering, any full reference to a local struct is a GT_LCL_VAR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the case of accessing an entire var using LCL_FLD is a bit more weird. Perhaps we can wrap the LCL_VAR in a BITCAST or other similar, new node to signal reinterpretation. Or perhaps just allow some level of type mismatching to occur in IR, though that would probably require a bit more care. In any case, I would like to have a way that avoids the use of indirections to perform reinterpretation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea would be that the cases where we currently use LCL_FLD would be either

  • handled in Lowering via BITCAST when they are used to load a struct into a register, or
  • handled in codegen by deferring the handling of register-sized copies


* Importer
* Vector types are normalized to the appropriate `TYP_SIMD*` type. Other struct nodes have `TYP_STRUCT`.
* Proposed: all struct-valued nodes that are created with a class handle will retain an index into
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit - the ClassLayout index is only needed when there are space constraints (e.g. there's no room in LclFld for a new pointer but we can fit in a uint16_t index. It can also be useful in VN. Otherwise using a ClassLayout pointer is preferrable.

* Proposed: This transformation would be made even if it is passed in non-matching
registers. The necessary transformations for correct code generation would be
made in `Lowering`.
* If it is passed by reference, a copy is made, and that copy is passed according to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if copies required by implicit by ref parameters shouldn't be done in lowering. It's not clear if there's any advtange in doing those copies earlier as the required temps are address exposed and thus ignored by optimizer for most purposes. Perhaps we can insert these copies after the liveness pass before LSRA - the last use information in produces should allow the elision of some copies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll add it.

* Transform struct values that are passed to or returned from calls by creating one or more of the following:
* `GT_FIELD_LIST` of `GT_LCL_FLD` when the struct is non-enregisterable and is passed in multiple
registers.
* `GT_BITCAST` when a promoted floating point field of a single-field struct is passed in an integer register.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BITCAST nodes could also be inserted during morph, not clear if it's useful. I suppose some could be CSE candidates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they would be all that useful as CSE candidates, though I suppose it could potentially eliminate a move between register files. I'd propose that we leave this as proposed, to avoid moving BITCAST nodes up in the phase ordering. We can change that later if it looks worthwhile.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In genral, BITCAST nodes should anyway move up in phase ordering. I already have a change that produces such node early, to avoid blocking variable enregistration due to float/int reinterpretation and it does show some CQ improvements. The only problem I've yet to solve is decomposition of the double/long case on 32 bit targets.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with having them appear early, but I'd still prefer that we defer any code that's purely ABI-related to Lowering.

* Fully promote structs of pointer size or less that have fewer field references than calls
or returns.
* Investigate whether it would be useful to re-type single-field structs, rather than creating new lclVars.
This would complicate type analysis when copied, passed or returned, but would avoid unnecessarily expanding
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For passing IMO we should see if we can allow some sort of implicit reinterpretation to occur in IR (e.g. pass TYP_INT to single field struct typed call arg node). Of course, that means that the call arg should no longer attempt to recover the handle from the arg tree but rather keep track of the original type from the call signature. ClassLayout + no-GT_LIST based experiment here: 317ca47

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea, though I suspect it would be a fair bit of work to tease out all the dependencies on the args retaining their struct types.
On a related note, the HFA queries seem to result in a lot of JIT/EE traffic when HFA types are in use. It might be nice to cache that information.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any problem to include HFA related info in ClassLayout. In fact the commit mentioned above does just that as an experiment. It doesn't actually cache anything right now but it did move HFA related queries to ClassLayout just to get an idea how it may look.


### <a name="Block-Assignments"/>Improve and Simplify Block Assignment Morphing

* `fgMorphOneAsgBlockOp` should probably be eliminated, and its functionality either moved to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried. Main issue is that the current codegen for unrolled block ops is kind of poor. Codegen fix mostly done here: #21711

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention that.

* The cache includes entries for opaque "struct" values of a given size, as well as for
structs with known class handles

### Defer ABI-specific transformations to `Lowering`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about getting in my removal of GT_LIST node from calls and field lists before doing any more significant work in call morphing and start getting conflicts left and right? 😁 It's nearly done (e.g #26392)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on reviewing that today. It's another big one ;-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, might be a bit too early to do that. I'll have to remove a ton of commits because only the first 10 or so are relevant. The rest are experiments that I wanted to pass through CI testing. Also, the ARM part is missing (I have it locally but it's buggy ATM).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then I'll have a quick look and you can ping me when you think it's ready for an actual review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry about the confusion. #26392 is actually fine, I was thinking about a different PR that needs cleanup.

@CarolEidt
Copy link
Author

I'm going to merge this, further changes can be made as needed.

@CarolEidt CarolEidt merged commit 930a7b9 into dotnet:master Aug 29, 2019
@CarolEidt CarolEidt deleted the 1stClassStructsDoc branch August 29, 2019 15:14
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants