-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: allow inlining of tags into anonymous structs of a tagged union #19917
Comments
Apologies if I didn't fully understand the issue, however I do want to point out two things. First, packed unions exist: const UnderPacked = packed union {
fee: f32,
fie: packed struct {
a: i32,
b: u8,
c: u8,
},
foe: void,
}; The size of this data structure is Second: |
For your example, it isn't necessary to use the This issue relates to tagged unions, not bare unions. For a tagged union, the enum is part of the data structure and can be used for dispatch at runtime via |
You reminded me of something I noticed and forgot to include in the issue, which is that this version: const UnderPacked = union(Tag) {
fee: f32,
fie: packed struct {
a: i32,
b: u8,
c: u8,
},
foe: void,
}; Actually has a size of 16, not 12. So using the This also has an explanation: the least size of backing integer which can support the packed struct is 8, since the CPU has no native As I understand it, it wouldn't be legal for the compiler to inline the tag into a packed struct, because |
Does #1922 cover your use case? |
@Vexu As far as I understand, this issue post points out an improvement to be employed automatically by the compiler, without requiring the user to manually/explicitly think about the layout. |
@rohlem that's correct, this would automatically rearrange the struct type to have padding equal to the tag width as the first field, when otherwise allowed (so neither In the comment about how unions lay out a packed struct, I considered adding a few words about how it would be nice to be able to define the tag as being part of the packed struct, which is more #1922 related. Such a feature would be useful for Also, #1922 is about putting an enum field in a consistent location within a bare union, not a tagged union. I would suggest that improving the automatic packing of tagged unions would make this less useful/necessary. It isn't a complete replacement, because this still requires that the tag will be at the zero offset, and a user might want the tag elsewhere for e.g. FFI reasons. A suggestion which is related to the proposal is that the documentation and specification of Zig should record that the tag in a tagged union will always be at the zero offset. If core would prefer to reserve the right to put the tag somewhere else, that's understandable. I don't know for a fact that the tag is at the zero offset in all cases, either, all I do know is that this would explain the behavior I'm seeing, and also that it's the obvious thing to do for a variety of reasons. Edit: this comment in #1922 suggests that Zig currently puts the tag at the end of the union, not the beginning. I don't know if that's still current, but it's worth pointing out that which end gets chosen is irrelevant to this proposal: according to classic C struct packing doctrine, either ascending or descending size order for struct fields will achieve an optimal layout. I'm by no means expert at codegen, but it strikes me as disadvantageous if the tag is more than one machine word separated from the zero offset, which might be a good reason to favor the zero offset for large structs, or indeed in general. I'm open to correction on that, or indeed, anything else I've said. |
I don't think this qualifies as a proposal, because it wouldn't change the semantics
of the language. If that's wrong, my apologies.
I've been working on translating a VM into Zig, and ran into the following:
The first size is 12, the second is 8.
I believe I understand what's going on here. The tag has to come first in tagged
unions, for good and suffient reasons, and also, a struct must have the alignment of
its largest member. The data size of the
.fie
struct is 6, but thei32
hasalignment requirements which forces two bytes of padding, and an alignment of four
for the pointer. That in turn forces another three bytes of padding between the tag
and the struct.
However, it should be legal to inline the tag, if the struct is anonymously defined
inside the union.
FiePacked
is a demonstration of what that would look like. Acareful eye will notice that
FiePacked
has an unused byte, and because of Calignment rules, that byte is after
c
.I would bet that the Zig compiler in its current form will always put one of the
accessible fields of a struct at the 0 offset, since there's never a good reason not
to. But switching on the tag is a good reason! You can't even get at the
.fie
variant of the union without touching the tag in some fashion.
This doesn't actually have to be limited to anonymous structs defined inside the
union, although maybe it should be.
For example, a definition like this:
Could, if within a single compilation unit, result in
OuterFie
getting packed like this:The size remains the same, since it's dictated by the
i32
. Obviously there areoptions as to where the other byte of padding could go, I put it in the same place as
the invisible padding of
FiePacked
.I see some potential problems with this. The main one is that layout of all structs
would have to be deferred until the compiler has looked inside every tagged union,
making something which is currently entirely local into a whole-program analysis.
There may also be efficiency reasons to have the pointer address at offset 0 point to
meaningful data. It's not clear how pronounced this effect would be, or if it would
even show up at all, since structs don't really exist in machine code. But stored
pointers to structs certainly do, so I doubt the performance impact of keeping all
useful data at an offset to those pointers would be zero.
However, for an anonymous struct defined within the union, like the first example, I
would say it's a pretty clear win to consider the tag as a potential field within the
struct's layout, one which has to be the first field, but is considered in laying
out the rest of the struct. This would result in
UnderPacked.fie
being laid outidentically to
FiePacked
, with the difference that there's not
field to accessonce the union instance has been switched on.
The effect on layout of adopting the anonymous-only version of this proposal should
be minimal, the effect on codegen might also be minimal, and it might not be. When a
compiler can implicitly rely on the zero offset of a struct having some sensible
value, it's easy for that assumption to show up in the implementation.
But the feature would be an immediate win for me. I'd rather not implement the
opcodes as a bunch of distinct structs in an anonymous union, because at that point
I'm using
extern
to make sure that the distinguishing field is in the same place inmemory, I lose type safety, comptime potential, generic dispatch, everything which
makes tagged unions nice.
The text was updated successfully, but these errors were encountered: