Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More control over the data layout of tagged unions #1922

Open
Hejsil opened this issue Feb 6, 2019 · 13 comments
Open

More control over the data layout of tagged unions #1922

Hejsil opened this issue Feb 6, 2019 · 13 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@Hejsil
Copy link
Contributor

Hejsil commented Feb 6, 2019

Currently we have a great level of control over the data layout of structs and enums using packed and custom tag types. Sadly tagged unions are quite limited in this aspect since we have no control over where the tag is placed. Imagine if we could do this:

packed union {
    // Normal fields allowed. They are always present, no matter the tag.
    some_field: u8,

    // Specify, that the tag is between some_field and some_field2
    @tagHere(enum(u8){A, B, C}),
    some_field2: u16,

    // The tag maps to one of the fields in here
    @unionHere(
    A: u8,
    B: u16,
    C: u32,    
    ),
};

This is just some made of syntax (plz don't accept this) but it get my point across. This would allow us to write the union's bytes to disk/over the internet/whatever and treat the received bytes as the union. No need to serialize and deserialize. The packed union already specifies a cross platform memory layout (after #649 that is). I already do this a lot with packed structs and it keeps code very simple.

The reason I also want to allow struct fields between the tag and the data is, that some binary formats currently encodes data like this (my usecase is here)

@Hejsil
Copy link
Contributor Author

Hejsil commented Feb 6, 2019

Or maybe, allow some kind of annotation to specify that one field in a struct is the tag of another field.

packed struct {
    some_field: u8,
    @tagOf("data") tag: enum(u8) {A,B,C},
    some_field2: u16,
    data: packed union {
        A: u8,
        B: u16,
        C: u32,    
    ),
}

@Hejsil
Copy link
Contributor Author

Hejsil commented Feb 6, 2019

Related #1072

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Feb 6, 2019
@andrewrk andrewrk added this to the 0.5.0 milestone Feb 6, 2019
@andrewrk
Copy link
Member

andrewrk commented Feb 6, 2019

Interesting idea.

Note the current "zig way" to do this is to use a bare union (which has a secret safety tag field) and then manually manage your tag wherever you want. You could still switch on the tag, but instead of Zig unwrapping for you, you will do safety-checked direct access of the corresponding union field.

That you want to make this work with packed suggests a particular use case: ability to use the same data structures in serialized form and in memory form. (Related #1512) I think this requires some thought. I think there are some pitfalls with this approach and I don't want to accidentally encourage Zig programmers to fall into them. I'll keep this proposal in mind, and comment here if I stumble upon a good example of a problem where it could be solved using this approach.

@daurnimator
Copy link
Contributor

daurnimator commented Feb 7, 2019

I have a few good use cases for this. One well known use case to consider is nan-packing.

My idea for solving this was allowing the programmer to implement their own "tag" function:

packed union {
    Double: f64,
    // This would be a great use case for anonymous struct members!
    Packed: packed struct {
        _: u1, // sign_bit. not used in this example
        exponent: u11, // exponent: must be ones
        mantissa: packet struct {
            _: u51, // could use this for all sorts of tagged types.
            Bool: u1,
        } // must take up 52 bits
    },
    fn @unionType(self: @This()) type {
        if (self.Packed.exponent == maxInt(@typeOf(self.Packed.exponent))) {
            return @typeOf(self.Double);
        } else {
            return @typeOf(self.Packed.mantissa.Bool);
        }
    }
};

@Hejsil
Copy link
Contributor Author

Hejsil commented Feb 7, 2019

I'll throw in use cases when I find them.
Extracted from real code:

pub const Trainer = packed struct {
    party_type: PartyType,
    class: u8,
    ...
    ai: lu32,
    party: Party,

    pub fn partyAt(trainer: *Trainer, index: usize, data: []u8) !*PartyMemberBase {
        return switch (trainer.party_type) {
            PartyType.None => &(try trainer.party.None.toSlice(data))[index].base,
            PartyType.Item => &(try trainer.party.Item.toSlice(data))[index].base,
            PartyType.Moves => &(try trainer.party.Moves.toSlice(data))[index].base,
            PartyType.Both => &(try trainer.party.Both.toSlice(data))[index].base,
        };
    }
};

pub const PartyType = packed enum(u8) {
    None = 0b00,
    Item = 0b10,
    Moves = 0b01,
    Both = 0b11,
};

pub const Party = packed union {
    None: Slice(PartyMemberNone),
    Item: Slice(PartyMemberItem),
    Moves: Slice(PartyMemberMoves),
    Both: Slice(PartyMemberBoth),
};

Here, the party field is a union of "slices" and the party_type is its tag. Currently I'm using partyAt as a safe helper, and @fieldToParentPtr to get from PartyMemberBase to PartyMemberX.

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Dec 31, 2019
@hollmmax
Copy link
Contributor

I'd like to use something like this for parsing/generation of instructions.
It doesn't make sense to serialize/deserialize the data, as it gets used once, and then thrown away.
Combined with #3380 , this would greatly simplify maintenance of ZASM and other code handling binary-formatted data.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 10, 2020
@somethingelseentirely
Copy link

I could also use this for a super densely packed Adaptive Radix Tree implementation.
Since all my pointers are 64 byte aligned, I could use it as a pointer when the lower 6 bit are 0,
store 7 byte of the key, use it to tag of the enum I'm pointing to, and have a tag that compresses 0s. 😅

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@gwenzek
Copy link
Contributor

gwenzek commented Feb 8, 2022

I think this could be interesting to interact with C code.
Looking at linux/input.h

struct input_value {
	__u16 type;
	__u16 code;
	__s32 value;
};

Depending on the type field code can take values from different enums, and value has also a different meaning depending on the (type,code) combination, and may not be needed.

I'd like to represent this struct in Zig code with a proper union where type would be used as the tag field,
and code, value would be represented by different structs depending on the type.
This would make it less error prone to interpret the value, or to produce value input_value struct.

@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@Romixal
Copy link

Romixal commented Aug 9, 2022

I think you should be able to push tagged unions representing events or commands into some tight buffer to read off later, or on a different thread. This is a pretty common pattern.
As it currently stands, zig slaps tag at the end of a union which makes it inconvenient to parse.

As it typically done you put the tag at the top for when you read the binary data you know how many following bytes belong to an entry. Currently you can precompute possible sizes of a union for each tag, but you have to manually put the tag at the top while writing, and then you can't just reinterpret a blob of data as a tagged union so you have to reassemble it.

To solve this particular issue, you can just place a tag at the top. However I would agree that having a general solution would be better.

@nektro
Copy link
Contributor

nektro commented Aug 9, 2022

as mentioned in the first comment, the general solution to control where the tag goes is to use enum/unoin disjointly instead of as union(enum). I do like this proposal because it mentions the idea of "common fields" which might be proposed somewhere else but im not sure.

@ityonemo
Copy link
Contributor

ityonemo commented Aug 12, 2022

Was thinking about packed structs and NaN boxing, so I came up with this (this is the most aggressive version), then @SpexGuy helpfully pointed me to this thread.

const E = enum(u12) { 
    Int = 0xFFF,
    String = 0xFFE,
    Pointer = 0xFFD,
}

const P = packed union(?E){
    Int: u32,
    String: [6]u8,
    Pointer: u48,
    null: f64,
}

const std = @import("std");
const expect = std.testing.expect;

fn tag_of(p: P) u12 {return @intCast(u12, (@bitCast(u64, p) | 0xFFF0_0000_0000_0000) >> 52)}

test "NaN-boxed float" {
    const with_int = P{.Int = 1234}};
    const with_string = P{.String  = "foo123"}};
    const as_float: P = 1.234; // default type can coerce in

    try expect(with_int.Int == 1234);
    try expectEqual(with_string.String, "foo123");
    try expect(as_float == 1.234); // default type can coerce out.

    try expect(tag_of(with_int) == 0xFFF);
    try expect(tag_of(with_string) == 0xFFE);

    // 1.234 => 0x3ff3be76c8b43958
    try expect(tag_of(as_float) == 0x3FF);
}

features/questions:

  • no new keywords/builtins
  • tag location defaults to highest order bits after the enumerated types
    • see options for fine-grained tag location below.
  • allows you to have a "default type" that isn't checked for tags
    • note that this is critical for ergonomic (aka safe) NaN boxing.
    • is it safety-checked behaviour to disallow writing known tags using the default type into the appropriate bits?
    • do we allow vanilla (unpacked) tagged unions to ALSO support null?
    • should case have an else block to obtain nulls? or match on null? or should we use an if (...) |...| block?

Allowing fine-grained position that isn't at the end could look something like this:

const E = enum(u8) {
  ShortInt = 0x00,
  ShortString = 0x01,
  ConfigFlags = 0x02,
  EmptyPayload = 0x03,
}

const P = packed union(E) {
  const this = @This();
  ShortInt: packed struct {
    @tag(this): E,
    i: u16,
  },
  ShortString: packed struct {
    @tag(this): E,
    s: [3]u8
  },
  ConfigFlags: packed struct {
   @tag(this): E,
   flag1: u1,
   state: u3,
   flag2: u1,
  },
  EmptyPayload: u0, // note this is implicitly ok because there are 0 bytes between the start and the tag, so it fits.
}

At the cost of having an extra layer of indirection to access the values, but would still be HUGELY useful in deserializing certain types of over-the-wire protocols. Note that this would require a (possibly awkward) builtin function. Tag alignment would be comptime checked, and possibly implicit if the union'd value is smaller than the number of bits that precedes the tag.

It MIGHT be possible to have packed structs with tags coerce to and from their tagless equivalent if they have only one parameter, so:

test "coercion of packed structs" {
  const short_int = P{.ShortInt = 47};
  const short_string = P{.ShortString = "foo"};
  const flags = P{.ConfigFlags = .{.flag1 = 1, .state = 2, .flag2 = 0};
  
  try expect(short_int == 47);
  try expectEquals(short_string, "foo");
  try expect(flags.flag1 == 1);
}

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@jayschwa
Copy link
Contributor

jayschwa commented Oct 9, 2023

This proposal could be nice for manipulating x86 segment descriptors. Without getting in the weeds, the value of two bits (i.e. tag) changes the meaning of neighboring bit flags.

@tauoverpi
Copy link
Contributor

This would be nice for representing sensor configuration for my action selection system instead of having to rely on the current:

pub const Sensor = packed struct(u32) {
    function: Function, // u7
    push: bool = false,
    cache_index: u5 = 0,
    namespace: Namespace, // u3
    name: Name, // u16

    pub const Namespace = meta.Enum(config.namespaces, u3, .fields);
    pub const Name = name: {
        const old = @typeInfo(config.namespaces).Struct.fields;
        var fields: [old.len]std.builtin.Type.UnionField = undefined;

        for (&fields, old) |*field, o| {
            field.* = .{
                .name = o.name,
                .type = meta.Enum(o.type, u16, .decls),
                .alignment = 0,
            };
        }

        break :name @Type(.{ .Union = .{
            .layout = .Packed,
            .tag_type = Namespace,
            .fields = &fields,
            .decls = &.{},
        } });
    };
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests