-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More control over the data layout of tagged unions #1922
Comments
Or maybe, allow some kind of annotation to specify that one field in a packed struct {
some_field: u8,
@tagOf("data") tag: enum(u8) {A,B,C},
some_field2: u16,
data: packed union {
A: u8,
B: u16,
C: u32,
),
} |
Related #1072 |
Interesting idea. Note the current "zig way" to do this is to use a bare union (which has a secret safety tag field) and then manually manage your tag wherever you want. You could still switch on the tag, but instead of Zig unwrapping for you, you will do safety-checked direct access of the corresponding union field. That you want to make this work with |
I have a few good use cases for this. One well known use case to consider is nan-packing. My idea for solving this was allowing the programmer to implement their own "tag" function: packed union {
Double: f64,
// This would be a great use case for anonymous struct members!
Packed: packed struct {
_: u1, // sign_bit. not used in this example
exponent: u11, // exponent: must be ones
mantissa: packet struct {
_: u51, // could use this for all sorts of tagged types.
Bool: u1,
} // must take up 52 bits
},
fn @unionType(self: @This()) type {
if (self.Packed.exponent == maxInt(@typeOf(self.Packed.exponent))) {
return @typeOf(self.Double);
} else {
return @typeOf(self.Packed.mantissa.Bool);
}
}
}; |
I'll throw in use cases when I find them.
Here, the |
I'd like to use something like this for parsing/generation of instructions. |
I could also use this for a super densely packed Adaptive Radix Tree implementation. |
I think this could be interesting to interact with C code. struct input_value {
__u16 type;
__u16 code;
__s32 value;
}; Depending on the I'd like to represent this struct in Zig code with a proper |
I think you should be able to push tagged unions representing events or commands into some tight buffer to read off later, or on a different thread. This is a pretty common pattern. As it typically done you put the tag at the top for when you read the binary data you know how many following bytes belong to an entry. Currently you can precompute possible sizes of a union for each tag, but you have to manually put the tag at the top while writing, and then you can't just reinterpret a blob of data as a tagged union so you have to reassemble it. To solve this particular issue, you can just place a tag at the top. However I would agree that having a general solution would be better. |
as mentioned in the first comment, the general solution to control where the tag goes is to use |
Was thinking about packed structs and NaN boxing, so I came up with this (this is the most aggressive version), then @SpexGuy helpfully pointed me to this thread. const E = enum(u12) {
Int = 0xFFF,
String = 0xFFE,
Pointer = 0xFFD,
}
const P = packed union(?E){
Int: u32,
String: [6]u8,
Pointer: u48,
null: f64,
}
const std = @import("std");
const expect = std.testing.expect;
fn tag_of(p: P) u12 {return @intCast(u12, (@bitCast(u64, p) | 0xFFF0_0000_0000_0000) >> 52)}
test "NaN-boxed float" {
const with_int = P{.Int = 1234}};
const with_string = P{.String = "foo123"}};
const as_float: P = 1.234; // default type can coerce in
try expect(with_int.Int == 1234);
try expectEqual(with_string.String, "foo123");
try expect(as_float == 1.234); // default type can coerce out.
try expect(tag_of(with_int) == 0xFFF);
try expect(tag_of(with_string) == 0xFFE);
// 1.234 => 0x3ff3be76c8b43958
try expect(tag_of(as_float) == 0x3FF);
} features/questions:
Allowing fine-grained position that isn't at the end could look something like this: const E = enum(u8) {
ShortInt = 0x00,
ShortString = 0x01,
ConfigFlags = 0x02,
EmptyPayload = 0x03,
}
const P = packed union(E) {
const this = @This();
ShortInt: packed struct {
@tag(this): E,
i: u16,
},
ShortString: packed struct {
@tag(this): E,
s: [3]u8
},
ConfigFlags: packed struct {
@tag(this): E,
flag1: u1,
state: u3,
flag2: u1,
},
EmptyPayload: u0, // note this is implicitly ok because there are 0 bytes between the start and the tag, so it fits.
} At the cost of having an extra layer of indirection to access the values, but would still be HUGELY useful in deserializing certain types of over-the-wire protocols. Note that this would require a (possibly awkward) builtin function. Tag alignment would be comptime checked, and possibly implicit if the union'd value is smaller than the number of bits that precedes the tag. It MIGHT be possible to have packed structs with tags coerce to and from their tagless equivalent if they have only one parameter, so: test "coercion of packed structs" {
const short_int = P{.ShortInt = 47};
const short_string = P{.ShortString = "foo"};
const flags = P{.ConfigFlags = .{.flag1 = 1, .state = 2, .flag2 = 0};
try expect(short_int == 47);
try expectEquals(short_string, "foo");
try expect(flags.flag1 == 1);
} |
This proposal could be nice for manipulating x86 segment descriptors. Without getting in the weeds, the value of two bits (i.e. tag) changes the meaning of neighboring bit flags. |
This would be nice for representing sensor configuration for my action selection system instead of having to rely on the current: pub const Sensor = packed struct(u32) {
function: Function, // u7
push: bool = false,
cache_index: u5 = 0,
namespace: Namespace, // u3
name: Name, // u16
pub const Namespace = meta.Enum(config.namespaces, u3, .fields);
pub const Name = name: {
const old = @typeInfo(config.namespaces).Struct.fields;
var fields: [old.len]std.builtin.Type.UnionField = undefined;
for (&fields, old) |*field, o| {
field.* = .{
.name = o.name,
.type = meta.Enum(o.type, u16, .decls),
.alignment = 0,
};
}
break :name @Type(.{ .Union = .{
.layout = .Packed,
.tag_type = Namespace,
.fields = &fields,
.decls = &.{},
} });
};
}; |
Currently we have a great level of control over the data layout of
struct
s andenum
s usingpacked
and custom tag types. Sadly taggedunion
s are quite limited in this aspect since we have no control over where the tag is placed. Imagine if we could do this:This is just some made of syntax (plz don't accept this) but it get my point across. This would allow us to write the union's bytes to disk/over the internet/whatever and treat the received bytes as the union. No need to serialize and deserialize. The
packed union
already specifies a cross platform memory layout (after #649 that is). I already do this a lot withpacked struct
s and it keeps code very simple.The reason I also want to allow struct fields between the tag and the data is, that some binary formats currently encodes data like this (my usecase is here)
The text was updated successfully, but these errors were encountered: