Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add language support for C-compatible bit-fields #3064

Closed
wants to merge 10 commits into from

Conversation

mahkoh
Copy link
Contributor

@mahkoh mahkoh commented Jan 12, 2021

This RFC adds support for bit-fields in repr(C) structs by

  • introducing a new attribute bitfield(N) that can be applied to integer
    fields,
  • allowing such annotated fields to be unnamed.

Rendered

@mahkoh mahkoh changed the title Add language support for C-compatible bitfields Add language support for C-compatible bit-fields Jan 12, 2021
@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 12, 2021

This is a rewrite of #1449 with concerns addressed.

In particular:

  • The overflow behavior is now fully specified.
  • The bit-field width must now be a literal.
  • Bit-fields are now restricted to repr(C) structs and their layout is defined in terms of the corresponding layout used by Clang.

On top of the features in the previous RFC, this RFC adds syntax for unnamed bit-fields. This is necessary because such fields have different effects on the layout than named bit-fields.

Other questions from the other RFC:

  • Why not use some kind of code generation?

    Any code generation needs to have access to the types of all fields in the struct. These types can usually only be determined during compilation. See the last motivating example in the RFC.

  • Why not a more general approach that allows us to specify arbitrary struct layout?

    The problem in the first place is that we do not know the layout until during compilation.

  • Is this really needed? I thought C bitfields were an esoteric feature nobody uses.

    There are over 400 bit-fields in the Linux kernel user-space API.

@programmerjake
Copy link
Member

A good future extension would be allowing bitfield widths and/or types to be generic parameters

@Kixiron
Copy link
Member

Kixiron commented Jan 12, 2021

What value do the bits in an unnamed field hold? Will they have an undefined value (and hence be ub to read in any aspect, including casting/transmuting from the struct as a whole), a guaranteed value (0 I'd guess?) or simply an unspecified value (allowing reading the field without undefined behavior but still not allowing the actual value to be relied on)?

@Lokathor
Copy link
Contributor

I would hope they have simple "unspecified value". AKA "they can be whatever, no niche there". This lets bitfield types be transmuted to and from plain integer types with far less worry.

@Kixiron
Copy link
Member

Kixiron commented Jan 12, 2021

I most definitely agree, I was just bringing it up so it gets added to the rfc

Comment on lines 193 to 195
A field named `_` is called an *unnamed* field. All other fields are called
*named* fields. Each named field annotated with `bitfield(N)` occupies `N` bits
of storage.
of storage. Unnamed fields do not occupy any storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really clarify anything, the term "storage" is really ambiguous and the wording of "unnamed fields do not occupy any storage" implies that unnamed fields don't take up any space whatsoever, which isn't correct

Copy link
Contributor Author

@mahkoh mahkoh Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implies that unnamed fields don't take up any space whatsoever, which isn't correct

That is the intended reading. Unnamed fields only affect the layout but are not themselves members of the struct. See also the equivalent notation for unnamed fields using attributes in the RFC.

In C++, unnamed fields are described as "not being members" of their surrounding struct. In C the language is not as simple but this difference is hopefully irrelevant in practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The space still exists though, those bits don't just disappear right?

Copy link
Contributor Author

@mahkoh mahkoh Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know what you mean by "the space exists". Space between fields in structs "exists" but the behavior of that space is not defined in great detail as far as I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's the problem. You have to say what's going on with that. Even if C doesn't, in Rust we care about those sorts of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC is not the place to resolve these issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue seems to be the most relevant: rust-lang/unsafe-code-guidelines#156

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to fix all of uninit memory for rust, you just have to say something about the spare bits in this one case.

It can be "the bits are unspecified" (my pick), or "the bits are niche", or "the bits are always zero", or whatever, but the RFC shouldn't leave this unanswered.

Copy link
Contributor Author

@mahkoh mahkoh Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you get such language merged into the language reference for padding bits in general, then I will refer to it. Otherwise I do not see why this RFC should document the state of padding bits of

#[repr(C)]
struct X {
	a: u8,
	#[bitfield(8)] _: u8,
	b: u16,
}

when the state of padding bits of

#[repr(C)]
struct X {
	a: u8,
	b: u16,
}

is not documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, this RFC is in a position in which it can entirely bypass the issues of previous code with uninit memory since it can simply declare what the state of "padding bits" should be, which should almost definitely be unspecified

@Lokathor
Copy link
Contributor

Future Extension:

  • bitfields that are enums with an int repr
  • bitfields that are repr(transparent) over an int type

let X { a, b, d } = x;
```

Just like in C, you cannot take the address of a bit-field:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, potentially, allow taking the address of a bitfield if and only if it's aligned to a byte boundary. But I don't think we should. So, this seems fine.

Co-authored-by: Josh Triplett <[email protected]>
(resp. 0-extended) to the size of the type of the field. (1-extended means
that the new bits will have the value of the most significant bit. In
particular, bit-fields with signed types with width 1 can only store the
values `0` and `-1`.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for specifying this behavior precisely.


A field named `_` is called an *unnamed* field. All other fields are called
*named* fields. Each named field annotated with `bitfield(N)` occupies `N` bits
of storage. Unnamed fields do not occupy any storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this needs clarification.

Suggested change
of storage. Unnamed fields do not occupy any storage.
of storage. Unnamed bit-fields act like padding within the structure; they affect the positioning of other fields, but are not themselves accessible. They act in every way like padding bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of the "in every way like padding bits" phrasing because it brings to mind undefined padding bytes (a la this comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten the reference section to make it clear that _ bit-fields are not fields.


If the width of the bit-field is `0`, then the name of the field must be `_`.

A field named `_` is called an *unnamed* field. All other fields are called
Copy link
Member

@joshtriplett joshtriplett Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that we already have a concept of unnamed fields of struct or union type, which have a different semantic. For clarity, I'd suggest:

Suggested change
A field named `_` is called an *unnamed* field. All other fields are called
A bit-field named `_` is called an *unnamed bit-field*. All other bit-fields are called

Or, for that matter, perhaps directly call it a padding bit-field.

struct are possible unless all accesses are immutable. Therefore, unlike in C,
the compiler can implement access to bit-fields with any kind of load/write
instruction, even if such an instruction overlaps the memory locations of other
fields in the struct.
Copy link
Member

@joshtriplett joshtriplett Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to leave the behavior quite this unspecified. I would prefer to have it explicitly specified that a store to a bitfield is permitted to be combined with other stores to bitfields or other fields in a single operation, but that the operation will not be performed via a wider write than a single machine word that affects fields other than those being written, unless the entire structure is actually being written.

People do use bitfields for flags within larger structures, where other parts of the structure might be concurrently accessed while under a lock, and they assume that a write to one of those bitfields will not overwrite a lock-protected value. We should be precise enough that people know how they can avoid that.

Copy link
Contributor Author

@mahkoh mahkoh Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People do use bitfields for flags within larger structures, where other parts of the structure might be concurrently accessed while under a lock

You're right, I didn't take into account that other fields might have interior mutability. I'll replace this by the usual language:

The memory locations of two different fields within a struct do not overlap if

  • at least one of them is not a bit-field or
  • they are separated, in declaration order, by at least one field
    • that is not a bit-field or
    • that is a bit-field with width 0.

"Memory location" is the language used by C to determine when two unsynchronized concurrent accesses are well-defined.

Copy link
Member

@joshtriplett joshtriplett Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable to me, assuming that it includes associated text explaining what can and can't be done in the same memory operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've chosen a somewhat different formulation to avoid pulling in the C memory model:

When a bit-field field is accessed, the abstract machine may also access
adjacent bit-field fields but not fields that are separated from the field by a
StructBodyElement that is not a bit-field field. (Note: This paragraph restricts
the kinds of loads and stores the compiler can perform when accessing a
bit-field. This paragraph does not need to be specially advertised to users as
the inability to take references to bit-field fields makes it impossible to
access adjacent bit-field fields in otherwise sound code.)

- When the assignment happens, the compiler writes the bits on the rhs to
the correct positions in the struct.

This is a more structured solution but would need to be fleshed out.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree both that this might be a good long-term solution (and bit-sized types would be useful for other purposes), and that this should not block the implementation of bitfields.

If, in the future, we add bit-sized types, we can specify how to put them in a repr(C) struct and have them behave like a bitfield. That should not stop us from providing the most straightforward implementation of bitfields without introducing bit-sized types.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- On Windows, Clang and GCC produce different layouts for packed structs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify, explicitly, that the behavior on MSVC targets must match that of MSVC.

(We don't, today, need to provide a means of controlling the ms_struct flag. We may need to eventually, though.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retep998 Please give your opinion on which language, if any, should be added to this RFC and which language should be included in the reference.

Nevertheless, such fields appear like regular fields in the rustdoc output if
they have the `pub` visibility.

If the width of the bit-field is `0`, then the name of the field must be `_`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify the behavior of bitfield(0) in this RFC. In general, I don't have any objection to defining this feature in terms of C-compatible bitfield types. However, I'd like to have a self-contained explanation for bitfield(0).

Copy link
Contributor Author

@mahkoh mahkoh Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the behavior of zero-width bit fields is specified enough to make this worth it. I think the most than can be said is that a zero-width bit field causes the storage unit allocated for the next bit field to be aligned at least to the next 8-bit boundary (the next byte). But this is probably not what you had in mind.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the purpose of a zero width bitfield, maybe it would be better to specify that with an attribute (possibly an argument to the bitfield attribute) on the next bitfield. Zero width bitfield seem very confusing to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the behavior of zero-width bit fields is specified enough to make this worth it.

That, to me, seems like a much stronger argument for specifying it explicitly, to make sure that the expected semantic is clear, and to make it feasible to test this.

I think the most than can be said is that a zero-width bit field causes the storage unit allocated for the next bit field to be aligned at least to the next 8-bit boundary (the next byte).

My understanding of the semantic was, roughly, that a zero-width bitfield will "finish" filling out the current in-progress field of the type specified for the zero-width bitfield.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to specify this in the RFC. After all, Rust might in the future be ported to systems where the behavior is different. In any case, as per the RFC, the behavior will be described in the reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do: I feel that the RFC would be underspecified without it.

The RFC can say that the canonical definition is C-compatible, and then give the behavior for current platforms, with a note that if a target's C-compatible bitfield definition differs Rust will do the same. That will still help the definition be clearer and more self-contained.

Putting the definition in the reference seems like a good idea, but I think the RFC should have at least draft text specifying this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then give the behavior for current platforms

The behavior for current platforms differs widely. I'm currently working on another RFC that will describe the behavior of repr(C) in detail. Let's defer this point until I've posted that RFC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rust Reference prefers that there is a canonical decision made about its contents and not to be the place where those decisions are made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not supposed to make a decision only to document it. The decision has already be made by the system compiler of the target. That's msvc on the msvc targets, gcc on the mingw and Linux targets, and clang otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've written a crate that implements the layout algorithms used my msvc, gcc, and clang. The code contains a large number of test cases that compare its output to the output of the C compilers. It should therefore have a high accuracy.

These layout algorithms are far too complicated to be described in prose. The reference can refer to this crate instead.

struct.

Nevertheless, such fields appear like regular fields in the rustdoc output if
they have the `pub` visibility.
Copy link
Member

@joshtriplett joshtriplett Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just prohibit the use of pub (or any visibility) on _ fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want _ to appear in rustdoc if I write a sys-wrapper for a C API that uses unnamed bit fields. On the other hand, sometimes such wrappers are supposed to have private fields. How would rustdoc decide if it should show the field or not?

With visibility modifiers this is not a problem and the author can also add doc comments if necessary. E.g. if the original C struct documents that field.

@joshtriplett
Copy link
Member

Future Extension:

* bitfields that are enums with an int repr

* bitfields that are repr(transparent) over an int type

I agree that both of those are useful future extensions, and that they shouldn't be in the initial version of the RFC.

I'd like to see them in the future possibilities section, though.

Comment on lines +180 to +182
UnnamedStructElements are used when determining the layout of the struct but are
otherwise ignored. Since they are not fields, they cannot be accessed, do not
appear in the construction of a struct, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UnnamedStructElements are used when determining the layout of the struct but are
otherwise ignored. Since they are not fields, they cannot be accessed, do not
appear in the construction of a struct, etc.
UnnamedStructElements are used when determining the layout of the struct but are
otherwise ignored. Since they are not fields, they cannot be accessed, do not
appear in the construction of a struct, etc. However, the contents of UnnamedStructElements
are unspecified, allowing reading from and transmuting to/from bitfield structs without incurring
undefined behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume that structs could have associated types:

struct X {
	type MyType = i32;
	field1: u8,
	field2: u16,
}

Does the statement

However, the contents of AssociatedTypes are unspecified, allowing reading from and transmuting to/from structs with associated types without incurring undefined behavior.

make sense to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course not, because types don't hold values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssociatedType is not a type. It's a non-terminal in the Rust grammar. It's a part of the syntax, and syntax does not exist at runtime. As such, they don't have contents except in the syntax sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree they have no content, then I don't understand what you're getting at.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kixiron said what I was trying to say.

Essentially, if the spare bits are defined to be anything other than "initialized but unspecified" then I can make a more safe version of bitfields, by hand, today, than this RFC would allow for.

Copy link
Member

@joshtriplett joshtriplett Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see why this was initially confusing.

It'd be conceptually straightforward to interpret the unnamed fields as "padding", except for the bitfield(0) case. But then it's necessary to specify the behavior of those fields.

You're proposing, instead, that unnamed "fields" (including bitfield(0) fields) don't define padding themselves; rather, they define the positioning of bitfields, and the positioning of bitfields within a larger structure defines padding. And that padding behaves like any other padding.

I think this explanation could use further in-RFC clarification to make the approach more obvious, and to separate the concept of "unnamed 'fields' defining where fields are placed" from "padding defined by where fields are placed". But given some additional explanation in that area, I don't object to the idea of making this RFC entirely orthogonal to any definition of padding..

I do think that many practical uses of bitfields would benefit from some additional definition of padding. But I also know that defining the behavior of padding is a much more controversial topic, and I don't want to see bitfields held up behind that topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're proposing, instead, that unnamed "fields" (including bitfield(0) fields) don't define padding themselves; rather, they define the positioning of bitfields, and the positioning of bitfields within a larger structure defines padding.

No I think that's too narrow. Unnamed "fields" also affect the positioning of regular fields and the overall alignment. They are layout modifiers like repr(align) except that they are written inside the struct body.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, reading further into the updated description, I think the new approach of not calling them "fields" provides the necessary clarity. (The distinction of "bit-field fields" is perhaps a little awkward, but nonetheless clear.)

```

But this proposal is flawed because, in C, both the width and the underlying
type influence the layout. The Zig proposal throws the underlying type away.
Copy link

@clarfonthey clarfonthey Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an extremely small point that I didn't know that affects the proposal a lot. How does it affect it?

This to me is an important aspect that justifies using this method instead of Zig's.

Shouldn't just be a footnote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the first motivating example should make this clear.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. It doesn't do that great a job of providing motivation beyond "this is how it affects C."

Why not just require additional alignment attributes instead of using different types for bitfields? I feel like these should be weighed as additional options.

Compatibility with C definitions could definitely be enough to justify it, but it should be explained better in the RFC rather than assuming every reader knows what it means and how it affects their code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't do that great a job of providing motivation beyond "this is how it affects C."

Being compatible with C without having to write target-specific code is, in fact, the motivation of the RFC and this is made pretty clear. Please read the discussion of the original RFC where at least 50% of the over 90 comments were trying to argue that what we actually want is arbitrarily sized integers and custom struct layouts.

@clarfonthey
Copy link

I'm extremely torn. On one hand, there needs to be a functional way to support bitfields on repr(C) structs with full compatibility with C, and on the other, this proposal is absolutely incompatible with a more rusty but less-immediately-implementable version like repr(bitpacked) that I mentioned in my generic integers RFC.

Feel like at least some comments would be warranted on why it's reasonable to have a dedicated syntax for bitfields in repr(C) structs without an easy path toward doing so for repr(rust) and repr(packed) in a similar way.

@Lokathor
Copy link
Contributor

Doing this for repr(C) doesn't affect the possibility for doing this with repr(rust) or repr(bitpacked) or anything else.

@clarfonthey
Copy link

I disagree; if you just design features exactly for individual use cases you end up with far more features than necessary. It's a valid criticism to say that C-style bitfields are necessary and there doesn't need to be a Rusty counterpart, but if you think that there should be a Rusty version in the future then the feature should be compatible with it.

I don't think that having to distinguish between a 3-bit u8 and a 3-bit u16 is something that should be possible in a Rusty version, and as it stands, a Rusty version could potentially look much more different than this.

@Lokathor
Copy link
Contributor

a 3-bit u8 is almost surely 1 byte and a 3-bit u16 is almost surely 2 bytes, under any repr, so that would distinguish them plenty.

Regardless of that: It's already the case that repr(rust) and repr(C) do very different things in select cases (eg, MyStruct(u8,u16,u8)), so having repr(rust) and repr(C) bitfields also be different doesn't seem like a stretch at all.

@clarfonthey
Copy link

a 3-bit u8 is almost surely 1 byte and a 3-bit u16 is almost surely 2 bytes, under any repr, so that would distinguish them plenty.

Respectfully disagree. When you specify bitfields you're specifying bits, under the assumption that adjacent fields will be squashed together. What happens when you specify a 2-bit u8 and then a 4-bit u16? Does that mean one byte with a value limited to 2 bits and two bytes with a value limited to 4 bits? What about a collective two bytes with six bits squashed into one byte and the rest left alone? Or maybe one byte with two bits and one with four and both aligned to two bytes?

It's not at all trivial what it means to limit bits, expecting them to be squashed together, while still having them be in a different container. In a repr(Rust) variant I can imagine it being very reasonable to simply request bits and have them by arranged in the struct wherever they fit, including inside niche areas. In cases like this I can imagine bitfields being used to reduce total space used without explicitly caring where they're placed.

@Lokathor
Copy link
Contributor

Alright, fair.

But none of that really affects how a repr(C) bitfield struct would work. So this RFC doesn't actually block any of that sort of discussion eventually happening.

@joshtriplett
Copy link
Member

I think it's reasonable to define a #[bitfield(N)] attribute for use with repr(C). If, in the future, we define a repr(Rust) mechanism for bitfields, we can potentially reuse the attribute, while defining different semantics. Conversely, if in the future we define sized integer types, we can define how those types fit into structures and relate that to this RFC. Either way, I don't think the resulting total surface area will be substantially larger than if we started with the larger and more complex problem to begin with.

I do believe in thinking about potential future expansion and aiming to be compatible with that, but I don't think this RFC needs to expand to cover that entire potential future space. The proposal here seems like the most straightforward way to provide C-compatible bitfields, and I believe that the proposal is forward-compatible with other things we may want to do in the future.

@tmccombs
Copy link

So, if you had something like the following:

#[repr(C)]
struct A {
    #[bitfield(3)] a: u8,
    #[bitfield(4)]  _: u8,
    #[bitfield(1)] b: bool,
}

#[repr(C)]
union B {
      a: A,
      all: u8,
}

fn main() {
    let b1= B { a: A { a: 1, b: true } };
    let u = unsafe { b1.all };
    let b2 =  B { all: u };
 }

what is the behavior for the read of b1.all? Is this undefined behavior since the value of u would contain data read from padding bits? Or would u have an unspecified value, without invoking unwanted "undefined behavior" (meaning you don't know which bits in u correspond to which bitfields, but you do know that b2 will be a copy of b1). Or will the value of u be target-specific, but well-defined for each supported target (with the padding bytes either specified to be 0, or to be unspecified values)?

I also wonder if it would be better to express the padding with syntax like

#[bitfield(1)] field1: bool,
#[bitfield(4, padding_before = 2)] field2: u16, // include 2 bits of padding before this field
#[bitfield(2, break)] field3: u8, // start new allocation, equivalent to a zero-width bitfield before this field in c. name subject to bikeshed

The advantage is that it doesn't require creating dummy fields that aren't actually part of the struct, and it doesn't require any special treatment for a bitfield of width 0 (so documentation can just say that bitfields must always be of width 1 or greater).

@skade
Copy link
Contributor

skade commented Jan 18, 2021

After reading this RFC and the alternatives, I'm a bit confused why this solution was chosen to be RFCed instead of the BitField<T, N> variant. The alternative seems pretty extensive and coherent and the only argument I could find against it in that alternative is "This is a more structured solution but would need to be fleshed out.".

- Relaxing the requirement that `N` must be a literal.
- Bit-fields with types that are enums with integer representation.
- Bit-fields with types that are transparent wrapper structs around valid
bit-field types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bit-field types.
bit-field types.
- Have a way to specify which compiler the layout should be compatible with (gcc, clang, MSVCC, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently working on a comprehensive RFC that will bring various enhancements to repr(C) types including allowing you to opt into MSVC layout on MinGW targets. But that RFC will still take a few more weeks until it's ready.

@tmccombs
Copy link

The alternative seems pretty extensive and coherent and the only argument I could find against it in that alternative is "This is a more structured solution but would need to be fleshed out."

Well, and that the type is rather magical. Specifically:

  • you can't take the address of values that are of this type
  • The type parameter doesn't have a constraint, but not all types can be used there (could probably be remedied by making a special trait for types that can be used there)
  • Consecutive fields of this type would have special behavior with respect to layout, and how the data is read and written depends not just on the BitField type itself, but on fields next to it in the struct
  • You probably would only be able to use this type as a field in a struct (maybe possible to use as a stack variable, but that probably isn't very useful)
  • It couldn't implement traits like AddAssign. Interestingly, the API mentioned also wouldn't work, because the set method takes a &mut self which is impossible to get since you can't take an address of the Bitfield.

It's also unclear on how you would accomplish the equivalent of unnamed bitfields, which are used to add padding bits.

@MauriceKayser
Copy link

I don't want to pollute the RFC too much, but maybe my bitfield crate can serve as some kind of inspiration. It supports:

  • bools and C-like enums as bit flags + enumerability over flags, if C-like enums are used
  • Primitive types and C-like enums as multi-bit fields
  • Explicit and implicit positioning and sizing of fields and flags
  • Optional core::fmt::Debug and core::fmt::Display implementations
  • Compile-time overlap and boundary checking.

@alercah
Copy link
Contributor

alercah commented Feb 4, 2021

I am happy to see this! One potential suggestion: also allow a pointer type to be used as a bit-field type, provided its number bits is at least (size - log_2(alignment)). The semantics in this case would go beyond C but allow for the common "reuse low bits of the pointer" optimizations to be directly supported as well.

@fstirlitz
Copy link

I'm voting this down, since I'd rather see bitfields implemented in terms of a combination of:

I realise this is going to be a much more ambitious route, but I believe it will pay off.

@Lokathor
Copy link
Contributor

Lokathor commented Feb 4, 2021

What would be the benefits of the more ambitious route? None immediately come to mind.

@programmerjake
Copy link
Member

One benefit is uint<N>, which is also needed for SIMD bitmasks, can more effectively describe the actual values that the field can store. To handle C ABI differences for different base types for bitfields, we could annotate fields using #[repr(u32)] or something:

struct S {
    uint32_t a : 5;
    uint16_t b : 5;
    uint16_t : 0, c : 5;
};

could translate to:

#[repr(C)]
pub struct S {
    #[repr(u32)]
    a: uint<5>,
    #[repr(u16)]
    b: uint<5>,
    #[repr(u16)]
    #[repr(separator(u16), u16)]
    c: uint<5>,
}

The default repr for bit fields can be the bitwidth rounded up to the next power-of-2 with the same signedness, so #[repr(u16)] on a uint<10> field can be omitted.

@mahkoh
Copy link
Contributor Author

mahkoh commented Feb 12, 2021

I've pushed the following changes:

  • bool bit-fields are no longer restricted to width 0 or 1. This is to increase compatibility with MSVC which allows bool bit-fields up to size 8.

  • The RFC now specifies that the layout of structs that have no C equivalent on the target is unspecified, similar to repr(Rust). This is nothing new, as the layout of structs containing () or i128 was already unspecified on some targets.

  • I've replaced Clang by the target's system compiler as the normative C compiler for the target. Appendix A contains a map from LLVM targets to their system compiler. I've composed this map myself on a best-effort basis. The layout algorithms used by Clang, GCC, and MSVC are incompatible, making this change necessary. The repc crate documents the layout algorithms of these compilers.

    The upside is that this removes the ambiguity around structure layout on Windows.

  • New future possibility: Unions with bit-fields. This is currently blocked by the requirement that unions have at least one field.

  • New future possibility: Opting into the MSVC layout on *-windows-gnu targets so that you can interact with OS APIs using bit-fields in a MinGW environment. Note that GCC on MinGW already uses a layout algorithm that is compatible with the MSVC algorithm in many cases. As per the RFC, rustc would use this algorithm by default.

I'll soon-ish post a separate RFC that will address some of these problems.

@rlgloden
Copy link

rlgloden commented Mar 4, 2021

I know this RFC is specifically targeting C compatibility ...

I am coming to Rust from Ada which had pretty powerful struct layout capabilities. Basically Ada allows you to optionally specify the byte offset and the bit range for each field of a record (including the record discriminant field if it was a variant record), e.g. the equivalent of the Rust structs and Rust enums:

   -- example of an optional Ada record layout (actual record definition not shown)
   for My_Example_Type use record
      Weight   at 0  range   0 .. 11;
      Type     at 0  range  13 .. 14;
      Valid    at 0  range  15 .. 15;
      Color    at 0  range  16 .. 31;
      Count    at 4  range   0 .. 7;
      Flags    at 4  range   8 .. 31;
   end record;

Not suggesting you use this format as it is a little verbose ... but suggesting the concept of specifying absolute locations for structs/enums vs. specifying field sizes and hoping for the best. This is very very important when defining large structs/enums that map to a hardware interfaces, packed messages, or OS devices. The compiler can easily verify the bitfield definitions are correctly sized for the field types and don't overlap. And if you need a gap in the layout of a struct/enum, you just leave a gap in the layout definition. Ada solved quite a few pesky embedded / system programming problems that are worth borrowing.

Will defining enum layouts also be supported in your approach? I really hope so!!! We found specifying enum layouts (called variant records in Ada) to be a very powerful way to define / interact with hardware interfaces and to support packed messages in the embedded world.

@tmccombs
Copy link

tmccombs commented Mar 4, 2021

@rlgloden I think one difficulty with that approach is that for compatibility with bitfields in c/c++, the offsets and alignments can actually be different depending on the target ABI.

@BurntSushi
Copy link
Member

Moderation note: mahkoh is no longer participating in the Rust community. If someone else is interested in working on this, they are welcome to open a new PR.

After consulting with the lang team, I'm closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.