Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should niches/ABI be part of the layout of a type? #122

Closed
gnzlbg opened this issue Apr 19, 2019 · 27 comments
Closed

Should niches/ABI be part of the layout of a type? #122

gnzlbg opened this issue Apr 19, 2019 · 27 comments
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 19, 2019

The current definition of layout (https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#layout) does not consider "niches" part of the type layout.

In this thread (#120 (comment)) it was argued that maybe we might want to change that and make them part of the layout of a type.

If we do that, we need to change the glossary, and distinguish that &mut T and *mut T don't have the same layout, because they don't have the same "niches".

cc @eddyb

@gnzlbg gnzlbg added the A-layout Topic: Related to data structure layout (`#[repr]`) label Apr 19, 2019
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 22, 2019

@RalfJung Niches can have at least two different sources: invalid representations, which are part of validity (e.g. in &T), and padding, which is certainly a part of layout (e.g. in (u8, u16)).

The current definition of layout in the glossary includes padding by omission (e.g. by saying that field offsets are part of layout), but does not explicitly mention it. Maybe we should explicitly mention padding in layouts definition, and mention that padding bits introduce niches in the representation.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 22, 2019

So I think that *mut T and &mut T should have the same layout - what they don't have is the same validity invariant.

This would narrow this question to whether (u8, u16) and (u8, u8, u16) should have the same layout. They do not have the same niches, but all bit-patterns are valid for both types.

@Lokathor
Copy link
Contributor

It would be a very useful property if two types with the same layout, when used as the concrete type of a generic type, produced the same layout of the overall type. In other words, Option<T> has the same concrete layout for any two concrete types you put as T as long as those two types have the same layout as each other.

I think this makes for a rule that is very easy to understand and teach. I don't think there's any optimization possibilities lost.

However, if we want to have such a rule then niche needs to count as part of the layout.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 22, 2019

All representations are valid for padding bytes, so they cannot introduce niches.

@RalfJung
Copy link
Member

RalfJung commented Jun 23, 2019

Niches can have at least two different sources: [...] and padding

All representations are valid for padding bytes, so they cannot introduce niches.

Just to be clear: the latter is correct, the former is not. Padding cannot be used for niches.

invalid representations, which are part of validity

That is indeed the justification for a niche. But not every ruled-out bit pattern is a niche. For example, 0x1 is not a valid bit pattern for &u32, and yet it is (currently) not part of the niche. Hence I think we have to consider niches and validity as separate terms. They are connected by a "soundness theorem" saying that all values in the niche are never valid for the type.

However, if we want to have such a rule then niche needs to count as part of the layout.

Yes, that's basically my point. We need some name for "all the things of T that we need to know when computing the size and alignment of Wrapper<T>". Those things are size, alignment and niche. (Field offsets are not part of it though!)

So, what do we call that thing? "Layout" seems like a reasonable term. So "layout" would, by definition, consist of the size, alignment and niche of a type.

But there are other things that are relevant for a type in this context, that are also sometimes to be considered to be included in "layout", namely the function call ABI and the offsets of the fields (if the type has any). In particular, TyLayout in rustc includes this additional data. But this means that e.g. u32 and Option<NonZeroU32>, while having the same (size, align, niche), don't have the same TyLayout.

So maybe (size, align, niche) should be called something else, to capture precisely the property @Lokathor was mentioning? Or maybe TyLayout should be renamed to TyLayoutAndAbiAndFields (working title)?

@RalfJung
Copy link
Member

RalfJung commented Jun 23, 2019

What seems really odd though is to include ABI and fields, but exclude the niche. I think that is just an oversight.

So my proposal is to update the docs to include "niche" in the definition of "layout". Or does anyone have a case where talking about (size, align, fields, abi) and excluding the niche is useful? @gnzlbg you seem to have that in mind when suggesting that &mut T and *mut T should be considered to have the same layout.

That still leaves open the question of how to call (size, align, niche), though.

I would certainly not include the validity invariant in whatever a layout is, that's way more information than we actually need. Abstracting it to a "niche" the way rustc does is useful I think.

@hanna-kruppe
Copy link

hanna-kruppe commented Jun 23, 2019

I would prefer to exclude "ABI" (meaning how it is passed by value, not the more general sense of Application Binary Interface) from "layout":

  • "layout" of course comes from and suggests memory layout, which that "ABI" doesn't influence
  • it goes along nicely with wording such as "a newtype has the same layout as its only field, but only has the same ABI is it's repr(transparent)" (which is essentially how we've been explaining the importance of repr(transparent) all along).

That is, I propose "layout = (size, align, fields, niche)". This would maybe entail renaming TyLayout but ¯\_(ツ)_/¯ IMO it's a misnomer anyway.

I also think it's fine that this is more information than is needed for "computing the layout of Wrapper<T> from the layout of T". If the distinction is important we can make up a term like "layout shape" or "layout without field offsets" for it (or just spell out the inputs to the layout computation in more detail), but for most of things we discuss in terms of "layout" (e.g., whether type punning is ok, whether layout computation is deterministic, etc.) the field offsets do potentially matter.

@RalfJung
Copy link
Member

whether layout computation is deterministic

We'd definitely want to include the ABI in that one though.

whether type punning is ok

Well, type punning is okay between Option<NonZeroU32> and u32 even though they don't have the same "field offsets" (the latter doesn't even have any fields).

@hanna-kruppe
Copy link

We'd definitely want to include the ABI in that one though.

Sure, just say "ABI and layout is deterministic [w.r.t. ...]".

Well, type punning is okay between Option and u32 even though they don't have the same "field offsets" (the latter doesn't even have any fields).

Yes, there is lots of type punning between types that aren't comparable wrt fields, and there's also other things (e.g., validity and safety) invariants to keep in mind when type-punning. Field offsets are just part of the story.

@RalfJung
Copy link
Member

I feel it makes sense to exclude fields, because then equality of layout is a necessary condition for type punning of T inside an arbitrary Wrapper<T> to make any sense.

Turns out the docs already have a definition of layout, and it doesn't agree with the glossary: https://doc.rust-lang.org/stable/reference/type-layout.html says

The layout of a type is its size, alignment, and the relative offsets of its fields.

So this does not include ABI, nor niche. I think this is unlike anything that any one of us has been proposing. ;)

@hanna-kruppe
Copy link

I feel it makes sense to exclude fields, because then equality of layout is a necessary condition for type punning of T inside an arbitrary Wrapper to make any sense.

I can understand that, but there are many incompatible substitutions for X, Y in "if we define layout as X then propety Y can be stated concisely in terms of layout". I don't know how to resolve that, arguing about which is more important seems miserable and unlikely to help.

But I don't have very strong opinions about most of the definition anyway. I am very serious about this one point though: specification terms that Rust users recognize from informal/pre-formal discussion should bear some resemblance to this informal/preexisting meaning. From that angle, "layout" absolutely must include field offsets. After all, where fields are located is a major part of how the type is laid out in memory, and fiddling with that is a large source of layout optimizations.

I care less about whether niche and ABI are in or out, definitely not enough to argue at length about it. Users think about those comparatively rarely. But telling users that #[repr(C)] struct Foo(u8, u32); and #[repr(C)] struct Bar(u32, u8); "have the same layout" is just plain misleading.

@RalfJung
Copy link
Member

But telling users that #[repr(C)] struct Foo(u8, u32); and #[repr(C)] struct Bar(u32, u8); "have the same layout" is just plain misleading.

That's fair. I think you convinced me that field offsets should be included, also for consistency with existing docs.

So the contentious points seem to be whether ABI and/or niche are included -- and if not, what we call the thing that includes them as well.

The current definition of layout in the glossary includes padding by omission (e.g. by saying that field offsets are part of layout), but does not explicitly mention it. Maybe we should explicitly mention padding in layouts definition, and mention that padding bits introduce niches in the representation.

IMO it is pretty clear that when you define where the fields are, that also defines the gaps between the fields, i.e., padding. Once the field offsets are fixed, there is no freedom left for where to put padding. But it might make sense to state this explicitly.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 24, 2019

I feel it makes sense to exclude fields, because then equality of layout is a necessary condition for type punning of T inside an arbitrary Wrapper to make any sense.

What do you mean by type punning?

The alignment of T and Wrapper<T> does not need to match for mem::transmute to be ok - only their size needs to match. So AFAICT, one can type pun types of different sizes, alignment, niches, and fields, depending on how one does the type punning. The requirements for the type punning will depend on what the particular API requires, but "layout equality" is probably a too strong requirement.

@Lokathor
Copy link
Contributor

People don't just pun with transmute, they also pub with slice casting stuff. I actually had to put it in a crate recently, link, because people kept wanting to do it but getting it wrong.

However, as you say, even a slice cast doesn't require layout equality.

@eddyb
Copy link
Member

eddyb commented Jun 25, 2019

I would prefer to exclude "ABI" (meaning how it is passed by value,

Nit: I think we can maybe refer to this as "call ABI"? As in, the ABI of a given type wrt calling conventions?

@RalfJung
Copy link
Member

I'd love that. "ABI" is such an overloaded term...

@Ixrec
Copy link
Contributor

Ixrec commented Jun 25, 2019

For the reader semi-confused by the phrase "the ABI w.r.t. calling conventions": is "call ABI" actually different from "calling convention"? Or are they just synonymous? Or is one of them a superset of the other?

@eddyb
Copy link
Member

eddyb commented Jun 25, 2019

@Ixrec It's... complicated. One way to look at it is that each calling convention takes in argument/return types' "call ABI" and lowers that to passing those argument/return values in registers and/or the stack.

E.g. the "call ABI" of (i32, ()) and that of i32 are the same (scalar, more specifically a 32-bit integer, signed), which means that every calling convention (SysV, Windows stdcall, etc.) must treat them the same.

It gets trickier with an aggregate "call ABI", because some calling conventions introspect the layout, even recursing through the fields (x86_64 SysV being the most complex AFAIK).

I guess the confusing bit is I could say "call ABI (of a type)" and "call ABI (of a platform)", the latter being more or less a "calling convention" (but I'm less sure here).

There's also the more gnarly distinction between what LLVM lowers itself, and what the frontend has to lower, but I would think we'd paper over that in this context.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 25, 2019

@Ixrec

For the reader semi-confused by the phrase "the ABI w.r.t. calling conventions": is "call ABI" actually different from "calling convention"?

They are different. The "calling convention" is an agreement between the caller (of a function) and the callee (the function) about how to interface, for example, the function arguments (amongst many other things). Both need to agree on this.

In Rust, most functions follow the Rust calling convention, but you can also choose extern "C" fn.., extern "fastcall" fn..., etc.

These calling conventions classify the types that you can pass around into different categories, e.g., in some calling conventions, an i32 function argument is a SCALAR and a struct Foo { ... } is an AGGREGATE. Depending on the category, the calling convention might define that the argument is passed in register X, or that it must be put on the stack frame, or passed in some other way.

These categories is what is meant here by "call ABI" of the type, and the same type can be a SCALAR in one calling convention, and an AGGREGATE in another. This is fine: the calling convention is part of the function type so both the caller and the callee agree on the category.

People often want to use, e.g., wrappers like struct Wrapper(i32); when interfacing with C functions that expect an i32. This does not work "as is" because i32 and Wrapper(i32) can have a different "call ABI" (their category is not necessarily the same). Applying repr(transparent) to Wrapper gives it the same category as i32 solving this problem.

@eddyb
Copy link
Member

eddyb commented Jun 26, 2019

These categories is what is meant here by "call ABI" of the type, and the same type can be a SCALAR in one calling convention, and an AGGREGATE in another

This is inaccurate, or at least misleading, as we have a scalar/aggregate distinction that the calling convention can't contest: it might still pass an aggregate in registers or a scalar on the stack, but it can't tell apart struct Wrapper(i32); from i32 (with or without repr(transparent)) - only repr(C) makes that an aggregate (which e.g. the x86_64 SysV calling convention will still pass in a register).

@RalfJung
Copy link
Member

RalfJung commented Aug 11, 2019

@eddyb proposed in private conversation to use "ABI" (instead of layout) as the overarching term here, and then categorize that into memory ABI (= layout?), call ABI, and maybe more.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 12, 2019

@eddyb proposed in private conversation to use "ABI" (instead of layout) as the overarching term here, and then categorize that into memory ABI (= layout?), call ABI,

I like that.

What would "memory ABI" be? Just size+align? If so, I don't think we should call that "layout". The term is a bit overloaded, and we use it, e.g., in the context of "layout optimizations", which do apply to "niche"s as well.

Also, where do "niche"s go there ? I suppose we could have a "value ABI" that includes padding+niches, where the difference is that "niche"s can be used for layout optimization while "padding" cannot. Maybe we need a better word for this than "padding", and this could also tie nicely with "value representation".

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 12, 2019

For example, the "value ABI" for bool (for all currently supported platforms) could be that bool has no padding, and can only take the values 0 and 1 - everything else is a "niche".

@eddyb
Copy link
Member

eddyb commented Aug 12, 2019

I suppose we could have a "value ABI"

One way to be more precise about this is to talk about "memory" vs "immediate".

As for what "memory ABI" would be: size, align, field offsets and "largest niche" offset/range. You could implement an entirely compliant Rust compiler with it except for FFI calling conventions.

@RalfJung RalfJung added the C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary label Aug 14, 2019
@RalfJung
Copy link
Member

The issue with these "X ABIs" is that often people will say just "ABI" when they mean "call ABI". At least that's my experience.

But otherwise, I do like this proposal. Maybe we also need a "nesting ABI" that includes the niche, whereas "memory ABI" only includes size + alignment?

@RalfJung RalfJung changed the title Should niches be part of the layout of a type ? Should niches/ABI be part of the layout of a type? Sep 2, 2020
@RalfJung
Copy link
Member

RalfJung commented Sep 2, 2020

I think this has not been mentioned in this issue yet, quoting from #153 (but said meeting was more than a year ago):

The conclusion in the meeting was that we should avoid the term "layout" in the reference, and instead define in the glossary which components a layout can have, and then always spell that out explicitly, probably with an abbreviation like "they have the same SAN-layout" [size, alignment, niche].

@JakobDegen
Copy link
Contributor

Closing as a duplicate in favor of #304 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants