Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clarify what is UB #149
clarify what is UB #149
Changes from 14 commits
dfb93b2
1dcafde
280761a
86d9e2c
3241c00
909b14c
f59eca2
738a338
0f51082
efb5086
df5ff63
c41d492
1f613d8
c73730b
bd6215e
7386b5c
864625f
64bf0a5
86a89ae
c5778a1
7703c18
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our bickering about unions does actually wrap back around to this point:
is
union { a: bool; b: bool; } = 3
"producing an invalid value as a field of a compound type"?(we can probably gloss over this, but it is something to make clearer when we have a better answer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer is "we don't know yet, see rust-lang/unsafe-code-guidelines#73".
So yes, this is a good question, and one that I would prefer we could skip over for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reword for consistency:
str
that isn't valid utf8There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we want to skip this entirely because this is just a library invariant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to think about this precise wording. I think you can break this out of here and fold it into "reading uninitialized memory". I want to have some wording along the lines of "you can't tell the compiler a value definitely exists where only uninitialized memory resides", but this needs to be reinforced by documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a type-based property though. The following is fine:
but the following is not:
I would actually remove the "Reading uninitialized memory" clause in favor of this type-based one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a shot, let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i get what you're saying but two things:
A) Pedantic aside that probably doesn't matter: do we actually care about guaranteeing
let x: MaybeUninit<u8> = mem::uninitialized();
is valid? I suppose that's important if you want to be able to cast a pointer to a fresh allocation to be&mut MaybeUninit
and not do dummy writes ofuninit()
to be able to call its methods? Hmm... (sub-aside: i'm a bit uncomfortable with technical discussion appealing to mem::uninitialized since it's busted in a bunch of ways, so it doesn't actually have a coherent meaning. Fresh allocations, however, work fine as well-defined uninitialized memory the compiler knows about.)B) I was using a subtler definition of "a value exists", which agrees with your statement that it's type-based. Basically empty types don't have values; they can't. So by having a
union { (), T }
you are, with types, telling the compiler that it may always be the case that a value doesn't exist there (but it's not opaque like asm or volatile, so the compiler can clearly observe reads and writes and reason about the memory's bytes).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, my definition of empty things being valueless fails to handle
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some Proposed Terminology for the sake of this conversation. This is not necessarily useful for a formal model, but it is helping me make sense of the concepts in my head:
a value is a property a byte or collection of bytes has. Bytes may have the secret 257th value of "uninitialized".
An instance of a type is something you get from constructing that type. Moving or destroying an instance relinquishes a piece of code's access to that instance. Those instances do not necessarily have a unique, tracked, identity, but their ability to reach a point in the program serves as a kind of proof (more on that below). The Copy trait just gives permission to all code that has acquired an instance to mint more instances with the same value.
I am not 100% comfortable with the notion of "construction", as room must be made for transmutes, plain-old-data, and deserializers. I think this basically boils down to a similar situation as double-drops where any type is freely allowed to assume its instances aren't double-dropped with UB at stake, but a type may randomly decide it's fine to be double-dropped. There's no trait/property for this, it's just a thing a type can randomly be fine with. It's very contrived to come up with such a type, so we don't care about this much outside of the very specific niche of standards and trivia.
Similarly, types must be able to assume that there is meaning to the existence of an instance, also with UB at stake. But types may allow instances to be minted without any constructor. Which is to say, for some types it may be fine to have more instances of a type "live" in a program than the number of its constructors that have run. This is a relatively important fact about a type, but just like with doubledropabble it's not something with a standardized trait/property. (perhaps one can just appeal to access to a type's record initialization syntax as permission to mint as many instances as one pleases without even using that syntax?)
Anyway.
Certain types may have forbidden values, indicating that if a piece of code claims to have an instance of a type, the memory that instance occupies must not have certain values.
Zero-sized types do not have values, but they do still have instances. A logical consequence of this is that we cannot have the concept of a ZST with a forbidden value. Nor can we speak of an "uninitialized" ZST. It doesn't make sense.
From this, I would like to assert that
!
does not in fact fall under the umbrella of forbidden values: it is simply forbidden to claim that you have an instance of!
, because it does not have a constructor. If you claim to have an instance of!
, you are lying, and we have made that lie unconditional UB (as is our right as type authors).Some additional context, from the practical perspective:
In the stdlib we frequently appeal to the idea that, logically speaking, there is always an instance of a zero-sized type everywhere you look. So e.g.
(73 as *const ()).read()
is not UB ("you can pull instances from the aether").This logically falls out from needing to be able to implement a type like Vec which doesn't like, reserve an address in memory for its copies of the ZSTs. It just dumps them into the aether and then pulls them from the aether when it needs to.
However it is dangerous to pull values from the aether like this. It is an important property of rust that privacy lets abstraction authors use the existence of a instance of a type they control serve as a proof that certain things have occurred, and this extends to ZSTs. So for instance one might use a ZST as a token giving permission to access a global, and so pulling that token from the aether could cause UB.
The obvious rule of thumb here is that you should only be able to pull as many instances from the aether as you dumped there. Hence
Vec<ZST>
is fine, because it is basically just a counter that tracks how many instances of the ZST that this Vec has been given and subsequently dumped into the aether. This counter prevents safe code from pulling extra instances from the aether (forging them).This jives well with
!
being illegal to claim to have an instance of: you "can't" pull an instance from the aether that you did not dump there unless you have access to the type's constructor, and no one has access to that constructor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postscript: claiming to have an instance of
union MyUnion { a: T, b: U }
is a claim that you have either an instance of T or an instance of U stored in that memory (this similarly follows for enums, but discriminants make it messier to talk about so let's ignore them).If T is a ZST (as in MaybeUninit), it follows that it is always valid for a MyUnion instance to be uninitialized, as that is always conformant with the claim that "there exists an instance of T here". Similarly if T is fine to instantiate, it's fine for U to be illegal to instantiate (
!
), as "p or contradiction" is just "p".An interesting claim that falls out of what I just wrote here is that if T and U have mutually invalid values, the compiler may conclude that those mutually-invalid values are invalid for a MyUnion to have. So a union (or enum) is not actually completely opaque to the compiler. It's just that if one arm is
()
, the compiler has to let you do whatever you want with the memory/instantiation. I am lightly worried this contradicts something important, but I'm not sure! I do think this is consistent with how we compute "niches" for enum packing optimizations, I'm just having trouble convincing myself that it's ok to peek through someone's bare union if niches happen to line up. If it's not, the union author can use the MaybeUninit trick to make the compiler back off (similar to how people add PhantomData fields all the time), so I don't think it's a big deal.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so that sounds more like what I wold call a "(Rust) Abstract (Machine) Byte".
Is this a list of bytes, of values, or something more mathematically abstract?
To me this indicates your concepts are flawed. I feel the fact that
!
has 0 "inhabitants" and()
has 1 andbool
has 2 should nicely fall out of the same concept. I have described such a concept in the UCG repo, but it is still work-in-progress.In particular, "ZSTs have no values" sounds just so wrong to me. The type
()
obviously has 1 value, and that's()
! (We sadly write them the same in Rust, but you get what I mean.) You even said yourself that "value" has something to do with a "collection of bytes"; if I interpret that freely as "list of bytes" then the type()
has one "value", and that's the empty list."No value" to me sounds like "empty type", and that raises some bad memories about how many people call
void
in C "empty". That's not accurate.void
is exactly like()
, it conveys no information, but it isn't "empty". The way to formally model "no information" is to say that "there is only one possible answer", AKA "the answer was already known so it doesn't even have to be given", AKA a type with exactly 1 possible value.Information-theoretically this also works perfectly: you need
log2(N)
many bits to encode which of N answers is given, andlog2(1) = 0
. So you need 0 bits to encode the()
type.The way I would phrase it is that copying/moving a ZST is a NOP. There is nothing to copy, operationally. As you said,
Vec
won't pull "more instance of a ZST out of this air as exist in the vector", so logically speaking, this is not really out-of-thin air -- it's just that you cannot see the moves that are happening.We've had cases where our handling of ZST did not follow this model, but I think those were bugs.
Agreed. However, in my terminology, those privacy-induced extra properties arise from the safety invariant, not the validity invariant.
All of that said, I think much of it is not relevant for a discussion about "what is UB". Even if
Vec
did "duplicate" a ZST, that would not be instantaneous UB! We cannot make the Abstract Machine understand the safety invariant, it can be arbitrarily complicated after all and is often user-defined.In other words, I claim that for any slice of ZST, doubling its length does not cause undefined behavior. Doing this may lead to UB later, when some code that assumed its safety invariant is not broken runs, but the act of doubling the slice length itself is perfectly fine, from the perspective of the Abstract Machine.
We have some disagreements here. ;)
I think treating unions this way is very, very complicated because when we go about defining the Abstract Machine in a concrete, operational way (like an interpreter), we cannot know which variant the union is in. It is because of that that I am strongly pushing for "unions have no active variant". https://github.com/rust-lang/rfcs/blob/master/text/2514-union-initialization-and-drop.md deliberately did not have such a notion, and these days the Reference states:
Unions have no notion of an "active field".
I think you are re-tracing a lot of rust-lang/unsafe-code-guidelines#73 right now. ;) The OP of that thread already contains some code that I (and many in the UCG) think is legal, but that you just ruled out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't properly consider the claim of unions "having an instance" in the context of field-by-field re-initialization into another type (which is of course extremely important). But my intended claim here is exactly the one gnzlbg is championing in that thread (and I have replied there to that effect) -- only if niches align is a niche actually available. Hence "mutually invalid values" (the intersection of their value domains, in your terminology). This is consistent with the way we make you create a union by providing an entire instance of one of the union's cases. Like, by default to create a union you have to very materially have an instance.
But yeah once you start using the union you can enter a super-position of instances, which certainly makes it awkward to talk about. I am absolutely not intending to suggest there is an "active member" as that is a horribly cursed concept.
Certainly though, if you ever do
&mut my_union.a
, in my understanding of the current reference semantics, you are making a very material claim of having an instance of that type. And it would also be very difficult for us to make unions robustly opaque, in the way asm!() blocks and volatile accesses are.Something still really bothers about the rules of ! falling out the rules for value domains. Like yes I understand the notion of 0 vs 1 inhabitants (although thank you for reminding me of that term), but inhabitants don't feel like the right abstraction for talking about packing bits? Like you also appealed to the notion of "a list of bytes", which I agree can reasonably include the empty list, but if I ask for a list, you're not allowed to come back with "well, here's no list at all, that's a list right".
Gonna keep reading the relevant things over and mulling it over. I am under the impression I'm just in a weird headspace and I'll eventually come around to your position on just having ! slot in here naturally.
Question for you: is
union { a: !, b: ! }
uninstantiable? i.e. is it equivalent to!
? In my mind it should be equivalent to!
, but it seems like perhaps you would like it to be instantiable?More broadly: to what extent is
union { a: T, b: T}
not justT
? Your definition of unions seems to want it to be equivalent toMaybeUninit<T>
(ignoring Drop as uninteresting here), which is strange, because that isn't the definition it uses!(assume repr(rust) here, if you find my proposal for unions in rust-lang/unsafe-code-guidelines#73 compelling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. At this point we entered rust-lang/unsafe-code-guidelines#77: does a reference
&mut T
have to point to a "valid"T
even when theT
is not actually loaded from memory? The document here for now, conservatively, says "yes", but I actually think we should delay this until the pointer is actually "followed" and aT
is materialized. But that is a separate discussion.That's why I also have a relation in the mathematical sense that relates "abstract values" (the view of the world where
bool
has 2 inhabitants,()
has 1 and!
has 0) with the way they are represented on the machine (as lists of bytes). If we assume canonical representaiton for a second (which not all types have, but the three we are considering here do), then that means there are exactly 2 byte lists that represent a validbool
(both lists have length 1:[0x00]
and[0x01]
), there is exactly 1 byte list that represents a valid()
(the empty list[]
), and there is no byte list that represents a valid!
.So, I think this translates fine even if you switch between the abstract view of a type (the "value" view) and its concrete view (the "byte list" view).
I personally would prefer for a union to be just a "bag of bytes": its abstract values are all byte lists of appropriate lengths, and the types in the fields don't matter at all (except for their size and alignment) until some field is actually used.
My view of a union of size
N
is basically that is is the same as[MaybeUninit<u8>; N]
, and that we also offer field access as a convenient way to transmute that type to the type of the field (truncating the data if the field does not cover the entire union).There are problems with that view, unfortunately. But they are not related to
!
.So, with that view, the union you are asking about is indeed "instantiable"; it has size 0 and the empty byte list is valid for that type. But accessing any of its fields is a transmute-to-
!
and therefore UB.What do you mean by "the definition it uses"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested reword and massive clarification:
Many have trouble accepting the consequences of invalid values, so they merit some extra discussion. The claim being made here is a very strong one, so read carefully.
A value is produced whenever it is assigned, passed to something, or returned from something. Keep in mind references get to assume their referents are valid, so you can't even create a reference to an invalid value. Additionally, uninitialized memory is always invalid, so you can't assign it to anything, pass it to anything, return it from anything, or take a reference to it. (Padding bytes are not technically part of a value's memory, and so may be left uninitialized.)
In simple and blunt terms: you cannot ever even suggest the existence of an invalid value. No, it's not ok if you "don't use" or "don't read" the value. Invalid values are instant Undefined Behaviour. The only correct way to manipulate memory that could be invalid is with raw pointers using methods like
write
andcopy
. If you want to leave a local variable or struct field uninitialized (or otherwise invalid), you must use a union or enum which clearly indicates at the type level that this memory may contain no values (see MaybeUninit for details).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied most of your suggestions but this one is big enough that it is probably easier to hand the PR off to you. ;) I'd love to do a pass over what you got when you are done, if you don't mind.
I like this new text, as usual in you very pointed style! One comment though:
That's not true for
MaybeUninit
.