-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not claim that transmute is like memcpy #99614
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
(rust-highfive has picked a reviewer for you, use r? to override) |
@rustbot label +T-libs-api +T-lang -T-libs |
library/core/src/intrinsics.rs
Outdated
/// | ||
/// `transmute` is semantically equivalent to a bitwise move of one type | ||
/// into another. It copies the bits from the source value into the | ||
/// destination value, then forgets the original. It's equivalent to C's | ||
/// `memcpy` under the hood, just like `transmute_copy`. | ||
/// destination value, then forgets the original. Note that source and destination | ||
/// are passed by-value, which means if `T` or `U` contains padding, that padding | ||
/// might *not* be preserved by `transmute`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding". I also think the "bitwise move" may throw people off. People's deep intuition is that move
means no innate effect on the object unless it is hurled with great force.
Like, I might imagine the Rust AM executes something that looks like
// Like a "real" machine register, but contains potentially any number of bytes.
struct Register([u8]);
// Not a pointer or a reference, but simply the location of something in the AM.
// This can even include something being held in a register.
struct Address;
pub const unsafe extern "rust-abstract-machine" fn transmute<T,U>(arg: T) -> U {
let mut src_addr = Address::load_address_from(arg);
let value = Register::load_as_type::<U>(&mut src_addr);
src_addr.destroy_range_of::<T>(arg); // Burn our bridges behind us.
return value
}
The catch here is that this is a "move of the bits", but my understanding is what we are really doing is creating a new value that was derived from the original argument. If the Rust AM knows an arbitrary bit must be set or unset in the new type U
, the effect of Register::load_as_type::<U>
may be to automatically always set or unset that bit. Or it may be preserved exactly as-is. Or it may check if the bit is correctly set or unset in the original T
and then, if it is, finish creating U
with the appropriate bytes, or if it isn't, pull the bytes from getrandom()
instead. "It's UB, I ain't gotta explain shit!"
...In other words, with mem::transmute
, we are in fact hurling things with great force, but then, if it was a valid transmutation, we find there's ahem padding at the end, enough to soften the landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding".
Neither do I, so I am not sure what you are alluding to here.
I am not quite sure what to make of your comment -- do you have some wording suggestions?
Note that nothing here is even specific to transmute
. Any by-value passing of arguments works this way.
what we are really doing is creating a new value that was derived from the original argument
We are serializing the argument to memory using one format (T
), and then deserializing using another format (U
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding".
Neither do I, so I am not sure what you are alluding to here.
Ah, I mostly meant that with this wording I think the transformation created in #96140 would still be surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I should have looked more carefully at the issue I am linking. oops
Yeah that is just complaining about UB code not doing what they expect it to do. We already say
/// Both types must have the same size. Neither the original, nor the result,
/// may be an [invalid value](../../nomicon/what-unsafe-does.html).
Do you think that needs to be clarified somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think the main detail is that programmers often think of UB in an unhelpful way that is more misleading than informative, and here we have an especially UB-prone function, which people are trying to expect things of, so it might help to reiterate the usual concerns of UB just to set expectations more.
Because... compiling while risking UB is not quite "aha, I detect UB, gotcha! nasal demons!" Yet I think that's the folkloric understanding. It's more "for all possible traces of control flow through this function, I may select a machine encoding of this function that produces the correct results assuming mem::transmute
's invariants were upheld, and may go wildly wrong if they were not." This is... the "same thing" to logicians, yes? But programmers are often not logicians, even when they are comfortable with logic.
So since this is a Wildly Unsafe function that does Wildly Unsafe transformations yet is nonetheless "necessary", in a certain sense, at least for now, I think it might be helpful to reiterate some form of the usual "these invariants must be upheld, and the compiler may 'help' by inflicting them on your program in a way it deems appropriate, such as (but not limited to) replacing invalid values with valid ones, or removing code that would have resulted in producing an invalid value (for example, the entire function body that contains an invalid call to mem::transmute
, which may include your entire program if the compiler has also made certain inlining decisions)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have expanded the wording a bit. I didn't want to go into quite as much length as you did though -- the docs link to the reference page on UB, so if necessary such clarification should be added there, IMO.
b963b10
to
aed5cf3
Compare
@thomcc this is another stable unsafe fn documentation clarification; could you take a look? |
At a glance this seems fine to me and just clarifies what we already documented (I don't think this changes guarantees at all), but I'll leave it to the assigned reviewer. |
Co-authored-by: Jubilee <[email protected]>
Josh seems to have a big review backlog so I was hoping someone else could take over. :) |
Hm, fair enough. I'll review after dinner. r? @thomcc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This just moves things around and/or documents things that already don't work. I left a few notes but don't feel the need to change anything; r=me if you want to land it as-is.
@bors r- I simply haven't gotten around to reading your comments yet.^^ |
Ah! My bad, I'll wait longer next time. Looks good to me. @bors r+ rollup |
Rollup of 5 pull requests Successful merges: - rust-lang#99371 (Remove synchronization from Windows `hashmap_random_keys`) - rust-lang#99614 (do not claim that transmute is like memcpy) - rust-lang#99738 (rustdoc: avoid inlining modules with duplicate names) - rust-lang#99800 (Fix futex module imports on wasm+atomics) - rust-lang#100079 (Replace `* -> vec` with `-> vec` in docs) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Saying transmute is like memcpy is not a well-formed statement, since memcpy is by-ref whereas transmute is by-val. The by-val nature of transmute inherently means that padding is lost along the way. (This is not specific to transmute, this is how all by-value operations work.) So adjust the docs to clarify this aspect.
Cc @workingjubilee