Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not claim that transmute is like memcpy #99614

Merged
merged 4 commits into from
Aug 3, 2022

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Jul 22, 2022

Saying transmute is like memcpy is not a well-formed statement, since memcpy is by-ref whereas transmute is by-val. The by-val nature of transmute inherently means that padding is lost along the way. (This is not specific to transmute, this is how all by-value operations work.) So adjust the docs to clarify this aspect.

Cc @workingjubilee

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 22, 2022
@rustbot
Copy link
Collaborator

rustbot commented Jul 22, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 22, 2022
@RalfJung
Copy link
Member Author

@rustbot label +T-libs-api +T-lang -T-libs

@rustbot rustbot added T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 22, 2022
Comment on lines 1212 to 1217
///
/// `transmute` is semantically equivalent to a bitwise move of one type
/// into another. It copies the bits from the source value into the
/// destination value, then forgets the original. It's equivalent to C's
/// `memcpy` under the hood, just like `transmute_copy`.
/// destination value, then forgets the original. Note that source and destination
/// are passed by-value, which means if `T` or `U` contains padding, that padding
/// might *not* be preserved by `transmute`.
Copy link
Member

@workingjubilee workingjubilee Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding". I also think the "bitwise move" may throw people off. People's deep intuition is that move means no innate effect on the object unless it is hurled with great force.

Like, I might imagine the Rust AM executes something that looks like

// Like a "real" machine register, but contains potentially any number of bytes.
struct Register([u8]);

// Not a pointer or a reference, but simply the location of something in the AM.
// This can even include something being held in a register.
struct Address;

pub const unsafe extern "rust-abstract-machine" fn transmute<T,U>(arg: T) -> U {
    let mut src_addr = Address::load_address_from(arg);
    let value = Register::load_as_type::<U>(&mut src_addr);
    src_addr.destroy_range_of::<T>(arg); // Burn our bridges behind us.
    return value
}

The catch here is that this is a "move of the bits", but my understanding is what we are really doing is creating a new value that was derived from the original argument. If the Rust AM knows an arbitrary bit must be set or unset in the new type U, the effect of Register::load_as_type::<U> may be to automatically always set or unset that bit. Or it may be preserved exactly as-is. Or it may check if the bit is correctly set or unset in the original T and then, if it is, finish creating U with the appropriate bytes, or if it isn't, pull the bytes from getrandom() instead. "It's UB, I ain't gotta explain shit!"

...In other words, with mem::transmute, we are in fact hurling things with great force, but then, if it was a valid transmutation, we find there's ahem padding at the end, enough to soften the landing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding".

Neither do I, so I am not sure what you are alluding to here.

I am not quite sure what to make of your comment -- do you have some wording suggestions?

Note that nothing here is even specific to transmute. Any by-value passing of arguments works this way.

what we are really doing is creating a new value that was derived from the original argument

We are serializing the argument to memory using one format (T), and then deserializing using another format (U).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe programmers will anticipate "the top bits of a boolean" to be what they think of as "padding".

Neither do I, so I am not sure what you are alluding to here.

Ah, I mostly meant that with this wording I think the transformation created in #96140 would still be surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I should have looked more carefully at the issue I am linking. oops

Yeah that is just complaining about UB code not doing what they expect it to do. We already say

    /// Both types must have the same size. Neither the original, nor the result,
    /// may be an [invalid value](../../nomicon/what-unsafe-does.html).

Do you think that needs to be clarified somehow?

Copy link
Member

@workingjubilee workingjubilee Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think the main detail is that programmers often think of UB in an unhelpful way that is more misleading than informative, and here we have an especially UB-prone function, which people are trying to expect things of, so it might help to reiterate the usual concerns of UB just to set expectations more.

Because... compiling while risking UB is not quite "aha, I detect UB, gotcha! nasal demons!" Yet I think that's the folkloric understanding. It's more "for all possible traces of control flow through this function, I may select a machine encoding of this function that produces the correct results assuming mem::transmute's invariants were upheld, and may go wildly wrong if they were not." This is... the "same thing" to logicians, yes? But programmers are often not logicians, even when they are comfortable with logic.

So since this is a Wildly Unsafe function that does Wildly Unsafe transformations yet is nonetheless "necessary", in a certain sense, at least for now, I think it might be helpful to reiterate some form of the usual "these invariants must be upheld, and the compiler may 'help' by inflicting them on your program in a way it deems appropriate, such as (but not limited to) replacing invalid values with valid ones, or removing code that would have resulted in producing an invalid value (for example, the entire function body that contains an invalid call to mem::transmute, which may include your entire program if the compiler has also made certain inlining decisions)."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have expanded the wording a bit. I didn't want to go into quite as much length as you did though -- the docs link to the reference page on UB, so if necessary such clarification should be added there, IMO.

@RalfJung RalfJung force-pushed the transmute-is-not-memcpy branch from b963b10 to aed5cf3 Compare July 23, 2022 12:17
@RalfJung
Copy link
Member Author

@thomcc this is another stable unsafe fn documentation clarification; could you take a look?

@thomcc
Copy link
Member

thomcc commented Jul 31, 2022

At a glance this seems fine to me and just clarifies what we already documented (I don't think this changes guarantees at all), but I'll leave it to the assigned reviewer.

Co-authored-by: Jubilee <[email protected]>
@RalfJung
Copy link
Member Author

Josh seems to have a big review backlog so I was hoping someone else could take over. :)

@thomcc
Copy link
Member

thomcc commented Jul 31, 2022

Hm, fair enough. I'll review after dinner.

r? @thomcc

@rust-highfive rust-highfive assigned thomcc and unassigned joshtriplett Jul 31, 2022
Copy link
Member

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This just moves things around and/or documents things that already don't work. I left a few notes but don't feel the need to change anything; r=me if you want to land it as-is.

@thomcc
Copy link
Member

thomcc commented Aug 2, 2022

Okay, I'm going to assume if @RalfJung wanted to change anything he would have. Feel free to submit a follow-up if just didn't notice.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Aug 2, 2022

📌 Commit c4aca2b has been approved by thomcc

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 2, 2022
@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2022

@bors r-

I simply haven't gotten around to reading your comments yet.^^

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 2, 2022
@thomcc
Copy link
Member

thomcc commented Aug 3, 2022

Ah! My bad, I'll wait longer next time. Looks good to me.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Aug 3, 2022

📌 Commit da3e11f has been approved by thomcc

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 3, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 3, 2022
Rollup of 5 pull requests

Successful merges:

 - rust-lang#99371 (Remove synchronization from Windows `hashmap_random_keys`)
 - rust-lang#99614 (do not claim that transmute is like memcpy)
 - rust-lang#99738 (rustdoc: avoid inlining modules with duplicate names)
 - rust-lang#99800 (Fix futex module imports on wasm+atomics)
 - rust-lang#100079 (Replace `* -> vec` with `-> vec` in docs)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit cb9932e into rust-lang:master Aug 3, 2022
@rustbot rustbot added this to the 1.64.0 milestone Aug 3, 2022
@RalfJung RalfJung deleted the transmute-is-not-memcpy branch August 3, 2022 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants