Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Box::default stack copies in debug mode #136089

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jwong101
Copy link
Contributor

The Box::new(T::default()) implementation of Box::default only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any MaybeUninit<T>'s and just writing
T directly to the Box pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark Box::write as #[inline(always)],
and change it's implementation to to avoid calling MaybeUninit::write
(which creates a MaybeUninit<T> on the stack) and to use ptr::write instead.

Fixes: #136043

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.
@rustbot
Copy link
Collaborator

rustbot commented Jan 26, 2025

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 26, 2025
// extra stack copies of `T` in debug mode.
//
// See https://github.com/rust-lang/rust/issues/136043 for more context.
ptr::write(&raw mut *x as *mut T, T::default());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the stack frame of ptr::write also involve a stack allocation since it takes the argument by value? A more robust solution would be to use copy_nonoverlapping from the local in Box::default directly to the heap.

Copy link
Contributor Author

@jwong101 jwong101 Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the stack frame of ptr::write also involve a stack allocation since it takes the argument by value?

From what I've observed, ptr::write doesn't actually create a new alloca, and essentially just calls memcpy directly on the argument(assuming the argument is passed is big enough and not something like a usize). Godbolt. Let me check how we lower the ptr::write intrinsic to make sure of this though.

A more robust solution would be to use copy_nonoverlapping from the local in Box::default directly to the heap.

Unfortunately, we need to also mem::forget1 the actual value after copying it, or create a ManuallyDrop<T>. The mem::forget copy doesn't appear to be eliminated in debug mode, and ManuallyDrop::<T>::default() call copies T::default to an additional alloca on its stack frame. I created two additional functions in the Godbolt link to showcase this codegen behavior.

Footnotes

  1. Technically we can skip this if needs_drop::<T>() is false.

@saethlin
Copy link
Member

Another option would be to mark Box::write as #[inline(always)]

I advise against this, and I'm glad the PR doesn't do it. We currently inline inline(always) calls even in unoptimized builds, but the attribute is explicitly a hint that can be completely ignored. This inlining in debug builds can have runaway compile time cost.

@jwong101
Copy link
Contributor Author

jwong101 commented Jan 28, 2025

I was looking into where we do the optimization of not creating an alloca in ptr::write's stack frame in debug mode, and I found something weird.

Given the following code:

const NUM: usize = 1000000;

// Thing is not `Copy`, but has no drop glue
pub struct Thing([u8; NUM]);

#[no_mangle]
pub fn src(move_thing: fn(Thing), capture_arg: fn(&Thing), maybe_use_y: fn(), y: Thing) {
    capture_arg(&y);
    move_thing(y);
    maybe_use_y();
}

There's no alloca created in debug mode:

define void @src(ptr %move_thing, ptr %capture_arg, ptr %maybe_use_y, ptr align 1 %y) unnamed_addr {
start:
  %maybe_use_y.dbg.spill = alloca [8 x i8], align 8
  %capture_arg.dbg.spill = alloca [8 x i8], align 8
  %move_thing.dbg.spill = alloca [8 x i8], align 8
  store ptr %move_thing, ptr %move_thing.dbg.spill, align 8
  store ptr %capture_arg, ptr %capture_arg.dbg.spill, align 8
  store ptr %maybe_use_y, ptr %maybe_use_y.dbg.spill, align 8
  call void %capture_arg(ptr align 1 %y)
  call void %move_thing(ptr align 1 %y)
  call void %maybe_use_y()
  ret void
}

However, there is an alloca in Release mode:

define void @src(ptr nocapture noundef nonnull readonly %move_thing, ptr nocapture noundef nonnull readonly %capture_arg, ptr nocapture noundef nonnull readonly %maybe_use_y, ptr noalias nocapture noundef readonly align 1 dereferenceable(1000000) %y) unnamed_addr {
start:
  %0 = alloca [1000000 x i8], align 1
  tail call void %capture_arg(ptr noalias noundef nonnull readonly align 1 dereferenceable(1000000) %y)
  call void @llvm.lifetime.start.p0(i64 1000000, ptr nonnull %0)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(1000000) %0, ptr noundef nonnull align 1 dereferenceable(1000000) %y, i64 1000000, i1 false)
  call void %move_thing(ptr noalias nocapture noundef nonnull align 1 dereferenceable(1000000) %0)
  call void @llvm.lifetime.end.p0(i64 1000000, ptr nonnull %0)
  tail call void %maybe_use_y()
  ret void
}

I haven't been keeping up with Rust's opsem, however, the last time I checked, the legality1 of this was unresolved as per rust-lang/unsafe-code-guidelines#416 and rust-lang/unsafe-code-guidelines#188.

cc @saethlin since you're already in this thread and I think this is in your area of expertise

Footnotes

  1. Personally, I think that moving out of a local/argument should invalidate all previous borrows, unless there's a bunch of unsafe code that this breaks.

@saethlin
Copy link
Member

I don't follow your reasoning. You're looking at LLVM IR and trying to reason about MIR semantics, and that is not going to go well. The relevant MIR optimization that changes how arguments are passed is GVN, which turns a Move operand into a Copy operand: https://godbolt.org/z/sEWz399Mj

I also don't think this discussion belongs on this PR, if you want to ask opsem questions you should go to the Zulip.

@safinaskar
Copy link
Contributor

@rustbot label A-box

@rustbot rustbot added the A-box Area: Our favorite opsem complication label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-box Area: Our favorite opsem complication S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash writing PNG from a C++ background thread on macOS Rust 1.84.0
5 participants