-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Box::default
stack copies in debug mode
#136089
base: master
Are you sure you want to change the base?
Conversation
The `Box::new(T::default())` implementation of `Box::default` only had two stack copies in debug mode, compared to the current version, which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing `T` directly to the `Box` pointer, the stack usage in debug mode remains the same as the old version.
// extra stack copies of `T` in debug mode. | ||
// | ||
// See https://github.com/rust-lang/rust/issues/136043 for more context. | ||
ptr::write(&raw mut *x as *mut T, T::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the stack frame of ptr::write
also involve a stack allocation since it takes the argument by value? A more robust solution would be to use copy_nonoverlapping
from the local in Box::default
directly to the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the stack frame of
ptr::write
also involve a stack allocation since it takes the argument by value?
From what I've observed, ptr::write
doesn't actually create a new alloca, and essentially just calls memcpy
directly on the argument(assuming the argument is passed is big enough and not something like a usize). Godbolt. Let me check how we lower the ptr::write
intrinsic to make sure of this though.
A more robust solution would be to use copy_nonoverlapping from the local in Box::default directly to the heap.
Unfortunately, we need to also mem::forget
1 the actual value after copying it, or create a ManuallyDrop<T>
. The mem::forget
copy doesn't appear to be eliminated in debug mode, and ManuallyDrop::<T>::default()
call copies T::default
to an additional alloca on its stack frame. I created two additional functions in the Godbolt link to showcase this codegen behavior.
Footnotes
-
Technically we can skip this if
needs_drop::<T>()
is false. ↩
I advise against this, and I'm glad the PR doesn't do it. We currently inline |
I was looking into where we do the optimization of not creating an alloca in Given the following code: const NUM: usize = 1000000;
// Thing is not `Copy`, but has no drop glue
pub struct Thing([u8; NUM]);
#[no_mangle]
pub fn src(move_thing: fn(Thing), capture_arg: fn(&Thing), maybe_use_y: fn(), y: Thing) {
capture_arg(&y);
move_thing(y);
maybe_use_y();
} There's no alloca created in debug mode: define void @src(ptr %move_thing, ptr %capture_arg, ptr %maybe_use_y, ptr align 1 %y) unnamed_addr {
start:
%maybe_use_y.dbg.spill = alloca [8 x i8], align 8
%capture_arg.dbg.spill = alloca [8 x i8], align 8
%move_thing.dbg.spill = alloca [8 x i8], align 8
store ptr %move_thing, ptr %move_thing.dbg.spill, align 8
store ptr %capture_arg, ptr %capture_arg.dbg.spill, align 8
store ptr %maybe_use_y, ptr %maybe_use_y.dbg.spill, align 8
call void %capture_arg(ptr align 1 %y)
call void %move_thing(ptr align 1 %y)
call void %maybe_use_y()
ret void
} However, there is an alloca in Release mode: define void @src(ptr nocapture noundef nonnull readonly %move_thing, ptr nocapture noundef nonnull readonly %capture_arg, ptr nocapture noundef nonnull readonly %maybe_use_y, ptr noalias nocapture noundef readonly align 1 dereferenceable(1000000) %y) unnamed_addr {
start:
%0 = alloca [1000000 x i8], align 1
tail call void %capture_arg(ptr noalias noundef nonnull readonly align 1 dereferenceable(1000000) %y)
call void @llvm.lifetime.start.p0(i64 1000000, ptr nonnull %0)
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(1000000) %0, ptr noundef nonnull align 1 dereferenceable(1000000) %y, i64 1000000, i1 false)
call void %move_thing(ptr noalias nocapture noundef nonnull align 1 dereferenceable(1000000) %0)
call void @llvm.lifetime.end.p0(i64 1000000, ptr nonnull %0)
tail call void %maybe_use_y()
ret void
} I haven't been keeping up with Rust's opsem, however, the last time I checked, the legality1 of this was unresolved as per rust-lang/unsafe-code-guidelines#416 and rust-lang/unsafe-code-guidelines#188. cc @saethlin since you're already in this thread and I think this is in your area of expertise Footnotes
|
I don't follow your reasoning. You're looking at LLVM IR and trying to reason about MIR semantics, and that is not going to go well. The relevant MIR optimization that changes how arguments are passed is GVN, which turns a Move operand into a Copy operand: https://godbolt.org/z/sEWz399Mj I also don't think this discussion belongs on this PR, if you want to ask opsem questions you should go to the Zulip. |
@rustbot label A-box |
The
Box::new(T::default())
implementation ofBox::default
onlyhad two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any
MaybeUninit<T>
's and just writingT
directly to theBox
pointer, the stack usage in debug mode remainsthe same as the old version.
Another option would be to mark
Box::write
as#[inline(always)]
,and change it's implementation to to avoid calling
MaybeUninit::write
(which creates a
MaybeUninit<T>
on the stack) and to useptr::write
instead.Fixes: #136043