Dest prop: Support removing writes when this unblocks optimizations #105813

JakobDegen · 2022-12-17T02:24:24Z

The logic this uses is quite simple: When we collect information about writes, we mark some of these writes as "removable." In cases where a removable write would normally invalidate an optimization candidate, we then instead keep the candidate around and mark that particular write as being in need of removing (if we do decide to use the candidate). When we finally merge candidates, we remove all those writes.

Removable writes are Operand::Move (removable by turning into Operand::Copy) and Deinit statements (removable by deleting the statement).

This increases the number of times that the opt fires on a -Zbuild-std build of regex by ~10%.

r? @tmiasko

rustbot · 2022-12-17T02:24:31Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

compiler/rustc_mir_transform/src/dest_prop.rs

tmiasko · 2022-12-17T23:39:42Z

Do we have any tests for Deinit statements? Does it require implementing support for projections first?

Could you comment about tradeoffs relatively to implementation along the lines proposed in #105190?

…merger

JakobDegen · 2022-12-18T00:52:25Z

Do we have any tests for Deinit statements? Does it require implementing support for projections first?

I think so. It seemed simple enough to include for now, so I've kept it in. I can also add Deinit statements to custom mir pretty soon, which will at least allow us to write a test.

Could you comment about tradeoffs relatively to implementation along the lines proposed in #105190?

I'm not so sure that these are really alternatives. That PR is approximately equivalent to deciding that we don't want to have a distinction between copy/move in MIR. That's a decision we could possibly make, but it has broader implications than just this opt/PR. In the mean time, this seems like a reasonable option for unblocking opts

tmiasko · 2022-12-18T14:37:00Z

Could you comment about tradeoffs relatively to implementation along the lines proposed in #105190?

I'm not so sure that these are really alternatives. That PR is approximately equivalent to deciding that we don't want to have a distinction between copy/move in MIR. That's a decision we could possibly make, but it has broader implications than just this opt/PR. In the mean time, this seems like a reasonable option for unblocking opts

I was thinking about changing move to copy where there is de facto no difference between the two, which is more or less what that pull request is doing. In call terminators when argument is passed indirectly, changing move to copy introduces additional copy. Why would the net result be an improvement in this case?

saethlin · 2022-12-18T18:48:23Z

I see commentary about Call terminators, so I'm just sharing some of what I've learned through experimentation.

Consider this perf run: #105190 (comment) which only converts Move to Copy when the type is a pointer or smaller than a pointer. This is a common-sense optimization I would suggest when writing Rust so there should be no way that the regressions here are from making the final codegen worse. But as a MIR transform, it produces a compile time regression because it breaks this optimization in Inline:

rust/compiler/rustc_mir_transform/src/inline.rs

Lines 683 to 689 in 35a99ee

    
           // Reuse the operand if it is a moved temporary. 
        
           if let Operand::Move(place) = &arg 
        
               && let Some(local) = place.as_local() 
        
               && caller_body.local_kind(local) == LocalKind::Temp 
        
           { 
        
               return local; 
        
           }

Just blindly converting all Operand::Move to Operand::Copy is this perf run: #105190 (comment) and while its regressions are worse than the pointer-sized one, the magnitude of the regressions is only slightly larger.

So I don't know what the effect of this additional copy is, but I'm extremely wary of our ability to assess its importance at the moment, because of how Inline and DestinationPropagation are tangled up without this PR.

JakobDegen · 2022-12-20T00:46:08Z

Why would the net result be an improvement in this case?

Two things to consider here:

Not doing copies on Move operands to calls is likely wrong anyway and we should discuss fixing it at some point. If/when we fix that, this will definitely be desirable.
It may not remain the case forever that non-call moves behave like copies. There is still open discussion about this, but I would expect that to become false at some point.

bjorn3 · 2022-12-20T10:24:51Z

compiler/rustc_mir_transform/src/dest_prop.rs

+        match op {
+            Operand::Move(place) => {
+                if self.should_remove_write_at(place.local, location) {
+                    *op = Operand::Copy(*place);


This is only valid if place is sized as unsized values can't be copied. I don't think should_remove_write_at can return true for unsized places (maybe there is a case where an unsized place can be re-initialized after a move though?), but it may be a good idea to check anyway.

Hm, this is a theoretical problem in future versions of dest prop I think. God, I really wish we could just treat unsized values as completely unsupported. I'll add a check.

tmiasko · 2022-12-22T01:00:06Z

Consider this perf run: #105190 (comment) which only converts Move to Copy when the type is a pointer or smaller than a pointer. ... But as a MIR transform, it produces a compile time regression because it breaks this optimization in Inline:

Thanks for bringing this up. Indeed with inlining the difference extends beyond the case when the arguments are passed indirectly. With recent changes that disabled top down inlining, I think you could avoid this negative interaction by running the pass after inlining (or leaving Move operands in call terminators).

Not doing copies on Move operands to calls is likely wrong anyway and we should discuss fixing it at some point. If/when we fix that, this will definitely be desirable.

Wouldn't it be unnecessary at that point? If there is no difference in behavior, Move can be replaced with Copy without cost and the existing implementation works as is. On the other hand, in cases where Move elides a copy, each Move changed into Copy has a cost similar to the benefit of a propagation.

JakobDegen · 2022-12-22T01:17:47Z

If there is no difference in behavior

I don't think this is likely to be what happens. I expect that this particular difference in behavior will go away, but probably others will remain. In other words, this change is useful if we end up in a world where there is a reason to have a distinction between Move and Copy, but eliding copies in calls is not a legal use of that distinction.

In any case, this is speculation about the future. I imagine that today the vast majority of the code this is transforming looks like this:

_5 = _3;
foo(move _5);
// into
foo(_3);

In the case of by-ref parameters, I'd expect this to be neutral in terms of codegen. In the case of by-val parameters, it does lead to better codegen though, modulo LLVM optimizations (I'd hope LLVM figures this out but wouldn't depend on it). To me that means that we should consider the second version the canonical one and thus do the transform.

bjorn3 · 2022-12-22T10:55:17Z

Not doing copies on Move operands to calls is likely wrong anyway and we should discuss fixing it at some point. If/when we fix that, this will definitely be desirable.

It is unavoidable for unsized arguments (as copying those is impossible) and a perf improvement for sized arguments if there is no prior copy. Honestly it did be nice if either most of those prior copies could be avoided during MIR building or copy would be turned into move after dest prop where possible.

saethlin · 2022-12-22T14:19:54Z

A perf improvement for compile time or the generated code? I'm looking for examples of this effect I can inspect.

bjorn3 · 2022-12-22T14:26:13Z

Both compile time and generated code. It would avoid an unnecessary copy. @pcwalton is currently working on trying to make LLVM elide memcpy calls where possibly, but that comes at a compile time cost AFAIK and I believe it won't eliminate all eliminatable copies as LLVM knows less about the semantics of rust than rustc itself does. And I would expect most memcpy calls only to be omitted when compiling in release mode (or when using Cranelift), so the extra copies have an even bigger runtime impact in those cases.

saethlin · 2022-12-22T14:48:25Z

Do you have any examples of the compile time impact that are not the situation I explained above?

Runtime impacts would also be concerning, but I'm not sure we have any benchmarks or examples to even demonstrate the issue. Do we?

We have the additional problem with this PR that we are discussing a MIR opt which is not accessible by rustc-perf. So without even the ability to collect the usual benchmark data I'm not really sure there is anything to base a decision on.

bjorn3 · 2022-12-22T15:15:00Z

Currently all call arguments are preceded by a copy (except for unsized arguments), so changing a move to a copy operand if that allows removing this copy has exactly zero impact. In fact I think it will codegen to effectively the same LLVM ir. Maybe it would even make sense to only support move operands in function calls? A copy operand is implemented as a copy to a temporary followed by handling this temporary as a move argument anyway. Changing arbitrary move operands to copies when there is no prior copy for dest prop to remove however does have an effect.

saethlin · 2022-12-22T16:03:07Z

Ahhhhh that makes sense, thanks.

So it would make the most sense to have a separate pass which converts never-used-again copies to moves before codegen.

tmiasko · 2022-12-22T19:32:54Z

Moving copies directly into the call terminators sounds reasonable as a normal form. I don't quite see why it would be an improvement for the code generation, though. Can you give an example?

In that case we probably should limit the number of move operands changed into copy per propagation to at most one. Consider for example:

pub fn f<T: Copy>(a: T) {
    g(a, a)
}

#[inline(never)]
pub fn g<T: Copy>(a: T, b: T) {}

With changes from this pull request, before the call to g, two copies of _1 argument will be made:

    bb0: {
        _0 = g::<T>(_1, _1) -> bb1;
    }

In contrast, the existing implementation ends up with only one copy of _1 argument:

    bb0: {
        _2 = _1;
        _0 = g::<T>(move _1, move _2) -> bb1
    }

bors · 2022-12-24T05:50:47Z

☔ The latest upstream changes (presumably #106103) made this pull request unmergeable. Please resolve the merge conflicts.

JakobDegen · 2022-12-25T02:56:15Z

It is unavoidable for unsized arguments (as copying those is impossible)

(I've said this before but just to reiterate: I consider this at least as much a problem with our semantics for unsized values as anything else)

Maybe it would even make sense to only support move operands in function calls?

I don't think this is the right strategy. I can expand on this more on Zulip if you like, but (and this was not clear to me until recently either) "implicit copy" semantics are the better semantics for function calls in a high level IR like MIR.

In that case we probably should limit the number of move operands changed into copy per propagation to at most one.

Yeah, I can add that, although this is a bit of an oversimplified heuristic. Doing anything smarter seems really difficult though

saethlin · 2022-12-25T03:16:20Z

Do we have any evidence for worse codegen to back up limiting the pass itself?

tmiasko · 2022-12-30T14:29:14Z

Do we have any evidence for worse codegen to back up limiting the pass itself?

If an argument is passed indirectly, then for Copy and Constant operands a temporary copy is made before the call. In contrast, for Move operands a copy is omitted. The implementation is in:

rust/compiler/rustc_codegen_ssa/src/mir/block.rs

Lines 1024 to 1036 in f6cc345

    
           // The callee needs to own the argument memory if we pass it 
        
           // by-ref, so make a local copy of non-immediate constants. 
        
           match (arg, op.val) { 
        
               (&mir::Operand::Copy(_), Ref(_, None, _)) 
        
               | (&mir::Operand::Constant(_), Ref(_, None, _)) => { 
        
                   let tmp = PlaceRef::alloca(bx, op.layout); 
        
                   bx.lifetime_start(tmp.llval, tmp.layout.size); 
        
                   op.val.store(bx, tmp); 
        
                   op.val = Ref(tmp.llval, None, tmp.align); 
        
                   copied_constant_arguments.push(tmp); 
        
               } 
        
               _ => {} 
        
           }

Without any cost-benefit analysis, the cost of transformation might exceeds the benefit like in the example given in #105813 (comment).

type T = [u64; 32];
#[no_mangle]
pub fn f(a: T, b: fn(_: T, _: T)) {
    b(a, a)
}

rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=2 -Zunpretty=mir

fn f(_1: [u64; 32], _2: fn([u64; 32], [u64; 32])) -> () {
    debug a => _1;                       // in scope 0 at a.rs:3:10: 3:11
    debug b => _2;                       // in scope 0 at a.rs:3:16: 3:17
    let mut _0: ();                      // return place in scope 0 at a.rs:3:35: 3:35
    let mut _3: [u64; 32];               // in scope 0 at a.rs:4:7: 4:8

    bb0: {
        StorageLive(_3);                 // scope 0 at a.rs:4:7: 4:8
        _3 = _1;                         // scope 0 at a.rs:4:7: 4:8
        _0 = move _2(move _3, move _1) -> bb1; // scope 0 at a.rs:4:5: 4:12
    }

    bb1: {
        StorageDead(_3);                 // scope 0 at a.rs:4:11: 4:12
        return;                          // scope 0 at a.rs:5:2: 5:2
    }
}

rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=2 --emit llvm-ir

define void @f(ptr noalias nocapture noundef dereferenceable(256) %a, ptr nocapture noundef nonnull readonly %b) unnamed_addr #0 {
start:
  %_3 = alloca [32 x i64], align 8
  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %_3)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(256) %_3, ptr noundef nonnull align 8 dereferenceable(256) %a, i64 256, i1 false)
  call void %b(ptr noalias nocapture noundef nonnull dereferenceable(256) %_3, ptr noalias nocapture noundef nonnull dereferenceable(256) %a)
  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %_3)
  ret void
}

rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=3 -Zunpretty=mir

fn f(_1: [u64; 32], _2: fn([u64; 32], [u64; 32])) -> () {
    debug a => _1;                       // in scope 0 at a.rs:3:10: 3:11
    debug b => _2;                       // in scope 0 at a.rs:3:16: 3:17
    let mut _0: ();                      // return place in scope 0 at a.rs:3:35: 3:35

    bb0: {
        _0 = move _2(_1, _1) -> bb1;     // scope 0 at a.rs:4:5: 4:12
    }

    bb1: {
        return;                          // scope 0 at a.rs:5:2: 5:2
    }
}

rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=3 --emit llvm-ir

define void @f(ptr noalias nocapture noundef readonly dereferenceable(256) %a, ptr nocapture noundef nonnull readonly %b) unnamed_addr #0 {
start:
  %0 = alloca [32 x i64], align 8
  %1 = alloca [32 x i64], align 8
  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %1)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(256) %1, ptr noundef nonnull align 8 dereferenceable(256) %a, i64 256, i1 false)
  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %0)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(256) %0, ptr noundef nonnull align 8 dereferenceable(256) %a, i64 256, i1 false)
  call void %b(ptr noalias nocapture noundef nonnull dereferenceable(256) %1, ptr noalias nocapture noundef nonnull dereferenceable(256) %0)
  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %1)
  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %0)
  ret void
}

saethlin · 2022-12-30T17:33:00Z

Ah okay that makes sense. We regress back to current output, so here DestinationPropagation just fails to improve final codegen 👍

saethlin · 2023-01-04T21:06:08Z

Operand::Move is also used in casts. So even if we can't merge the full version of this until we fix another part of the compiler, I think there is value in enabling it just for Operand::Move that isn't in a call terminator.

…iasko Add a regression test for argument copies with DestinationPropagation This example, as a codegen test: rust-lang#105813 (comment) r? `@tmiasko`

cjgillot · 2023-01-21T10:59:53Z

compiler/rustc_mir_transform/src/dest_prop.rs

-            for (src, candidates) in candidates.c.iter() {
-                if merged_locals.contains(*src) {
+            for (src, candidates) in candidates.c.drain() {
+                if merged_locals.contains(src) {
                    continue;
                }
                let Some(dest) =


Suggested change

let Some(dest) =

let Some((dest, writes_to_remove)) =

JohnCSimon · 2023-04-30T03:59:51Z

@JakobDegen Ping from triage: Can you post your status on this PR? This has sat idle for a few months.

JakobDegen · 2023-04-30T05:07:13Z

I'm investing my time in Lir right now, so I'm not going to push this further. I'll keep the branch around, if someone else wants to pick it up they're welcome to

rustbot assigned tmiasko Dec 17, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 17, 2022

JakobDegen force-pushed the dest-prop-move branch from 209b0b9 to efbf7f6 Compare December 17, 2022 03:45

Dest prop: Support removing writes when this unblocks optimizations

984afb2

JakobDegen force-pushed the dest-prop-move branch from efbf7f6 to 984afb2 Compare December 17, 2022 04:07

tmiasko reviewed Dec 17, 2022

View reviewed changes

compiler/rustc_mir_transform/src/dest_prop.rs Outdated Show resolved Hide resolved

Dest prop: Separate writes to remove into separate data structure in …

b51ac9a

…merger

bjorn3 reviewed Dec 20, 2022

View reviewed changes

saethlin mentioned this pull request Jan 10, 2023

Add a regression test for argument copies with DestinationPropagation #106660

Merged

cjgillot added the A-mir-opt Area: MIR optimizations label Jan 21, 2023

tmiasko added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 21, 2023

cjgillot reviewed Feb 3, 2023

View reviewed changes

This was referenced Feb 11, 2023

Enable CopyProp #107449

Merged

Improve codegen of large Copy parameters under CopyProp to match DestinationPropagation #108068

Closed

JakobDegen closed this Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dest prop: Support removing writes when this unblocks optimizations #105813

Dest prop: Support removing writes when this unblocks optimizations #105813

JakobDegen commented Dec 17, 2022

rustbot commented Dec 17, 2022

tmiasko commented Dec 17, 2022

JakobDegen commented Dec 18, 2022

tmiasko commented Dec 18, 2022 •

edited

Loading

saethlin commented Dec 18, 2022

JakobDegen commented Dec 20, 2022

bjorn3 Dec 20, 2022

JakobDegen Dec 20, 2022

tmiasko commented Dec 22, 2022

JakobDegen commented Dec 22, 2022

bjorn3 commented Dec 22, 2022

saethlin commented Dec 22, 2022

bjorn3 commented Dec 22, 2022

saethlin commented Dec 22, 2022

bjorn3 commented Dec 22, 2022 •

edited

Loading

saethlin commented Dec 22, 2022

tmiasko commented Dec 22, 2022

bors commented Dec 24, 2022

JakobDegen commented Dec 25, 2022

saethlin commented Dec 25, 2022

tmiasko commented Dec 30, 2022

saethlin commented Dec 30, 2022

saethlin commented Jan 4, 2023

cjgillot Jan 21, 2023

JohnCSimon commented Apr 30, 2023

JakobDegen commented Apr 30, 2023

Dest prop: Support removing writes when this unblocks optimizations #105813

Dest prop: Support removing writes when this unblocks optimizations #105813

Conversation

JakobDegen commented Dec 17, 2022

rustbot commented Dec 17, 2022

tmiasko commented Dec 17, 2022

JakobDegen commented Dec 18, 2022

tmiasko commented Dec 18, 2022 • edited Loading

saethlin commented Dec 18, 2022

JakobDegen commented Dec 20, 2022

bjorn3 Dec 20, 2022

Choose a reason for hiding this comment

JakobDegen Dec 20, 2022

Choose a reason for hiding this comment

tmiasko commented Dec 22, 2022

JakobDegen commented Dec 22, 2022

bjorn3 commented Dec 22, 2022

saethlin commented Dec 22, 2022

bjorn3 commented Dec 22, 2022

saethlin commented Dec 22, 2022

bjorn3 commented Dec 22, 2022 • edited Loading

saethlin commented Dec 22, 2022

tmiasko commented Dec 22, 2022

bors commented Dec 24, 2022

JakobDegen commented Dec 25, 2022

saethlin commented Dec 25, 2022

tmiasko commented Dec 30, 2022

saethlin commented Dec 30, 2022

saethlin commented Jan 4, 2023

cjgillot Jan 21, 2023

Choose a reason for hiding this comment

JohnCSimon commented Apr 30, 2023

JakobDegen commented Apr 30, 2023

tmiasko commented Dec 18, 2022 •

edited

Loading

bjorn3 commented Dec 22, 2022 •

edited

Loading