Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dest prop: Support removing writes when this unblocks optimizations #105813

Closed
wants to merge 2 commits into from

Conversation

JakobDegen
Copy link
Contributor

The logic this uses is quite simple: When we collect information about writes, we mark some of these writes as "removable." In cases where a removable write would normally invalidate an optimization candidate, we then instead keep the candidate around and mark that particular write as being in need of removing (if we do decide to use the candidate). When we finally merge candidates, we remove all those writes.

Removable writes are Operand::Move (removable by turning into Operand::Copy) and Deinit statements (removable by deleting the statement).

This increases the number of times that the opt fires on a -Zbuild-std build of regex by ~10%.

r? @tmiasko

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 17, 2022
@rustbot
Copy link
Collaborator

rustbot commented Dec 17, 2022

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@tmiasko
Copy link
Contributor

tmiasko commented Dec 17, 2022

Do we have any tests for Deinit statements? Does it require implementing support for projections first?

Could you comment about tradeoffs relatively to implementation along the lines proposed in #105190?

@JakobDegen
Copy link
Contributor Author

Do we have any tests for Deinit statements? Does it require implementing support for projections first?

I think so. It seemed simple enough to include for now, so I've kept it in. I can also add Deinit statements to custom mir pretty soon, which will at least allow us to write a test.

Could you comment about tradeoffs relatively to implementation along the lines proposed in #105190?

I'm not so sure that these are really alternatives. That PR is approximately equivalent to deciding that we don't want to have a distinction between copy/move in MIR. That's a decision we could possibly make, but it has broader implications than just this opt/PR. In the mean time, this seems like a reasonable option for unblocking opts

@tmiasko
Copy link
Contributor

tmiasko commented Dec 18, 2022

Could you comment about tradeoffs relatively to implementation along the lines proposed in #105190?

I'm not so sure that these are really alternatives. That PR is approximately equivalent to deciding that we don't want to have a distinction between copy/move in MIR. That's a decision we could possibly make, but it has broader implications than just this opt/PR. In the mean time, this seems like a reasonable option for unblocking opts

I was thinking about changing move to copy where there is de facto no difference between the two, which is more or less what that pull request is doing. In call terminators when argument is passed indirectly, changing move to copy introduces additional copy. Why would the net result be an improvement in this case?

@saethlin
Copy link
Member

I see commentary about Call terminators, so I'm just sharing some of what I've learned through experimentation.

Consider this perf run: #105190 (comment) which only converts Move to Copy when the type is a pointer or smaller than a pointer. This is a common-sense optimization I would suggest when writing Rust so there should be no way that the regressions here are from making the final codegen worse. But as a MIR transform, it produces a compile time regression because it breaks this optimization in Inline:

// Reuse the operand if it is a moved temporary.
if let Operand::Move(place) = &arg
&& let Some(local) = place.as_local()
&& caller_body.local_kind(local) == LocalKind::Temp
{
return local;
}

Just blindly converting all Operand::Move to Operand::Copy is this perf run: #105190 (comment) and while its regressions are worse than the pointer-sized one, the magnitude of the regressions is only slightly larger.

So I don't know what the effect of this additional copy is, but I'm extremely wary of our ability to assess its importance at the moment, because of how Inline and DestinationPropagation are tangled up without this PR.

@JakobDegen
Copy link
Contributor Author

Why would the net result be an improvement in this case?

Two things to consider here:

  1. Not doing copies on Move operands to calls is likely wrong anyway and we should discuss fixing it at some point. If/when we fix that, this will definitely be desirable.
  2. It may not remain the case forever that non-call moves behave like copies. There is still open discussion about this, but I would expect that to become false at some point.

match op {
Operand::Move(place) => {
if self.should_remove_write_at(place.local, location) {
*op = Operand::Copy(*place);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only valid if place is sized as unsized values can't be copied. I don't think should_remove_write_at can return true for unsized places (maybe there is a case where an unsized place can be re-initialized after a move though?), but it may be a good idea to check anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is a theoretical problem in future versions of dest prop I think. God, I really wish we could just treat unsized values as completely unsupported. I'll add a check.

@tmiasko
Copy link
Contributor

tmiasko commented Dec 22, 2022

Consider this perf run: #105190 (comment) which only converts Move to Copy when the type is a pointer or smaller than a pointer. ... But as a MIR transform, it produces a compile time regression because it breaks this optimization in Inline:

Thanks for bringing this up. Indeed with inlining the difference extends beyond the case when the arguments are passed indirectly. With recent changes that disabled top down inlining, I think you could avoid this negative interaction by running the pass after inlining (or leaving Move operands in call terminators).

Not doing copies on Move operands to calls is likely wrong anyway and we should discuss fixing it at some point. If/when we fix that, this will definitely be desirable.

Wouldn't it be unnecessary at that point? If there is no difference in behavior, Move can be replaced with Copy without cost and the existing implementation works as is. On the other hand, in cases where Move elides a copy, each Move changed into Copy has a cost similar to the benefit of a propagation.

@JakobDegen
Copy link
Contributor Author

If there is no difference in behavior

I don't think this is likely to be what happens. I expect that this particular difference in behavior will go away, but probably others will remain. In other words, this change is useful if we end up in a world where there is a reason to have a distinction between Move and Copy, but eliding copies in calls is not a legal use of that distinction.

In any case, this is speculation about the future. I imagine that today the vast majority of the code this is transforming looks like this:

_5 = _3;
foo(move _5);
// into
foo(_3);

In the case of by-ref parameters, I'd expect this to be neutral in terms of codegen. In the case of by-val parameters, it does lead to better codegen though, modulo LLVM optimizations (I'd hope LLVM figures this out but wouldn't depend on it). To me that means that we should consider the second version the canonical one and thus do the transform.

@bjorn3
Copy link
Member

bjorn3 commented Dec 22, 2022

Not doing copies on Move operands to calls is likely wrong anyway and we should discuss fixing it at some point. If/when we fix that, this will definitely be desirable.

It is unavoidable for unsized arguments (as copying those is impossible) and a perf improvement for sized arguments if there is no prior copy. Honestly it did be nice if either most of those prior copies could be avoided during MIR building or copy would be turned into move after dest prop where possible.

@saethlin
Copy link
Member

A perf improvement for compile time or the generated code? I'm looking for examples of this effect I can inspect.

@bjorn3
Copy link
Member

bjorn3 commented Dec 22, 2022

Both compile time and generated code. It would avoid an unnecessary copy. @pcwalton is currently working on trying to make LLVM elide memcpy calls where possibly, but that comes at a compile time cost AFAIK and I believe it won't eliminate all eliminatable copies as LLVM knows less about the semantics of rust than rustc itself does. And I would expect most memcpy calls only to be omitted when compiling in release mode (or when using Cranelift), so the extra copies have an even bigger runtime impact in those cases.

@saethlin
Copy link
Member

Do you have any examples of the compile time impact that are not the situation I explained above?

Runtime impacts would also be concerning, but I'm not sure we have any benchmarks or examples to even demonstrate the issue. Do we?

We have the additional problem with this PR that we are discussing a MIR opt which is not accessible by rustc-perf. So without even the ability to collect the usual benchmark data I'm not really sure there is anything to base a decision on.

@bjorn3
Copy link
Member

bjorn3 commented Dec 22, 2022

Currently all call arguments are preceded by a copy (except for unsized arguments), so changing a move to a copy operand if that allows removing this copy has exactly zero impact. In fact I think it will codegen to effectively the same LLVM ir. Maybe it would even make sense to only support move operands in function calls? A copy operand is implemented as a copy to a temporary followed by handling this temporary as a move argument anyway. Changing arbitrary move operands to copies when there is no prior copy for dest prop to remove however does have an effect.

@saethlin
Copy link
Member

Ahhhhh that makes sense, thanks.

So it would make the most sense to have a separate pass which converts never-used-again copies to moves before codegen.

@tmiasko
Copy link
Contributor

tmiasko commented Dec 22, 2022

Moving copies directly into the call terminators sounds reasonable as a normal form. I don't quite see why it would be an improvement for the code generation, though. Can you give an example?

In that case we probably should limit the number of move operands changed into copy per propagation to at most one. Consider for example:

pub fn f<T: Copy>(a: T) {
    g(a, a)
}

#[inline(never)]
pub fn g<T: Copy>(a: T, b: T) {}

With changes from this pull request, before the call to g, two copies of _1 argument will be made:

    bb0: {
        _0 = g::<T>(_1, _1) -> bb1;
    }

In contrast, the existing implementation ends up with only one copy of _1 argument:

    bb0: {
        _2 = _1;
        _0 = g::<T>(move _1, move _2) -> bb1
    }

@bors
Copy link
Contributor

bors commented Dec 24, 2022

☔ The latest upstream changes (presumably #106103) made this pull request unmergeable. Please resolve the merge conflicts.

@JakobDegen
Copy link
Contributor Author

It is unavoidable for unsized arguments (as copying those is impossible)

(I've said this before but just to reiterate: I consider this at least as much a problem with our semantics for unsized values as anything else)

Maybe it would even make sense to only support move operands in function calls?

I don't think this is the right strategy. I can expand on this more on Zulip if you like, but (and this was not clear to me until recently either) "implicit copy" semantics are the better semantics for function calls in a high level IR like MIR.

In that case we probably should limit the number of move operands changed into copy per propagation to at most one.

Yeah, I can add that, although this is a bit of an oversimplified heuristic. Doing anything smarter seems really difficult though

@saethlin
Copy link
Member

Do we have any evidence for worse codegen to back up limiting the pass itself?

@tmiasko
Copy link
Contributor

tmiasko commented Dec 30, 2022

Do we have any evidence for worse codegen to back up limiting the pass itself?

If an argument is passed indirectly, then for Copy and Constant operands a temporary copy is made before the call. In contrast, for Move operands a copy is omitted. The implementation is in:

// The callee needs to own the argument memory if we pass it
// by-ref, so make a local copy of non-immediate constants.
match (arg, op.val) {
(&mir::Operand::Copy(_), Ref(_, None, _))
| (&mir::Operand::Constant(_), Ref(_, None, _)) => {
let tmp = PlaceRef::alloca(bx, op.layout);
bx.lifetime_start(tmp.llval, tmp.layout.size);
op.val.store(bx, tmp);
op.val = Ref(tmp.llval, None, tmp.align);
copied_constant_arguments.push(tmp);
}
_ => {}
}

Without any cost-benefit analysis, the cost of transformation might exceeds the benefit like in the example given in #105813 (comment).

type T = [u64; 32];
#[no_mangle]
pub fn f(a: T, b: fn(_: T, _: T)) {
    b(a, a)
}
rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=2 -Zunpretty=mir
fn f(_1: [u64; 32], _2: fn([u64; 32], [u64; 32])) -> () {
    debug a => _1;                       // in scope 0 at a.rs:3:10: 3:11
    debug b => _2;                       // in scope 0 at a.rs:3:16: 3:17
    let mut _0: ();                      // return place in scope 0 at a.rs:3:35: 3:35
    let mut _3: [u64; 32];               // in scope 0 at a.rs:4:7: 4:8

    bb0: {
        StorageLive(_3);                 // scope 0 at a.rs:4:7: 4:8
        _3 = _1;                         // scope 0 at a.rs:4:7: 4:8
        _0 = move _2(move _3, move _1) -> bb1; // scope 0 at a.rs:4:5: 4:12
    }

    bb1: {
        StorageDead(_3);                 // scope 0 at a.rs:4:11: 4:12
        return;                          // scope 0 at a.rs:5:2: 5:2
    }
}
rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=2 --emit llvm-ir
define void @f(ptr noalias nocapture noundef dereferenceable(256) %a, ptr nocapture noundef nonnull readonly %b) unnamed_addr #0 {
start:
  %_3 = alloca [32 x i64], align 8
  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %_3)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(256) %_3, ptr noundef nonnull align 8 dereferenceable(256) %a, i64 256, i1 false)
  call void %b(ptr noalias nocapture noundef nonnull dereferenceable(256) %_3, ptr noalias nocapture noundef nonnull dereferenceable(256) %a)
  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %_3)
  ret void
}
rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=3 -Zunpretty=mir
fn f(_1: [u64; 32], _2: fn([u64; 32], [u64; 32])) -> () {
    debug a => _1;                       // in scope 0 at a.rs:3:10: 3:11
    debug b => _2;                       // in scope 0 at a.rs:3:16: 3:17
    let mut _0: ();                      // return place in scope 0 at a.rs:3:35: 3:35

    bb0: {
        _0 = move _2(_1, _1) -> bb1;     // scope 0 at a.rs:4:5: 4:12
    }

    bb1: {
        return;                          // scope 0 at a.rs:5:2: 5:2
    }
}
rustc --crate-type=lib -O a.rs -Zmir-enable-passes=+DestinationPropagation -Zfuel=a=3 --emit llvm-ir
define void @f(ptr noalias nocapture noundef readonly dereferenceable(256) %a, ptr nocapture noundef nonnull readonly %b) unnamed_addr #0 {
start:
  %0 = alloca [32 x i64], align 8
  %1 = alloca [32 x i64], align 8
  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %1)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(256) %1, ptr noundef nonnull align 8 dereferenceable(256) %a, i64 256, i1 false)
  call void @llvm.lifetime.start.p0(i64 256, ptr nonnull %0)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(256) %0, ptr noundef nonnull align 8 dereferenceable(256) %a, i64 256, i1 false)
  call void %b(ptr noalias nocapture noundef nonnull dereferenceable(256) %1, ptr noalias nocapture noundef nonnull dereferenceable(256) %0)
  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %1)
  call void @llvm.lifetime.end.p0(i64 256, ptr nonnull %0)
  ret void
}

@saethlin
Copy link
Member

Ah okay that makes sense. We regress back to current output, so here DestinationPropagation just fails to improve final codegen 👍

@saethlin
Copy link
Member

saethlin commented Jan 4, 2023

Operand::Move is also used in casts. So even if we can't merge the full version of this until we fix another part of the compiler, I think there is value in enabling it just for Operand::Move that isn't in a call terminator.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 11, 2023
…iasko

Add a regression test for argument copies with DestinationPropagation

This example, as a codegen test: rust-lang#105813 (comment)

r? `@tmiasko`
@cjgillot cjgillot added the A-mir-opt Area: MIR optimizations label Jan 21, 2023
@tmiasko tmiasko added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 21, 2023
for (src, candidates) in candidates.c.iter() {
if merged_locals.contains(*src) {
for (src, candidates) in candidates.c.drain() {
if merged_locals.contains(src) {
continue;
}
let Some(dest) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let Some(dest) =
let Some((dest, writes_to_remove)) =

@JohnCSimon
Copy link
Member

@JakobDegen Ping from triage: Can you post your status on this PR? This has sat idle for a few months.

@JakobDegen
Copy link
Contributor Author

I'm investing my time in Lir right now, so I'm not going to push this further. I'll keep the branch around, if someone else wants to pick it up they're welcome to

@JakobDegen JakobDegen closed this Apr 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants