Worse codegen with `mem::take(vec)` than on stable #103840

clubby789 · 2022-11-01T16:12:32Z

With this code

pub fn foo(t: &mut Vec<usize>) {
    let mut taken = std::mem::take(t);
    taken.pop();
    *t = taken;
}

Stable produces

playground::foo:
	sub	rsp, 24
	movups	xmm0, xmmword ptr [rdi]
	movaps	xmmword ptr [rsp], xmm0
	mov	rax, qword ptr [rdi + 16]
	xor	ecx, ecx
	sub	rax, 1
	cmovae	rcx, rax
	mov	qword ptr [rdi + 16], rcx
	add	rsp, 24
	ret

Whereas beta/nightly produces

playground::foo:
	push	r15
	push	r14
	push	rbx
	mov	rbx, rdi
	mov	r14, qword ptr [rdi + 8]
	mov	r15, qword ptr [rdi + 16]
	xorps	xmm0, xmm0
	movups	xmmword ptr [rdi + 8], xmm0
	mov	rsi, qword ptr [rdi + 8]
	test	rsi, rsi
	je	.LBB0_2
	shl	rsi, 3
	mov	edi, 8
	mov	edx, 8
	call	qword ptr [rip + __rust_dealloc@GOTPCREL]

.LBB0_2:
	xor	eax, eax
	sub	r15, 1
	cmovae	rax, r15
	mov	qword ptr [rbx + 8], r14
	mov	qword ptr [rbx + 16], rax
	pop	rbx
	pop	r14
	pop	r15
	ret

searched nightlies: from nightly-2022-07-02 to nightly-2022-07-03
regressed nightly: nightly-2022-07-03
searched commit range: 46b8c23...f2d9393
regressed commit: 0075bb4

bisected with cargo-bisect-rustc v0.6.4

Host triple: x86_64-unknown-linux-gnu

@rustbot label +regression-from-stable-to-nightly +A-mir-opt-inlining

The text was updated successfully, but these errors were encountered:

nikic · 2022-11-01T17:59:33Z

Godbolt: https://rust.godbolt.org/z/4GTrh1EGx

Result IR can be further optimized by GVN, so this might be addressable on the LLVM side.

nikic · 2022-11-02T16:48:15Z

Looks like this got a bit worse on LLVM main because an additional assume is being preserved: https://llvm.godbolt.org/z/95eMe6j7q

Anyway, there is a phase ordering problem here. MemCpyOpt runs after GVN, and only at that point do we convert the memcpy into a memset, which makes the following load from it easy to fold.

An easy fix would probably be to support memset in InstCombine load store forwarding. But this is no longer going to fix this issue due to the aforementioned assume issue. Ugh.

nikic · 2022-11-03T10:13:50Z

Upstream patch for InstCombine: https://reviews.llvm.org/D137323

An alternative solution would be to move MemCpyOpt prior to GVN, but I'm not sure whether that would cause other issues.

apiraino · 2022-11-03T11:12:45Z

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

nikic · 2022-11-03T15:08:45Z

Upstream patch for SimplifyCFG: https://reviews.llvm.org/D137339

Together these produce the following final IR:

define void @_ZN7example3foo17h9f11ae7042742a8dE(ptr noalias nocapture noundef align 8 dereferenceable(24) %t) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
  %taken.sroa.6.0.t.sroa_idx = getelementptr inbounds i8, ptr %t, i64 8
  %taken.sroa.6.0.copyload5 = load i64, ptr %taken.sroa.6.0.t.sroa_idx, align 8, !alias.scope !2, !noalias !6
  %taken.sroa.7.0.t.sroa_idx = getelementptr inbounds i8, ptr %t, i64 16
  %taken.sroa.7.0.copyload6 = load i64, ptr %taken.sroa.7.0.t.sroa_idx, align 8, !alias.scope !2, !noalias !6
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %taken.sroa.6.0.t.sroa_idx, i8 0, i64 16, i1 false)
  %0 = icmp eq i64 %taken.sroa.7.0.copyload6, 0
  %1 = add i64 %taken.sroa.7.0.copyload6, -1
  %spec.select = select i1 %0, i64 0, i64 %1
  store i64 %taken.sroa.6.0.copyload5, ptr %taken.sroa.6.0.t.sroa_idx, align 8
  store i64 %spec.select, ptr %taken.sroa.7.0.t.sroa_idx, align 8
  ret void
}

Ignoring the opportunity to form a usub.sat, this is optimal.

clubby789 · 2022-12-29T17:00:00Z

Nightly now compiles to

example::foo:
        mov     rax, qword ptr [rdi + 16]
        xor     ecx, ecx
        sub     rax, 1
        cmovae  rcx, rax
        mov     qword ptr [rdi + 16], rcx
        ret

nikic · 2022-12-29T17:15:41Z

Needs codegen test.

clubby789 · 2022-12-29T17:23:22Z

Would just // CHECK-NOT: __rust_dealloc work?

nikic · 2022-12-29T21:12:41Z

Would just // CHECK-NOT: __rust_dealloc work?

Sounds reasonable.

Add codegen test for issue 103840 Closes rust-lang#103840

the8472 · 2023-02-16T22:25:27Z

Reopening because it working on nightly is not really reliable behavior. #106790 and #108106 both change vec field order and in each case it breaks the test.

the8472 · 2023-04-29T19:49:50Z

I'm no longer having issues with the codegen test, LLVM 16 upgrade seems to have made it more reliable.

rustbot added A-mir-opt-inlining Area: MIR inlining regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 1, 2022

nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 1, 2022

rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 3, 2022

nikic self-assigned this Nov 3, 2022

clubby789 closed this as completed Dec 29, 2022

nikic reopened this Dec 29, 2022

nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Dec 29, 2022

nikic removed their assignment Dec 29, 2022

clubby789 mentioned this issue Dec 29, 2022

Add codegen test for issue 103840 #106272

Merged

bors closed this as completed in 23b1cc1 Jan 2, 2023

Aaron1011 pushed a commit to Aaron1011/rust that referenced this issue Jan 6, 2023

Auto merge of rust-lang#106272 - clubby789:codegen-test-103840, r=nikic

3cf246c

Add codegen test for issue 103840 Closes rust-lang#103840

the8472 mentioned this issue Jan 14, 2023

add more niches to rawvec #106790

Merged

the8472 reopened this Feb 16, 2023

JohnTitor removed the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Feb 21, 2023

Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023

the8472 closed this as completed Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worse codegen with `mem::take(vec)` than on stable #103840

Worse codegen with `mem::take(vec)` than on stable #103840

clubby789 commented Nov 1, 2022 •

edited

Loading

nikic commented Nov 1, 2022

nikic commented Nov 2, 2022

nikic commented Nov 3, 2022

apiraino commented Nov 3, 2022

nikic commented Nov 3, 2022

clubby789 commented Dec 29, 2022

nikic commented Dec 29, 2022

clubby789 commented Dec 29, 2022

nikic commented Dec 29, 2022

the8472 commented Feb 16, 2023

the8472 commented Apr 29, 2023

Worse codegen with mem::take(vec) than on stable #103840

Worse codegen with mem::take(vec) than on stable #103840

Comments

clubby789 commented Nov 1, 2022 • edited Loading

nikic commented Nov 1, 2022

nikic commented Nov 2, 2022

nikic commented Nov 3, 2022

apiraino commented Nov 3, 2022

nikic commented Nov 3, 2022

clubby789 commented Dec 29, 2022

nikic commented Dec 29, 2022

clubby789 commented Dec 29, 2022

nikic commented Dec 29, 2022

the8472 commented Feb 16, 2023

the8472 commented Apr 29, 2023

Worse codegen with `mem::take(vec)` than on stable #103840

Worse codegen with `mem::take(vec)` than on stable #103840

clubby789 commented Nov 1, 2022 •

edited

Loading