Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worse codegen with mem::take(vec) than on stable #103840

Closed
clubby789 opened this issue Nov 1, 2022 · 11 comments · Fixed by #106272
Closed

Worse codegen with mem::take(vec) than on stable #103840

clubby789 opened this issue Nov 1, 2022 · 11 comments · Fixed by #106272
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt-inlining Area: MIR inlining I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@clubby789
Copy link
Contributor

clubby789 commented Nov 1, 2022

With this code

pub fn foo(t: &mut Vec<usize>) {
    let mut taken = std::mem::take(t);
    taken.pop();
    *t = taken;
}

Stable produces

playground::foo:
	sub	rsp, 24
	movups	xmm0, xmmword ptr [rdi]
	movaps	xmmword ptr [rsp], xmm0
	mov	rax, qword ptr [rdi + 16]
	xor	ecx, ecx
	sub	rax, 1
	cmovae	rcx, rax
	mov	qword ptr [rdi + 16], rcx
	add	rsp, 24
	ret

Whereas beta/nightly produces

playground::foo:
	push	r15
	push	r14
	push	rbx
	mov	rbx, rdi
	mov	r14, qword ptr [rdi + 8]
	mov	r15, qword ptr [rdi + 16]
	xorps	xmm0, xmm0
	movups	xmmword ptr [rdi + 8], xmm0
	mov	rsi, qword ptr [rdi + 8]
	test	rsi, rsi
	je	.LBB0_2
	shl	rsi, 3
	mov	edi, 8
	mov	edx, 8
	call	qword ptr [rip + __rust_dealloc@GOTPCREL]

.LBB0_2:
	xor	eax, eax
	sub	r15, 1
	cmovae	rax, r15
	mov	qword ptr [rbx + 8], r14
	mov	qword ptr [rbx + 16], rax
	pop	rbx
	pop	r14
	pop	r15
	ret

searched nightlies: from nightly-2022-07-02 to nightly-2022-07-03
regressed nightly: nightly-2022-07-03
searched commit range: 46b8c23...f2d9393
regressed commit: 0075bb4

bisected with cargo-bisect-rustc v0.6.4

Host triple: x86_64-unknown-linux-gnu

@rustbot label +regression-from-stable-to-nightly +A-mir-opt-inlining
@rustbot rustbot added A-mir-opt-inlining Area: MIR inlining regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 1, 2022
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 1, 2022
@nikic
Copy link
Contributor

nikic commented Nov 1, 2022

Godbolt: https://rust.godbolt.org/z/4GTrh1EGx

Result IR can be further optimized by GVN, so this might be addressable on the LLVM side.

@nikic
Copy link
Contributor

nikic commented Nov 2, 2022

Looks like this got a bit worse on LLVM main because an additional assume is being preserved: https://llvm.godbolt.org/z/95eMe6j7q

Anyway, there is a phase ordering problem here. MemCpyOpt runs after GVN, and only at that point do we convert the memcpy into a memset, which makes the following load from it easy to fold.

An easy fix would probably be to support memset in InstCombine load store forwarding. But this is no longer going to fix this issue due to the aforementioned assume issue. Ugh.

@nikic
Copy link
Contributor

nikic commented Nov 3, 2022

Upstream patch for InstCombine: https://reviews.llvm.org/D137323

An alternative solution would be to move MemCpyOpt prior to GVN, but I'm not sure whether that would cause other issues.

@apiraino
Copy link
Contributor

apiraino commented Nov 3, 2022

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 3, 2022
@nikic
Copy link
Contributor

nikic commented Nov 3, 2022

Upstream patch for SimplifyCFG: https://reviews.llvm.org/D137339

Together these produce the following final IR:

define void @_ZN7example3foo17h9f11ae7042742a8dE(ptr noalias nocapture noundef align 8 dereferenceable(24) %t) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
  %taken.sroa.6.0.t.sroa_idx = getelementptr inbounds i8, ptr %t, i64 8
  %taken.sroa.6.0.copyload5 = load i64, ptr %taken.sroa.6.0.t.sroa_idx, align 8, !alias.scope !2, !noalias !6
  %taken.sroa.7.0.t.sroa_idx = getelementptr inbounds i8, ptr %t, i64 16
  %taken.sroa.7.0.copyload6 = load i64, ptr %taken.sroa.7.0.t.sroa_idx, align 8, !alias.scope !2, !noalias !6
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %taken.sroa.6.0.t.sroa_idx, i8 0, i64 16, i1 false)
  %0 = icmp eq i64 %taken.sroa.7.0.copyload6, 0
  %1 = add i64 %taken.sroa.7.0.copyload6, -1
  %spec.select = select i1 %0, i64 0, i64 %1
  store i64 %taken.sroa.6.0.copyload5, ptr %taken.sroa.6.0.t.sroa_idx, align 8
  store i64 %spec.select, ptr %taken.sroa.7.0.t.sroa_idx, align 8
  ret void
}

Ignoring the opportunity to form a usub.sat, this is optimal.

@nikic nikic self-assigned this Nov 3, 2022
@clubby789
Copy link
Contributor Author

Nightly now compiles to

example::foo:
        mov     rax, qword ptr [rdi + 16]
        xor     ecx, ecx
        sub     rax, 1
        cmovae  rcx, rax
        mov     qword ptr [rdi + 16], rcx
        ret

@nikic nikic reopened this Dec 29, 2022
@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Dec 29, 2022
@nikic nikic removed their assignment Dec 29, 2022
@nikic
Copy link
Contributor

nikic commented Dec 29, 2022

Needs codegen test.

@clubby789
Copy link
Contributor Author

Would just // CHECK-NOT: __rust_dealloc work?

@nikic
Copy link
Contributor

nikic commented Dec 29, 2022

Would just // CHECK-NOT: __rust_dealloc work?

Sounds reasonable.

@bors bors closed this as completed in 23b1cc1 Jan 2, 2023
Aaron1011 pushed a commit to Aaron1011/rust that referenced this issue Jan 6, 2023
@the8472 the8472 reopened this Feb 16, 2023
@the8472
Copy link
Member

the8472 commented Feb 16, 2023

Reopening because it working on nightly is not really reliable behavior. #106790 and #108106 both change vec field order and in each case it breaks the test.

@JohnTitor JohnTitor removed the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Feb 21, 2023
@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@the8472
Copy link
Member

the8472 commented Apr 29, 2023

I'm no longer having issues with the codegen test, LLVM 16 upgrade seems to have made it more reliable.

@the8472 the8472 closed this as completed Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt-inlining Area: MIR inlining I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants