-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MachineCopyPropagation] When the source of PreviousCopy is undef, we cannot replace sub register #74682
Conversation
@llvm/pr-subscribers-llvm-regalloc Author: DianQK (DianQK) ChangesFixes #74680. Full diff: https://github.com/llvm/llvm-project/pull/74682.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index a032b31a1fc7c..fdbdaa0a6d641 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -443,6 +443,9 @@ static bool isNopCopy(const MachineInstr &PreviousCopy, MCRegister Src,
return true;
if (!TRI->isSubRegister(PreviousSrc, Src))
return false;
+ // When the source of PreviousCopy is undef, we cannot replace sub register.
+ if (CopyOperands->Source->isUndef())
+ return false;
unsigned SubIdx = TRI->getSubRegIndex(PreviousSrc, Src);
return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
}
diff --git a/llvm/test/CodeGen/AArch64/machine-cp-undef.mir b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
new file mode 100644
index 0000000000000..015b01fcc30b0
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
@@ -0,0 +1,145 @@
+# RUN: llc -o - %s -O3 -mcpu=apple-m1 -mtriple=arm64-apple-macos | FileCheck %s
+
+--- |
+ target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+ target triple = "arm64-apple-macosx11.0.0"
+
+ ; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+ define fastcc noundef zeroext i1 @foo(i32 %self.0.val, i8 %self.4.val) unnamed_addr #0 {
+ start:
+ switch i32 %self.0.val, label %bb2 [
+ i32 0, label %bb9
+ i32 1, label %bb4
+ i32 2, label %bb5
+ i32 3, label %bb1
+ ]
+
+ bb2: ; preds = %start
+ unreachable
+
+ bb4: ; preds = %start
+ br label %bb9
+
+ bb5: ; preds = %start
+ br label %bb9
+
+ bb1: ; preds = %start
+ %trunc.not = icmp eq i8 %self.4.val, 0
+ br label %bb9
+
+ bb9: ; preds = %bb1, %bb5, %bb4, %start
+ %_0.0.off0 = phi i1 [ true, %bb5 ], [ true, %bb4 ], [ false, %start ], [ %trunc.not, %bb1 ]
+ ret i1 %_0.0.off0
+ }
+
+ attributes #0 = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) "frame-pointer"="non-leaf" "target-cpu"="apple-m1" }
+
+ !llvm.module.flags = !{!0, !1}
+ !llvm.ident = !{!2}
+
+ !0 = !{i32 8, !"PIC Level", i32 2}
+ !1 = !{i32 7, !"PIE Level", i32 2}
+ !2 = !{!"rustc version 1.76.0-nightly (0e2dac837 2023-12-04)"}
+
+...
+---
+name: foo
+alignment: 4
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: false
+failsVerification: false
+tracksDebugUserValues: true
+registers: []
+liveins:
+ - { reg: '$w0', virtual-reg: '' }
+ - { reg: '$w1', virtual-reg: '' }
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 1
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 0
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ localFrameSize: 0
+ savePoint: ''
+ restorePoint: ''
+fixedStack: []
+stack: []
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo: {}
+jumpTable:
+ kind: label-difference32
+ entries:
+ - id: 0
+ blocks: [ '%bb.5', '%bb.2', '%bb.3', '%bb.4' ]
+body: |
+ bb.0.start:
+ successors: %bb.1(0x80000000)
+ liveins: $w0, $w1
+
+ ; CHECK: mov w8, w0
+ ; CHECK-NOT: mov x8, x0
+ ; CHECK: ldrb w11, [x9, x8]
+ renamable $w8 = COPY $w0
+ renamable $w8 = ORRWrs $wzr, killed renamable $w8, 0, implicit-def $x8
+ renamable $w0 = COPY $wzr
+
+ bb.1.start:
+ successors: %bb.5(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+ liveins: $w0, $w1, $x8
+
+ renamable $x9 = MOVaddrJT target-flags(aarch64-page) %jump-table.0, target-flags(aarch64-pageoff, aarch64-nc) %jump-table.0
+ early-clobber renamable $x10, dead early-clobber renamable $x11 = JumpTableDest32 killed renamable $x9, killed renamable $x8, %jump-table.0
+ BR killed renamable $x10
+
+ bb.2.bb4:
+ successors: %bb.5(0x80000000)
+
+ renamable $w0 = MOVi32imm 1
+ B %bb.5
+
+ bb.3.bb5:
+ successors: %bb.5(0x80000000)
+
+ renamable $w0 = MOVi32imm 1
+ B %bb.5
+
+ bb.4.bb1:
+ successors: %bb.5(0x80000000)
+ liveins: $w1
+
+ dead $wzr = ANDSWri killed renamable $w1, 7, implicit-def $nzcv
+ renamable $w0 = CSINCWr $wzr, $wzr, 1, implicit killed $nzcv
+
+ bb.5.bb9:
+ liveins: $w0
+
+ RET_ReallyLR implicit $w0
+
+...
|
@llvm/pr-subscribers-backend-aarch64 Author: DianQK (DianQK) ChangesFixes #74680. Full diff: https://github.com/llvm/llvm-project/pull/74682.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index a032b31a1fc7c..fdbdaa0a6d641 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -443,6 +443,9 @@ static bool isNopCopy(const MachineInstr &PreviousCopy, MCRegister Src,
return true;
if (!TRI->isSubRegister(PreviousSrc, Src))
return false;
+ // When the source of PreviousCopy is undef, we cannot replace sub register.
+ if (CopyOperands->Source->isUndef())
+ return false;
unsigned SubIdx = TRI->getSubRegIndex(PreviousSrc, Src);
return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
}
diff --git a/llvm/test/CodeGen/AArch64/machine-cp-undef.mir b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
new file mode 100644
index 0000000000000..015b01fcc30b0
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-cp-undef.mir
@@ -0,0 +1,145 @@
+# RUN: llc -o - %s -O3 -mcpu=apple-m1 -mtriple=arm64-apple-macos | FileCheck %s
+
+--- |
+ target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+ target triple = "arm64-apple-macosx11.0.0"
+
+ ; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+ define fastcc noundef zeroext i1 @foo(i32 %self.0.val, i8 %self.4.val) unnamed_addr #0 {
+ start:
+ switch i32 %self.0.val, label %bb2 [
+ i32 0, label %bb9
+ i32 1, label %bb4
+ i32 2, label %bb5
+ i32 3, label %bb1
+ ]
+
+ bb2: ; preds = %start
+ unreachable
+
+ bb4: ; preds = %start
+ br label %bb9
+
+ bb5: ; preds = %start
+ br label %bb9
+
+ bb1: ; preds = %start
+ %trunc.not = icmp eq i8 %self.4.val, 0
+ br label %bb9
+
+ bb9: ; preds = %bb1, %bb5, %bb4, %start
+ %_0.0.off0 = phi i1 [ true, %bb5 ], [ true, %bb4 ], [ false, %start ], [ %trunc.not, %bb1 ]
+ ret i1 %_0.0.off0
+ }
+
+ attributes #0 = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) "frame-pointer"="non-leaf" "target-cpu"="apple-m1" }
+
+ !llvm.module.flags = !{!0, !1}
+ !llvm.ident = !{!2}
+
+ !0 = !{i32 8, !"PIC Level", i32 2}
+ !1 = !{i32 7, !"PIE Level", i32 2}
+ !2 = !{!"rustc version 1.76.0-nightly (0e2dac837 2023-12-04)"}
+
+...
+---
+name: foo
+alignment: 4
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: false
+failsVerification: false
+tracksDebugUserValues: true
+registers: []
+liveins:
+ - { reg: '$w0', virtual-reg: '' }
+ - { reg: '$w1', virtual-reg: '' }
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 1
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 0
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ localFrameSize: 0
+ savePoint: ''
+ restorePoint: ''
+fixedStack: []
+stack: []
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo: {}
+jumpTable:
+ kind: label-difference32
+ entries:
+ - id: 0
+ blocks: [ '%bb.5', '%bb.2', '%bb.3', '%bb.4' ]
+body: |
+ bb.0.start:
+ successors: %bb.1(0x80000000)
+ liveins: $w0, $w1
+
+ ; CHECK: mov w8, w0
+ ; CHECK-NOT: mov x8, x0
+ ; CHECK: ldrb w11, [x9, x8]
+ renamable $w8 = COPY $w0
+ renamable $w8 = ORRWrs $wzr, killed renamable $w8, 0, implicit-def $x8
+ renamable $w0 = COPY $wzr
+
+ bb.1.start:
+ successors: %bb.5(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+ liveins: $w0, $w1, $x8
+
+ renamable $x9 = MOVaddrJT target-flags(aarch64-page) %jump-table.0, target-flags(aarch64-pageoff, aarch64-nc) %jump-table.0
+ early-clobber renamable $x10, dead early-clobber renamable $x11 = JumpTableDest32 killed renamable $x9, killed renamable $x8, %jump-table.0
+ BR killed renamable $x10
+
+ bb.2.bb4:
+ successors: %bb.5(0x80000000)
+
+ renamable $w0 = MOVi32imm 1
+ B %bb.5
+
+ bb.3.bb5:
+ successors: %bb.5(0x80000000)
+
+ renamable $w0 = MOVi32imm 1
+ B %bb.5
+
+ bb.4.bb1:
+ successors: %bb.5(0x80000000)
+ liveins: $w1
+
+ dead $wzr = ANDSWri killed renamable $w1, 7, implicit-def $nzcv
+ renamable $w0 = CSINCWr $wzr, $wzr, 1, implicit killed $nzcv
+
+ bb.5.bb9:
+ liveins: $w0
+
+ RET_ReallyLR implicit $w0
+
+...
|
a979b85
to
fd2d627
Compare
c35fa40
to
efbd7f1
Compare
… cannot replace sub register The `postrapseudos` may replace `mov w8, w0` with `mov x8, x0`.
efbd7f1
to
fd364a3
Compare
Hello. I think that if you removed undef from the first instruction the result would still be incorrect. With:
The second instruction will zero-extend the w0 register to x8. It would be OK to remove the first instruction (it is dead), it is not OK to remove the second if something is relying on the top bits being zero. I assume that's what goes wrong in your case? The top bits are not zero into the function? |
I'm also curious about it, but this transformation has been around for a long time, so I'm assuming it's correct. I think there exists some other pass solution to this problem here.
Yes. Please see the link https://llvm.godbolt.org/z/zv3vz8zcs. _call_from_main:
sub sp, sp, #32
stp x29, x30, [sp, #16]
add x29, sp, #16
mov w8, #1
str w8, [sp, #4] ;;; We save `0x00000001` on stack
add x0, sp, #4 ;;; Save the address
bl _device_create_texture
ldp x29, x30, [sp, #16]
add sp, sp, #32
ret
_device_create_texture:
sub sp, sp, #32
stp x29, x30, [sp, #16]
add x29, sp, #16
ldr x8, [x0] ;;; Load 8-byte, get `0x6f60400000000001`
str x8, [sp]
ldr w9, [x0, #8]
str w9, [sp, #8]
ubfx x1, x8, #32, #8
mov x0, x8
bl __ZN12repro_11790213TextureFormat17required_features17h5e4d5de64322154bE
..
ret
__ZN12repro_11790213TextureFormat17required_features17h5e4d5de64322154bE:
mov x8, x0 ;;; x8/x0 is `0x6f60400000000001`
mov w0, #0
adrp x9, LJTI10_0@PAGE
add x9, x9, LJTI10_0@PAGEOFF
adr x10, LBB10_1
ldrb w11, [x9, x8] ;;; We expect x8 is `0x0000000000000001` In this function, we can't guarantee that mov x8, x0
mov w8, w0
mov w8, w8 If I use __ZN12repro_11790213TextureFormat17required_features17h5e4d5de64322154bE:
sub sp, sp, #16
str w1, [sp, #4]
subs w8, w0, #0
mov w8, w8 ; This looks like a transformation that guarantees that x8 is `0x0000000000000001`. After that I found the following code in llvm-project/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp Lines 4422 to 4429 in 0808be4
If we lower the code to |
@nikic Would you approve of this? Or do we need to add other codegen participants to review? |
I don't believe the undef is the issue - I think the issue is that AArch64InstrInfo::isCopyInstrImpl is saying that a W-reg orr is a copy, even if it is really a zextend because the entire X output register is depended upon. Can you try and add something to isCopyInstImpl instead, that says: If the register is virtual then it should not be a subreg, and if the register is physical then there should not be an implicit def of the X reg. Would that solve your issue, and would it cause other problems in AArch64 codegen? Thanks |
Thanks for the explanation.
I'm not sure if I'm misunderstanding anything. But I tried to file the #75184 PR. It breaks the original commit 5372658. It looks like it's because we're generating the mov instruction for the ORRWsr. (At this point this is a copy instruction?) I'm not sure exactly what is going wrong. Maybe we need to add a new method to TargetRegisterInfo? Or are you going to fix this issue? I'm not familiar with MIR. |
Thanks. It sounds like there are not a lot of code changes, which is a good sign. I didn't expect the debug problems though. I'll try and take a look at the patch. Perhaps you are right that we need a new method for the debug info to use. |
Based on llvm-project/llvm/docs/InstrRefDebugInfo.md Lines 125 to 127 in fd8fa31
isCopyLikeInstr method? This should be a good name while changing very little.I 'll give it a try later.
|
I tried adding |
… the X register (#75184) Follows #74682 (comment). Fixes #74680.
… the X register (llvm#75184) Follows llvm#74682 (comment). Fixes llvm#74680. (cherry picked from commit 7649d22)
… the X register (llvm#75184) Follows llvm#74682 (comment). Fixes llvm#74680. (cherry picked from commit 7649d22)
… the X register (#75184) Follows llvm/llvm-project#74682 (comment). Fixes #74680.
Fixes #74680.