Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of lateout pairs in inline asm #57550

Open
newpavlov opened this issue Sep 4, 2022 · 3 comments
Open

Incorrect handling of lateout pairs in inline asm #57550

newpavlov opened this issue Sep 4, 2022 · 3 comments

Comments

@newpavlov
Copy link

newpavlov commented Sep 4, 2022

The issue was discovered while looking into source of rust-lang/rust#101346.

The following Rust function:

pub fn foo() -> u32 {
    let t1: u32;
    let t2: u32;
    unsafe {
        asm!(
            "mov {0:e}, 1",
            "mov eax, 42",
            lateout(reg) t1,
            lateout("eax") t2,
            options(nostack),
        );
    }
    t1
}

Gets compiled into this obviously incorrect assembly:

example::foo:
        mov     eax, 1
        mov     eax, 42
        ret

Godbolt link: https://rust.godbolt.org/z/Yb9v7WobM

LLVM incorrectly reuses register for a pair of lateouts if it can see that one of those does not get used later.

@asl
Copy link
Collaborator

asl commented Sep 4, 2022

@newpavlov Please attach LLVM IR that could be used to reproduce the issue. Thanks!

@newpavlov
Copy link
Author

https://llvm.godbolt.org/z/qxrfd7fj3

define i32 @foo() unnamed_addr #0 {
start:
  %0 = tail call { i32, i32 } asm inteldialect "mov ${0:k}, 1\0Amov eax, 42", "=r,={ax},~{dirflag},~{fpsr},~{flags}"() #1, !srcloc !2
  %1 = extractvalue { i32, i32 } %0, 0
  ret i32 %1
}

!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}
!2 = !{i32 0, i32 108, i32 136}

@nikic
Copy link
Contributor

nikic commented Sep 4, 2022

Just to be clear, the problem here is that with an =r,={ax} constraint string, both output registers are allocated to eax.

It looks like we originally get a correct allocation to ecx followed by copy to eax, but the copy is removed by machine copy propagation:

# *** IR Dump After Stack Slot Coloring (stack-slot-coloring) ***:
# Machine code for function foo: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues

0B	bb.0.start:
16B	  INLINEASM &"mov ${0}, 1\0Amov ${1}, 42" [inteldialect], $0:[regdef:GR32], def renamable $ecx, $1:[regdef], implicit-def dead $eax
32B	  $eax = COPY killed renamable $ecx
48B	  RET 0, $eax

# End machine code for function foo.

# *** IR Dump After Machine Copy Propagation Pass (machine-cp) ***:
# Machine code for function foo: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues

bb.0.start:
  INLINEASM &"mov ${0}, 1\0Amov ${1}, 42" [inteldialect], $0:[regdef:GR32], def $eax, $1:[regdef], implicit-def dead $eax
  RET 0, $eax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants