[BPF] Make llvm-objdump disasm default cpu v4 #102166

yonghong-song · 2024-08-06T15:38:09Z

Currently, with the following example,
$ cat t.c
void foo(int a, _Atomic int *b)
{
*b &= a;
}
$ clang --target=bpf -O2 -c -mcpu=v3 t.c
$ llvm-objdump -d t.o
t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 :
0: c3 12 00 00 51 00 00 00
1: 95 00 00 00 00 00 00 00 exit

Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly.

If we add --mcpu=v3 to llvm-objdump command line, we will have
$ llvm-objdump -d --mcpu=v3 t.o

t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 :
0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
1: 95 00 00 00 00 00 00 00 exit

The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3.

To avoid the above '' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1].

Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior.

[1] https://lore.kernel.org/bpf/[email protected]/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200

Currently, with the following example, $ cat t.c void foo(int a, _Atomic int *b) { *b &= a; } $ clang --target=bpf -O2 -c -mcpu=v3 t.c $ llvm-objdump -d t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: c3 12 00 00 51 00 00 00 <unknown> 1: 95 00 00 00 00 00 00 00 exit Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly. If we add --mcpu=v3 to llvm-objdump command line, we will have $ llvm-objdump -d --mcpu=v3 t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1) 1: 95 00 00 00 00 00 00 00 exit The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3. To avoid the above '<unknown>' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1]. Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior. [1] https://lore.kernel.org/bpf/[email protected]/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200

llvmbot · 2024-08-06T15:38:40Z

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-llvm-binary-utilities

Author: None (yonghong-song)

Changes

Currently, with the following example,
$ cat t.c
void foo(int a, _Atomic int *b)
{
*b &= a;
}
$ clang --target=bpf -O2 -c -mcpu=v3 t.c
$ llvm-objdump -d t.o
t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
0: c3 12 00 00 51 00 00 00 <unknown>
1: 95 00 00 00 00 00 00 00 exit

Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly.

If we add --mcpu=v3 to llvm-objdump command line, we will have
$ llvm-objdump -d --mcpu=v3 t.o

t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
1: 95 00 00 00 00 00 00 00 exit

The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3.

To avoid the above '<unknown>' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1].

Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior.

[1] https://lore.kernel.org/bpf/6f32c0a1-9de2-4145-92ea-be025362182f@linux.dev/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200

Full diff: https://github.com/llvm/llvm-project/pull/102166.diff

7 Files Affected:

(modified) llvm/lib/Object/ELFObjectFile.cpp (+2)
(modified) llvm/test/CodeGen/BPF/objdump_atomics.ll (+1-1)
(modified) llvm/test/CodeGen/BPF/objdump_cond_op.ll (+1-1)
(modified) llvm/test/CodeGen/BPF/objdump_imm_hex.ll (+2-2)
(modified) llvm/test/CodeGen/BPF/objdump_static_var.ll (+2-2)
(modified) llvm/test/MC/BPF/insn-unit.s (+7-7)
(modified) llvm/test/MC/BPF/load-store-32.s (+1-1)

diff --git a/llvm/lib/Object/ELFObjectFile.cpp b/llvm/lib/Object/ELFObjectFile.cpp
index 53c3de06d118c..f79c233d93fe8 100644
--- a/llvm/lib/Object/ELFObjectFile.cpp
+++ b/llvm/lib/Object/ELFObjectFile.cpp
@@ -441,6 +441,8 @@ std::optional<StringRef> ELFObjectFileBase::tryGetCPUName() const {
   case ELF::EM_PPC:
   case ELF::EM_PPC64:
     return StringRef("future");
+  case ELF::EM_BPF:
+    return StringRef("v4");
   default:
     return std::nullopt;
   }
diff --git a/llvm/test/CodeGen/BPF/objdump_atomics.ll b/llvm/test/CodeGen/BPF/objdump_atomics.ll
index 3ec364f7368b5..c4cb16b2c3641 100644
--- a/llvm/test/CodeGen/BPF/objdump_atomics.ll
+++ b/llvm/test/CodeGen/BPF/objdump_atomics.ll
@@ -2,7 +2,7 @@
 
 ; CHECK-LABEL: test_load_add_32
 ; CHECK: c3 21
-; CHECK: r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
+; CHECK: w2 = atomic_fetch_add((u32 *)(r1 + 0), w2)
 define void @test_load_add_32(ptr %p, i32 zeroext %v) {
 entry:
   atomicrmw add ptr %p, i32 %v seq_cst
diff --git a/llvm/test/CodeGen/BPF/objdump_cond_op.ll b/llvm/test/CodeGen/BPF/objdump_cond_op.ll
index 3b2e6c1922fc4..c64a0f2f29382 100644
--- a/llvm/test/CodeGen/BPF/objdump_cond_op.ll
+++ b/llvm/test/CodeGen/BPF/objdump_cond_op.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck %s
+; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck %s
 
 ; Source Code:
 ; int gbl;
diff --git a/llvm/test/CodeGen/BPF/objdump_imm_hex.ll b/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
index 1760bb6b6c521..38b93e8a39b55 100644
--- a/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
+++ b/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
@@ -53,8 +53,8 @@ define i32 @test(i64, i64) local_unnamed_addr #0 {
   %14 = phi i32 [ %12, %10 ], [ %7, %4 ]
   %15 = phi i32 [ 2, %10 ], [ 1, %4 ]
   store i32 %14, ptr @gbl, align 4
-; CHECK-DEC: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0) = r1
-; CHECK-HEX: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0x0) = r1
+; CHECK-DEC: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0) = w1
+; CHECK-HEX: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0x0) = w1
   br label %16
 
 ; <label>:16:                                     ; preds = %13, %8
diff --git a/llvm/test/CodeGen/BPF/objdump_static_var.ll b/llvm/test/CodeGen/BPF/objdump_static_var.ll
index a91074ebddd46..b743d82fe5e3d 100644
--- a/llvm/test/CodeGen/BPF/objdump_static_var.ll
+++ b/llvm/test/CodeGen/BPF/objdump_static_var.ll
@@ -1,5 +1,5 @@
-; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck --check-prefix=CHECK %s
-; RUN: llc -mtriple=bpfeb -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck --check-prefix=CHECK %s
+; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s
+; RUN: llc -mtriple=bpfeb -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s
 
 ; src:
 ;   static volatile long a = 2;
diff --git a/llvm/test/MC/BPF/insn-unit.s b/llvm/test/MC/BPF/insn-unit.s
index 84735d196030d..e0a4864837798 100644
--- a/llvm/test/MC/BPF/insn-unit.s
+++ b/llvm/test/MC/BPF/insn-unit.s
@@ -34,9 +34,9 @@
   r6 = *(u16 *)(r1 + 8)  // BPF_LDX | BPF_H
   r7 = *(u32 *)(r2 + 16) // BPF_LDX | BPF_W
   r8 = *(u64 *)(r3 - 30) // BPF_LDX | BPF_DW
-// CHECK-64: 71 05 00 00 00 00 00 00 	r5 = *(u8 *)(r0 + 0)
-// CHECK-64: 69 16 08 00 00 00 00 00 	r6 = *(u16 *)(r1 + 8)
-// CHECK-64: 61 27 10 00 00 00 00 00 	r7 = *(u32 *)(r2 + 16)
+// CHECK-64: 71 05 00 00 00 00 00 00 	w5 = *(u8 *)(r0 + 0)
+// CHECK-64: 69 16 08 00 00 00 00 00 	w6 = *(u16 *)(r1 + 8)
+// CHECK-64: 61 27 10 00 00 00 00 00 	w7 = *(u32 *)(r2 + 16)
 // CHECK-32: 71 05 00 00 00 00 00 00 	w5 = *(u8 *)(r0 + 0)
 // CHECK-32: 69 16 08 00 00 00 00 00 	w6 = *(u16 *)(r1 + 8)
 // CHECK-32: 61 27 10 00 00 00 00 00 	w7 = *(u32 *)(r2 + 16)
@@ -47,9 +47,9 @@
   *(u16 *)(r1 + 8) = r8   // BPF_STX | BPF_H
   *(u32 *)(r2 + 16) = r9  // BPF_STX | BPF_W
   *(u64 *)(r3 - 30) = r10 // BPF_STX | BPF_DW
-// CHECK-64: 73 70 00 00 00 00 00 00 	*(u8 *)(r0 + 0) = r7
-// CHECK-64: 6b 81 08 00 00 00 00 00 	*(u16 *)(r1 + 8) = r8
-// CHECK-64: 63 92 10 00 00 00 00 00 	*(u32 *)(r2 + 16) = r9
+// CHECK-64: 73 70 00 00 00 00 00 00 	*(u8 *)(r0 + 0) = w7
+// CHECK-64: 6b 81 08 00 00 00 00 00 	*(u16 *)(r1 + 8) = w8
+// CHECK-64: 63 92 10 00 00 00 00 00 	*(u32 *)(r2 + 16) = w9
 // CHECK-32: 73 70 00 00 00 00 00 00 	*(u8 *)(r0 + 0) = w7
 // CHECK-32: 6b 81 08 00 00 00 00 00 	*(u16 *)(r1 + 8) = w8
 // CHECK-32: 63 92 10 00 00 00 00 00 	*(u32 *)(r2 + 16) = w9
@@ -57,7 +57,7 @@
 
   lock *(u32 *)(r2 + 16) += r9  // BPF_STX | BPF_W | BPF_XADD
   lock *(u64 *)(r3 - 30) += r10 // BPF_STX | BPF_DW | BPF_XADD
-// CHECK-64: c3 92 10 00 00 00 00 00 	lock *(u32 *)(r2 + 16) += r9
+// CHECK-64: c3 92 10 00 00 00 00 00 	lock *(u32 *)(r2 + 16) += w9
 // CHECK-32: c3 92 10 00 00 00 00 00 	lock *(u32 *)(r2 + 16) += w9
 // CHECK: db a3 e2 ff 00 00 00 00 	lock *(u64 *)(r3 - 30) += r10
 
diff --git a/llvm/test/MC/BPF/load-store-32.s b/llvm/test/MC/BPF/load-store-32.s
index 826b13b1a48cc..996d696e91a0c 100644
--- a/llvm/test/MC/BPF/load-store-32.s
+++ b/llvm/test/MC/BPF/load-store-32.s
@@ -1,6 +1,6 @@
 # RUN: llvm-mc -triple bpfel -filetype=obj -o %t %s
 # RUN: llvm-objdump --no-print-imm-hex --mattr=+alu32 -d -r %t | FileCheck --check-prefix=CHECK-32 %s
-# RUN: llvm-objdump --no-print-imm-hex -d -r %t | FileCheck %s
+# RUN: llvm-objdump --no-print-imm-hex --mcpu=v1 -d -r %t | FileCheck %s
 
 // ======== BPF_LDX Class ========
   w5 = *(u8 *)(r0 + 0)   // BPF_LDX | BPF_B

yonghong-song · 2024-08-06T15:39:39Z

cc @jemarch

4ast

lgtm

yonghong-song · 2024-08-06T16:30:52Z

@4ast PPC used a cpu 'future' so they do not need to update tryGetCPUName(). Do we need to add a 'latest' cpu flavor to avoid updating tryGetCPUName()? I am not 100% sure about this since we may update tryGetCPUName() very infrequently as we do not increase cpu number very often. WDYT?

eddyz87 · 2024-08-06T16:58:38Z

llvm/test/MC/BPF/insn-unit.s

-// CHECK-64: 71 05 00 00 00 00 00 00 	r5 = *(u8 *)(r0 + 0)
-// CHECK-64: 69 16 08 00 00 00 00 00 	r6 = *(u16 *)(r1 + 8)
-// CHECK-64: 61 27 10 00 00 00 00 00 	r7 = *(u32 *)(r2 + 16)
+// CHECK-64: 71 05 00 00 00 00 00 00 	w5 = *(u8 *)(r0 + 0)


Orthogonal to this change, but I find this disassembly difference between CPU versions quite annoying. It seems that it is better to avoid multiple textual representations for the same instruction encoding.

eddyz87

Otherwise the change looks good, but I agree that having "latest" would be a tad nicer.

As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX instruction with another new mode modifier, BPF_MEMREL ("releasing atomic store"). BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 BPF_MEMACQ BPF_DW BPF_LDX Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 BPF_MEMREL BPF_H BPF_STX Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit 0395868 ("[BPF] Make llvm-objdump disasm default cpu v4 (llvm#102166)"). Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. [1] https://lore.kernel.org/all/[email protected]/

As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX instruction with another new mode modifier, BPF_MEMREL ("releasing atomic store"). BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 BPF_MEMACQ BPF_DW BPF_LDX Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 BPF_MEMREL BPF_H BPF_STX Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit 0395868 ("[BPF] Make llvm-objdump disasm default cpu v4 (llvm#102166)"). Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. Also use ACQUIRE or RELEASE if user requested weaker memory orders (RELAXED or CONSUME) until we actually support them. Requesting a stronger memory order (i.e. SEQ_CST) will cause an error. [1] https://lore.kernel.org/all/[email protected]/

llvmbot added mc Machine (object) code llvm:binary-utilities labels Aug 6, 2024

yonghong-song requested a review from MaskRay August 6, 2024 15:38

yonghong-song requested review from 4ast and eddyz87 August 6, 2024 15:38

4ast approved these changes Aug 6, 2024

View reviewed changes

MaskRay approved these changes Aug 6, 2024

View reviewed changes

eddyz87 reviewed Aug 6, 2024

View reviewed changes

eddyz87 approved these changes Aug 6, 2024

View reviewed changes

yonghong-song merged commit 0395868 into llvm:main Aug 7, 2024
10 checks passed

peilin-ye mentioned this pull request Sep 13, 2024

[BPF] Add load-acquire and store-release instructions under -mcpu=v4 #108636

Merged

yonghong-song deleted the fix-llvm-objdump branch February 8, 2025 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BPF] Make llvm-objdump disasm default cpu v4 #102166

[BPF] Make llvm-objdump disasm default cpu v4 #102166

yonghong-song commented Aug 6, 2024

llvmbot commented Aug 6, 2024 •

edited

Loading

yonghong-song commented Aug 6, 2024

4ast left a comment

yonghong-song commented Aug 6, 2024

eddyz87 Aug 6, 2024

eddyz87 left a comment

[BPF] Make llvm-objdump disasm default cpu v4 #102166

[BPF] Make llvm-objdump disasm default cpu v4 #102166

Conversation

yonghong-song commented Aug 6, 2024

llvmbot commented Aug 6, 2024 • edited Loading

yonghong-song commented Aug 6, 2024

4ast left a comment

Choose a reason for hiding this comment

yonghong-song commented Aug 6, 2024

eddyz87 Aug 6, 2024

Choose a reason for hiding this comment

eddyz87 left a comment

Choose a reason for hiding this comment

llvmbot commented Aug 6, 2024 •

edited

Loading