Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BPF] Make llvm-objdump disasm default cpu v4 #102166

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

yonghong-song
Copy link
Contributor

Currently, with the following example,
$ cat t.c
void foo(int a, _Atomic int *b)
{
*b &= a;
}
$ clang --target=bpf -O2 -c -mcpu=v3 t.c
$ llvm-objdump -d t.o
t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 :
0: c3 12 00 00 51 00 00 00
1: 95 00 00 00 00 00 00 00 exit

Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly.

If we add --mcpu=v3 to llvm-objdump command line, we will have
$ llvm-objdump -d --mcpu=v3 t.o

t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 :
0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
1: 95 00 00 00 00 00 00 00 exit

The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3.

To avoid the above '' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1].

Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior.

[1] https://lore.kernel.org/bpf/[email protected]/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200

Currently, with the following example,
  $ cat t.c
  void foo(int a, _Atomic int *b)
  {
   *b &= a;
  }
  $ clang --target=bpf -O2 -c -mcpu=v3 t.c
  $ llvm-objdump -d t.o
  t.o:    file format elf64-bpf

  Disassembly of section .text:

  0000000000000000 <foo>:
       0:       c3 12 00 00 51 00 00 00 <unknown>
       1:       95 00 00 00 00 00 00 00 exit

Basically, the default cpu for llvm-objdump is v1 and it won't be
able to decode insn properly.

If we add --mcpu=v3 to llvm-objdump command line, we will have
  $ llvm-objdump -d --mcpu=v3 t.o

  t.o:    file format elf64-bpf

  Disassembly of section .text:

  0000000000000000 <foo>:
       0:       c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
       1:       95 00 00 00 00 00 00 00 exit

The atomic_fetch_and insn can be decoded properly.
Using latest cpu version --mcpu=v4 can also decode properly like
the above --mcpu=v3.

To avoid the above '<unknown>' decoding with common 'llvm-objdump -d t.o',
this patch marked the default cpu for llvm-objdump with the current
highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The
cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted
in the future if cpu number is increased e.g. v5 etc. Such an approach
also aligns with gcc-bpf as discussed in [1].

Six bpf unit tests are affected with this change. I changed test output
for three unit tests and added --mcpu=v1 for the other three unit tests,
to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1
behavior.

  [1] https://lore.kernel.org/bpf/[email protected]/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200
@llvmbot llvmbot added mc Machine (object) code llvm:binary-utilities labels Aug 6, 2024
@yonghong-song yonghong-song requested a review from MaskRay August 6, 2024 15:38
@llvmbot
Copy link
Member

llvmbot commented Aug 6, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-llvm-binary-utilities

Author: None (yonghong-song)

Changes

Currently, with the following example,
$ cat t.c
void foo(int a, _Atomic int *b)
{
*b &= a;
}
$ clang --target=bpf -O2 -c -mcpu=v3 t.c
$ llvm-objdump -d t.o
t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
0: c3 12 00 00 51 00 00 00 <unknown>
1: 95 00 00 00 00 00 00 00 exit

Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly.

If we add --mcpu=v3 to llvm-objdump command line, we will have
$ llvm-objdump -d --mcpu=v3 t.o

t.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
1: 95 00 00 00 00 00 00 00 exit

The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3.

To avoid the above '<unknown>' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1].

Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior.

[1] https://lore.kernel.org/bpf/6f32c0a1-9de2-4145-92ea-be025362182f@linux.dev/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200


Full diff: https://github.com/llvm/llvm-project/pull/102166.diff

7 Files Affected:

  • (modified) llvm/lib/Object/ELFObjectFile.cpp (+2)
  • (modified) llvm/test/CodeGen/BPF/objdump_atomics.ll (+1-1)
  • (modified) llvm/test/CodeGen/BPF/objdump_cond_op.ll (+1-1)
  • (modified) llvm/test/CodeGen/BPF/objdump_imm_hex.ll (+2-2)
  • (modified) llvm/test/CodeGen/BPF/objdump_static_var.ll (+2-2)
  • (modified) llvm/test/MC/BPF/insn-unit.s (+7-7)
  • (modified) llvm/test/MC/BPF/load-store-32.s (+1-1)
diff --git a/llvm/lib/Object/ELFObjectFile.cpp b/llvm/lib/Object/ELFObjectFile.cpp
index 53c3de06d118c..f79c233d93fe8 100644
--- a/llvm/lib/Object/ELFObjectFile.cpp
+++ b/llvm/lib/Object/ELFObjectFile.cpp
@@ -441,6 +441,8 @@ std::optional<StringRef> ELFObjectFileBase::tryGetCPUName() const {
   case ELF::EM_PPC:
   case ELF::EM_PPC64:
     return StringRef("future");
+  case ELF::EM_BPF:
+    return StringRef("v4");
   default:
     return std::nullopt;
   }
diff --git a/llvm/test/CodeGen/BPF/objdump_atomics.ll b/llvm/test/CodeGen/BPF/objdump_atomics.ll
index 3ec364f7368b5..c4cb16b2c3641 100644
--- a/llvm/test/CodeGen/BPF/objdump_atomics.ll
+++ b/llvm/test/CodeGen/BPF/objdump_atomics.ll
@@ -2,7 +2,7 @@
 
 ; CHECK-LABEL: test_load_add_32
 ; CHECK: c3 21
-; CHECK: r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
+; CHECK: w2 = atomic_fetch_add((u32 *)(r1 + 0), w2)
 define void @test_load_add_32(ptr %p, i32 zeroext %v) {
 entry:
   atomicrmw add ptr %p, i32 %v seq_cst
diff --git a/llvm/test/CodeGen/BPF/objdump_cond_op.ll b/llvm/test/CodeGen/BPF/objdump_cond_op.ll
index 3b2e6c1922fc4..c64a0f2f29382 100644
--- a/llvm/test/CodeGen/BPF/objdump_cond_op.ll
+++ b/llvm/test/CodeGen/BPF/objdump_cond_op.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck %s
+; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck %s
 
 ; Source Code:
 ; int gbl;
diff --git a/llvm/test/CodeGen/BPF/objdump_imm_hex.ll b/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
index 1760bb6b6c521..38b93e8a39b55 100644
--- a/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
+++ b/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
@@ -53,8 +53,8 @@ define i32 @test(i64, i64) local_unnamed_addr #0 {
   %14 = phi i32 [ %12, %10 ], [ %7, %4 ]
   %15 = phi i32 [ 2, %10 ], [ 1, %4 ]
   store i32 %14, ptr @gbl, align 4
-; CHECK-DEC: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0) = r1
-; CHECK-HEX: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0x0) = r1
+; CHECK-DEC: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0) = w1
+; CHECK-HEX: 63 12 00 00 00 00 00 00         *(u32 *)(r2 + 0x0) = w1
   br label %16
 
 ; <label>:16:                                     ; preds = %13, %8
diff --git a/llvm/test/CodeGen/BPF/objdump_static_var.ll b/llvm/test/CodeGen/BPF/objdump_static_var.ll
index a91074ebddd46..b743d82fe5e3d 100644
--- a/llvm/test/CodeGen/BPF/objdump_static_var.ll
+++ b/llvm/test/CodeGen/BPF/objdump_static_var.ll
@@ -1,5 +1,5 @@
-; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck --check-prefix=CHECK %s
-; RUN: llc -mtriple=bpfeb -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck --check-prefix=CHECK %s
+; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s
+; RUN: llc -mtriple=bpfeb -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s
 
 ; src:
 ;   static volatile long a = 2;
diff --git a/llvm/test/MC/BPF/insn-unit.s b/llvm/test/MC/BPF/insn-unit.s
index 84735d196030d..e0a4864837798 100644
--- a/llvm/test/MC/BPF/insn-unit.s
+++ b/llvm/test/MC/BPF/insn-unit.s
@@ -34,9 +34,9 @@
   r6 = *(u16 *)(r1 + 8)  // BPF_LDX | BPF_H
   r7 = *(u32 *)(r2 + 16) // BPF_LDX | BPF_W
   r8 = *(u64 *)(r3 - 30) // BPF_LDX | BPF_DW
-// CHECK-64: 71 05 00 00 00 00 00 00 	r5 = *(u8 *)(r0 + 0)
-// CHECK-64: 69 16 08 00 00 00 00 00 	r6 = *(u16 *)(r1 + 8)
-// CHECK-64: 61 27 10 00 00 00 00 00 	r7 = *(u32 *)(r2 + 16)
+// CHECK-64: 71 05 00 00 00 00 00 00 	w5 = *(u8 *)(r0 + 0)
+// CHECK-64: 69 16 08 00 00 00 00 00 	w6 = *(u16 *)(r1 + 8)
+// CHECK-64: 61 27 10 00 00 00 00 00 	w7 = *(u32 *)(r2 + 16)
 // CHECK-32: 71 05 00 00 00 00 00 00 	w5 = *(u8 *)(r0 + 0)
 // CHECK-32: 69 16 08 00 00 00 00 00 	w6 = *(u16 *)(r1 + 8)
 // CHECK-32: 61 27 10 00 00 00 00 00 	w7 = *(u32 *)(r2 + 16)
@@ -47,9 +47,9 @@
   *(u16 *)(r1 + 8) = r8   // BPF_STX | BPF_H
   *(u32 *)(r2 + 16) = r9  // BPF_STX | BPF_W
   *(u64 *)(r3 - 30) = r10 // BPF_STX | BPF_DW
-// CHECK-64: 73 70 00 00 00 00 00 00 	*(u8 *)(r0 + 0) = r7
-// CHECK-64: 6b 81 08 00 00 00 00 00 	*(u16 *)(r1 + 8) = r8
-// CHECK-64: 63 92 10 00 00 00 00 00 	*(u32 *)(r2 + 16) = r9
+// CHECK-64: 73 70 00 00 00 00 00 00 	*(u8 *)(r0 + 0) = w7
+// CHECK-64: 6b 81 08 00 00 00 00 00 	*(u16 *)(r1 + 8) = w8
+// CHECK-64: 63 92 10 00 00 00 00 00 	*(u32 *)(r2 + 16) = w9
 // CHECK-32: 73 70 00 00 00 00 00 00 	*(u8 *)(r0 + 0) = w7
 // CHECK-32: 6b 81 08 00 00 00 00 00 	*(u16 *)(r1 + 8) = w8
 // CHECK-32: 63 92 10 00 00 00 00 00 	*(u32 *)(r2 + 16) = w9
@@ -57,7 +57,7 @@
 
   lock *(u32 *)(r2 + 16) += r9  // BPF_STX | BPF_W | BPF_XADD
   lock *(u64 *)(r3 - 30) += r10 // BPF_STX | BPF_DW | BPF_XADD
-// CHECK-64: c3 92 10 00 00 00 00 00 	lock *(u32 *)(r2 + 16) += r9
+// CHECK-64: c3 92 10 00 00 00 00 00 	lock *(u32 *)(r2 + 16) += w9
 // CHECK-32: c3 92 10 00 00 00 00 00 	lock *(u32 *)(r2 + 16) += w9
 // CHECK: db a3 e2 ff 00 00 00 00 	lock *(u64 *)(r3 - 30) += r10
 
diff --git a/llvm/test/MC/BPF/load-store-32.s b/llvm/test/MC/BPF/load-store-32.s
index 826b13b1a48cc..996d696e91a0c 100644
--- a/llvm/test/MC/BPF/load-store-32.s
+++ b/llvm/test/MC/BPF/load-store-32.s
@@ -1,6 +1,6 @@
 # RUN: llvm-mc -triple bpfel -filetype=obj -o %t %s
 # RUN: llvm-objdump --no-print-imm-hex --mattr=+alu32 -d -r %t | FileCheck --check-prefix=CHECK-32 %s
-# RUN: llvm-objdump --no-print-imm-hex -d -r %t | FileCheck %s
+# RUN: llvm-objdump --no-print-imm-hex --mcpu=v1 -d -r %t | FileCheck %s
 
 // ======== BPF_LDX Class ========
   w5 = *(u8 *)(r0 + 0)   // BPF_LDX | BPF_B

@yonghong-song yonghong-song requested review from 4ast and eddyz87 August 6, 2024 15:38
@yonghong-song
Copy link
Contributor Author

cc @jemarch

Copy link
Member

@4ast 4ast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yonghong-song
Copy link
Contributor Author

@4ast PPC used a cpu 'future' so they do not need to update tryGetCPUName(). Do we need to add a 'latest' cpu flavor to avoid updating tryGetCPUName()? I am not 100% sure about this since we may update tryGetCPUName() very infrequently as we do not increase cpu number very often. WDYT?

// CHECK-64: 71 05 00 00 00 00 00 00 r5 = *(u8 *)(r0 + 0)
// CHECK-64: 69 16 08 00 00 00 00 00 r6 = *(u16 *)(r1 + 8)
// CHECK-64: 61 27 10 00 00 00 00 00 r7 = *(u32 *)(r2 + 16)
// CHECK-64: 71 05 00 00 00 00 00 00 w5 = *(u8 *)(r0 + 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal to this change, but I find this disassembly difference between CPU versions quite annoying. It seems that it is better to avoid multiple textual representations for the same instruction encoding.

Copy link
Contributor

@eddyz87 eddyz87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the change looks good, but I agree that having "latest" would be a tad nicer.

@yonghong-song yonghong-song merged commit 0395868 into llvm:main Aug 7, 2024
10 checks passed
peilin-ye added a commit to peilin-ye/llvm-project that referenced this pull request Sep 13, 2024
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new mode modifier,
BPF_MEMACQ ("acquiring atomic load").  Similarly, a "store_release" is a
BPF_STX instruction with another new mode modifier, BPF_MEMREL
("releasing atomic store").

BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111).
For example:

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00 exit

Opcode 0xf9, or 0b11111001, can be decoded as:

  0b 111        11     001
     BPF_MEMACQ BPF_DW BPF_LDX

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00 exit

Opcode 0xeb, or 0b11101011, can be decoded as:

  0b 111        01    011
     BPF_MEMREL BPF_H BPF_STX

Inline assembly is also supported.  For example:

  asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" :
               "=r"(ret) : "r"(ptr) : "memory");

Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit
0395868 ("[BPF] Make llvm-objdump disasm default cpu v4
(llvm#102166)").

Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and
__BPF_FEATURE_STORE_RELEASE, to let developers detect these new features
in source code.  They can also be disabled using two new llc options,
-disable-load-acquire and -disable-store-release, respectively.

[1] https://lore.kernel.org/all/[email protected]/
peilin-ye added a commit to peilin-ye/llvm-project that referenced this pull request Sep 16, 2024
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new mode modifier,
BPF_MEMACQ ("acquiring atomic load").  Similarly, a "store_release" is a
BPF_STX instruction with another new mode modifier, BPF_MEMREL
("releasing atomic store").

BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111).
For example:

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00 exit

Opcode 0xf9, or 0b11111001, can be decoded as:

  0b 111        11     001
     BPF_MEMACQ BPF_DW BPF_LDX

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00 exit

Opcode 0xeb, or 0b11101011, can be decoded as:

  0b 111        01    011
     BPF_MEMREL BPF_H BPF_STX

Inline assembly is also supported.  For example:

  asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" :
               "=r"(ret) : "r"(ptr) : "memory");

Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit
0395868 ("[BPF] Make llvm-objdump disasm default cpu v4
(llvm#102166)").

Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and
__BPF_FEATURE_STORE_RELEASE, to let developers detect these new features
in source code.  They can also be disabled using two new llc options,
-disable-load-acquire and -disable-store-release, respectively.

[1] https://lore.kernel.org/all/[email protected]/
peilin-ye added a commit to peilin-ye/llvm-project that referenced this pull request Sep 20, 2024
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new mode modifier,
BPF_MEMACQ ("acquiring atomic load").  Similarly, a "store_release" is a
BPF_STX instruction with another new mode modifier, BPF_MEMREL
("releasing atomic store").

BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111).
For example:

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00 exit

Opcode 0xf9, or 0b11111001, can be decoded as:

  0b 111        11     001
     BPF_MEMACQ BPF_DW BPF_LDX

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00 exit

Opcode 0xeb, or 0b11101011, can be decoded as:

  0b 111        01    011
     BPF_MEMREL BPF_H BPF_STX

Inline assembly is also supported.  For example:

  asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" :
               "=r"(ret) : "r"(ptr) : "memory");

Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit
0395868 ("[BPF] Make llvm-objdump disasm default cpu v4
(llvm#102166)").

Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and
__BPF_FEATURE_STORE_RELEASE, to let developers detect these new features
in source code.  They can also be disabled using two new llc options,
-disable-load-acquire and -disable-store-release, respectively.

Also use ACQUIRE or RELEASE if user requested weaker memory orders
(RELAXED or CONSUME) until we actually support them.  Requesting a
stronger memory order (i.e. SEQ_CST) will cause an error.

[1] https://lore.kernel.org/all/[email protected]/
peilin-ye added a commit to peilin-ye/llvm-project that referenced this pull request Sep 24, 2024
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new mode modifier,
BPF_MEMACQ ("acquiring atomic load").  Similarly, a "store_release" is a
BPF_STX instruction with another new mode modifier, BPF_MEMREL
("releasing atomic store").

BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111).
For example:

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00 exit

Opcode 0xf9, or 0b11111001, can be decoded as:

  0b 111        11     001
     BPF_MEMACQ BPF_DW BPF_LDX

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00 exit

Opcode 0xeb, or 0b11101011, can be decoded as:

  0b 111        01    011
     BPF_MEMREL BPF_H BPF_STX

Inline assembly is also supported.  For example:

  asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" :
               "=r"(ret) : "r"(ptr) : "memory");

Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit
0395868 ("[BPF] Make llvm-objdump disasm default cpu v4
(llvm#102166)").

Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and
__BPF_FEATURE_STORE_RELEASE, to let developers detect these new features
in source code.  They can also be disabled using two new llc options,
-disable-load-acquire and -disable-store-release, respectively.

Also use ACQUIRE or RELEASE if user requested weaker memory orders
(RELAXED or CONSUME) until we actually support them.  Requesting a
stronger memory order (i.e. SEQ_CST) will cause an error.

[1] https://lore.kernel.org/all/[email protected]/
peilin-ye added a commit to peilin-ye/llvm-project that referenced this pull request Sep 30, 2024
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new mode modifier,
BPF_MEMACQ ("acquiring atomic load").  Similarly, a "store_release" is a
BPF_STX instruction with another new mode modifier, BPF_MEMREL
("releasing atomic store").

BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111).
For example:

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00 exit

Opcode 0xf9, or 0b11111001, can be decoded as:

  0b 111        11     001
     BPF_MEMACQ BPF_DW BPF_LDX

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00 exit

Opcode 0xeb, or 0b11101011, can be decoded as:

  0b 111        01    011
     BPF_MEMREL BPF_H BPF_STX

Inline assembly is also supported.  For example:

  asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" :
               "=r"(ret) : "r"(ptr) : "memory");

Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit
0395868 ("[BPF] Make llvm-objdump disasm default cpu v4
(llvm#102166)").

Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and
__BPF_FEATURE_STORE_RELEASE, to let developers detect these new features
in source code.  They can also be disabled using two new llc options,
-disable-load-acquire and -disable-store-release, respectively.

Also use ACQUIRE or RELEASE if user requested weaker memory orders
(RELAXED or CONSUME) until we actually support them.  Requesting a
stronger memory order (i.e. SEQ_CST) will cause an error.

[1] https://lore.kernel.org/all/[email protected]/
@yonghong-song yonghong-song deleted the fix-llvm-objdump branch February 8, 2025 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:binary-utilities mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants