-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BPF] Make llvm-objdump disasm default cpu v4 #102166
Conversation
Currently, with the following example, $ cat t.c void foo(int a, _Atomic int *b) { *b &= a; } $ clang --target=bpf -O2 -c -mcpu=v3 t.c $ llvm-objdump -d t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: c3 12 00 00 51 00 00 00 <unknown> 1: 95 00 00 00 00 00 00 00 exit Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly. If we add --mcpu=v3 to llvm-objdump command line, we will have $ llvm-objdump -d --mcpu=v3 t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1) 1: 95 00 00 00 00 00 00 00 exit The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3. To avoid the above '<unknown>' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1]. Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior. [1] https://lore.kernel.org/bpf/[email protected]/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200
@llvm/pr-subscribers-mc @llvm/pr-subscribers-llvm-binary-utilities Author: None (yonghong-song) ChangesCurrently, with the following example, Disassembly of section .text: 0000000000000000 <foo>: Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly. If we add --mcpu=v3 to llvm-objdump command line, we will have t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3. To avoid the above '<unknown>' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1]. Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior. [1] https://lore.kernel.org/bpf/6f32c0a1-9de2-4145-92ea-be025362182f@linux.dev/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200 Full diff: https://github.com/llvm/llvm-project/pull/102166.diff 7 Files Affected:
diff --git a/llvm/lib/Object/ELFObjectFile.cpp b/llvm/lib/Object/ELFObjectFile.cpp
index 53c3de06d118c..f79c233d93fe8 100644
--- a/llvm/lib/Object/ELFObjectFile.cpp
+++ b/llvm/lib/Object/ELFObjectFile.cpp
@@ -441,6 +441,8 @@ std::optional<StringRef> ELFObjectFileBase::tryGetCPUName() const {
case ELF::EM_PPC:
case ELF::EM_PPC64:
return StringRef("future");
+ case ELF::EM_BPF:
+ return StringRef("v4");
default:
return std::nullopt;
}
diff --git a/llvm/test/CodeGen/BPF/objdump_atomics.ll b/llvm/test/CodeGen/BPF/objdump_atomics.ll
index 3ec364f7368b5..c4cb16b2c3641 100644
--- a/llvm/test/CodeGen/BPF/objdump_atomics.ll
+++ b/llvm/test/CodeGen/BPF/objdump_atomics.ll
@@ -2,7 +2,7 @@
; CHECK-LABEL: test_load_add_32
; CHECK: c3 21
-; CHECK: r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
+; CHECK: w2 = atomic_fetch_add((u32 *)(r1 + 0), w2)
define void @test_load_add_32(ptr %p, i32 zeroext %v) {
entry:
atomicrmw add ptr %p, i32 %v seq_cst
diff --git a/llvm/test/CodeGen/BPF/objdump_cond_op.ll b/llvm/test/CodeGen/BPF/objdump_cond_op.ll
index 3b2e6c1922fc4..c64a0f2f29382 100644
--- a/llvm/test/CodeGen/BPF/objdump_cond_op.ll
+++ b/llvm/test/CodeGen/BPF/objdump_cond_op.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck %s
+; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck %s
; Source Code:
; int gbl;
diff --git a/llvm/test/CodeGen/BPF/objdump_imm_hex.ll b/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
index 1760bb6b6c521..38b93e8a39b55 100644
--- a/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
+++ b/llvm/test/CodeGen/BPF/objdump_imm_hex.ll
@@ -53,8 +53,8 @@ define i32 @test(i64, i64) local_unnamed_addr #0 {
%14 = phi i32 [ %12, %10 ], [ %7, %4 ]
%15 = phi i32 [ 2, %10 ], [ 1, %4 ]
store i32 %14, ptr @gbl, align 4
-; CHECK-DEC: 63 12 00 00 00 00 00 00 *(u32 *)(r2 + 0) = r1
-; CHECK-HEX: 63 12 00 00 00 00 00 00 *(u32 *)(r2 + 0x0) = r1
+; CHECK-DEC: 63 12 00 00 00 00 00 00 *(u32 *)(r2 + 0) = w1
+; CHECK-HEX: 63 12 00 00 00 00 00 00 *(u32 *)(r2 + 0x0) = w1
br label %16
; <label>:16: ; preds = %13, %8
diff --git a/llvm/test/CodeGen/BPF/objdump_static_var.ll b/llvm/test/CodeGen/BPF/objdump_static_var.ll
index a91074ebddd46..b743d82fe5e3d 100644
--- a/llvm/test/CodeGen/BPF/objdump_static_var.ll
+++ b/llvm/test/CodeGen/BPF/objdump_static_var.ll
@@ -1,5 +1,5 @@
-; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck --check-prefix=CHECK %s
-; RUN: llc -mtriple=bpfeb -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex -d - | FileCheck --check-prefix=CHECK %s
+; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s
+; RUN: llc -mtriple=bpfeb -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s
; src:
; static volatile long a = 2;
diff --git a/llvm/test/MC/BPF/insn-unit.s b/llvm/test/MC/BPF/insn-unit.s
index 84735d196030d..e0a4864837798 100644
--- a/llvm/test/MC/BPF/insn-unit.s
+++ b/llvm/test/MC/BPF/insn-unit.s
@@ -34,9 +34,9 @@
r6 = *(u16 *)(r1 + 8) // BPF_LDX | BPF_H
r7 = *(u32 *)(r2 + 16) // BPF_LDX | BPF_W
r8 = *(u64 *)(r3 - 30) // BPF_LDX | BPF_DW
-// CHECK-64: 71 05 00 00 00 00 00 00 r5 = *(u8 *)(r0 + 0)
-// CHECK-64: 69 16 08 00 00 00 00 00 r6 = *(u16 *)(r1 + 8)
-// CHECK-64: 61 27 10 00 00 00 00 00 r7 = *(u32 *)(r2 + 16)
+// CHECK-64: 71 05 00 00 00 00 00 00 w5 = *(u8 *)(r0 + 0)
+// CHECK-64: 69 16 08 00 00 00 00 00 w6 = *(u16 *)(r1 + 8)
+// CHECK-64: 61 27 10 00 00 00 00 00 w7 = *(u32 *)(r2 + 16)
// CHECK-32: 71 05 00 00 00 00 00 00 w5 = *(u8 *)(r0 + 0)
// CHECK-32: 69 16 08 00 00 00 00 00 w6 = *(u16 *)(r1 + 8)
// CHECK-32: 61 27 10 00 00 00 00 00 w7 = *(u32 *)(r2 + 16)
@@ -47,9 +47,9 @@
*(u16 *)(r1 + 8) = r8 // BPF_STX | BPF_H
*(u32 *)(r2 + 16) = r9 // BPF_STX | BPF_W
*(u64 *)(r3 - 30) = r10 // BPF_STX | BPF_DW
-// CHECK-64: 73 70 00 00 00 00 00 00 *(u8 *)(r0 + 0) = r7
-// CHECK-64: 6b 81 08 00 00 00 00 00 *(u16 *)(r1 + 8) = r8
-// CHECK-64: 63 92 10 00 00 00 00 00 *(u32 *)(r2 + 16) = r9
+// CHECK-64: 73 70 00 00 00 00 00 00 *(u8 *)(r0 + 0) = w7
+// CHECK-64: 6b 81 08 00 00 00 00 00 *(u16 *)(r1 + 8) = w8
+// CHECK-64: 63 92 10 00 00 00 00 00 *(u32 *)(r2 + 16) = w9
// CHECK-32: 73 70 00 00 00 00 00 00 *(u8 *)(r0 + 0) = w7
// CHECK-32: 6b 81 08 00 00 00 00 00 *(u16 *)(r1 + 8) = w8
// CHECK-32: 63 92 10 00 00 00 00 00 *(u32 *)(r2 + 16) = w9
@@ -57,7 +57,7 @@
lock *(u32 *)(r2 + 16) += r9 // BPF_STX | BPF_W | BPF_XADD
lock *(u64 *)(r3 - 30) += r10 // BPF_STX | BPF_DW | BPF_XADD
-// CHECK-64: c3 92 10 00 00 00 00 00 lock *(u32 *)(r2 + 16) += r9
+// CHECK-64: c3 92 10 00 00 00 00 00 lock *(u32 *)(r2 + 16) += w9
// CHECK-32: c3 92 10 00 00 00 00 00 lock *(u32 *)(r2 + 16) += w9
// CHECK: db a3 e2 ff 00 00 00 00 lock *(u64 *)(r3 - 30) += r10
diff --git a/llvm/test/MC/BPF/load-store-32.s b/llvm/test/MC/BPF/load-store-32.s
index 826b13b1a48cc..996d696e91a0c 100644
--- a/llvm/test/MC/BPF/load-store-32.s
+++ b/llvm/test/MC/BPF/load-store-32.s
@@ -1,6 +1,6 @@
# RUN: llvm-mc -triple bpfel -filetype=obj -o %t %s
# RUN: llvm-objdump --no-print-imm-hex --mattr=+alu32 -d -r %t | FileCheck --check-prefix=CHECK-32 %s
-# RUN: llvm-objdump --no-print-imm-hex -d -r %t | FileCheck %s
+# RUN: llvm-objdump --no-print-imm-hex --mcpu=v1 -d -r %t | FileCheck %s
// ======== BPF_LDX Class ========
w5 = *(u8 *)(r0 + 0) // BPF_LDX | BPF_B
|
cc @jemarch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@4ast PPC used a cpu 'future' so they do not need to update tryGetCPUName(). Do we need to add a 'latest' cpu flavor to avoid updating tryGetCPUName()? I am not 100% sure about this since we may update tryGetCPUName() very infrequently as we do not increase cpu number very often. WDYT? |
// CHECK-64: 71 05 00 00 00 00 00 00 r5 = *(u8 *)(r0 + 0) | ||
// CHECK-64: 69 16 08 00 00 00 00 00 r6 = *(u16 *)(r1 + 8) | ||
// CHECK-64: 61 27 10 00 00 00 00 00 r7 = *(u32 *)(r2 + 16) | ||
// CHECK-64: 71 05 00 00 00 00 00 00 w5 = *(u8 *)(r0 + 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal to this change, but I find this disassembly difference between CPU versions quite annoying. It seems that it is better to avoid multiple textual representations for the same instruction encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise the change looks good, but I agree that having "latest" would be a tad nicer.
As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX instruction with another new mode modifier, BPF_MEMREL ("releasing atomic store"). BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 BPF_MEMACQ BPF_DW BPF_LDX Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 BPF_MEMREL BPF_H BPF_STX Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit 0395868 ("[BPF] Make llvm-objdump disasm default cpu v4 (llvm#102166)"). Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. [1] https://lore.kernel.org/all/[email protected]/
As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX instruction with another new mode modifier, BPF_MEMREL ("releasing atomic store"). BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 BPF_MEMACQ BPF_DW BPF_LDX Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 BPF_MEMREL BPF_H BPF_STX Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit 0395868 ("[BPF] Make llvm-objdump disasm default cpu v4 (llvm#102166)"). Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. [1] https://lore.kernel.org/all/[email protected]/
As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX instruction with another new mode modifier, BPF_MEMREL ("releasing atomic store"). BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 BPF_MEMACQ BPF_DW BPF_LDX Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 BPF_MEMREL BPF_H BPF_STX Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit 0395868 ("[BPF] Make llvm-objdump disasm default cpu v4 (llvm#102166)"). Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. Also use ACQUIRE or RELEASE if user requested weaker memory orders (RELAXED or CONSUME) until we actually support them. Requesting a stronger memory order (i.e. SEQ_CST) will cause an error. [1] https://lore.kernel.org/all/[email protected]/
As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX instruction with another new mode modifier, BPF_MEMREL ("releasing atomic store"). BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 BPF_MEMACQ BPF_DW BPF_LDX Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 BPF_MEMREL BPF_H BPF_STX Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit 0395868 ("[BPF] Make llvm-objdump disasm default cpu v4 (llvm#102166)"). Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. Also use ACQUIRE or RELEASE if user requested weaker memory orders (RELAXED or CONSUME) until we actually support them. Requesting a stronger memory order (i.e. SEQ_CST) will cause an error. [1] https://lore.kernel.org/all/[email protected]/
As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX instruction with another new mode modifier, BPF_MEMREL ("releasing atomic store"). BPF_MEMACQ and BPF_MEMREL share the same numeric value, 0x7 (or 0b111). For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: f9 10 00 00 00 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit Opcode 0xf9, or 0b11111001, can be decoded as: 0b 111 11 001 BPF_MEMACQ BPF_DW BPF_LDX Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: eb 21 00 00 00 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit Opcode 0xeb, or 0b11101011, can be decoded as: 0b 111 01 011 BPF_MEMREL BPF_H BPF_STX Inline assembly is also supported. For example: asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : "=r"(ret) : "r"(ptr) : "memory"); Let 'llvm-objdump -d' use -mcpu=v5 by default, just like commit 0395868 ("[BPF] Make llvm-objdump disasm default cpu v4 (llvm#102166)"). Add two macros, __BPF_FEATURE_LOAD_ACQUIRE and __BPF_FEATURE_STORE_RELEASE, to let developers detect these new features in source code. They can also be disabled using two new llc options, -disable-load-acquire and -disable-store-release, respectively. Also use ACQUIRE or RELEASE if user requested weaker memory orders (RELAXED or CONSUME) until we actually support them. Requesting a stronger memory order (i.e. SEQ_CST) will cause an error. [1] https://lore.kernel.org/all/[email protected]/
Currently, with the following example,
$ cat t.c
void foo(int a, _Atomic int *b)
{
*b &= a;
}
$ clang --target=bpf -O2 -c -mcpu=v3 t.c
$ llvm-objdump -d t.o
t.o: file format elf64-bpf
Disassembly of section .text:
0000000000000000 :
0: c3 12 00 00 51 00 00 00
1: 95 00 00 00 00 00 00 00 exit
Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly.
If we add --mcpu=v3 to llvm-objdump command line, we will have
$ llvm-objdump -d --mcpu=v3 t.o
t.o: file format elf64-bpf
Disassembly of section .text:
0000000000000000 :
0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
1: 95 00 00 00 00 00 00 00 exit
The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3.
To avoid the above '' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1].
Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior.
[1] https://lore.kernel.org/bpf/[email protected]/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200