Skip to content

Commit

Permalink
Merge branch 'introduce-load-acquire-and-store-release-bpf-instructions'
Browse files Browse the repository at this point in the history
Peilin Ye says:

====================
Introduce load-acquire and store-release BPF instructions

This patchset adds kernel support for BPF load-acquire and store-release
instructions (for background, please see [1]), including core/verifier
and arm64/x86-64 JIT compiler changes, as well as selftests.  riscv64 is
also planned to be supported.  The corresponding LLVM changes can be
found at:

  llvm/llvm-project#108636

The first 3 patches from v4 have already been applied:

  - [bpf-next,v4,01/10] bpf/verifier: Factor out atomic_ptr_type_ok()
    https://git.kernel.org/bpf/bpf-next/c/b2d9ef71d4c9
  - [bpf-next,v4,02/10] bpf/verifier: Factor out check_atomic_rmw()
    https://git.kernel.org/bpf/bpf-next/c/d430c46c7580
  - [bpf-next,v4,03/10] bpf/verifier: Factor out check_load_mem() and check_store_reg()
    https://git.kernel.org/bpf/bpf-next/c/d38ad248fb7a

Please refer to the LLVM PR and individual kernel patches for details.
Thanks!

v5: https://lore.kernel.org/all/[email protected]/
v5..v6 change:

  o (Alexei) avoid using #ifndef in verifier.c

v4: https://lore.kernel.org/bpf/[email protected]/
v4..v5 notable changes:

  o (kernel test robot) for 32-bit arches: make the verifier reject
                        64-bit load-acquires/store-releases, and fix
                        build error in interpreter changes
    * tested ARCH=arc build following instructions from kernel test
      robot
  o (Alexei) drop Documentation/ patch (v4 10/10) for now

v3: https://lore.kernel.org/bpf/[email protected]/
v3..v4 notable changes:

  o (Alexei) add x86-64 JIT support (including arena)
  o add Acked-by: tags from Xu

v2: https://lore.kernel.org/bpf/[email protected]/
v2..v3 notable changes:

  o (Alexei) change encoding to BPF_LOAD_ACQ=0x100, BPF_STORE_REL=0x110
  o add Acked-by: tags from Ilya and Eduard
  o make new selftests depend on:
    * __clang_major__ >= 18, and
    * ENABLE_ATOMICS_TESTS is defined (currently this means -mcpu=v3 or
      v4), and
    * JIT supports load_acq/store_rel (currenty only arm64)
  o work around llvm-17 CI job failure by conditionally define
    __arena_global variables as 64-bit if __clang_major__ < 18, to make
    sure .addr_space.1 has no holes
  o add Google copyright notice in new files

v1: https://lore.kernel.org/all/[email protected]/
v1..v2 notable changes:

  o (Eduard) for x86 and s390, make
             bpf_jit_supports_insn(..., /*in_arena=*/true) return false
	     for load_acq/store_rel
  o add Eduard's Acked-by: tag
  o (Eduard) extract LDX and non-ATOMIC STX handling into helpers, see
             PATCH v2 3/9
  o allow unpriv programs to store-release pointers to stack
  o (Alexei) make it clearer in the interpreter code (PATCH v2 4/9) that
             only W and DW are supported for atomic RMW
  o test misaligned load_acq/store_rel
  o (Eduard) other selftests/ changes:
    * test load_acq/store_rel with !atomic_ptr_type_ok() pointers:
      - PTR_TO_CTX, for is_ctx_reg()
      - PTR_TO_PACKET, for is_pkt_reg()
      - PTR_TO_FLOW_KEYS, for is_flow_key_reg()
      - PTR_TO_SOCKET, for is_sk_reg()
    * drop atomics/ tests
    * delete unnecessary 'pid' checks from arena_atomics/ tests
    * avoid depending on __BPF_FEATURE_LOAD_ACQ_STORE_REL, use
      __imm_insn() and inline asm macros instead

RFC v1: https://lore.kernel.org/all/[email protected]
RFC v1..v1 notable changes:

  o 1-2/8: minor verifier.c refactoring patches
  o   3/8: core/verifier changes
         * (Eduard) handle load-acquire properly in backtrack_insn()
         * (Eduard) avoid skipping checks (e.g.,
                    bpf_jit_supports_insn()) for load-acquires
         * track the value stored by store-releases, just like how
           non-atomic STX instructions are handled
         * (Eduard) add missing link in commit message
         * (Eduard) always print 'r' for disasm.c changes
  o   4/8: arm64/insn: avoid treating load_acq/store_rel as
           load_ex/store_ex
  o   5/8: arm64/insn: add load_acq/store_rel
         * (Xu) include Should-Be-One (SBO) bits in "mask" and "value",
                to avoid setting fixed bits during runtime (JIT-compile
                time)
  o   6/8: arm64 JIT compiler changes
         * (Xu) use emit_a64_add_i() for "pointer + offset" to optimize
                code emission
  o   7/8: selftests
         * (Eduard) avoid adding new tests to the 'test_verifier' runner
         * add more tests, e.g., checking mark_precise logic
  o   8/8: instruction-set.rst changes

[1] https://lore.kernel.org/all/[email protected]/

Thanks,
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
  • Loading branch information
Alexei Starovoitov committed Mar 4, 2025
2 parents ad55432 + 953df09 commit c6287f1
Show file tree
Hide file tree
Showing 19 changed files with 1,081 additions and 33 deletions.
12 changes: 10 additions & 2 deletions arch/arm64/include/asm/insn.h
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,10 @@ enum aarch64_insn_ldst_type {
AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX,
AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX,
AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX,
AARCH64_INSN_LDST_LOAD_ACQ,
AARCH64_INSN_LDST_LOAD_EX,
AARCH64_INSN_LDST_LOAD_ACQ_EX,
AARCH64_INSN_LDST_STORE_REL,
AARCH64_INSN_LDST_STORE_EX,
AARCH64_INSN_LDST_STORE_REL_EX,
AARCH64_INSN_LDST_SIGNED_LOAD_IMM_OFFSET,
Expand Down Expand Up @@ -351,8 +353,10 @@ __AARCH64_INSN_FUNCS(ldr_imm, 0x3FC00000, 0x39400000)
__AARCH64_INSN_FUNCS(ldr_lit, 0xBF000000, 0x18000000)
__AARCH64_INSN_FUNCS(ldrsw_lit, 0xFF000000, 0x98000000)
__AARCH64_INSN_FUNCS(exclusive, 0x3F800000, 0x08000000)
__AARCH64_INSN_FUNCS(load_ex, 0x3F400000, 0x08400000)
__AARCH64_INSN_FUNCS(store_ex, 0x3F400000, 0x08000000)
__AARCH64_INSN_FUNCS(load_acq, 0x3FDFFC00, 0x08DFFC00)
__AARCH64_INSN_FUNCS(store_rel, 0x3FDFFC00, 0x089FFC00)
__AARCH64_INSN_FUNCS(load_ex, 0x3FC00000, 0x08400000)
__AARCH64_INSN_FUNCS(store_ex, 0x3FC00000, 0x08000000)
__AARCH64_INSN_FUNCS(mops, 0x3B200C00, 0x19000400)
__AARCH64_INSN_FUNCS(stp, 0x7FC00000, 0x29000000)
__AARCH64_INSN_FUNCS(ldp, 0x7FC00000, 0x29400000)
Expand Down Expand Up @@ -602,6 +606,10 @@ u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
int offset,
enum aarch64_insn_variant variant,
enum aarch64_insn_ldst_type type);
u32 aarch64_insn_gen_load_acq_store_rel(enum aarch64_insn_register reg,
enum aarch64_insn_register base,
enum aarch64_insn_size_type size,
enum aarch64_insn_ldst_type type);
u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
enum aarch64_insn_register base,
enum aarch64_insn_register state,
Expand Down
29 changes: 29 additions & 0 deletions arch/arm64/lib/insn.c
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,35 @@ u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
offset >> shift);
}

u32 aarch64_insn_gen_load_acq_store_rel(enum aarch64_insn_register reg,
enum aarch64_insn_register base,
enum aarch64_insn_size_type size,
enum aarch64_insn_ldst_type type)
{
u32 insn;

switch (type) {
case AARCH64_INSN_LDST_LOAD_ACQ:
insn = aarch64_insn_get_load_acq_value();
break;
case AARCH64_INSN_LDST_STORE_REL:
insn = aarch64_insn_get_store_rel_value();
break;
default:
pr_err("%s: unknown load-acquire/store-release encoding %d\n",
__func__, type);
return AARCH64_BREAK_FAULT;
}

insn = aarch64_insn_encode_ldst_size(size, insn);

insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn,
reg);

return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn,
base);
}

u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
enum aarch64_insn_register base,
enum aarch64_insn_register state,
Expand Down
20 changes: 20 additions & 0 deletions arch/arm64/net/bpf_jit.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,26 @@
aarch64_insn_gen_load_store_ex(Rt, Rn, Rs, A64_SIZE(sf), \
AARCH64_INSN_LDST_STORE_REL_EX)

/* Load-acquire & store-release */
#define A64_LDAR(Rt, Rn, size) \
aarch64_insn_gen_load_acq_store_rel(Rt, Rn, AARCH64_INSN_SIZE_##size, \
AARCH64_INSN_LDST_LOAD_ACQ)
#define A64_STLR(Rt, Rn, size) \
aarch64_insn_gen_load_acq_store_rel(Rt, Rn, AARCH64_INSN_SIZE_##size, \
AARCH64_INSN_LDST_STORE_REL)

/* Rt = [Rn] (load acquire) */
#define A64_LDARB(Wt, Xn) A64_LDAR(Wt, Xn, 8)
#define A64_LDARH(Wt, Xn) A64_LDAR(Wt, Xn, 16)
#define A64_LDAR32(Wt, Xn) A64_LDAR(Wt, Xn, 32)
#define A64_LDAR64(Xt, Xn) A64_LDAR(Xt, Xn, 64)

/* [Rn] = Rt (store release) */
#define A64_STLRB(Wt, Xn) A64_STLR(Wt, Xn, 8)
#define A64_STLRH(Wt, Xn) A64_STLR(Wt, Xn, 16)
#define A64_STLR32(Wt, Xn) A64_STLR(Wt, Xn, 32)
#define A64_STLR64(Xt, Xn) A64_STLR(Xt, Xn, 64)

/*
* LSE atomics
*
Expand Down
86 changes: 84 additions & 2 deletions arch/arm64/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,81 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
return 0;
}

static int emit_atomic_ld_st(const struct bpf_insn *insn, struct jit_ctx *ctx)
{
const s32 imm = insn->imm;
const s16 off = insn->off;
const u8 code = insn->code;
const bool arena = BPF_MODE(code) == BPF_PROBE_ATOMIC;
const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
const u8 dst = bpf2a64[insn->dst_reg];
const u8 src = bpf2a64[insn->src_reg];
const u8 tmp = bpf2a64[TMP_REG_1];
u8 reg;

switch (imm) {
case BPF_LOAD_ACQ:
reg = src;
break;
case BPF_STORE_REL:
reg = dst;
break;
default:
pr_err_once("unknown atomic load/store op code %02x\n", imm);
return -EINVAL;
}

if (off) {
emit_a64_add_i(1, tmp, reg, tmp, off, ctx);
reg = tmp;
}
if (arena) {
emit(A64_ADD(1, tmp, reg, arena_vm_base), ctx);
reg = tmp;
}

switch (imm) {
case BPF_LOAD_ACQ:
switch (BPF_SIZE(code)) {
case BPF_B:
emit(A64_LDARB(dst, reg), ctx);
break;
case BPF_H:
emit(A64_LDARH(dst, reg), ctx);
break;
case BPF_W:
emit(A64_LDAR32(dst, reg), ctx);
break;
case BPF_DW:
emit(A64_LDAR64(dst, reg), ctx);
break;
}
break;
case BPF_STORE_REL:
switch (BPF_SIZE(code)) {
case BPF_B:
emit(A64_STLRB(src, reg), ctx);
break;
case BPF_H:
emit(A64_STLRH(src, reg), ctx);
break;
case BPF_W:
emit(A64_STLR32(src, reg), ctx);
break;
case BPF_DW:
emit(A64_STLR64(src, reg), ctx);
break;
}
break;
default:
pr_err_once("unexpected atomic load/store op code %02x\n",
imm);
return -EINVAL;
}

return 0;
}

#ifdef CONFIG_ARM64_LSE_ATOMICS
static int emit_lse_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
{
Expand Down Expand Up @@ -1641,11 +1716,17 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
return ret;
break;

case BPF_STX | BPF_ATOMIC | BPF_B:
case BPF_STX | BPF_ATOMIC | BPF_H:
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_B:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_H:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
if (cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
if (bpf_atomic_is_load_store(insn))
ret = emit_atomic_ld_st(insn, ctx);
else if (cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
ret = emit_lse_atomic(insn, ctx);
else
ret = emit_ll_sc_atomic(insn, ctx);
Expand Down Expand Up @@ -2669,7 +2750,8 @@ bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
switch (insn->code) {
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (!cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
if (!bpf_atomic_is_load_store(insn) &&
!cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
return false;
}
return true;
Expand Down
14 changes: 10 additions & 4 deletions arch/s390/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -2919,10 +2919,16 @@ bool bpf_jit_supports_arena(void)

bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
{
/*
* Currently the verifier uses this function only to check which
* atomic stores to arena are supported, and they all are.
*/
if (!in_arena)
return true;
switch (insn->code) {
case BPF_STX | BPF_ATOMIC | BPF_B:
case BPF_STX | BPF_ATOMIC | BPF_H:
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (bpf_atomic_is_load_store(insn))
return false;
}
return true;
}

Expand Down
95 changes: 82 additions & 13 deletions arch/x86/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -1242,8 +1242,8 @@ static void emit_st_r12(u8 **pprog, u32 size, u32 dst_reg, int off, int imm)
emit_st_index(pprog, size, dst_reg, X86_REG_R12, off, imm);
}

static int emit_atomic(u8 **pprog, u8 atomic_op,
u32 dst_reg, u32 src_reg, s16 off, u8 bpf_size)
static int emit_atomic_rmw(u8 **pprog, u32 atomic_op,
u32 dst_reg, u32 src_reg, s16 off, u8 bpf_size)
{
u8 *prog = *pprog;

Expand Down Expand Up @@ -1283,8 +1283,9 @@ static int emit_atomic(u8 **pprog, u8 atomic_op,
return 0;
}

static int emit_atomic_index(u8 **pprog, u8 atomic_op, u32 size,
u32 dst_reg, u32 src_reg, u32 index_reg, int off)
static int emit_atomic_rmw_index(u8 **pprog, u32 atomic_op, u32 size,
u32 dst_reg, u32 src_reg, u32 index_reg,
int off)
{
u8 *prog = *pprog;

Expand All @@ -1297,7 +1298,7 @@ static int emit_atomic_index(u8 **pprog, u8 atomic_op, u32 size,
EMIT1(add_3mod(0x48, dst_reg, src_reg, index_reg));
break;
default:
pr_err("bpf_jit: 1 and 2 byte atomics are not supported\n");
pr_err("bpf_jit: 1- and 2-byte RMW atomics are not supported\n");
return -EFAULT;
}

Expand Down Expand Up @@ -1331,6 +1332,49 @@ static int emit_atomic_index(u8 **pprog, u8 atomic_op, u32 size,
return 0;
}

static int emit_atomic_ld_st(u8 **pprog, u32 atomic_op, u32 dst_reg,
u32 src_reg, s16 off, u8 bpf_size)
{
switch (atomic_op) {
case BPF_LOAD_ACQ:
/* dst_reg = smp_load_acquire(src_reg + off16) */
emit_ldx(pprog, bpf_size, dst_reg, src_reg, off);
break;
case BPF_STORE_REL:
/* smp_store_release(dst_reg + off16, src_reg) */
emit_stx(pprog, bpf_size, dst_reg, src_reg, off);
break;
default:
pr_err("bpf_jit: unknown atomic load/store opcode %02x\n",
atomic_op);
return -EFAULT;
}

return 0;
}

static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
u32 dst_reg, u32 src_reg, u32 index_reg,
int off)
{
switch (atomic_op) {
case BPF_LOAD_ACQ:
/* dst_reg = smp_load_acquire(src_reg + idx_reg + off16) */
emit_ldx_index(pprog, size, dst_reg, src_reg, index_reg, off);
break;
case BPF_STORE_REL:
/* smp_store_release(dst_reg + idx_reg + off16, src_reg) */
emit_stx_index(pprog, size, dst_reg, src_reg, index_reg, off);
break;
default:
pr_err("bpf_jit: unknown atomic load/store opcode %02x\n",
atomic_op);
return -EFAULT;
}

return 0;
}

#define DONT_CLEAR 1

bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
Expand Down Expand Up @@ -2113,6 +2157,13 @@ st: if (is_imm8(insn->off))
}
break;

case BPF_STX | BPF_ATOMIC | BPF_B:
case BPF_STX | BPF_ATOMIC | BPF_H:
if (!bpf_atomic_is_load_store(insn)) {
pr_err("bpf_jit: 1- and 2-byte RMW atomics are not supported\n");
return -EFAULT;
}
fallthrough;
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm == (BPF_AND | BPF_FETCH) ||
Expand Down Expand Up @@ -2148,10 +2199,10 @@ st: if (is_imm8(insn->off))
EMIT2(simple_alu_opcodes[BPF_OP(insn->imm)],
add_2reg(0xC0, AUX_REG, real_src_reg));
/* Attempt to swap in new value */
err = emit_atomic(&prog, BPF_CMPXCHG,
real_dst_reg, AUX_REG,
insn->off,
BPF_SIZE(insn->code));
err = emit_atomic_rmw(&prog, BPF_CMPXCHG,
real_dst_reg, AUX_REG,
insn->off,
BPF_SIZE(insn->code));
if (WARN_ON(err))
return err;
/*
Expand All @@ -2166,17 +2217,35 @@ st: if (is_imm8(insn->off))
break;
}

err = emit_atomic(&prog, insn->imm, dst_reg, src_reg,
insn->off, BPF_SIZE(insn->code));
if (bpf_atomic_is_load_store(insn))
err = emit_atomic_ld_st(&prog, insn->imm, dst_reg, src_reg,
insn->off, BPF_SIZE(insn->code));
else
err = emit_atomic_rmw(&prog, insn->imm, dst_reg, src_reg,
insn->off, BPF_SIZE(insn->code));
if (err)
return err;
break;

case BPF_STX | BPF_PROBE_ATOMIC | BPF_B:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_H:
if (!bpf_atomic_is_load_store(insn)) {
pr_err("bpf_jit: 1- and 2-byte RMW atomics are not supported\n");
return -EFAULT;
}
fallthrough;
case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
start_of_ldx = prog;
err = emit_atomic_index(&prog, insn->imm, BPF_SIZE(insn->code),
dst_reg, src_reg, X86_REG_R12, insn->off);

if (bpf_atomic_is_load_store(insn))
err = emit_atomic_ld_st_index(&prog, insn->imm,
BPF_SIZE(insn->code), dst_reg,
src_reg, X86_REG_R12, insn->off);
else
err = emit_atomic_rmw_index(&prog, insn->imm, BPF_SIZE(insn->code),
dst_reg, src_reg, X86_REG_R12,
insn->off);
if (err)
return err;
goto populate_extable;
Expand Down
Loading

0 comments on commit c6287f1

Please sign in to comment.