Skip to content

Commit

Permalink
[BPF] Add load-acquire and store-release instructions under -mcpu=v4
Browse files Browse the repository at this point in the history
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.  Define 2 new flags:

  BPF_LOAD_ACQ    0x100
  BPF_STORE_REL   0x110

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).

Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or
16-bit load-acquire zero-extends the value before writing it to a 32-bit
register, just like ARM64 instruction LDAPRH and friends.

As an example (assuming little-endian):

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 00 01 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000100): BPF_LOAD_ACQ

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  cb 21 00 00 10 01 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000110): BPF_STORE_REL

Inline assembly is also supported.

Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let
developers detect this new feature.  It can also be disabled using a new
llc option, -disable-load-acq-store-rel.

Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain"
store (BPF_MEM | BPF_STX) instruction:

  void foo(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELAXED);
  }

  6b 21 00 00 00 00 00 00  *(u16 *)(r1 + 0x0) = w2
  95 00 00 00 00 00 00 00  exit

Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate
a zero-extending, "plain" load (BPF_MEM | BPF_LDX) instruction:

  int foo(char *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_RELAXED);
  }

  71 11 00 00 00 00 00 00  w1 = *(u8 *)(r1 + 0x0)
  bc 10 08 00 00 00 00 00  w0 = (s8)w1
  95 00 00 00 00 00 00 00  exit

Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE.  Using
__ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and
will cause an error:

  $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null
  bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is not supported
    1 | int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
      |     ^
  ...

Finally, rename those isST*() and isLD*() helper functions in
BPFMISimplifyPatchable.cpp based on what the instructions actually do,
rather than their instruction class.

[1] https://lore.kernel.org/all/[email protected]/
  • Loading branch information
peilin-ye committed Feb 20, 2025
1 parent 86f0e6d commit 37a4553
Show file tree
Hide file tree
Showing 12 changed files with 355 additions and 17 deletions.
1 change: 1 addition & 0 deletions clang/lib/Basic/Targets/BPF.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ void BPFTargetInfo::getTargetDefines(const LangOptions &Opts,
Builder.defineMacro("__BPF_FEATURE_SDIV_SMOD");
Builder.defineMacro("__BPF_FEATURE_GOTOL");
Builder.defineMacro("__BPF_FEATURE_ST");
Builder.defineMacro("__BPF_FEATURE_LOAD_ACQ_STORE_REL");
}
}

Expand Down
5 changes: 5 additions & 0 deletions clang/test/Preprocessor/bpf-predefined-macros.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ int t;
#ifdef __BPF_FEATURE_MAY_GOTO
int u;
#endif
#ifdef __BPF_FEATURE_LOAD_ACQ_STORE_REL
int v;
#endif

// CHECK: int b;
// CHECK: int c;
Expand Down Expand Up @@ -106,6 +109,8 @@ int u;
// CPU_V3: int u;
// CPU_V4: int u;

// CPU_V4: int v;

// CPU_GENERIC: int g;

// CPU_PROBE: int f;
2 changes: 2 additions & 0 deletions llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ struct BPFOperand : public MCParsedAsmOperand {
.Case("exit", true)
.Case("lock", true)
.Case("ld_pseudo", true)
.Case("store_release", true)
.Default(false);
}

Expand Down Expand Up @@ -271,6 +272,7 @@ struct BPFOperand : public MCParsedAsmOperand {
.Case("cmpxchg_64", true)
.Case("cmpxchg32_32", true)
.Case("addr_space_cast", true)
.Case("load_acquire", true)
.Default(false);
}
};
Expand Down
25 changes: 25 additions & 0 deletions llvm/lib/Target/BPF/BPFISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,11 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom);
}

for (auto VT : {MVT::i32, MVT::i64}) {
setOperationAction(ISD::ATOMIC_LOAD, VT, Custom);
setOperationAction(ISD::ATOMIC_STORE, VT, Custom);
}

for (auto VT : { MVT::i32, MVT::i64 }) {
if (VT == MVT::i32 && !STI.getHasAlu32())
continue;
Expand Down Expand Up @@ -290,6 +295,9 @@ void BPFTargetLowering::ReplaceNodeResults(
else
Msg = "unsupported atomic operation, please use 64 bit version";
break;
case ISD::ATOMIC_LOAD:
case ISD::ATOMIC_STORE:
return;
}

SDLoc DL(N);
Expand All @@ -315,6 +323,9 @@ SDValue BPFTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
return LowerSDIVSREM(Op, DAG);
case ISD::DYNAMIC_STACKALLOC:
return LowerDYNAMIC_STACKALLOC(Op, DAG);
case ISD::ATOMIC_LOAD:
case ISD::ATOMIC_STORE:
return LowerATOMIC_LOAD_STORE(Op, DAG);
}
}

Expand Down Expand Up @@ -701,6 +712,20 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
return DAG.getNode(BPFISD::SELECT_CC, DL, Op.getValueType(), Ops);
}

SDValue BPFTargetLowering::LowerATOMIC_LOAD_STORE(SDValue Op,
SelectionDAG &DAG) const {
SDNode *N = Op.getNode();
SDLoc DL(N);

if (cast<AtomicSDNode>(N)->getMergedOrdering() ==
AtomicOrdering::SequentiallyConsistent)
fail(DL, DAG,
"sequentially consistent (seq_cst) "
"atomic load/store is not supported");

return Op;
}

const char *BPFTargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((BPFISD::NodeType)Opcode) {
case BPFISD::FIRST_NUMBER:
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Target/BPF/BPFISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ class BPFTargetLowering : public TargetLowering {
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerATOMIC_LOAD_STORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;

Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Target/BPF/BPFInstrFormats.td
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,13 @@ def BPF_END : BPFArithOp<0xd>;
def BPF_XCHG : BPFArithOp<0xe>;
def BPF_CMPXCHG : BPFArithOp<0xf>;

class BPFAtomicOp<bits<5> val> {
bits<5> Value = val;
}

def BPF_LOAD_ACQ : BPFAtomicOp<0x10>;
def BPF_STORE_REL : BPFAtomicOp<0x11>;

class BPFEndDir<bits<1> val> {
bits<1> Value = val;
}
Expand Down
125 changes: 125 additions & 0 deletions llvm/lib/Target/BPF/BPFInstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ def BPFHasSdivSmod : Predicate<"Subtarget->hasSdivSmod()">;
def BPFNoMovsx : Predicate<"!Subtarget->hasMovsx()">;
def BPFNoBswap : Predicate<"!Subtarget->hasBswap()">;
def BPFHasStoreImm : Predicate<"Subtarget->hasStoreImm()">;
def BPFHasLoadAcqStoreRel : Predicate<"Subtarget->hasLoadAcqStoreRel()">;

class ImmediateAsmOperand<string name> : AsmOperandClass {
let Name = name;
Expand Down Expand Up @@ -566,6 +567,47 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in {
(STB_imm (imm_to_i64 imm:$src), ADDRri:$dst)>;
}

class STORE_RELEASE<BPFWidthModifer SizeOp, string OpcodeStr, RegisterClass RegTp>
: TYPE_LD_ST<BPF_ATOMIC.Value, SizeOp.Value,
(outs),
(ins RegTp:$src, MEMri:$addr),
"store_release(("#OpcodeStr#" *)($addr), $src)",
[]> {
bits<4> src;
bits<20> addr;

let Inst{51-48} = addr{19-16}; // base reg
let Inst{55-52} = src;
let Inst{47-32} = addr{15-0}; // offset
let Inst{8-4} = BPF_STORE_REL.Value;
let BPFClass = BPF_STX;
}

class STORE_RELEASEi64<BPFWidthModifer Opc, string OpcodeStr>
: STORE_RELEASE<Opc, OpcodeStr, GPR>;

class relaxed_store<PatFrag base>
: PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
let IsAtomic = 1;
let IsAtomicOrderingReleaseOrStronger = 0;
}

class releasing_store<PatFrag base>
: PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
let IsAtomic = 1;
let IsAtomicOrderingRelease = 1;
}

let Predicates = [BPFHasLoadAcqStoreRel] in {
def STDREL : STORE_RELEASEi64<BPF_DW, "u64">;

foreach P = [[relaxed_store<atomic_store_64>, STD],
[releasing_store<atomic_store_64>, STDREL],
] in {
def : Pat<(P[0] GPR:$val, ADDRri:$addr), (P[1] GPR:$val, ADDRri:$addr)>;
}
}

// LOAD instructions
class LOAD<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, list<dag> Pattern>
: TYPE_LD_ST<ModOp.Value, SizeOp.Value,
Expand Down Expand Up @@ -622,6 +664,47 @@ let Predicates = [BPFHasLdsx] in {

def LDD : LOADi64<BPF_DW, BPF_MEM, "u64", load>;

class LOAD_ACQUIRE<BPFWidthModifer SizeOp, string OpcodeStr, RegisterClass RegTp>
: TYPE_LD_ST<BPF_ATOMIC.Value, SizeOp.Value,
(outs RegTp:$dst),
(ins MEMri:$addr),
"$dst = load_acquire(("#OpcodeStr#" *)($addr))",
[]> {
bits<4> dst;
bits<20> addr;

let Inst{51-48} = dst;
let Inst{55-52} = addr{19-16}; // base reg
let Inst{47-32} = addr{15-0}; // offset
let Inst{8-4} = BPF_LOAD_ACQ.Value;
let BPFClass = BPF_STX;
}

class LOAD_ACQUIREi64<BPFWidthModifer SizeOp, string OpcodeStr>
: LOAD_ACQUIRE<SizeOp, OpcodeStr, GPR>;

class relaxed_load<PatFrags base>
: PatFrag<(ops node:$ptr), (base node:$ptr)> {
let IsAtomic = 1;
let IsAtomicOrderingAcquireOrStronger = 0;
}

class acquiring_load<PatFrags base>
: PatFrag<(ops node:$ptr), (base node:$ptr)> {
let IsAtomic = 1;
let IsAtomicOrderingAcquire = 1;
}

let Predicates = [BPFHasLoadAcqStoreRel] in {
def LDDACQ : LOAD_ACQUIREi64<BPF_DW, "u64">;

foreach P = [[relaxed_load<atomic_load_64>, LDD],
[acquiring_load<atomic_load_64>, LDDACQ],
] in {
def : Pat<(P[0] ADDRri:$addr), (P[1] ADDRri:$addr)>;
}
}

class BRANCH<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
: TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
(outs),
Expand Down Expand Up @@ -1181,10 +1264,19 @@ class STORE32<BPFWidthModifer SizeOp, string OpcodeStr, list<dag> Pattern>
class STOREi32<BPFWidthModifer Opc, string OpcodeStr, PatFrag OpNode>
: STORE32<Opc, OpcodeStr, [(OpNode GPR32:$src, ADDRri:$addr)]>;

class STORE_RELEASEi32<BPFWidthModifer Opc, string OpcodeStr>
: STORE_RELEASE<Opc, OpcodeStr, GPR32>;

let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
def STW32 : STOREi32<BPF_W, "u32", store>;
def STH32 : STOREi32<BPF_H, "u16", truncstorei16>;
def STB32 : STOREi32<BPF_B, "u8", truncstorei8>;

let Predicates = [BPFHasLoadAcqStoreRel] in {
def STWREL32 : STORE_RELEASEi32<BPF_W, "u32">;
def STHREL32 : STORE_RELEASEi32<BPF_H, "u16">;
def STBREL32 : STORE_RELEASEi32<BPF_B, "u8">;
}
}

class LOAD32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, list<dag> Pattern>
Expand All @@ -1205,10 +1297,19 @@ class LOAD32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, lis
class LOADi32<BPFWidthModifer SizeOp, BPFModeModifer ModOp, string OpcodeStr, PatFrag OpNode>
: LOAD32<SizeOp, ModOp, OpcodeStr, [(set i32:$dst, (OpNode ADDRri:$addr))]>;

class LOAD_ACQUIREi32<BPFWidthModifer SizeOp, string OpcodeStr>
: LOAD_ACQUIRE<SizeOp, OpcodeStr, GPR32>;

let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
def LDW32 : LOADi32<BPF_W, BPF_MEM, "u32", load>;
def LDH32 : LOADi32<BPF_H, BPF_MEM, "u16", zextloadi16>;
def LDB32 : LOADi32<BPF_B, BPF_MEM, "u8", zextloadi8>;

let Predicates = [BPFHasLoadAcqStoreRel] in {
def LDWACQ32 : LOAD_ACQUIREi32<BPF_W, "u32">;
def LDHACQ32 : LOAD_ACQUIREi32<BPF_H, "u16">;
def LDBACQ32 : LOAD_ACQUIREi32<BPF_B, "u8">;
}
}

let Predicates = [BPFHasALU32] in {
Expand Down Expand Up @@ -1238,6 +1339,30 @@ let Predicates = [BPFHasALU32] in {
(SUBREG_TO_REG (i64 0), (LDH32 ADDRri:$src), sub_32)>;
def : Pat<(i64 (extloadi32 ADDRri:$src)),
(SUBREG_TO_REG (i64 0), (LDW32 ADDRri:$src), sub_32)>;

let Predicates = [BPFHasLoadAcqStoreRel] in {
foreach P = [[relaxed_load<atomic_load_32>, LDW32],
[relaxed_load<atomic_load_az_16>, LDH32],
[relaxed_load<atomic_load_az_8>, LDB32],
[acquiring_load<atomic_load_32>, LDWACQ32],
[acquiring_load<atomic_load_az_16>, LDHACQ32],
[acquiring_load<atomic_load_az_8>, LDBACQ32],
] in {
def : Pat<(P[0] ADDRri:$addr), (P[1] ADDRri:$addr)>;
}
}

let Predicates = [BPFHasLoadAcqStoreRel] in {
foreach P = [[relaxed_store<atomic_store_32>, STW32],
[relaxed_store<atomic_store_16>, STH32],
[relaxed_store<atomic_store_8>, STB32],
[releasing_store<atomic_store_32>, STWREL32],
[releasing_store<atomic_store_16>, STHREL32],
[releasing_store<atomic_store_8>, STBREL32],
] in {
def : Pat<(P[0] GPR32:$val, ADDRri:$addr), (P[1] GPR32:$val, ADDRri:$addr)>;
}
}
}

let usesCustomInserter = 1, isCodeGenOnly = 1 in {
Expand Down
34 changes: 19 additions & 15 deletions llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,35 +94,39 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) {
LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
}

static bool isST(unsigned Opcode) {
static bool isStoreImm(unsigned Opcode) {
return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
}

static bool isSTX32(unsigned Opcode) {
return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
static bool isStore32(unsigned Opcode) {
return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32 ||
Opcode == BPF::STBREL32 || Opcode == BPF::STHREL32 ||
Opcode == BPF::STWREL32;
}

static bool isSTX64(unsigned Opcode) {
static bool isStore64(unsigned Opcode) {
return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
Opcode == BPF::STD;
Opcode == BPF::STD || Opcode == BPF::STDREL;
}

static bool isLDX32(unsigned Opcode) {
return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
static bool isLoad32(unsigned Opcode) {
return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32 ||
Opcode == BPF::LDBACQ32 || Opcode == BPF::LDHACQ32 ||
Opcode == BPF::LDWACQ32;
}

static bool isLDX64(unsigned Opcode) {
static bool isLoad64(unsigned Opcode) {
return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
Opcode == BPF::LDD;
Opcode == BPF::LDD || Opcode == BPF::LDDACQ;
}

static bool isLDSX(unsigned Opcode) {
static bool isLoadSext(unsigned Opcode) {
return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
}

bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
}

void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
Expand All @@ -143,11 +147,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
MachineInstr *DefInst = MO.getParent();
unsigned Opcode = DefInst->getOpcode();
unsigned COREOp;
if (isLDX64(Opcode) || isLDSX(Opcode))
if (isLoad64(Opcode) || isLoadSext(Opcode))
COREOp = BPF::CORE_LD64;
else if (isLDX32(Opcode))
else if (isLoad32(Opcode))
COREOp = BPF::CORE_LD32;
else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
COREOp = BPF::CORE_ST;
else
continue;
Expand All @@ -160,7 +164,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
// Reject the form:
// %1 = ADD_rr %2, %3
// *(type *)(%2 + 0) = %1
if (isSTX64(Opcode) || isSTX32(Opcode)) {
if (isStore64(Opcode) || isStore32(Opcode)) {
const MachineOperand &Opnd = DefInst->getOperand(0);
if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
continue;
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Target/BPF/BPFSubtarget.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ static cl::opt<bool> Disable_gotol("disable-gotol", cl::Hidden, cl::init(false),
static cl::opt<bool>
Disable_StoreImm("disable-storeimm", cl::Hidden, cl::init(false),
cl::desc("Disable BPF_ST (immediate store) insn"));
static cl::opt<bool> Disable_load_acq_store_rel(
"disable-load-acq-store-rel", cl::Hidden, cl::init(false),
cl::desc("Disable load-acquire and store-release insns"));

void BPFSubtarget::anchor() {}

Expand All @@ -62,6 +65,7 @@ void BPFSubtarget::initializeEnvironment() {
HasSdivSmod = false;
HasGotol = false;
HasStoreImm = false;
HasLoadAcqStoreRel = false;
}

void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
Expand Down Expand Up @@ -91,6 +95,7 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
HasSdivSmod = !Disable_sdiv_smod;
HasGotol = !Disable_gotol;
HasStoreImm = !Disable_StoreImm;
HasLoadAcqStoreRel = !Disable_load_acq_store_rel;
return;
}
}
Expand Down
Loading

0 comments on commit 37a4553

Please sign in to comment.