Skip to content

Commit

Permalink
x64: Refactor Amode computation in ISLE
Browse files Browse the repository at this point in the history
This commit replaces the previous computation of `Amode` with a
different set of rules that are intended to achieve the same purpose but
are structured differently. The motivation for this commit is going to
become more relevant in the next commit where `lea` will be used for the
`iadd` instruction, possibly, on x64. When doing so it caused a stack
overflow in the test suite during the compilation phase of a wasm
module, namely as part of the `amode_add` function. This function is
recursively defined in terms of itself and recurses as deep as the
deepest `iadd`-chain in a program. A particular test in our test suite
has a 10k-long chain of `iadd` which ended up causing a stack overflow
in debug mode.

This stack overflow is caused because the `amode_add` helper in ISLE
unconditionally peels all the `iadd` nodes away and looks at all of
them, even if most end up in intermediate registers along the way. Given
that structure I couldn't find a way to easily abort the recursion. The
new `to_amode` helper is structured in a similar fashion but attempts to
instead only recurse far enough to fold items into the final `Amode`
instead of recursing through items which themselves don't end up in the
`Amode`. Put another way previously the `amode_add` helper might emit
`x64_add` instructions, but it no longer does that.

This goal of this commit is to preserve all the original `Amode`
optimizations, however. For some parts, though, it relies more on egraph
optimizations to run since if an `iadd` is 10k deep it doesn't try to
find a constant buried 9k levels inside there to fold into the `Amode`.
The hope, though, is that with egraphs having run already it's shuffled
constants to the right most of the time and already folded any possible
together.
  • Loading branch information
alexcrichton committed Mar 10, 2023
1 parent e64fb6a commit c21359f
Show file tree
Hide file tree
Showing 18 changed files with 165 additions and 294 deletions.
214 changes: 43 additions & 171 deletions cranelift/codegen/src/isa/x64/inst.isle
Original file line number Diff line number Diff line change
Expand Up @@ -941,25 +941,6 @@
;; the given MachLabel.
(RipRelative (target MachLabel))))

;; Some Amode constructor helpers.

(decl amode_with_flags (Amode MemFlags) Amode)
(extern constructor amode_with_flags amode_with_flags)

(decl amode_imm_reg (u32 Gpr) Amode)
(extern constructor amode_imm_reg amode_imm_reg)

(decl amode_imm_reg_flags (u32 Gpr MemFlags) Amode)
(rule (amode_imm_reg_flags offset base flags)
(amode_with_flags (amode_imm_reg offset base) flags))

(decl amode_imm_reg_reg_shift (u32 Gpr Gpr u8) Amode)
(extern constructor amode_imm_reg_reg_shift amode_imm_reg_reg_shift)

(decl amode_imm_reg_reg_shift_flags (u32 Gpr Gpr u8 MemFlags) Amode)
(rule (amode_imm_reg_reg_shift_flags offset base index shift flags)
(amode_with_flags (amode_imm_reg_reg_shift offset base index shift) flags))

;; A helper to both check that the `Imm64` and `Offset32` values sum to less
;; than 32-bits AND return this summed `u32` value. Also, the `Imm64` will be
;; zero-extended from `Type` up to 64 bits. This is useful for `to_amode`.
Expand All @@ -968,168 +949,59 @@

;;;; Amode lowering ;;;;

;; To generate an address for a memory access, we can pattern-match
;; various CLIF sub-trees to x64's complex addressing modes (`Amode`).
;;
;; Information about available addressing modes is available in
;; Intel's Software Developer's Manual, volume 2, section 2.1.5,
;; "Addressing-Mode Encoding of ModR/M and SIB Bytes."
;;
;; The general strategy to build an `Amode` is to traverse over the
;; input expression's addends, recursively deconstructing a tree of
;; `iadd` operators that add up parts of the address, updating the
;; `Amode` in an incremental fashion as we add in each piece.
;;
;; We start with an "immediate + register" form that encapsulates the
;; load/store's built-in `Offset32` and `invalid_reg` as the
;; register. This is given by `amode_initial`. Then we add `Value`s
;; one at a time with `amode_add`. (Why start with `invalid_reg` at
;; all? Because we don't want to special-case the first input and
;; duplicate rules; this lets us use the "add a value" logic even for
;; the first value.)
;;
;; It is always valid to use `amode_add` to add the one single
;; `address` input to the load/store (i.e., the `Value` given to
;; `to_amode`). In the fallback case, this is what we do. Then we get
;; an `Amode.ImmReg` with the `Offset32` and `Value` below and nothing
;; else; this always works and is not *that* bad.
;;
;; But we can often do better. The toplevel rule for `iadd` below will
;; turn an `(amode_add amode (iadd a b))` into two invocations of
;; `amode_add`, for each operand of the `iadd`. This is what allows us
;; to handle sums of many parts.
;;
;; Then we "just" need to work out how we can incorporate a new
;; component into an existing addressing mode:
;;
;; - Case 1: When we have an `ImmReg` and the register is
;; `invalid_reg` (the initial `Amode` above), we can put the new
;; addend into a register and insert it into the `ImmReg`.
;;
;; - Case 2: When we have an `ImmReg` with a valid register already,
;; and we have another register to add, we can transition to an
;; `ImmRegRegShift`.
;;
;; - Case 3: When we're adding an `ishl`, we can refine the above rule
;; and use the built-in multiplier of 1, 2, 4, 8 to implement a
;; left-shift by 0, 1, 2, 3.
;;
;; - Case 4: When we are adding another constant offset, we can fold
;; it into the existing offset, as long as the sum still fits into
;; the signed 32-bit field.
;;
;; - Case 5: And as a general fallback, we can generate a new `add`
;; instruction and add the new addend to an existing component of
;; the `Amode`.
;; Converts a `Value` and a static offset into an `Amode` for x64, attempting
;; to be as fancy as possible with offsets/registers/shifts/etc to make maximal
;; use of the x64 addressing modes.
(decl to_amode (MemFlags Value Offset32) Amode)

;; Initial step in amode processing: create an ImmReg with
;; (invalid_reg) and encapsulating the flags and offset from the
;; load/store.
(decl amode_initial (MemFlags Offset32) Amode)
(rule (amode_initial flags (offset32 off))
(Amode.ImmReg off (invalid_reg) flags))
;; Base case, "just put it in a register"
(rule (to_amode flags base offset)
(Amode.ImmReg offset base flags))

;; One step in amode processing: take an existing amode and add
;; another value to it.
(decl amode_add (Amode Value) Amode)
;; Slightly-more-fancy case, if the address is the addition of two things then
;; delegate to the `to_amode_add` helper.
(rule 1 (to_amode flags (iadd x y) offset)
(to_amode_add flags x y offset))

;; -- Top-level driver: pull apart the addends.
;;
;; Any amode can absorb an `iadd` by absorbing first the LHS of the
;; add, then the RHS.
;;
;; Priority 2 to take this above fallbacks and ensure we traverse the
;; `iadd` tree fully.
(rule 2 (amode_add amode (iadd x y))
(let ((amode1 Amode (amode_add amode x))
(amode2 Amode (amode_add amode1 y)))
amode2))

;; -- Case 1 (adding a register to the initial Amode with invalid_reg).
;;
;; An Amode.ImmReg with invalid_reg (initial state) can absorb a
;; register as the base register.
(rule (amode_add (Amode.ImmReg off (invalid_reg) flags) value)
(Amode.ImmReg off value flags))
;; Same as `to_amode`, except that the base address is computed via the addition
;; of the two `Value` arguments provided.
(decl to_amode_add (MemFlags Value Value Offset32) Amode)

;; -- Case 2 (adding a register to an Amode with a register already).
;;
;; An Amode.ImmReg can absorb another register as the index register.
(rule (amode_add (Amode.ImmReg off (valid_reg base) flags) value)
;; Shift of 0 --> base + 1*value.
(Amode.ImmRegRegShift off base value 0 flags))
;; Base case, "just put things in registers". Note that the shift value of 0
;; here means `x + (y << 0)` which is the same as `x + y`.
(rule (to_amode_add flags x y offset)
(Amode.ImmRegRegShift offset x y 0 flags))

;; -- Case 3 (adding a shifted value to an Amode).
;;
;; An Amode.ImmReg can absorb a shift of another register as the index register.
;;
;; Priority 2 to take these rules above generic case.
(rule 2 (amode_add (Amode.ImmReg off (valid_reg base) flags) (ishl index (iconst (uimm8 shift))))
;; If the one of the arguments being added is itself a constant shift then
;; that can be modeled directly so long as the shift is a modestly small amount.
(rule 1 (to_amode_add flags x (ishl y (iconst (uimm8 shift))) offset)
(if (u32_lteq (u8_as_u32 shift) 3))
(Amode.ImmRegRegShift offset x y shift flags))
(rule 2 (to_amode_add flags (ishl y (iconst (uimm8 shift))) x offset)
(if (u32_lteq (u8_as_u32 shift) 3))
(Amode.ImmRegRegShift off base index shift flags))
(Amode.ImmRegRegShift offset x y shift flags))

;; -- Case 4 (absorbing constant offsets).
;;
;; An Amode can absorb a constant (i64, or extended i32) as long as
;; the sum still fits in the signed-32-bit offset.
;; Constant extraction rules.
;;
;; Priority 3 in order to take this option above the fallback
;; (immediate in register). Two rules, for imm+reg and
;; imm+reg+scale*reg cases.
(rule 3 (amode_add (Amode.ImmReg off base flags)
(iconst (simm32 c)))
(if-let sum (s32_add_fallible off c))
(Amode.ImmReg sum base flags))
(rule 3 (amode_add (Amode.ImmRegRegShift off base index shift flags)
(iconst (simm32 c)))
(if-let sum (s32_add_fallible off c))
(Amode.ImmRegRegShift sum base index shift flags))

;; Likewise for a zero-extended i32 const, as long as the constant
;; wasn't negative. (Why nonnegative? Because adding a
;; non-sign-extended negative to a 64-bit address is not the same as
;; adding in simm32-space.)
(rule 3 (amode_add (Amode.ImmReg off base flags)
(uextend (iconst (simm32 (u32_nonnegative c)))))
(if-let sum (s32_add_fallible off c))
(Amode.ImmReg sum base flags))
(rule 3 (amode_add (Amode.ImmRegRegShift off base index shift flags)
(uextend (iconst (simm32 (u32_nonnegative c)))))
(if-let sum (s32_add_fallible off c))
(Amode.ImmRegRegShift sum base index shift flags))

;; Likewise for a sign-extended i32 const.
(rule 3 (amode_add (Amode.ImmReg off base flags)
(sextend (iconst (simm32 c))))
(if-let sum (s32_add_fallible off c))
(Amode.ImmReg sum base flags))
(rule 3 (amode_add (Amode.ImmRegRegShift off base index shift flags)
(sextend (iconst (simm32 c))))
(if-let sum (s32_add_fallible off c))
(Amode.ImmRegRegShift sum base index shift flags))

;; -- Case 5 (fallback to add a new value to an imm+reg+scale*reg).
;; These rules attempt to find a constant within one of `x` or `y`, or deeper
;; within them if they have their own adds. These only succeed if the constant
;; itself can be represented with 32-bits and can be infallibly added to the
;; offset that we already have.
;;
;; An Amode.ImmRegRegShift can absorb any other value by creating a
;; new add instruction and replacing the base with
;; (base+value).
(rule (amode_add (Amode.ImmRegRegShift off base index shift flags) value)
(let ((sum Gpr (x64_add $I64 base value)))
(Amode.ImmRegRegShift off sum index shift flags)))

;; Finally, define the toplevel `to_amode`.
(rule (to_amode flags base @ (value_type (ty_addr64 _)) offset)
(amode_finalize (amode_add (amode_initial flags offset) base)))

;; If an amode has no registers at all and only offsets (a constant
;; value), we need to "finalize" it by sticking in a zero'd reg in
;; place of the (invalid_reg) produced by (amode_initial).
(decl amode_finalize (Amode) Amode)
(rule 1 (amode_finalize (Amode.ImmReg off (invalid_reg) flags))
(Amode.ImmReg off (imm $I64 0) flags))
(rule 0 (amode_finalize amode)
amode)
;; Note the recursion here where this rule is defined in terms of itself to
;; "peel" layers of constants.
(rule 3 (to_amode_add flags (iadd x (iconst (simm32 c))) y offset)
(if-let sum (s32_add_fallible offset c))
(to_amode_add flags x y sum))
(rule 4 (to_amode_add flags x (iadd y (iconst (simm32 c))) offset)
(if-let sum (s32_add_fallible offset c))
(to_amode_add flags x y sum))
(rule 5 (to_amode_add flags x (iconst (simm32 c)) offset)
(if-let sum (s32_add_fallible offset c))
(to_amode flags x sum))
(rule 6 (to_amode_add flags (iconst (simm32 c)) x offset)
(if-let sum (s32_add_fallible offset c))
(to_amode flags x sum))

;; Offsetting an Amode. Used when we need to do consecutive
;; loads/stores to adjacent addresses.
Expand Down
18 changes: 10 additions & 8 deletions cranelift/codegen/src/isa/x64/lower.isle
Original file line number Diff line number Diff line change
Expand Up @@ -445,10 +445,11 @@
(base_mask_addr Gpr (x64_lea mask_table))
(mask_offset Gpr (x64_shl $I64 amt
(imm8_to_imm8_gpr 4))))
(amode_imm_reg_reg_shift 0
base_mask_addr
mask_offset
0)))
(Amode.ImmRegRegShift 0
base_mask_addr
mask_offset
0
(mem_flags_trusted))))

(rule (ishl_i8x16_mask (RegMemImm.Mem amt))
(ishl_i8x16_mask (RegMemImm.Reg (x64_load $I64 amt (ExtKind.None)))))
Expand Down Expand Up @@ -550,10 +551,11 @@
(mask_offset Gpr (x64_shl $I64
amt
(imm8_to_imm8_gpr 4))))
(amode_imm_reg_reg_shift 0
base_mask_addr
mask_offset
0)))
(Amode.ImmRegRegShift 0
base_mask_addr
mask_offset
0
(mem_flags_trusted))))

(rule (ushr_i8x16_mask (RegMemImm.Mem amt))
(ushr_i8x16_mask (RegMemImm.Reg (x64_load $I64 amt (ExtKind.None)))))
Expand Down
15 changes: 0 additions & 15 deletions cranelift/codegen/src/isa/x64/lower/isle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -315,21 +315,6 @@ impl Context for IsleContext<'_, '_, MInst, X64Backend> {
RegMem::mem(addr.clone())
}

#[inline]
fn amode_imm_reg_reg_shift(&mut self, simm32: u32, base: Gpr, index: Gpr, shift: u8) -> Amode {
Amode::imm_reg_reg_shift(simm32, base, index, shift)
}

#[inline]
fn amode_imm_reg(&mut self, simm32: u32, base: Gpr) -> Amode {
Amode::imm_reg(simm32, base.to_reg())
}

#[inline]
fn amode_with_flags(&mut self, amode: &Amode, flags: MemFlags) -> Amode {
amode.with_flags(flags)
}

#[inline]
fn amode_to_synthetic_amode(&mut self, amode: &Amode) -> SyntheticAmode {
amode.clone().into()
Expand Down
5 changes: 5 additions & 0 deletions cranelift/codegen/src/isle_prelude.rs
Original file line number Diff line number Diff line change
Expand Up @@ -643,6 +643,11 @@ macro_rules! isle_common_prelude_methods {
offset as u32
}

#[inline]
fn u32_to_offset32(&mut self, offset: u32) -> Offset32 {
Offset32::new(offset as i32)
}

fn range(&mut self, start: usize, end: usize) -> Range {
(start, end)
}
Expand Down
5 changes: 5 additions & 0 deletions cranelift/codegen/src/prelude.isle
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,10 @@
(decl pure offset32_to_u32 (Offset32) u32)
(extern constructor offset32_to_u32 offset32_to_u32)

;; Convert a number to an `Offset32`
(decl pure u32_to_offset32 (u32) Offset32)
(extern constructor u32_to_offset32 u32_to_offset32)

;; This is a direct import of `IntCC::unsigned`.
;; Get the corresponding IntCC with the signed component removed.
;; For conditions without a signed component, this is a no-op.
Expand Down Expand Up @@ -481,3 +485,4 @@
;;;; Automatic conversions ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(convert Offset32 u32 offset32_to_u32)
(convert u32 Offset32 u32_to_offset32)
2 changes: 2 additions & 0 deletions cranelift/filetests/filetests/isa/x64/amode-opt.clif
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
test compile precise-output
set use_egraphs=true
set opt_level=speed
target x86_64

function %amode_add(i64, i64) -> i64 {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,17 @@
;; movq %rsp, %rbp
;; unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
;; block0:
;; movl %edi, %r11d
;; movq %r11, %rdi
;; addq %rdi, const(1), %rdi
;; movl %edi, %r8d
;; movq %r8, %r11
;; addq %r11, const(0), %r11
;; jnb ; ud2 heap_oob ;
;; movq 8(%rdx), %rax
;; cmpq %rax, %rdi
;; movq 8(%rdx), %rdi
;; cmpq %rdi, %r11
;; jnbe label3; j label1
;; block1:
;; movq 0(%rdx), %rdi
;; addq %rdi, const(0), %rdi
;; movl %esi, 0(%rdi,%r11,1)
;; addq %r8, 0(%rdx), %r8
;; movl $-65536, %eax
;; movl %esi, 0(%r8,%rax,1)
;; jmp label2
;; block2:
;; movq %rbp, %rsp
Expand All @@ -70,17 +70,17 @@
;; movq %rsp, %rbp
;; unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
;; block0:
;; movl %edi, %r11d
;; movq %r11, %rdi
;; addq %rdi, const(1), %rdi
;; movl %edi, %r8d
;; movq %r8, %r11
;; addq %r11, const(0), %r11
;; jnb ; ud2 heap_oob ;
;; movq 8(%rsi), %rax
;; cmpq %rax, %rdi
;; movq 8(%rsi), %rdi
;; cmpq %rdi, %r11
;; jnbe label3; j label1
;; block1:
;; movq 0(%rsi), %rsi
;; addq %rsi, const(0), %rsi
;; movl 0(%rsi,%r11,1), %eax
;; addq %r8, 0(%rsi), %r8
;; movl $-65536, %edi
;; movl 0(%r8,%rdi,1), %eax
;; jmp label2
;; block2:
;; movq %rbp, %rsp
Expand Down
Loading

0 comments on commit c21359f

Please sign in to comment.