riscv64: Improve `icmp` codegen #6112

afonso360 · 2023-03-28T16:36:17Z

👋 Hey,

This PR improves our current lowerings for icmp and moves them to ISLE. We currently emit a 4 instruction sequence that moves 1 or 0 to a register based on a branch. This PR changes the lowerings to use the dedicated icmp instructions.

These are:

Instruction	Description
`slt rd, rs1, rs2`	Set Less Than
`sltu rd, rs1, rs2`	Set Less Than Unsigned
`slti rd, rs1, imm`	Set Less Than (Immediate)
`sltu rd, rs1, imm`	Set Less Than Unsigned (Immediate)

All other IntCC's are handled with some variation of the above. Additionally I've also added some optimizations when the comparision is done with an immediate.

This has been fuzzing for most of today (~8h) without any issues.

cranelift/codegen/src/isa/riscv64/inst.isle

cfallin

Thanks for going through all of the cases and working out the correct sequences -- this is fairly subtle, with the minimal set of instructions we have on this ISA.

I share your concern/discomfort/... with the handling of signed i8; I suspect maybe we can define a generic "sign-extend from this ty to Imm12" helper and avoid the special case; with that addressed, the rest LGTM to me.

cranelift/codegen/src/isa/riscv64/inst.isle

jameysharp

I like the improved output! Before we merge this though I have some suggestions and questions.

cranelift/codegen/src/isa/riscv64/inst.isle

cranelift/codegen/src/isa/riscv64/inst/emit.rs

cranelift/filetests/filetests/isa/riscv64/icmp-imm-lhs.clif

cranelift/codegen/src/isa/riscv64/inst/emit.rs

cranelift/codegen/src/isa/riscv64/inst.isle

cranelift/codegen/src/isle_prelude.rs

This reverts commit 4bd46a6.

afonso360 · 2023-04-13T14:02:03Z

This was a fairly big rebase since most of these changes conflicted with #5888 which also added the snez mnemonic. I think I got it right, at least nothing seems too out of place.

jameysharp

I'm not sure all of these rules are correct. I've left notes on what I've been able to look at so far but I haven't thought through everything yet.

I think it's useful for gen_icmp_imm to add special cases for Equal and NotEqual to 0, using seqz and snez respectively.

One thought: all comparisons where neither operand is constant can be converted to tmp = x - y followed by gen_icmp_imm cc tmp 0, right?

I think there's some way to confine much of the complexity to gen_icmp_imm instead of having just as many cases also appearing in gen_icmp_inner. I'm having trouble reasoning through the cases though. In particular you might want to use y - x and reverse cc for some cases, but maybe those cases are sufficiently covered by incrementing the constant.

There are more special cases involving constants which might be useful to write rules for. For example, sure, you can rewrite icmp ule x, 0 to icmp ult x, 1, but both are equivalent to icmp eq x, 0 which has a dedicated instruction. That example isn't super important though if sltiu and seqz are both one instruction.

Sorry I don't have better organized thoughts on this. This is definitely making my brain hurt.

jameysharp · 2023-04-13T16:36:11Z

cranelift/codegen/src/isa/riscv64/inst.isle

 ;; Helper for emitting the `sltu` ("Set Less Than Unsigned") instruction.
 ;; rd ← rs1 < rs2
 (decl rv_sltu (Reg Reg) Reg)
 (rule (rv_sltu rs1 rs2)
  (alu_rrr (AluOPRRR.SltU) rs1 rs2))
-
+  


Looks like some trailing whitespace got added here. You can use git diff --check to check a range of commits for any whitespace issues like that.

jameysharp · 2023-04-13T19:20:34Z

cranelift/codegen/src/isa/riscv64/inst.isle

+(rule 4 (gen_icmp_imm cc x imm ty)
+  (if-let (IntCC.UnsignedGreaterThan) (intcc_unsigned cc))
+  (let ((res Reg (gen_icmp_imm (intcc_reverse cc) x (i64_add imm 1) ty)))
+    (rv_xori res (imm12_const 1))))


Adding 1 to the constant may overflow. It isn't immediately obvious to me that this rule is always equivalent in that case.

If the constant is the maximum value for the given ty and the signedness of cc, and cc is *GreaterThan, then for all x this icmp must return false.

If we add 1 modulo the width of ty, then the result wraps around to the minimum for ty/cc. Then for all x, !(x < min) will be true. Which would be wrong.

However this doesn't add modulo the width of ty. Instead, it's a signed 64-bit addition. If ty is narrower than 64-bit, then x is always less than the result (whether it's compared signed or unsigned), so the inverse of that result is always false, which is correct.

If ty is I64 and cc is SignedGreaterThan, then the signed 64-bit addition overflows. Then checking signed less-than will be false, the final result will be true, so the rule is incorrect.

If ty is I64 and cc is UnsignedGreaterThan, then the signed 64-bit addition does not overflow, but interpreting it as unsigned does overflow. Then checking unsigned less-than is false, etc.

So I think this rule is wrong in case of overflow when ty is I64, isn't it?

Yeah, this case used to be covered when we used Imm12 instead of i64. Since imm12_add is partial we would have that automatically checked for us. I forgot to check for that when changing it to i64.

jameysharp · 2023-04-13T19:21:15Z

cranelift/codegen/src/isa/riscv64/inst.isle

+;; i.e. `x <= imm` is the same as `x < imm + 1`.
+(rule 2 (gen_icmp_imm cc x imm ty)
+  (if-let (IntCC.UnsignedLessThanOrEqual) (intcc_unsigned cc))
+  (gen_icmp_imm (intcc_without_equal cc) x (i64_add imm 1) ty))


Similarly, I think this rule is wrong if adding 1 overflows and ty is I64.

jameysharp · 2023-04-13T19:24:33Z

cranelift/codegen/src/isa/riscv64/inst.isle

+  (let ((extend_op ExtendOp (icmp_intcc_extend cc ty))
+        (x_ext Reg (extend x extend_op ty $I64))
+        (y_ext Reg (extend y extend_op ty $I64)))
+      (gen_icmp_inner cc x_ext y_ext ty)))


Given that the operands were just extended to I64, should gen_icmp_inner be called with $I64 instead of ty?

jameysharp · 2023-04-13T19:29:12Z

cranelift/codegen/src/isa/riscv64/inst.isle

+  (if-let imm (imm12_sextend_i64 ty n))
+  (rv_slti (sext x ty $I64) imm))
+(rule 1 (gen_icmp_imm (IntCC.UnsignedLessThan) x n (fits_in_64 ty)) 
+  (if-let imm (imm12_sextend_i64 ty n))


Why is sign-extending the constant the right thing to do for an unsigned comparison?

github-actions · 2023-05-09T23:44:59Z

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

afonso360 · 2023-09-29T21:52:06Z

I'm closing this because I'm not going to look into it right now, and it looks like @alexcrichton is looking into something similar in #7113.

Nevertheless thanks @jameysharp for all of the feedback and helping me along with this ❤️!

This commit is the first in a few steps to reimplement bytecodealliance#6112 and bytecodealliance#7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments.

* riscv64: Constants are always sign-extended Skip generating extra sign-extension instructions in this case because the materialization of a constant will implicitly sign-extend into the entire register. * riscv64: Rename `lower_int_compare` helper Try to reserve `lower_*` as taking a thing and producing a `*Reg`. Rename this helper to `is_nonzero_cmp` to represent how it's testing for nonzero and producing a comparison. * riscv64: Rename some float comparison helpers * `FCmp` -> `FloatCompare` * `emit_fcmp` -> `fcmp_to_float_compare` * `lower_fcmp` -> `lower_float_compare` Make some room for upcoming integer comparison functions. * riscv64: Remove `ordered` helper This is only used by one lowering so inline its definition directly. * riscv64: Remove the `Icmp` pseudo-instruction This commit is the first in a few steps to reimplement #6112 and #7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments. * riscv64: Remove an unused helper * riscv64: Optimize comparisons with 0 Use the `x0` register which is always zero where possible and avoid unnecessary `xor`s against this register. * riscv64: Specialize `a < $imm` * riscv64: Optimize Equal/NotEqual against constants * riscv64: Optimize LessThan with constant argument * Optimize `a <= $imm` * riscv64: Optimize `a >= $imm` * riscv64: Add comment for new helper * Use i64 in icmp optimizations Matches the sign-extension that happens at the hardware layer. * Correct some sign extensions * riscv64: Don't assume immediates are extended * riscv64: Fix encoding for `c.addi4spn` * riscv64: Remove `icmp` lowerings which modify constants These aren't correct and will need updating * Add regression test * riscv64: Fix handling unsigned comparisons with constants --------- Co-authored-by: Afonso Bordado <[email protected]>

afonso360 commented Mar 28, 2023

View reviewed changes

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved

github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 28, 2023

cfallin reviewed Mar 29, 2023

View reviewed changes

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved

jameysharp reviewed Mar 29, 2023

View reviewed changes

afonso360 requested a review from a team as a code owner April 7, 2023 12:37

afonso360 requested review from fitzgen and removed request for a team April 7, 2023 12:37

afonso360 force-pushed the riscv-icmp branch from c179c79 to 9d080a8 Compare April 7, 2023 12:59

afonso360 added 22 commits April 13, 2023 14:30

riscv64: Add icmp tests

47b2c59

riscv64: Move lower_icmp to inst.isle

a083500

riscv64: Optimize icmp codegen for <=i64 types

fda1f38

riscv64: Improve icmp.i128 codegen

472d593

riscv64: Add icmp_imm fallback case

3211c73

riscv64: Add sltz/sgtz mnemonics

6a0662e

riscv64: Add optimizations for icmp.slt with 0

a99597a

riscv64: Add optimizations for icmp.sgt with 0

90fc5e4

riscv64: Add icmp.slt with imm12 const optimizations

eecd74d

riscv64: Add icmp.ult with imm12 const optimizations

172709b

riscv64: Use better extractors for icmp

dbf51a1

riscv64: Reorder icmp rules

b83324f

riscv64: Improve Icmp Imm codegen

e3bcf75

cranelift: Delete u64_gt

e822ca5

riscv64: Add Signedness struct

0298962

riscv64: Clarify comment

1050eac

Revert "riscv64: Add Signedness struct"

297e44d

This reverts commit 4bd46a6.

riscv64: Use select_reg for i128 rules

2484e7f

riscv64: Handle icmp+imm when the imm is on the LHS

2e729fb

riscv64: Improve handling of i8 icmp_imm's

efd86ed

riscv64: Remove unused code

17ff0e6

riscv64: Fixup tests

9ec9ad7

riscv64: Cleanup icmp rules

0e419c4

afonso360 force-pushed the riscv-icmp branch from 6e8aad6 to 0e419c4 Compare April 13, 2023 13:52

jameysharp reviewed Apr 13, 2023

View reviewed changes

github-actions bot added the isle Related to the ISLE domain-specific language label May 9, 2023

afonso360 closed this Sep 29, 2023

alexcrichton mentioned this pull request Oct 10, 2023

riscv64: Improve codegen for icmp #7203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv64: Improve `icmp` codegen #6112

riscv64: Improve `icmp` codegen #6112

afonso360 commented Mar 28, 2023

cfallin left a comment

jameysharp left a comment

afonso360 commented Apr 13, 2023

jameysharp left a comment

jameysharp Apr 13, 2023

jameysharp Apr 13, 2023

afonso360 Apr 13, 2023

jameysharp Apr 13, 2023

jameysharp Apr 13, 2023

jameysharp Apr 13, 2023

github-actions bot commented May 9, 2023

afonso360 commented Sep 29, 2023

riscv64: Improve icmp codegen #6112

riscv64: Improve icmp codegen #6112

Conversation

afonso360 commented Mar 28, 2023

cfallin left a comment

Choose a reason for hiding this comment

jameysharp left a comment

Choose a reason for hiding this comment

afonso360 commented Apr 13, 2023

jameysharp left a comment

Choose a reason for hiding this comment

jameysharp Apr 13, 2023

Choose a reason for hiding this comment

jameysharp Apr 13, 2023

Choose a reason for hiding this comment

afonso360 Apr 13, 2023

Choose a reason for hiding this comment

jameysharp Apr 13, 2023

Choose a reason for hiding this comment

jameysharp Apr 13, 2023

Choose a reason for hiding this comment

jameysharp Apr 13, 2023

Choose a reason for hiding this comment

github-actions bot commented May 9, 2023

Subscribe to Label Action

afonso360 commented Sep 29, 2023

riscv64: Improve `icmp` codegen #6112

riscv64: Improve `icmp` codegen #6112