Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Improve icmp codegen #6112

Closed
wants to merge 23 commits into from

Conversation

afonso360
Copy link
Contributor

👋 Hey,

This PR improves our current lowerings for icmp and moves them to ISLE. We currently emit a 4 instruction sequence that moves 1 or 0 to a register based on a branch. This PR changes the lowerings to use the dedicated icmp instructions.

These are:

Instruction Description
slt rd, rs1, rs2 Set Less Than
sltu rd, rs1, rs2 Set Less Than Unsigned
slti rd, rs1, imm Set Less Than (Immediate)
sltu rd, rs1, imm Set Less Than Unsigned (Immediate)

All other IntCC's are handled with some variation of the above. Additionally I've also added some optimizations when the comparision is done with an immediate.

This has been fuzzing for most of today (~8h) without any issues.

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 28, 2023
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going through all of the cases and working out the correct sequences -- this is fairly subtle, with the minimal set of instructions we have on this ISA.

I share your concern/discomfort/... with the handling of signed i8; I suspect maybe we can define a generic "sign-extend from this ty to Imm12" helper and avoid the special case; with that addressed, the rest LGTM to me.

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the improved output! Before we merge this though I have some suggestions and questions.

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst/emit.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst/emit.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
cranelift/codegen/src/isle_prelude.rs Outdated Show resolved Hide resolved
@afonso360 afonso360 requested a review from a team as a code owner April 7, 2023 12:37
@afonso360 afonso360 requested review from fitzgen and removed request for a team April 7, 2023 12:37
@afonso360
Copy link
Contributor Author

This was a fairly big rebase since most of these changes conflicted with #5888 which also added the snez mnemonic. I think I got it right, at least nothing seems too out of place.

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure all of these rules are correct. I've left notes on what I've been able to look at so far but I haven't thought through everything yet.

I think it's useful for gen_icmp_imm to add special cases for Equal and NotEqual to 0, using seqz and snez respectively.

One thought: all comparisons where neither operand is constant can be converted to tmp = x - y followed by gen_icmp_imm cc tmp 0, right?

I think there's some way to confine much of the complexity to gen_icmp_imm instead of having just as many cases also appearing in gen_icmp_inner. I'm having trouble reasoning through the cases though. In particular you might want to use y - x and reverse cc for some cases, but maybe those cases are sufficiently covered by incrementing the constant.

There are more special cases involving constants which might be useful to write rules for. For example, sure, you can rewrite icmp ule x, 0 to icmp ult x, 1, but both are equivalent to icmp eq x, 0 which has a dedicated instruction. That example isn't super important though if sltiu and seqz are both one instruction.

Sorry I don't have better organized thoughts on this. This is definitely making my brain hurt.

;; Helper for emitting the `sltu` ("Set Less Than Unsigned") instruction.
;; rd ← rs1 < rs2
(decl rv_sltu (Reg Reg) Reg)
(rule (rv_sltu rs1 rs2)
(alu_rrr (AluOPRRR.SltU) rs1 rs2))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some trailing whitespace got added here. You can use git diff --check to check a range of commits for any whitespace issues like that.

(rule 4 (gen_icmp_imm cc x imm ty)
(if-let (IntCC.UnsignedGreaterThan) (intcc_unsigned cc))
(let ((res Reg (gen_icmp_imm (intcc_reverse cc) x (i64_add imm 1) ty)))
(rv_xori res (imm12_const 1))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding 1 to the constant may overflow. It isn't immediately obvious to me that this rule is always equivalent in that case.

If the constant is the maximum value for the given ty and the signedness of cc, and cc is *GreaterThan, then for all x this icmp must return false.

If we add 1 modulo the width of ty, then the result wraps around to the minimum for ty/cc. Then for all x, !(x < min) will be true. Which would be wrong.

However this doesn't add modulo the width of ty. Instead, it's a signed 64-bit addition. If ty is narrower than 64-bit, then x is always less than the result (whether it's compared signed or unsigned), so the inverse of that result is always false, which is correct.

If ty is I64 and cc is SignedGreaterThan, then the signed 64-bit addition overflows. Then checking signed less-than will be false, the final result will be true, so the rule is incorrect.

If ty is I64 and cc is UnsignedGreaterThan, then the signed 64-bit addition does not overflow, but interpreting it as unsigned does overflow. Then checking unsigned less-than is false, etc.

So I think this rule is wrong in case of overflow when ty is I64, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this case used to be covered when we used Imm12 instead of i64. Since imm12_add is partial we would have that automatically checked for us. I forgot to check for that when changing it to i64.

;; i.e. `x <= imm` is the same as `x < imm + 1`.
(rule 2 (gen_icmp_imm cc x imm ty)
(if-let (IntCC.UnsignedLessThanOrEqual) (intcc_unsigned cc))
(gen_icmp_imm (intcc_without_equal cc) x (i64_add imm 1) ty))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I think this rule is wrong if adding 1 overflows and ty is I64.

(let ((extend_op ExtendOp (icmp_intcc_extend cc ty))
(x_ext Reg (extend x extend_op ty $I64))
(y_ext Reg (extend y extend_op ty $I64)))
(gen_icmp_inner cc x_ext y_ext ty)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the operands were just extended to I64, should gen_icmp_inner be called with $I64 instead of ty?

(if-let imm (imm12_sextend_i64 ty n))
(rv_slti (sext x ty $I64) imm))
(rule 1 (gen_icmp_imm (IntCC.UnsignedLessThan) x n (fits_in_64 ty))
(if-let imm (imm12_sextend_i64 ty n))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is sign-extending the constant the right thing to do for an unsigned comparison?

@github-actions github-actions bot added the isle Related to the ISLE domain-specific language label May 9, 2023
@github-actions
Copy link

github-actions bot commented May 9, 2023

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@afonso360
Copy link
Contributor Author

I'm closing this because I'm not going to look into it right now, and it looks like @alexcrichton is looking into something similar in #7113.

Nevertheless thanks @jameysharp for all of the feedback and helping me along with this ❤️!

@afonso360 afonso360 closed this Sep 29, 2023
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 10, 2023
This commit is the first in a few steps to reimplement bytecodealliance#6112 and bytecodealliance#7113.
The `Icmp` pseudo-instruction is removed here and necessary
functionality is all pushed into ISLE lowerings. This enables deleting
the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary,
meaning that all conditional branches should ideally be generated in
lowering rather than pseudo-instructions to benefit from various
optimizations.

Currently the lowering is the bare-bones minimum to get things working.
This involved creating a helper to lower an `IntegerCompare` into an
`XReg` via negation/swapping args/etc. In generated code this removes
branches and makes `icmp` a straight-line instruction for non-128-bit
arguments.
github-merge-queue bot pushed a commit that referenced this pull request Oct 16, 2023
* riscv64: Constants are always sign-extended

Skip generating extra sign-extension instructions in this case because
the materialization of a constant will implicitly sign-extend into the
entire register.

* riscv64: Rename `lower_int_compare` helper

Try to reserve `lower_*` as taking a thing and producing a `*Reg`.
Rename this helper to `is_nonzero_cmp` to represent how it's testing for
nonzero and producing a comparison.

* riscv64: Rename some float comparison helpers

* `FCmp` -> `FloatCompare`
* `emit_fcmp` -> `fcmp_to_float_compare`
* `lower_fcmp` -> `lower_float_compare`

Make some room for upcoming integer comparison functions.

* riscv64: Remove `ordered` helper

This is only used by one lowering so inline its definition directly.

* riscv64: Remove the `Icmp` pseudo-instruction

This commit is the first in a few steps to reimplement #6112 and #7113.
The `Icmp` pseudo-instruction is removed here and necessary
functionality is all pushed into ISLE lowerings. This enables deleting
the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary,
meaning that all conditional branches should ideally be generated in
lowering rather than pseudo-instructions to benefit from various
optimizations.

Currently the lowering is the bare-bones minimum to get things working.
This involved creating a helper to lower an `IntegerCompare` into an
`XReg` via negation/swapping args/etc. In generated code this removes
branches and makes `icmp` a straight-line instruction for non-128-bit
arguments.

* riscv64: Remove an unused helper

* riscv64: Optimize comparisons with 0

Use the `x0` register which is always zero where possible and avoid
unnecessary `xor`s against this register.

* riscv64: Specialize `a < $imm`

* riscv64: Optimize Equal/NotEqual against constants

* riscv64: Optimize LessThan with constant argument

* Optimize `a <= $imm`

* riscv64: Optimize `a >= $imm`

* riscv64: Add comment for new helper

* Use i64 in icmp optimizations

Matches the sign-extension that happens at the hardware layer.

* Correct some sign extensions

* riscv64: Don't assume immediates are extended

* riscv64: Fix encoding for `c.addi4spn`

* riscv64: Remove `icmp` lowerings which modify constants

These aren't correct and will need updating

* Add regression test

* riscv64: Fix handling unsigned comparisons with constants

---------

Co-authored-by: Afonso Bordado <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants