-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64: Improve icmp
codegen
#6112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going through all of the cases and working out the correct sequences -- this is fairly subtle, with the minimal set of instructions we have on this ISA.
I share your concern/discomfort/... with the handling of signed i8
; I suspect maybe we can define a generic "sign-extend from this ty to Imm12" helper and avoid the special case; with that addressed, the rest LGTM to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the improved output! Before we merge this though I have some suggestions and questions.
This reverts commit 4bd46a6.
This was a fairly big rebase since most of these changes conflicted with #5888 which also added the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure all of these rules are correct. I've left notes on what I've been able to look at so far but I haven't thought through everything yet.
I think it's useful for gen_icmp_imm
to add special cases for Equal
and NotEqual
to 0, using seqz
and snez
respectively.
One thought: all comparisons where neither operand is constant can be converted to tmp = x - y
followed by gen_icmp_imm cc tmp 0
, right?
I think there's some way to confine much of the complexity to gen_icmp_imm
instead of having just as many cases also appearing in gen_icmp_inner
. I'm having trouble reasoning through the cases though. In particular you might want to use y - x
and reverse cc
for some cases, but maybe those cases are sufficiently covered by incrementing the constant.
There are more special cases involving constants which might be useful to write rules for. For example, sure, you can rewrite icmp ule x, 0
to icmp ult x, 1
, but both are equivalent to icmp eq x, 0
which has a dedicated instruction. That example isn't super important though if sltiu
and seqz
are both one instruction.
Sorry I don't have better organized thoughts on this. This is definitely making my brain hurt.
;; Helper for emitting the `sltu` ("Set Less Than Unsigned") instruction. | ||
;; rd ← rs1 < rs2 | ||
(decl rv_sltu (Reg Reg) Reg) | ||
(rule (rv_sltu rs1 rs2) | ||
(alu_rrr (AluOPRRR.SltU) rs1 rs2)) | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some trailing whitespace got added here. You can use git diff --check
to check a range of commits for any whitespace issues like that.
(rule 4 (gen_icmp_imm cc x imm ty) | ||
(if-let (IntCC.UnsignedGreaterThan) (intcc_unsigned cc)) | ||
(let ((res Reg (gen_icmp_imm (intcc_reverse cc) x (i64_add imm 1) ty))) | ||
(rv_xori res (imm12_const 1)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding 1 to the constant may overflow. It isn't immediately obvious to me that this rule is always equivalent in that case.
If the constant is the maximum value for the given ty
and the signedness of cc
, and cc
is *GreaterThan
, then for all x
this icmp
must return false.
If we add 1 modulo the width of ty
, then the result wraps around to the minimum for ty
/cc
. Then for all x
, !(x < min)
will be true. Which would be wrong.
However this doesn't add modulo the width of ty
. Instead, it's a signed 64-bit addition. If ty
is narrower than 64-bit, then x
is always less than the result (whether it's compared signed or unsigned), so the inverse of that result is always false, which is correct.
If ty
is I64
and cc
is SignedGreaterThan
, then the signed 64-bit addition overflows. Then checking signed less-than will be false, the final result will be true, so the rule is incorrect.
If ty
is I64
and cc
is UnsignedGreaterThan
, then the signed 64-bit addition does not overflow, but interpreting it as unsigned does overflow. Then checking unsigned less-than is false, etc.
So I think this rule is wrong in case of overflow when ty
is I64
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this case used to be covered when we used Imm12 instead of i64. Since imm12_add
is partial we would have that automatically checked for us. I forgot to check for that when changing it to i64.
;; i.e. `x <= imm` is the same as `x < imm + 1`. | ||
(rule 2 (gen_icmp_imm cc x imm ty) | ||
(if-let (IntCC.UnsignedLessThanOrEqual) (intcc_unsigned cc)) | ||
(gen_icmp_imm (intcc_without_equal cc) x (i64_add imm 1) ty)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I think this rule is wrong if adding 1 overflows and ty
is I64
.
(let ((extend_op ExtendOp (icmp_intcc_extend cc ty)) | ||
(x_ext Reg (extend x extend_op ty $I64)) | ||
(y_ext Reg (extend y extend_op ty $I64))) | ||
(gen_icmp_inner cc x_ext y_ext ty))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the operands were just extended to I64, should gen_icmp_inner
be called with $I64
instead of ty
?
(if-let imm (imm12_sextend_i64 ty n)) | ||
(rv_slti (sext x ty $I64) imm)) | ||
(rule 1 (gen_icmp_imm (IntCC.UnsignedLessThan) x n (fits_in_64 ty)) | ||
(if-let imm (imm12_sextend_i64 ty n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is sign-extending the constant the right thing to do for an unsigned comparison?
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
I'm closing this because I'm not going to look into it right now, and it looks like @alexcrichton is looking into something similar in #7113. Nevertheless thanks @jameysharp for all of the feedback and helping me along with this ❤️! |
This commit is the first in a few steps to reimplement bytecodealliance#6112 and bytecodealliance#7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments.
* riscv64: Constants are always sign-extended Skip generating extra sign-extension instructions in this case because the materialization of a constant will implicitly sign-extend into the entire register. * riscv64: Rename `lower_int_compare` helper Try to reserve `lower_*` as taking a thing and producing a `*Reg`. Rename this helper to `is_nonzero_cmp` to represent how it's testing for nonzero and producing a comparison. * riscv64: Rename some float comparison helpers * `FCmp` -> `FloatCompare` * `emit_fcmp` -> `fcmp_to_float_compare` * `lower_fcmp` -> `lower_float_compare` Make some room for upcoming integer comparison functions. * riscv64: Remove `ordered` helper This is only used by one lowering so inline its definition directly. * riscv64: Remove the `Icmp` pseudo-instruction This commit is the first in a few steps to reimplement #6112 and #7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments. * riscv64: Remove an unused helper * riscv64: Optimize comparisons with 0 Use the `x0` register which is always zero where possible and avoid unnecessary `xor`s against this register. * riscv64: Specialize `a < $imm` * riscv64: Optimize Equal/NotEqual against constants * riscv64: Optimize LessThan with constant argument * Optimize `a <= $imm` * riscv64: Optimize `a >= $imm` * riscv64: Add comment for new helper * Use i64 in icmp optimizations Matches the sign-extension that happens at the hardware layer. * Correct some sign extensions * riscv64: Don't assume immediates are extended * riscv64: Fix encoding for `c.addi4spn` * riscv64: Remove `icmp` lowerings which modify constants These aren't correct and will need updating * Add regression test * riscv64: Fix handling unsigned comparisons with constants --------- Co-authored-by: Afonso Bordado <[email protected]>
👋 Hey,
This PR improves our current lowerings for
icmp
and moves them to ISLE. We currently emit a 4 instruction sequence that moves 1 or 0 to a register based on a branch. This PR changes the lowerings to use the dedicatedicmp
instructions.These are:
slt rd, rs1, rs2
sltu rd, rs1, rs2
slti rd, rs1, imm
sltu rd, rs1, imm
All other
IntCC
's are handled with some variation of the above. Additionally I've also added some optimizations when the comparision is done with an immediate.This has been fuzzing for most of today (~8h) without any issues.