-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64: Improve ctz
/clz
/cls
codegen
#5854
riscv64: Improve ctz
/clz
/cls
codegen
#5854
Conversation
I've tried to review this a couple of times and have not succeeded yet. I think it would help me if you can split this into smaller PRs. It looks like you already structured individual commits carefully, but I think the main thing I want in this case is to keep separate the movement of definitions from the actual changes in lowering. I'm open to alternative suggestions, but would you mind opening a new PR that moves existing definitions without changing them first? I'm hoping that makes the overall diff for this PR easier to review as it should only have the actual logic changes. If that's too much trouble, I'm happy to discuss other ways to split this, or I can try to review it as-is, or we can see if @elliottt or @cfallin is up for reviewing it. My trouble is that since I'm not familiar with risc-v I have to figure that out at the same time as following the motion of existing code and finding the real changes, and together that's apparently a bit too much for my brain. |
No worries, and thanks for looking into it (and all the other PR's)! ❤️ I'll split this into general RISC-V rules cleanups and move the codegen changes into separate PR's. On that note, I've been thinking about doing a pass on the entire RISCV ruleset to add helpers for generating instructions. We have a lot of:
Which I would really like for it to become something along these lines:
I think this would improve readability somwhat, the rules on RISCV are really verbose and I think this is part of it. What do you guys think about that? |
I've heard other people say they consider that kind of helper really important in the x86 rules, so I imagine it's a good idea for all backends. I haven't authored enough backend rules to have an opinion myself though. My first reaction had originally been that it seemed like a lot of work with an unclear benefit, but I'm coming around to the conclusion that it's worth doing after all. |
acc8700
to
b75d36b
Compare
Rebased this on top of #5984, I've also separated the commits a bit, so it should be easier to review commit-by-commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was much easier to review, thank you! I have some questions but I think this is in good shape overall.
@@ -315,14 +315,14 @@ | |||
|
|||
|
|||
;;;; Rules for `ctz` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | |||
(rule (lower (has_type ty (ctz x))) | |||
(rule (lower (has_type (fits_in_64 ty) (ctz x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to note that the uses of fits_in_64
in the "Restrict lowering rules for ctz/clz" commit should be harmless but shouldn't be necessary. Since the rule for $I128
has higher priority, and since ctz/clz/cls accept only integer types, the only types which could reach the general rule are I8-I64.
It's definitely good to be consistent across rules for whether we're explicit about checking fits_in_64
or not. And it's probably good to be explicit since it serves as a kind of documentation. Currently, doing this causes ISLE to generate slightly worse code, but it's probably not measurably worse, and I hope to fix it eventually anyway.
So you don't need to do anything about this comment. I just wanted to mention that this change shouldn't change any lowering behavior.
|
||
; VCode: | ||
; block0: | ||
; lui t2,16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The immediate here being 16 when I knew the constant needed to be 2^16 confused me so much. But of course 16 is 2^4, and lui
shifts its immediate operand left by 12 bits, and 4+12 is 16, so this is correct.
@@ -1031,6 +1031,37 @@ | |||
|
|||
|
|||
|
|||
;; TODO: LLVM lowers this as not+clz for i32/i64. We should do the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this TODO comment. As far as I understand we can't use not
on the input without checking if it was negative. But isn't that exactly what the rule for lower_cls
already does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I think I messed up slightly here. I looked at rust's output for u32::leading_ones
, but that has slightly different semantics. LLVM's output for the equivalent operation is still better than our lowering.
example::cls_u64:
bgez a0, .LBB4_2
not a0, a0
.LBB4_2:
clz a0, a0
addi a0, a0, -1
ret
We are doing essentially the same but replacing the branch with select_reg
. I'm not sure this can easly be improved in ISLE without being able to emit branches.
Thanks for catching this! I'm going to remove the TODO, we have a bunch of places where replacing select_reg
with custom branches would improve codegen, I guess this is just one more of them, so no need for a special mention.
(low Reg (gen_select_reg (IntCC.SignedLessThan) high (zero_reg) (gen_bit_not low) low)) | ||
(high Reg (gen_select_reg (IntCC.SignedLessThan) high (zero_reg) (gen_bit_not high) high)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible future work: Maybe make MInst::Select
accept an IntCC
etc, so it can replace MInst::SelectReg
, and so this lowering doesn't have to generate two separate branches for the same condition.
b75d36b
to
41773c1
Compare
41773c1
to
c1d1020
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it. 👍
Instead of checking if we have 64 zeros in the top half. Check if it *is* 0, that way we avoid loading the `64` constant.
Instead of checking if we have 64 zeros in the bottom half. Check if it *is* 0, that way we avoid loading the `64` constant.
c1d1020
to
bcd5533
Compare
👋 Hey,
This PR started by fixing the lowering rules for
ctz
. The base rule had(if-let $false (has_zbb))
, but the rule fori8
/i16
depended on bothZbb
andZbs
. Which meant that if we only hadZbb
we had no valid rule for loweringi8
/i16
. The fuzzer found this on #5844 (comment).I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.
We can now avoid requiring
Zbs
in all cases by falling back to anori
oror
instruction instead to set a single bit.It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.
It's been fuzzing for about an hour, I'll update this if it finds anything.