Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Improve ctz/clz/cls codegen #5854

Merged
merged 10 commits into from
Mar 21, 2023

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented Feb 22, 2023

👋 Hey,

This PR started by fixing the lowering rules for ctz. The base rule had (if-let $false (has_zbb)), but the rule for i8/i16 depended on both Zbb and Zbs. Which meant that if we only had Zbb we had no valid rule for lowering i8/i16. The fuzzer found this on #5844 (comment).

I also ended up improving codegen for these instructions in a number of cases. We try to avoid the base case as much as possible since it involves a loop.

We can now avoid requiring Zbs in all cases by falling back to an ori or or instruction instead to set a single bit.

It also moves some constructors around and other misc stuff since we have a bunch of rules all over the files. I'm trying to collect the ones I find at the top of the file.

It's been fuzzing for about an hour, I'll update this if it finds anything.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:meta Everything related to the meta-language. labels Feb 22, 2023
@jameysharp
Copy link
Contributor

I've tried to review this a couple of times and have not succeeded yet. I think it would help me if you can split this into smaller PRs. It looks like you already structured individual commits carefully, but I think the main thing I want in this case is to keep separate the movement of definitions from the actual changes in lowering.

I'm open to alternative suggestions, but would you mind opening a new PR that moves existing definitions without changing them first? I'm hoping that makes the overall diff for this PR easier to review as it should only have the actual logic changes.

If that's too much trouble, I'm happy to discuss other ways to split this, or I can try to review it as-is, or we can see if @elliottt or @cfallin is up for reviewing it. My trouble is that since I'm not familiar with risc-v I have to figure that out at the same time as following the motion of existing code and finding the real changes, and together that's apparently a bit too much for my brain.

@afonso360
Copy link
Contributor Author

No worries, and thanks for looking into it (and all the other PR's)! ❤️

I'll split this into general RISC-V rules cleanups and move the codegen changes into separate PR's.


On that note, I've been thinking about doing a pass on the entire RISCV ruleset to add helpers for generating instructions.

We have a lot of:

(alu_rr_imm12 (AluOPRRI.Andi) val (imm12_const 255))

Which I would really like for it to become something along these lines:

(rv_andi val (imm12 255))

I think this would improve readability somwhat, the rules on RISCV are really verbose and I think this is part of it. What do you guys think about that?

@afonso360 afonso360 marked this pull request as draft February 28, 2023 11:39
@jameysharp
Copy link
Contributor

I've heard other people say they consider that kind of helper really important in the x86 rules, so I imagine it's a good idea for all backends. I haven't authored enough backend rules to have an opinion myself though. My first reaction had originally been that it seemed like a lot of work with an unclear benefit, but I'm coming around to the conclusion that it's worth doing after all.

@afonso360
Copy link
Contributor Author

Rebased this on top of #5984, I've also separated the commits a bit, so it should be easier to review commit-by-commit.

@afonso360 afonso360 marked this pull request as ready for review March 13, 2023 12:52
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was much easier to review, thank you! I have some questions but I think this is in good shape overall.

@@ -315,14 +315,14 @@


;;;; Rules for `ctz` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(rule (lower (has_type ty (ctz x)))
(rule (lower (has_type (fits_in_64 ty) (ctz x)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to note that the uses of fits_in_64 in the "Restrict lowering rules for ctz/clz" commit should be harmless but shouldn't be necessary. Since the rule for $I128 has higher priority, and since ctz/clz/cls accept only integer types, the only types which could reach the general rule are I8-I64.

It's definitely good to be consistent across rules for whether we're explicit about checking fits_in_64 or not. And it's probably good to be explicit since it serves as a kind of documentation. Currently, doing this causes ISLE to generate slightly worse code, but it's probably not measurably worse, and I hope to fix it eventually anyway.

So you don't need to do anything about this comment. I just wanted to mention that this change shouldn't change any lowering behavior.

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved

; VCode:
; block0:
; lui t2,16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate here being 16 when I knew the constant needed to be 2^16 confused me so much. But of course 16 is 2^4, and lui shifts its immediate operand left by 12 bits, and 4+12 is 16, so this is correct.

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst.isle Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
@@ -1031,6 +1031,37 @@



;; TODO: LLVM lowers this as not+clz for i32/i64. We should do the same.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this TODO comment. As far as I understand we can't use not on the input without checking if it was negative. But isn't that exactly what the rule for lower_cls already does?

Copy link
Contributor Author

@afonso360 afonso360 Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think I messed up slightly here. I looked at rust's output for u32::leading_ones, but that has slightly different semantics. LLVM's output for the equivalent operation is still better than our lowering.

example::cls_u64:
        bgez    a0, .LBB4_2
        not     a0, a0
.LBB4_2:
        clz     a0, a0
        addi    a0, a0, -1
        ret

We are doing essentially the same but replacing the branch with select_reg. I'm not sure this can easly be improved in ISLE without being able to emit branches.

Thanks for catching this! I'm going to remove the TODO, we have a bunch of places where replacing select_reg with custom branches would improve codegen, I guess this is just one more of them, so no need for a special mention.

Comment on lines +1049 to +1055
(low Reg (gen_select_reg (IntCC.SignedLessThan) high (zero_reg) (gen_bit_not low) low))
(high Reg (gen_select_reg (IntCC.SignedLessThan) high (zero_reg) (gen_bit_not high) high))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible future work: Maybe make MInst::Select accept an IntCC etc, so it can replace MInst::SelectReg, and so this lowering doesn't have to generate two separate branches for the same condition.

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved
@afonso360 afonso360 force-pushed the riscv64-ctlz-improvments branch from b75d36b to 41773c1 Compare March 17, 2023 15:12
@afonso360 afonso360 marked this pull request as draft March 17, 2023 18:18
@afonso360 afonso360 force-pushed the riscv64-ctlz-improvments branch from 41773c1 to c1d1020 Compare March 21, 2023 14:45
@afonso360 afonso360 marked this pull request as ready for review March 21, 2023 14:45
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it. 👍

@afonso360 afonso360 added this pull request to the merge queue Mar 21, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Mar 21, 2023
@afonso360 afonso360 force-pushed the riscv64-ctlz-improvments branch from c1d1020 to bcd5533 Compare March 21, 2023 19:54
@afonso360 afonso360 enabled auto-merge March 21, 2023 19:55
@afonso360 afonso360 added this pull request to the merge queue Mar 21, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 21, 2023
@afonso360 afonso360 added this pull request to the merge queue Mar 21, 2023
@afonso360 afonso360 merged commit 7a3df7d into bytecodealliance:main Mar 21, 2023
@afonso360 afonso360 deleted the riscv64-ctlz-improvments branch March 21, 2023 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants