Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track zero and sign extension throughout the code generator. #1584

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

ltratt
Copy link
Contributor

@ltratt ltratt commented Feb 5, 2025

Previously we manually, and eagerly, sign and zero extended registers as we needed them. This was easy to get wrong, because the absence of sign/zero extension did not mean that it wasn't needed.

This commit forces register allocation to always consider sign / zero extension. What before was, for example:

let [lhs_reg] = self.ra.assign_gp_regs(
    &mut self.asm,
    iidx,
    [RegConstraint::InputOutput(lhs)],
);

now becomes:

let [lhs_reg] = self.ra.assign_gp_regs(
    &mut self.asm,
    iidx,
    [GPConstraint::InputOutput {
        op: lhs,
        in_ext: RegExtension::Undefined,
        out_ext: RegExtension::ZeroExtended,
        force_reg: None,
    }],
);

in_ext specifies "what extension must be the input value satisfy?" and out_ext specifies "what extension will the output value have?"

The major parts of this commit are thus:

  1. Audit the x64 code generator to specify sign / zero extension everywhere.

This turned up various potentially dodgy things, including some parts where we had more complexity than I wanted to deal with. I have not hesitated to simplify (which mostly mean deoptimise!) some parts of code generation where it makes reasoning about the changes in this commit easier. In particular, I've less often special-cased the 32-bit cases. We can always make this more complex again later.

  1. Change the register allocation to deal with sign / zero extension. We track the current state of registers, so if you e.g. ask to zero extend a register we already know is zero extended, we don't emit code.

This showed up a number of ways we could go wrong: I ended up having to fundamentally rewrite the core part of register allocation because of this. I have also simplified this and we thus generate less optimal code in several ways (notably we spill more due to this commit), though note that we do a lot less sign / zero extension. I have also made more parts of the register allocator amenable in the future to being broken up into testable units, though this commit does not test those bits in isolation.

Note this commit only updates general purpose register allocation. Floating point register allocation is left untouched for now, because (1) this commit is already sufficiently unwieldy (2) sign/zero extension in FP isn't something we need to worry about.

  1. Updating tests.

I have tried to strengthen tests when possible, but some have had to be weakened slightly because of our (currently) poorer code generation.

I am sure this commit introduces some new bugs, but I suspect it fixes more bugs than it introduces. I have somewhat more confidence in it than I did before, and I think it provides a more solid base for future changes which will strengthen our confidence further.

Previously we manually, and eagerly, sign and zero extended registers as
we needed them. This was easy to get wrong, because the absence of
sign/zero extension did not mean that it wasn't needed.

This commit forces register allocation to always consider sign / zero
extension. What before was, for example:

```rust
let [lhs_reg] = self.ra.assign_gp_regs(
    &mut self.asm,
    iidx,
    [RegConstraint::InputOutput(lhs)],
);
```

now becomes:

```rust
let [lhs_reg] = self.ra.assign_gp_regs(
    &mut self.asm,
    iidx,
    [GPConstraint::InputOutput {
        op: lhs,
        in_ext: RegExtension::Undefined,
        out_ext: RegExtension::ZeroExtended,
        force_reg: None,
    }],
);
```

`in_ext` specifies "what extension must be the input value satisfy?" and
`out_ext` specifies "what extension will the output value have?"

The major parts of this commit are thus:

  1. Audit the x64 code generator to specify sign / zero extension
     everywhere.

This turned up various potentially dodgy things, including some parts
where we had more complexity than I wanted to deal with. I have not
hesitated to simplify (which mostly mean deoptimise!) some parts of code
generation where it makes reasoning about the changes in this commit
easier. In particular, I've less often special-cased the 32-bit cases.
We can always make this more complex again later.

  2. Change the register allocation to deal with sign / zero extension.
  We track the current state of registers, so if you e.g. ask to zero
  extend a register we already know is zero extended, we don't emit
  code.

This showed up a number of ways we could go wrong: I ended up having to
fundamentally rewrite the core part of register allocation because of
this. I have also simplified this and we thus generate less optimal
code in several ways (notably we spill more due to this commit), though
note that we do a lot less sign / zero extension. I have also made more
parts of the register allocator amenable in the future to being broken
up into testable units, though this commit does not test those bits in
isolation.

Note this commit *only* updates general purpose register allocation.
Floating point register allocation is left untouched for now, because
(1) this commit is already sufficiently unwieldy (2) sign/zero extension
in FP isn't something we need to worry about.

  3. Updating tests.

I have tried to strengthen tests when possible, but some have had to be
weakened slightly because of our (currently) poorer code generation.

I am sure this commit introduces some new bugs, but I suspect it fixes
more bugs than it introduces. I have somewhat more confidence in it than
I did before, and I think it provides a more solid base for future
changes which will strengthen our confidence further.
@vext01
Copy link
Contributor

vext01 commented Feb 5, 2025

There's so much here that I'll have to take you at your word. LGTM.

@ptersilie
Copy link
Contributor

Uff, only skimmed it, as I don't believe I'll have much to add anyway.

@ptersilie ptersilie added this pull request to the merge queue Feb 5, 2025
Merged via the queue into ykjit:master with commit 480b38b Feb 5, 2025
2 checks passed
@ltratt ltratt deleted the new_regalloc branch February 6, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants