Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: emit_cmp: use x64_test for comparisons with 0 #6086

Merged
merged 2 commits into from
Mar 27, 2023

Conversation

meithecatte
Copy link
Contributor

This implements the first part of the improvement described in #5869.

The PR template suggests I assign someone for review, but I have no idea who'd be best suited for this.

cc @elliottt

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Mar 22, 2023
@jameysharp
Copy link
Contributor

Neat, thanks for picking this up! I think this is correct, but I'd like @elliottt or @abrown to take a look.

Copy link
Contributor

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me! It is a bit of a micro-optimization, but every bit helps (or "byte" in this case). I verified in Intel's optimization manual that this change will help both with macrofusion (merge two instructions into a single micro-op) and code size:

Assembly/Compiler Coding Rule 17. (M impact, ML generality) Employ macrofusion where
possible using instruction pairs that support macrofusion. Prefer TEST over CMP if possible. Use unsigned variables and unsigned jumps when possible. Try to logically verify that a variable is nonnegative at the time of comparison. Avoid CMP or TEST of MEM-IMM flavor when possible. However, do not add other instructions to avoid using the MEM-IMM flavor.

Beyond the MEM-IMM issue, what is going on here is that CMP will not macrofuse with a subset of conditional jumps but TEST will macrofuse with all of them (see section 3.4.2.2, Optimizing for Macrofusion).

Assembly/Compiler Coding Rule 36. (ML impact, M generality) Use the TEST instruction instead of AND when the result of the logical AND is not used. This saves µops in execution. Use a TEST of a register with itself instead of a CMP of the register to zero, this saves the need to encode the zero and saves encoding space. Avoid comparing a constant to a memory operand. It is preferable to load the memory operand and compare the constant to a register.

Thanks!


(rule 3 (emit_cmp cc (u64_from_iconst 0) b @ (value_type ty))
(let ((size OperandSize (raw_operand_size_of_type ty)))
(icmp_cond_result (x64_test size b b) (intcc_reverse cc))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note for others: I was a bit nervous about reverse here and ended up looking up whether inverse is actually the right thing — we can see from rule 1, though, that reverse is used there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm frequently confused between inverse and reverse too. I think both names are correct descriptions of what they do, but maybe later we can come up with alternate names that are less confusing. Maybe "negate" instead of "inverse", and "swap" instead of "reverse", for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameysharp Good idea! I think these are much better names.

@jameysharp jameysharp added this pull request to the merge queue Mar 23, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 23, 2023
@meithecatte
Copy link
Contributor Author

Hmm, any idea what could be happening with that test failure? Is there some special-case around sinking for cmp that needs to be extended to test?

@meithecatte
Copy link
Contributor Author

Hmm, this optimization can turn one use at the IR level into two uses at the same instruction, at the opcode level. Perhaps this confuses the sinking logic?

@cfallin
Copy link
Member

cfallin commented Mar 23, 2023

Hmm, this optimization can turn one use at the IR level into two uses at the same instruction, at the opcode level. Perhaps this confuses the sinking logic?

Yes, we have some subtle invariants here where lowering icmp/fcmp is special: we lower at every use (because otherwise we'd have to spill/restore flags), so we go to special lengths to not allow load sinking. See e.g. #3934. We'd like to come up with a more principled way to enforce this invariant, maybe a "special" lowering context state that knows we're lowering something that can be lowered multiple times; but right now we don't have that. Unfortunately I think this means that this change can't work as-is.

Comment on lines 4303 to 4304
(let ((size OperandSize (raw_operand_size_of_type ty)))
(icmp_cond_result (x64_test size a a) cc)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the issue on CI, I think this should work?

(let (
    (size OperandSize (raw_operand_size_of_type ty))
    (a Gpr (put_in_gpr a))
  )
  (icmp_cond_result (x64_test size a a) cc))

Basically by forcing a into a register earlier it's not separately converted to both GprMem and Gpr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's the right fix!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that works. I added a filetest that triggers this assertion, too.

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you, and an extra thank you for adding the file test.

@jameysharp jameysharp added this pull request to the merge queue Mar 27, 2023
Merged via the queue into bytecodealliance:main with commit db07988 Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants