-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64: emit_cmp: use x64_test for comparisons with 0 #6086
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me! It is a bit of a micro-optimization, but every bit helps (or "byte" in this case). I verified in Intel's optimization manual that this change will help both with macrofusion (merge two instructions into a single micro-op) and code size:
Assembly/Compiler Coding Rule 17. (M impact, ML generality) Employ macrofusion where
possible using instruction pairs that support macrofusion. Prefer TEST over CMP if possible. Use unsigned variables and unsigned jumps when possible. Try to logically verify that a variable is nonnegative at the time of comparison. Avoid CMP or TEST of MEM-IMM flavor when possible. However, do not add other instructions to avoid using the MEM-IMM flavor.
Beyond the MEM-IMM
issue, what is going on here is that CMP
will not macrofuse with a subset of conditional jumps but TEST
will macrofuse with all of them (see section 3.4.2.2, Optimizing for Macrofusion).
Assembly/Compiler Coding Rule 36. (ML impact, M generality) Use the TEST instruction instead of AND when the result of the logical AND is not used. This saves µops in execution. Use a TEST of a register with itself instead of a CMP of the register to zero, this saves the need to encode the zero and saves encoding space. Avoid comparing a constant to a memory operand. It is preferable to load the memory operand and compare the constant to a register.
Thanks!
|
||
(rule 3 (emit_cmp cc (u64_from_iconst 0) b @ (value_type ty)) | ||
(let ((size OperandSize (raw_operand_size_of_type ty))) | ||
(icmp_cond_result (x64_test size b b) (intcc_reverse cc)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note for others: I was a bit nervous about reverse
here and ended up looking up whether inverse
is actually the right thing — we can see from rule 1
, though, that reverse
is used there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm frequently confused between inverse
and reverse
too. I think both names are correct descriptions of what they do, but maybe later we can come up with alternate names that are less confusing. Maybe "negate" instead of "inverse", and "swap" instead of "reverse", for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameysharp Good idea! I think these are much better names.
Hmm, any idea what could be happening with that test failure? Is there some special-case around sinking for |
Hmm, this optimization can turn one use at the IR level into two uses at the same instruction, at the opcode level. Perhaps this confuses the sinking logic? |
Yes, we have some subtle invariants here where lowering |
(let ((size OperandSize (raw_operand_size_of_type ty))) | ||
(icmp_cond_result (x64_test size a a) cc))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the issue on CI, I think this should work?
(let (
(size OperandSize (raw_operand_size_of_type ty))
(a Gpr (put_in_gpr a))
)
(icmp_cond_result (x64_test size a a) cc))
Basically by forcing a
into a register earlier it's not separately converted to both GprMem
and Gpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's the right fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that works. I added a filetest that triggers this assertion, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you, and an extra thank you for adding the file test.
This implements the first part of the improvement described in #5869.
The PR template suggests I assign someone for review, but I have no idea who'd be best suited for this.
cc @elliottt