Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE #4704

Merged
merged 6 commits into from
Aug 16, 2022

Conversation

elliottt
Copy link
Member

@elliottt elliottt commented Aug 12, 2022

Migrate the lowering for the four instructions fcvt_to_{u,s}int{,_sat} to ISLE.

I realized while porting this lowering that we currently don't support the unsaturating versions of both instructions for vector types, and that we currently only support vector conversions from F32X4 to I32X4. I didn't want to tackle implementing those here, and have preserved the current behavior instead. That bug is tracked in #4693.

@elliottt elliottt force-pushed the trevor/x64-fcvt-to-uint branch 2 times, most recently from c12866f to 1f60df4 Compare August 12, 2022 18:13
@elliottt elliottt changed the title x64: Lower fcvt_to_uint in ISLE x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE Aug 12, 2022
@elliottt elliottt force-pushed the trevor/x64-fcvt-to-uint branch from 1f60df4 to e462c1d Compare August 12, 2022 18:17
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Aug 12, 2022

;; Converting to unsigned int so if float src is negative or NaN
;; will first set to zero.
(tmp2 Xmm (x64_pxor src src)) ;; TODO: unnecessary dependency on src
Copy link
Member Author

@elliottt elliottt Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having trouble figuring out how to make a zero in a register, and would like to avoid the unnecessary dependency on the input. I tried allocating a temp and using it as input to pxor, but the following snippet caused a panic in regalloc2:

(tmp2 WritableXmm (temp_writable_xmm))
(tmp2 Xmm (x64_pxor tmp2, tmp2))

It seems that using a temp register without first having assigned to it is not expected. Is there an easy way to make a zero that I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pxor instruction takes both args as sources, so expects them to be defined; so regalloc2 is correctly panic'ing here that there is a use of an undefined value.

The issue is that we haven't special-cased the "xor of any value with itself gives zero" semantics of the instruction: RA2 doesn't know (can't know) that this particular instruction is invariant to its input, so it's fine to use an undefined value.

I think we do special-case this for at least one other xor variant (XmmUninitializedConst enum arm, IIRC?) -- we'd want to do the same for pxor here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased on #4709 to pull in the changes to produces_const, and it removed some unnecessary movs as a side-benefit :D

Copy link
Member Author

@elliottt elliottt Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also rebased this on #4714, as it turns out that treating the cmpps with the same source registers as a constant is not correct. I couldn't reliably add a movdqa instruction in the translation to ISLE, which meant that we couldn't control the argument to the cmpps instruction. As a result, we would get spurious errors, as it would compare random values against themselves, and sometimes those random values would be NaN.

@elliottt elliottt marked this pull request as ready for review August 12, 2022 18:47
@elliottt elliottt added the isle Related to the ISLE domain-specific language label Aug 12, 2022
@github-actions
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@elliottt elliottt force-pushed the trevor/x64-fcvt-to-uint branch from e462c1d to d22f51c Compare August 15, 2022 18:40
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

(dst Xmm (x64_cvttps2dq $F32X4 dst))

;; TODO: regalloc2 introduces a useless move here, is that
;; acceptable?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still active?

In general it's good to understand why spurious moves occur, but in some cases rearranging the ops causes things to shift somewhat and it happens; especially when the old handwritten code was making multiple defs on one temp which forced values into the same location (a sort of accidental constraint). If we get one extra move in a long conversion sequence it's not the end of the world, IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not bad to include that move there then I'll remove the TODO 👍

@elliottt elliottt force-pushed the trevor/x64-fcvt-to-uint branch 2 times, most recently from 98c359c to b7c1225 Compare August 15, 2022 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants