-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE #4704
x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE #4704
Conversation
c12866f
to
1f60df4
Compare
1f60df4
to
e462c1d
Compare
|
||
;; Converting to unsigned int so if float src is negative or NaN | ||
;; will first set to zero. | ||
(tmp2 Xmm (x64_pxor src src)) ;; TODO: unnecessary dependency on src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having trouble figuring out how to make a zero in a register, and would like to avoid the unnecessary dependency on the input. I tried allocating a temp and using it as input to pxor
, but the following snippet caused a panic in regalloc2:
(tmp2 WritableXmm (temp_writable_xmm))
(tmp2 Xmm (x64_pxor tmp2, tmp2))
It seems that using a temp register without first having assigned to it is not expected. Is there an easy way to make a zero that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pxor
instruction takes both args as sources, so expects them to be defined; so regalloc2 is correctly panic'ing here that there is a use of an undefined value.
The issue is that we haven't special-cased the "xor of any value with itself gives zero" semantics of the instruction: RA2 doesn't know (can't know) that this particular instruction is invariant to its input, so it's fine to use an undefined value.
I think we do special-case this for at least one other xor variant (XmmUninitializedConst enum arm, IIRC?) -- we'd want to do the same for pxor here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased on #4709 to pull in the changes to produces_const
, and it removed some unnecessary mov
s as a side-benefit :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also rebased this on #4714, as it turns out that treating the cmpps
with the same source registers as a constant is not correct. I couldn't reliably add a movdqa
instruction in the translation to ISLE, which meant that we couldn't control the argument to the cmpps
instruction. As a result, we would get spurious errors, as it would compare random values against themselves, and sometimes those random values would be NaN
.
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
e462c1d
to
d22f51c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
(dst Xmm (x64_cvttps2dq $F32X4 dst)) | ||
|
||
;; TODO: regalloc2 introduces a useless move here, is that | ||
;; acceptable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still active?
In general it's good to understand why spurious moves occur, but in some cases rearranging the ops causes things to shift somewhat and it happens; especially when the old handwritten code was making multiple defs on one temp which forced values into the same location (a sort of accidental constraint). If we get one extra move in a long conversion sequence it's not the end of the world, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not bad to include that move there then I'll remove the TODO 👍
98c359c
to
b7c1225
Compare
b7c1225
to
f451260
Compare
Migrate the lowering for the four instructions fcvt_to_{u,s}int{,_sat} to ISLE.
I realized while porting this lowering that we currently don't support the unsaturating versions of both instructions for vector types, and that we currently only support vector conversions from
F32X4
toI32X4
. I didn't want to tackle implementing those here, and have preserved the current behavior instead. That bug is tracked in #4693.