-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Add simple bitwise ops #101762
Conversation
And,AndAcross,Or,OrAcross,Xor,XorAcross
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
@@ -4802,11 +4802,11 @@ void CodeGen::genArm64EmitterUnitTestsSve() | |||
INS_OPTS_SCALABLE_D); // CLASTB <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T> | |||
|
|||
// IF_SVE_CN_3A | |||
theEmitter->emitIns_R_R_R(INS_sve_clasta, EA_2BYTE, REG_V12, REG_P1, REG_V15, INS_OPTS_SCALABLE_H, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same changes as for AddAcross
in previous PR - the size
arg is not used, as the sizes are dependant on opts
.
@@ -2919,7 +2919,10 @@ void emitter::emitInsSve_R_R_R(instruction ins, | |||
|
|||
if (sopt == INS_SCALABLE_OPTS_UNPREDICATED) | |||
{ | |||
assert(opt == INS_OPTS_SCALABLE_D); | |||
// The instruction only has a .D variant. However, this doesn't matter as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this prevents adding special cases in hwinstrinccodegen.
@dotnet/arm64-contrib @kunalspathak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some nit comments
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs
Show resolved
Hide resolved
/// MOVPRFX Zresult, Zop1; AND Zresult.B, Pg/M, Zresult.B, Zop2.B | ||
/// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2) | ||
/// AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B | ||
/// AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// AND Ztied2.B, Pg/M, Ztied2.B, Zop1.B |
Why do we have 2 entries of the predicated version? Here and elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 250 is saying a = AND(a, b)
, whereas line 251 is showing b = AND(b, a)
It's a little awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to list every possible variant here nor the mov
instructions required to handle RMW cases, we definitely don't do that for any other intrinsics across Arm64, x64, or WASM.
The main intent is really just to give a brief overview of the C/C++ intrinsic and the primary hardware instruction emitted so that users can map things more easily and know the primary location to lookup to understand the instruction (kind of like a see-also
).
Ideally we'd be able to basically quote the Arm64 architecture manual and give a better description (with the notes we currently have as actual see-also
), but said manuals come with an explicit copyright/proprietary notice and so cannot be reproduced without express written permission (which means getting legal of both companies involved and getting the relevant agreement put together). So this is the next best thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is to say, I think we can just do what we do for other ISAs and simplify it down to a few lines:
/// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>
/// AND <Zd>.D, <Zn>.D, <Zm>.D
/// AND <Zdn>.<T>, <Zdn>.<T>, #<const>
/// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2)
/// AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
Which covers the 4x C/C++ intrinsics that map to this API and the 4x instruction entries that map, without getting into the implementation details of exactly how operands map to registers, how boilerplate instructions to handle RMW considerations are emitted (like mov
, movprfx
, etc), and without getting into how predication maps to the instructions (which is something to handle in a general conceptual doc that intrinsics can link to, not repeated per doc page).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I was allowing movprfx
because that is something special in SVE land, but again I think it is an implementation RMW detail which we do not need in the summary docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly I don't think that can be scripted. But, agreed with the approach, we'll have to simplify manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fixed, but it's in the same style as existing:
/// <summary>
/// svuint8_t svand[_u8]_m(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_x(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// svuint8_t svand[_u8]_z(svbool_t pg, svuint8_t op1, svuint8_t op2)
/// AND Ztied1.B, Pg/M, Ztied1.B, Zop2.B
/// AND Zresult.D, Zop1.D, Zop2.D
/// svbool_t svand[_b]_z(svbool_t pg, svbool_t op1, svbool_t op2)
/// AND Presult.B, Pg/Z, Pop1.B, Pop2.B
/// </summary>
Easier to update from the existing autogenerated and the tied
is useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the autogenerated, you can skip outputting the ones that has movprfx
. Can you refresh my memory of what is tied
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the autogenerated, you can skip outputting the ones that has
movprfx
.
I'll do that
Can you refresh my memory of what is
tied
?
Both args marked as tied
are the same register. RW semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* JIT ARM64-SVE: Add simple bitwise ops And,AndAcross,Or,OrAcross,Xor,XorAcross * Fix fadda * Fix unpkh/fexpa/frecpe * Reorder System.Runtime.Intrinsics.cs * Fix API head comments
* JIT ARM64-SVE: Add simple bitwise ops And,AndAcross,Or,OrAcross,Xor,XorAcross * Fix fadda * Fix unpkh/fexpa/frecpe * Reorder System.Runtime.Intrinsics.cs * Fix API head comments
And, AndAcross, Or, OrAcross, Xor, XorAcross
Test results: