-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xplat: use xmm for float args #2716
Conversation
@dotnet-bot test Windows ci_slow_x64_debug please |
191e9c3
to
08a8e9a
Compare
lib/Backend/amd64/LowererMDArch.cpp
Outdated
@@ -523,7 +561,9 @@ LowererMDArch::LowerCallArgs(IR::Instr *callInstr, ushort callFlags, Js::ArgSlot | |||
callInstr->InsertBefore(argInstr); | |||
argCount++; | |||
} | |||
|
|||
#ifndef _WIN32 | |||
this->xplatCallArgs.StopRecording(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty neat idea 👍 I'm worried about the locations of Start/Stop
recording though. It seems to me maybe every GetArgSlotOpnd
call needs to be recorded. We use a mixture of LowerCallArgs and several other random code paths to load different args, which may all need to be homed. E.g., even the helperCallArgs
loaded in LowerCall
may need to be homed. Stopping here may miss some args. Please survey all calls to GetArgSlotOpnd
to see if this is a potential issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we count them all, we would also count the ones we endup pushing to stack. We are only interested with the ones explicitly located on registers (float and non-float) . The rest, we consider them as non-float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even the helperCallArgs loaded in LowerCall may need to be homed.
Indeed they are but we don't need to check/store their type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we only record register args, which is what you currently doing inside GetArgSlotOpnd
. You are not recording stack args.
Recording all might be more clear, otherwise you have to assume this is the only place xmm args could be loaded, all others in int registers, other registers may have no recordings, making the homing code complicated... (If recording all is hard to do right, this is fine with me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you follow the usages of GetArgSlotOpnd
you will see that there are 2 places setting argSym so the register can be anything but TyMachReg
. One of them we record. Other, we don't have to since shouldHomeParams = argCount > 0
and !shouldHomeParams
... So we record them all right.
lib/Backend/amd64/LowererMDArch.cpp
Outdated
int floatCount = this->xplatCallArgs.floatCount; | ||
int argRes = min(callArgCount, static_cast<int>(XmmArgRegsCount)); | ||
int intCount = max(0, min(argRes - floatCount, static_cast<int>(IntArgRegsCount))); | ||
int argRegs = intCount + floatCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the logic here -- could it be due to the possibility that you were not recording all args loading (my other comment)? If you recorded every arg loading, the logic here could be very simple. Every arg from position 0 to xplatCallArgs.topPosition must be in register, type recorded, and must be homed. You can then figure out the register from position and your recorded type and home it. Seems no need to count int/float registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to limit recording to actual registers. This way we know / store only the ones make sense for our homing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you can guarantee that the partial recording successfully records every float register arg, the logic here seems quite complex and could contain bugs. E.g., given 8 args: int0, int1, int2, int3, int4, int5, int_stack, xmm7
.
callArgCount == 8
xpTopPos == 8
floatCount == 1
argRes == 8
intCount == 6
argRegs == 7
Then your following for
loop will start from i = 8
and end at i == 2
(loop 7 iterations because of argRegs). You'll miss homing the 1st register int0
.
Seems it could be simpler:
for (int i = min(callArgCount, XmmArgRegsCount); i > 0; i--) {
if (had recording for i and it is float) {
prep homing float reg
} else if (i <= IntArgRegsCount) {
prep homing int reg
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 ) callArgCount will be at least 9 -> you are missing the +call_info.
2 ) in your case i
won't represent the the arg slot index..
3 ) good test case idea, will update the one on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
callArgCount will be at least 9 -> you are missing the +call_info.
callArgCount is total native arg count, not original runtime arg count -- in my example "int1" is call_info.
in your case i won't represent the the arg slot index..
Why? i
does represent the arg slot index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (had recording for i and it is float)
Why? i does represent the arg slot index.
As long as we loop reset the whole array each time and empty loop till first int register...
Anyways, updated the PR.
lib/Backend/amd64/LowererMDArch.cpp
Outdated
else if (isFloatArg && argPosition <= XmmArgRegsCount) | ||
RegNum reg = GetRegFromArgPosition(isFloatArg, argPosition); | ||
#ifndef _WIN32 | ||
if (reg != RegNOREG && this->xplatCallArgs.IsRecording()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems all calls to GetArgSlotOpnd
is to load an argument, and if we are to record every one, IsRecording()
should be true.
lib/Backend/amd64/LowererMDArch.h
Outdated
|
||
inline int GetArgsCount() { return intCount + floatCount; } | ||
|
||
inline int GetTopPosition() { return lastPosition; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: inline
is assumed when implementation is in class declaration. We usually don't write inline
here explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline is assumed when implementation is in class declaration.
Sorry, how this implementation is different ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just proposing removing inline
keyword here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got that, could you please clarify why remove inline
keyword here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference, as I think it is redundant and gets in the way of code reading.
lib/Backend/amd64/LowererMDArch.h
Outdated
} | ||
} | ||
|
||
IRType args [XmmArgRegsCount + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Here it assumes XmmArgRegsCount >= IntArgRegsCount
. Could use some code to guard that.
test/AsmJs/rlexe.xml
Outdated
</test> | ||
<test> | ||
<default> | ||
<files>argTestMixed.js</files> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new tests are fine, I'm just thinking maybe existing tests already covering this? The reason we didn't hit this was just because we were not running all existing tests (-simdjs
issue). If you change runtests.py
, remove that flag, maybe you can see the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe but we don't have simd is enabled. These tests are now useful for xplat and will be useful for other archs.
lib/Backend/amd64/LowererMDArch.h
Outdated
{ | ||
Assert(isRecording); | ||
isRecording = false; | ||
for (int i = lastPosition - GetArgsCount(); i >= 0; i--) args[i] = TyMachPtr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some assumptions hidden in this code. Looks this code assumes the recording pattern must be we always record every register in the back range but none in the front range if any is missing recording. That's why you clear range [0, start). Is it right? Is there a way to ensure this assumption holds? At least can you document it?
da9b793
to
dffd23d
Compare
lib/Backend/amd64/LowererMDArch.h
Outdated
{ | ||
for (int i = 0; i <= XmmArgRegsCount; i++) args[i] = TyMachPtr; | ||
|
||
floatCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: floatCount tracked but not being used.
dffd23d
to
028861d
Compare
Previously xplat LowerCall args were limited to REG_INT_ARG... Adding XMM* float reg. support
028861d
to
2c749f5
Compare
@jianchun Thanks for the review. Made some additional improvements. |
Merge pull request #2716 from obastemur:fix_xplat_args Previously xplat LowerCall args were limited to REG_INT_ARG... Adding XMM* support and a test case to make sure we don't break. This PR also improves performance on xplat jetstream Disabling couple of additional wasm tests on xplat with a note. edit: added another test case
Merge pull request #2716 from obastemur:fix_xplat_args Previously xplat LowerCall args were limited to REG_INT_ARG... Adding XMM* support and a test case to make sure we don't break. This PR also improves performance on xplat jetstream Disabling couple of additional wasm tests on xplat with a note. edit: added another test case
@obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here.
@obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here.
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI Edit: Add a flag to warn instead of fail on test timeout This is in an attempt to reduce noise a bit; a small number of tests fail to run quickly sometimes enough on one channel of our test infra.
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI Edit: Add a flag to warn instead of fail on test timeout This is in an attempt to reduce noise a bit; a small number of tests fail to run quickly sometimes enough on one channel of our test infra. Edit: Specify type for -j test flag.
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI Edit: Add a flag to warn instead of fail on test timeout This is in an attempt to reduce noise a bit; a small number of tests fail to run quickly sometimes enough on one channel of our test infra. Edit: Specify type for -j test flag. Edit: Disable asan for asmjs for the moment.
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI Edit: Add a flag to warn instead of fail on test timeout This is in an attempt to reduce noise a bit; a small number of tests fail to run quickly sometimes enough on one channel of our test infra. Edit: Specify type for -j test flag. Edit: Disable tests that do not currently work with ASAN
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI Edit: Add a flag to warn instead of fail on test timeout This is in an attempt to reduce noise a bit; a small number of tests fail to run quickly sometimes enough on one channel of our test infra. Edit: Specify type for -j test flag. Edit: Disable tests that do not currently work with ASAN
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI Edit: Add a flag to warn instead of fail on test timeout This is in an attempt to reduce noise a bit; a small number of tests fail to run quickly sometimes enough on one channel of our test infra. Edit: Specify type for -j test flag. Edit: Disable tests that do not currently work with ASAN
Remove the flag-based test ignoring of -simdjs as it has caused confusion on whether or not tests were running. Add test tagging for simd, asmjs, and wasm functionality. Add the ability to read --sanitize flags and exclude known failing tests. Edit: Allow all of the asmjs tests, since they work now. @obastemur fixed the issues with them in chakra-core#2716, so I've gone and enabled them here. Edit: Add --static to test flags. Not using it currently to mask out any tests; this is just to make the test and build flags line up better. Edit: Add a -j flag to runtests.py for num processes This makes it line up better with the build flags for CI Edit: Add a flag to warn instead of fail on test timeout This is in an attempt to reduce noise a bit; a small number of tests fail to run quickly sometimes enough on one channel of our test infra. Edit: Specify type for -j test flag. Edit: Disable tests that do not currently work with ASAN Edit: Make the verbose flag do something, add a way to run tests outside of the repo if needed.
Previously xplat LowerCall args were limited to REG_INT_ARG... Adding XMM* support and a test case to make sure we don't break.
This PR also improves performance on xplat jetstream
Disabling couple of additional wasm tests on xplat with a note.
edit: added another test case