-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication. Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid requiring a copy and/or marking `lvDoNotEnregister` for those cases. Unify the specification of multi-reg args: - numRegs now indicates the actual number of reg args (not the size in pointer-size units) - regNums contains all the arg register numbers
This change has zero diffs for x86, x64/windows, arm altjit and arm64 altjit. A private perf run showed the following improvements on the Devirtualization benchmark (i.e. a factor of 2 for 3 of the 4 sub-benchmarks): For the remaining benchmarks, there were many numbers in the roughly .9 to 1.1 range that were reported as signficant, but neither crossgen asm-diffs nor superpmi run against both ZapDisable and regular benchmarks on x64/ux showed any significant negative diffs. For the regular run, the overall was:
|
@dotnet/jit-contrib PTAL |
src/jit/codegencommon.cpp
Outdated
baseVarNum = compiler->lvaFirstStackIncomingArgNum; | ||
|
||
if (compiler->lvaFirstStackIncomingArgNum != BAD_VAR_NUM) | ||
// Iterate over all the local variables in the Lcl var table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it me or this comment is wrong? The for loop below seems to iterate only over the args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct - the comment is wrong. In my defense, it isn't new (though the diffs seem to think so), it was just moved from below.
I'll correct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (Looked only at the general idea and impact to arm64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm have some nits
src/jit/compiler.cpp
Outdated
{ | ||
// If this is passed as a floating type, use that. | ||
// Otherwise, we'll use the general case - we don't want to use the "EightByteType" | ||
// directly, because it won't preserve small types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to imply that the classification returned by Compiler::EightByteType is incorrect, or at least being deprecated. If this is the case, I think it is worth adding a comment to Compiler::EightByteType explaining its limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should just reword the comment. It's not really a limitation of the "EightByteType", it's that it returns TYP_INT
for any integer type <= 4 bytes. I could change that method, but other callers depend on that behavior.
src/jit/compiler.cpp
Outdated
// Set 'useType' to the type of the first eightbyte item | ||
// We can't pass this as a primitive type. | ||
} | ||
else if (structDesc.passedInRegisters && varTypeIsFloating(GetEightByteType(structDesc, 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems easier to directly use structDesc.eightByteClassifications[0] == SystemVClassificationTypeSSE even if it is ugly it saves a few calls, one that is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm not totally comfortable with that, because it assumes that the structDesc is initialized (I realize it is, but it's not obvious from this context).
@@ -834,7 +837,7 @@ var_types Compiler::getArgTypeForStruct(CORINFO_CLASS_HANDLE clsHnd, | |||
#ifdef UNIX_AMD64_ABI | |||
|
|||
// The case of (structDesc.eightByteCount == 1) should have already been handled | |||
if (structDesc.eightByteCount > 1) | |||
if ((structDesc.eightByteCount > 1) || !structDesc.passedInRegisters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What case does adding || !structDesc.passedInRegisters? It seems like the comment is correct, by this point any Arg that is <= 8bytes should have been handled and any arg that is > 8 bytes but <= 16 bytes will be correctly addressed with structDesc.eightByteCount > 1 regardless of whether structDesc.passedInRegisters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think so. I believe that if passedInRegisters
is false
, the eightByteCount
will always be 0 - in which case we still need to set this as being passed by value. It's only the pass by reference case that should not fall into this then clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense thank you for clarifying.
src/jit/compiler.h
Outdated
private: | ||
regNumberSmall regNums[MAX_ARG_REG_COUNT]; // The registers to use when passing this argument, set to REG_STK for | ||
// arguments passed on | ||
// the stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: condense to one line:
// arguments passed on the stack
// In such case the fgArgTabEntry keeps track of whether the original node (before morphing) | ||
// was a struct and the struct classification. | ||
isStructArg = fgEntryPtr->isStruct; | ||
isStructArg = argEntry->isStruct; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It may be preferable to declare isStructArg twice. To avoid any potential problems with bool isStructArg being uninitialized if someone changes this in the future.
bool isStructArg = argEntry->isStruct;
and in !remorphing
bool isStructArg = varTypeIsStruct(argx);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work, because we need the value below this if-then-else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that is unfortunate, ok
src/jit/morph.cpp
Outdated
if ((howToPassStruct == SPK_PrimitiveType) && // Passed in a single register | ||
!isPow2(originalSize)) // size is 3,5,6 or 7 bytes | ||
{ | ||
if (argObj->gtObj.gtOp1->IsVarAddr()) // Is the source a LclVar? | ||
// if (argObj->gtObj.gtOp1->IsVarAddr()) // Is the source a LclVar? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out? Is it necessary? If not can it just be deleted? As is it is a little hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - I had convinced myself that this didn't need to be limited to that case, so I commented it out to test my hypothesis. Then I failed to remove it.
src/jit/morph.cpp
Outdated
{ | ||
// For ARM64 we pass structs that are 3,5,6,7 bytes in size | ||
// we can read 4 or 8 bytes from the LclVar to pass this arg | ||
// For ARM64 we pass structs that are 3,5,6,7 bytes in size in registers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: update comment, this now is for arm64 or unix amd64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted the comment, as it is redundant
src/jit/morph.cpp
Outdated
@@ -3916,6 +3857,7 @@ GenTreeCall* Compiler::fgMorphArgs(GenTreeCall* call) | |||
size = roundupSize / TARGET_POINTER_SIZE; // Normalize size to number of pointer sized items | |||
} | |||
} | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Add comment to endif
#endif // !UNIX_AMD64_ABI
src/jit/morph.cpp
Outdated
#ifdef FEATURE_HFA | ||
if (!passUsingFloatRegs) | ||
{ | ||
// Note on Arm32 a HFA is passed in int regs for varargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently is currently correct for Arm32 and Arm64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that this is Windows specific
src/jit/codegencommon.cpp
Outdated
// We are iterating over the arguments only. | ||
assert(varDsc->lvIsParam); | ||
// We are iterating over the arguments only. | ||
assert(varDsc->lvIsParam); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With /GSChecks we can make a "shadow" copy of an incoming struct argument.
Basically we are copying it from the incoming Arg Space into a new struct local that is placed so that
it will catch any buffer overruns into its storage. Something to think about with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Brian. Just to make sure I understand - this would be a "new" local though, right? So that the original varDsc would still have the original offset. If that's the case, then I don't think that would impact this. Otherwise, this would have a problem because it assumes that they are laid out in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is a new local, I'm not sure about all the details here, but the IL references to the arg are rewrtten to refer to the new local.
There was a recent bug where we struct promotion interacted badly and it and end up with reference to both the original copy and to the individual promoted fields.
A fix and test case for the issue is here: #17329
} | ||
} | ||
#endif // !_TARGET_X86_ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring here.
The arm failures in the previous round were due to an overly aggressive assert in |
@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test |
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test |
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test |
@dotnet-bot test Windows_NT arm64 Cross Checked normal Build and Test |
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test |
* Unify struct arg handling Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication. Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid requiring a copy and/or marking `lvDoNotEnregister` for those cases. Unify the specification of multi-reg args: - numRegs now indicates the actual number of reg args (not the size in pointer-size units) - regNums contains all the arg register numbers Commit migrated from dotnet/coreclr@d28957d
Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication.
Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid
requiring a copy and/or marking
lvDoNotEnregister
for those cases.Unify the specification of multi-reg args: