Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Unify struct arg handling #18358

Merged
merged 4 commits into from
Jun 12, 2018
Merged

Unify struct arg handling #18358

merged 4 commits into from
Jun 12, 2018

Conversation

CarolEidt
Copy link

Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication.
Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid
requiring a copy and/or marking lvDoNotEnregister for those cases.

Unify the specification of multi-reg args:

  • numRegs now indicates the actual number of reg args (not the size in pointer-size units)
  • regNums contains all the arg register numbers

Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication.
Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid
requiring a copy and/or marking `lvDoNotEnregister` for those cases.

Unify the specification of multi-reg args:
- numRegs now indicates the actual number of reg args (not the size in pointer-size units)
- regNums contains all the arg register numbers
@CarolEidt
Copy link
Author

This change has zero diffs for x86, x64/windows, arm altjit and arm64 altjit.
For x64/ux, the diffs are:
-7442766 fewer bytes (-3.52% of base), (19596 methods improved, 39 regressed)

A private perf run showed the following improvements on the Devirtualization benchmark (i.e. a factor of 2 for 3 of the 4 sub-benchmarks):
Devirtualization / EqualityComparer, +1.719 (overall)
ValueTupleCompare, +1.023
ValueTupleCompareCached, +2.030
ValueTupleCompareNoOpt, +2.083
ValueTupleCompareWrapped, +2.018

For the remaining benchmarks, there were many numbers in the roughly .9 to 1.1 range that were reported as signficant, but neither crossgen asm-diffs nor superpmi run against both ZapDisable and regular benchmarks on x64/ux showed any significant negative diffs. For the regular run, the overall was:
-36623 fewer bytes (-1.759% of base), 810 methods improved, 7 regressed
The ZapDisable run was similar.
The 7 regressions are:

  • Microsoft.CodeAnalysis.CSharp.CodeGen.StackOptimizerPass1:VisitArguments (432 to 777 bytes) - A loop was cloned, and the "good" loop is actually larger, due to edge splitting and spilling
  • Microsoft.CodeAnalysis.PEAssembly:.ctor (342 to 412 bytes) - A loop is cloned so the code is larger, but the loop is 65 bytes, where it was 86 before.
  • Burgers:GetCalculated3 (1050 to 1052 bytes) - 2 additional spills
  • System.Linq.EnumerableSorter[KeyValuePair,Int32][System.Collections.Generic.KeyValuePair[System.Int64,System.__Canon],System.Int32]:ComputeKeys (220 to 232 bytes) - An arg is copied differently (not because it's a struct, but because of side-effects, this causes different ordering, different register allocation and slightly larger code)
  • Microsoft.CodeAnalysis.MetadataReaderExtensions:ReadAssemblyIdentityOrThrow (757 to 768 bytes) - Smaller code before LSRA, but more spill & resolution moves
  • System.Collections.Generic.ArraySortHelper1[CustomAttributeRow][Microsoft.Cci.MetadataWriter+CustomAttributeRow]:SwapIfGreater(ref,ref,int,int) ` (177 to 185 bytes) - Because arg1 doesn't get copied, it has a longer lifetime, and winds up getting spilled
  • Microsoft.CodeAnalysis.CSharp.Symbols.AnonymousTypeManager:AssignTemplatesNamesAndCompile (1533 to 1536 bytes) - Similar to above - one less var, different live ranges, 1 more resolution move and 1 more split edge

@CarolEidt
Copy link
Author

@dotnet/jit-contrib PTAL
This eliminate some of the x64/ux-specific code, but makes only marginal progress toward the design described in https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/struct-abi.md.

@jashook jashook self-requested a review June 8, 2018 05:05
baseVarNum = compiler->lvaFirstStackIncomingArgNum;

if (compiler->lvaFirstStackIncomingArgNum != BAD_VAR_NUM)
// Iterate over all the local variables in the Lcl var table.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it me or this comment is wrong? The for loop below seems to iterate only over the args.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct - the comment is wrong. In my defense, it isn't new (though the diffs seem to think so), it was just moved from below.
I'll correct it.

Copy link

@sdmaclea sdmaclea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (Looked only at the general idea and impact to arm64)

Copy link

@jashook jashook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm have some nits

{
// If this is passed as a floating type, use that.
// Otherwise, we'll use the general case - we don't want to use the "EightByteType"
// directly, because it won't preserve small types.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to imply that the classification returned by Compiler::EightByteType is incorrect, or at least being deprecated. If this is the case, I think it is worth adding a comment to Compiler::EightByteType explaining its limitations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should just reword the comment. It's not really a limitation of the "EightByteType", it's that it returns TYP_INT for any integer type <= 4 bytes. I could change that method, but other callers depend on that behavior.

// Set 'useType' to the type of the first eightbyte item
// We can't pass this as a primitive type.
}
else if (structDesc.passedInRegisters && varTypeIsFloating(GetEightByteType(structDesc, 0)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems easier to directly use structDesc.eightByteClassifications[0] == SystemVClassificationTypeSSE even if it is ugly it saves a few calls, one that is redundant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm not totally comfortable with that, because it assumes that the structDesc is initialized (I realize it is, but it's not obvious from this context).

@@ -834,7 +837,7 @@ var_types Compiler::getArgTypeForStruct(CORINFO_CLASS_HANDLE clsHnd,
#ifdef UNIX_AMD64_ABI

// The case of (structDesc.eightByteCount == 1) should have already been handled
if (structDesc.eightByteCount > 1)
if ((structDesc.eightByteCount > 1) || !structDesc.passedInRegisters)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case does adding || !structDesc.passedInRegisters? It seems like the comment is correct, by this point any Arg that is <= 8bytes should have been handled and any arg that is > 8 bytes but <= 16 bytes will be correctly addressed with structDesc.eightByteCount > 1 regardless of whether structDesc.passedInRegisters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so. I believe that if passedInRegisters is false, the eightByteCount will always be 0 - in which case we still need to set this as being passed by value. It's only the pass by reference case that should not fall into this then clause.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense thank you for clarifying.

private:
regNumberSmall regNums[MAX_ARG_REG_COUNT]; // The registers to use when passing this argument, set to REG_STK for
// arguments passed on
// the stack
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: condense to one line:

// arguments passed on the stack

// In such case the fgArgTabEntry keeps track of whether the original node (before morphing)
// was a struct and the struct classification.
isStructArg = fgEntryPtr->isStruct;
isStructArg = argEntry->isStruct;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It may be preferable to declare isStructArg twice. To avoid any potential problems with bool isStructArg being uninitialized if someone changes this in the future.

bool isStructArg = argEntry->isStruct;
and in !remorphing
bool isStructArg = varTypeIsStruct(argx);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work, because we need the value below this if-then-else.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that is unfortunate, ok

if ((howToPassStruct == SPK_PrimitiveType) && // Passed in a single register
!isPow2(originalSize)) // size is 3,5,6 or 7 bytes
{
if (argObj->gtObj.gtOp1->IsVarAddr()) // Is the source a LclVar?
// if (argObj->gtObj.gtOp1->IsVarAddr()) // Is the source a LclVar?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out? Is it necessary? If not can it just be deleted? As is it is a little hard to follow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - I had convinced myself that this didn't need to be limited to that case, so I commented it out to test my hypothesis. Then I failed to remove it.

{
// For ARM64 we pass structs that are 3,5,6,7 bytes in size
// we can read 4 or 8 bytes from the LclVar to pass this arg
// For ARM64 we pass structs that are 3,5,6,7 bytes in size in registers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: update comment, this now is for arm64 or unix amd64.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the comment, as it is redundant

@@ -3916,6 +3857,7 @@ GenTreeCall* Compiler::fgMorphArgs(GenTreeCall* call)
size = roundupSize / TARGET_POINTER_SIZE; // Normalize size to number of pointer sized items
}
}
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add comment to endif

#endif // !UNIX_AMD64_ABI

#ifdef FEATURE_HFA
if (!passUsingFloatRegs)
{
// Note on Arm32 a HFA is passed in int regs for varargs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently is currently correct for Arm32 and Arm64

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that this is Windows specific

// We are iterating over the arguments only.
assert(varDsc->lvIsParam);
// We are iterating over the arguments only.
assert(varDsc->lvIsParam);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With /GSChecks we can make a "shadow" copy of an incoming struct argument.
Basically we are copying it from the incoming Arg Space into a new struct local that is placed so that
it will catch any buffer overruns into its storage. Something to think about with this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Brian. Just to make sure I understand - this would be a "new" local though, right? So that the original varDsc would still have the original offset. If that's the case, then I don't think that would impact this. Otherwise, this would have a problem because it assumes that they are laid out in order.

Copy link

@briansull briansull Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a new local, I'm not sure about all the details here, but the IL references to the arg are rewrtten to refer to the new local.

There was a recent bug where we struct promotion interacted badly and it and end up with reference to both the original copy and to the individual promoted fields.

A fix and test case for the issue is here: #17329

}
}
#endif // !_TARGET_X86_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring here.

@CarolEidt
Copy link
Author

The arm failures in the previous round were due to an overly aggressive assert in checkIsStruct() for ARM where a stack-passed struct is passed as TYP_DOUBLE.
I believe I've addressed all the feedback.
Once the CI tests pass, I'll run pri1 tests.

@jashook
Copy link

jashook commented Jun 11, 2018

@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test
@dotnet-bot test Windows_NT arm64 Cross Checked normal Build and Test
@dotnet-bot test Windows_NT arm Cross Checked normal Build and Test
@dotnet-bot test Windows_NT x64 Build and Test
@dotnet-bot test Windows_NT x86 Checked Build and Test
@dotnet-bot test Ubuntu x64 Build and Test

@jashook
Copy link

jashook commented Jun 11, 2018

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test

@CarolEidt
Copy link
Author

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test

@jashook
Copy link

jashook commented Jun 12, 2018

@dotnet-bot test Windows_NT arm64 Cross Checked normal Build and Test
@dotnet-bot test Windows_NT arm Cross Checked normal Build and Test
@dotnet-bot test Windows_NT x64 Build and Test
@dotnet-bot test Windows_NT x86 Checked Build and Test
@dotnet-bot test Ubuntu x64 Build and Test

@CarolEidt
Copy link
Author

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test

@CarolEidt CarolEidt merged commit d28957d into dotnet:master Jun 12, 2018
@CarolEidt CarolEidt deleted the fgMorphArgs1 branch August 31, 2018 17:49
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Unify struct arg handling

Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication.
Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid
requiring a copy and/or marking `lvDoNotEnregister` for those cases.

Unify the specification of multi-reg args:
- numRegs now indicates the actual number of reg args (not the size in pointer-size units)
- regNums contains all the arg register numbers



Commit migrated from dotnet/coreclr@d28957d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants