Skip to content

Commit

Permalink
Remove extra UnmanagedCallersOnly overhead on x86 (dotnet#46238)
Browse files Browse the repository at this point in the history
* Implement emitting an unmanaged calling convention entry point with the correct argument order and register usage on x86.

* Move Unix x86 to the UnmanagedCallersOnly plan now that we don't need to do argument shuffling.

* Add SEH hookup and profiler/debugger hooks to Reverse P/Invoke entry helper to match custom x86 thunk.

Fixes dotnet#46177

* Remove Windows x86 assembly stub for individual reverse p/invokes. Move Windows x86 unmanaged callers only to not have extra overhead and put reverse P/Invoke stubs for Windows x86 on the UnmanagedCallersOnly plan.

* Further cleanup

* Remove extraneous UnmanagedCallersOnly block now that x86 UnmanagedCallersOnly has been simplified.

* Undo ArgOrder size specifier since it isn't needed and it doesn't work.

* Fix copy constructor reverse marshalling. Now that we don't have the emitted unmanaged thunk stub, we need to handle the x86 differences for copy-constructed parameters in the IL stub.

* Fix version guid syntax.

* Remove FastNExportHandler.

* Revert "Remove FastNExportHandler."

This reverts commit 423f70e.

* Fix setting up entry frame for new thread.

* Allow the NExportSEH record to live below ESP so we don't need to create a new stack frame.

* Fix formatting.

* Assign an offset for the return buffer on x86 since it might come in on the stack.

* Make sure we use the TC block we just put in on x86 as well.

* Shrink the ReversePInvokeFrame on non-x86 back to master's size.

* Fix arch-specific R2R constant.

* Pass the return address of the ReversePInvokeEnter helper to TraceCall instead of the entry point and call TraceCall from all JIT_ReversePInvokeEnter* helpers.

* Fix ILVerification and ILVerify

* fix R2R constants for crossgen1

* Don't assert ReversePInvokeFrame size for cross-bitness scenarios.
  • Loading branch information
jkoritzinsky authored Jan 20, 2021
1 parent 5aef85a commit 5ec0c7a
Show file tree
Hide file tree
Showing 40 changed files with 326 additions and 1,491 deletions.
8 changes: 6 additions & 2 deletions src/coreclr/inc/corinfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -700,12 +700,16 @@ enum class CorInfoCallConvExtension
// New calling conventions supported with the extensible calling convention encoding go here.
};

#ifdef UNIX_X86_ABI
#ifdef TARGET_X86
inline bool IsCallerPop(CorInfoCallConvExtension callConv)
{
#ifdef UNIX_X86_ABI
return callConv == CorInfoCallConvExtension::Managed || callConv == CorInfoCallConvExtension::C;
}
#else
return callConv == CorInfoCallConvExtension::C;
#endif // UNIX_X86_ABI
}
#endif

// Determines whether or not this calling convention is an instance method calling convention.
inline bool callConvIsInstanceMethodCallConv(CorInfoCallConvExtension callConv)
Expand Down
13 changes: 7 additions & 6 deletions src/coreclr/inc/jiteeversionguid.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
// be changed. This is the identifier verified by ICorJitCompiler::getVersionIdentifier().
//
// You can use "uuidgen.exe -s" to generate this value.
//
//
// Note that this file is parsed by some tools, namely superpmi.py, so make sure the first line is exactly
// of the form:
//
Expand All @@ -30,12 +30,13 @@
// NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////
//

constexpr GUID JITEEVersionIdentifier = { /* f556df6c-b9c7-479c-b895-8e1f1959fe59 */
0xf556df6c,
0xb9c7,
0x479c,
{0xb8, 0x95, 0x8e, 0x1f, 0x19, 0x59, 0xfe, 0x59}
constexpr GUID JITEEVersionIdentifier = { /* 768493d2-21cb-41e6-b06d-e62131fd0fc2 */
0x768493d2,
0x21cb,
0x41e6,
{0xb0, 0x6d, 0xe6, 0x21, 0x31, 0xfd, 0x0f, 0xc2}
};

//////////////////////////////////////////////////////////////////////////////////////////////////////////
Expand Down
6 changes: 5 additions & 1 deletion src/coreclr/inc/readytorun.h
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,11 @@ struct READYTORUN_EXCEPTION_CLAUSE
enum ReadyToRunRuntimeConstants : DWORD
{
READYTORUN_PInvokeTransitionFrameSizeInPointerUnits = 11,
READYTORUN_ReversePInvokeTransitionFrameSizeInPointerUnits = 2
#ifdef TARGET_X86
READYTORUN_ReversePInvokeTransitionFrameSizeInPointerUnits = 5,
#else
READYTORUN_ReversePInvokeTransitionFrameSizeInPointerUnits = 2,
#endif
};

enum ReadyToRunHFAElemType : DWORD
Expand Down
2 changes: 0 additions & 2 deletions src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8896,10 +8896,8 @@ void CodeGen::genFnEpilog(BasicBlock* block)
if (compiler->info.compIsVarArgs)
fCalleePop = false;

#ifdef UNIX_X86_ABI
if (IsCallerPop(compiler->info.compCallConv))
fCalleePop = false;
#endif // UNIX_X86_ABI

if (fCalleePop)
{
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6184,10 +6184,12 @@ int Compiler::compCompileHelper(CORINFO_MODULE_HANDLE classPtr,
{
bool unused;
info.compCallConv = info.compCompHnd->getUnmanagedCallConv(methodInfo->ftn, nullptr, &unused);
info.compArgOrder = Target::g_tgtUnmanagedArgOrder;
}
else
{
info.compCallConv = CorInfoCallConvExtension::Managed;
info.compArgOrder = Target::g_tgtArgOrder;
}

info.compIsVarArgs = false;
Expand Down
10 changes: 10 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -9379,6 +9379,8 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
// current number of EH clauses (after additions like synchronized
// methods and funclets, and removals like unreachable code deletion).

Target::ArgOrder compArgOrder;

bool compMatchedVM; // true if the VM is "matched": either the JIT is a cross-compiler
// and the VM expects that, or the JIT is a "self-host" compiler
// (e.g., x86 hosted targeting x86) and the VM expects that.
Expand Down Expand Up @@ -9458,6 +9460,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
return (info.compRetBuffArg != BAD_VAR_NUM);
}
#endif // TARGET_WINDOWS && TARGET_ARM64
// 4. x86 unmanaged calling conventions require the address of RetBuff to be returned in eax.
CLANG_FORMAT_COMMENT_ANCHOR;
#if defined(TARGET_X86)
if (info.compCallConv != CorInfoCallConvExtension::Managed)
{
return (info.compRetBuffArg != BAD_VAR_NUM);
}
#endif

return false;
#endif // TARGET_AMD64
Expand Down
24 changes: 20 additions & 4 deletions src/coreclr/jit/flowgraph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8683,13 +8683,29 @@ void Compiler::fgAddReversePInvokeEnterExit()
varDsc->lvType = TYP_BLK;
varDsc->lvExactSize = eeGetEEInfo()->sizeOfReversePInvokeFrame;

GenTree* tree;

// Add enter pinvoke exit callout at the start of prolog

tree = gtNewOperNode(GT_ADDR, TYP_I_IMPL, gtNewLclvNode(lvaReversePInvokeFrameVar, TYP_BLK));
GenTree* pInvokeFrameVar = gtNewOperNode(GT_ADDR, TYP_I_IMPL, gtNewLclvNode(lvaReversePInvokeFrameVar, TYP_BLK));

GenTree* stubArgument;

if (info.compPublishStubParam)
{
// If we have a secret param for a Reverse P/Invoke, that means that we are in an IL stub.
// In this case, the method handle we pass down to the Reverse P/Invoke helper should be
// the target method, which is passed in the secret parameter.
stubArgument = gtNewLclvNode(lvaStubArgumentVar, TYP_I_IMPL);
}
else
{
stubArgument = gtNewIconNode(0, TYP_I_IMPL);
}

GenTree* tree;

GenTreeCall::Use* args = gtNewCallArgs(pInvokeFrameVar, gtNewIconEmbMethHndNode(info.compMethodHnd), stubArgument);

tree = gtNewHelperCallNode(CORINFO_HELP_JIT_REVERSE_PINVOKE_ENTER, TYP_VOID, gtNewCallArgs(tree));
tree = gtNewHelperCallNode(CORINFO_HELP_JIT_REVERSE_PINVOKE_ENTER, TYP_VOID, args);

fgEnsureFirstBBisScratch();

Expand Down
6 changes: 6 additions & 0 deletions src/coreclr/jit/importer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17334,6 +17334,12 @@ bool Compiler::impReturnInstruction(int prefixFlags, OPCODE& opcode)
{
op1 = gtNewOperNode(GT_RETURN, TYP_BYREF, gtNewLclvNode(info.compRetBuffArg, TYP_BYREF));
}
#endif
#if defined(TARGET_X86)
else if (info.compCallConv != CorInfoCallConvExtension::Managed)
{
op1 = gtNewOperNode(GT_RETURN, TYP_BYREF, gtNewLclvNode(info.compRetBuffArg, TYP_BYREF));
}
#endif
else
{
Expand Down
59 changes: 42 additions & 17 deletions src/coreclr/jit/lclvars.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,29 @@ void Compiler::lvaInitTypeRef()
//-------------------------------------------------------------------------

InitVarDscInfo varDscInfo;
varDscInfo.Init(lvaTable, hasRetBuffArg);
#ifdef TARGET_X86
// x86 unmanaged calling conventions limit the number of registers supported
// for accepting arguments. As a result, we need to modify the number of registers
// when we emit a method with an unmanaged calling convention.
switch (info.compCallConv)
{
case CorInfoCallConvExtension::Thiscall:
// In thiscall the this parameter goes into a register.
varDscInfo.Init(lvaTable, hasRetBuffArg, 1, 0);
break;
case CorInfoCallConvExtension::C:
case CorInfoCallConvExtension::Stdcall:
varDscInfo.Init(lvaTable, hasRetBuffArg, 0, 0);
break;
case CorInfoCallConvExtension::Managed:
case CorInfoCallConvExtension::Fastcall:
default:
varDscInfo.Init(lvaTable, hasRetBuffArg, MAX_REG_ARG, MAX_FLOAT_REG_ARG);
break;
}
#else
varDscInfo.Init(lvaTable, hasRetBuffArg, MAX_REG_ARG, MAX_FLOAT_REG_ARG);
#endif

lvaInitArgs(&varDscInfo);

Expand Down Expand Up @@ -513,14 +535,16 @@ void Compiler::lvaInitRetBuffArg(InitVarDscInfo* varDscInfo, bool useFixedRetBuf
info.compRetBuffArg = varDscInfo->varNum;
varDsc->lvType = TYP_BYREF;
varDsc->lvIsParam = 1;
varDsc->lvIsRegArg = 1;
varDsc->lvIsRegArg = 0;

if (useFixedRetBufReg && hasFixedRetBuffReg())
{
varDsc->lvIsRegArg = 1;
varDsc->SetArgReg(theFixedRetBuffReg());
}
else
else if (varDscInfo->canEnreg(TYP_INT))
{
varDsc->lvIsRegArg = 1;
unsigned retBuffArgNum = varDscInfo->allocRegArg(TYP_INT);
varDsc->SetArgReg(genMapIntRegArgNumToRegNum(retBuffArgNum));
}
Expand Down Expand Up @@ -557,10 +581,10 @@ void Compiler::lvaInitRetBuffArg(InitVarDscInfo* varDscInfo, bool useFixedRetBuf
}
#endif // FEATURE_SIMD

assert(isValidIntArgReg(varDsc->GetArgReg()));
assert(!varDsc->lvIsRegArg || isValidIntArgReg(varDsc->GetArgReg()));

#ifdef DEBUG
if (verbose)
if (varDsc->lvIsRegArg && verbose)
{
printf("'__retBuf' passed in register %s\n", getRegName(varDsc->GetArgReg()));
}
Expand Down Expand Up @@ -591,7 +615,10 @@ void Compiler::lvaInitUserArgs(InitVarDscInfo* varDscInfo, unsigned skipArgs, un

#if defined(TARGET_X86)
// Only (some of) the implicit args are enregistered for varargs
varDscInfo->maxIntRegArgNum = info.compIsVarArgs ? varDscInfo->intRegArgNum : MAX_REG_ARG;
if (info.compIsVarArgs)
{
varDscInfo->maxIntRegArgNum = varDscInfo->intRegArgNum;
}
#elif defined(TARGET_AMD64) && !defined(UNIX_AMD64_ABI)
// On System V type environment the float registers are not indexed together with the int ones.
varDscInfo->floatRegArgNum = varDscInfo->intRegArgNum;
Expand Down Expand Up @@ -5345,7 +5372,7 @@ void Compiler::lvaAssignVirtualFrameOffsetsToArgs()
This is all relative to our Virtual '0'
*/

if (Target::g_tgtArgOrder == Target::ARG_ORDER_L2R)
if (info.compArgOrder == Target::ARG_ORDER_L2R)
{
argOffs = compArgSize;
}
Expand All @@ -5357,9 +5384,10 @@ void Compiler::lvaAssignVirtualFrameOffsetsToArgs()
noway_assert(compArgSize >= codeGen->intRegState.rsCalleeRegArgCount * REGSIZE_BYTES);
#endif

#ifdef TARGET_X86
argOffs -= codeGen->intRegState.rsCalleeRegArgCount * REGSIZE_BYTES;
#endif
if (info.compArgOrder == Target::ARG_ORDER_L2R)
{
argOffs -= codeGen->intRegState.rsCalleeRegArgCount * REGSIZE_BYTES;
}

// Update the arg initial register locations.
lvaUpdateArgsWithInitialReg();
Expand Down Expand Up @@ -5398,11 +5426,8 @@ void Compiler::lvaAssignVirtualFrameOffsetsToArgs()
if (info.compRetBuffArg != BAD_VAR_NUM)
{
noway_assert(lclNum == info.compRetBuffArg);
noway_assert(lvaTable[lclNum].lvIsRegArg);
#ifndef TARGET_X86
argOffs =
lvaAssignVirtualFrameOffsetToArg(lclNum, REGSIZE_BYTES, argOffs UNIX_AMD64_ABI_ONLY_ARG(&callerArgOffset));
#endif // TARGET_X86
lclNum++;
}

Expand Down Expand Up @@ -5553,7 +5578,7 @@ int Compiler::lvaAssignVirtualFrameOffsetToArg(unsigned lclNum,
noway_assert(lclNum < info.compArgsCount);
noway_assert(argSize);

if (Target::g_tgtArgOrder == Target::ARG_ORDER_L2R)
if (info.compArgOrder == Target::ARG_ORDER_L2R)
{
argOffs -= argSize;
}
Expand Down Expand Up @@ -5621,7 +5646,7 @@ int Compiler::lvaAssignVirtualFrameOffsetToArg(unsigned lclNum,
}
}

if (Target::g_tgtArgOrder == Target::ARG_ORDER_R2L && !varDsc->lvIsRegArg)
if (info.compArgOrder == Target::ARG_ORDER_R2L && !varDsc->lvIsRegArg)
{
argOffs += argSize;
}
Expand All @@ -5646,7 +5671,7 @@ int Compiler::lvaAssignVirtualFrameOffsetToArg(unsigned lclNum,
noway_assert(lclNum < info.compArgsCount);
noway_assert(argSize);

if (Target::g_tgtArgOrder == Target::ARG_ORDER_L2R)
if (info.compArgOrder == Target::ARG_ORDER_L2R)
{
argOffs -= argSize;
}
Expand Down Expand Up @@ -5925,7 +5950,7 @@ int Compiler::lvaAssignVirtualFrameOffsetToArg(unsigned lclNum,
}
}

if (Target::g_tgtArgOrder == Target::ARG_ORDER_R2L && !varDsc->lvIsRegArg)
if (info.compArgOrder == Target::ARG_ORDER_R2L && !varDsc->lvIsRegArg)
{
argOffs += argSize;
}
Expand Down
6 changes: 3 additions & 3 deletions src/coreclr/jit/register_arg_convention.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,15 @@ struct InitVarDscInfo

public:
// set to initial values
void Init(LclVarDsc* lvaTable, bool _hasRetBufArg)
void Init(LclVarDsc* lvaTable, bool _hasRetBufArg, unsigned _maxIntRegArgNum, unsigned _maxFloatRegArgNum)
{
hasRetBufArg = _hasRetBufArg;
varDsc = &lvaTable[0]; // the first argument LclVar 0
varNum = 0; // the first argument varNum 0
intRegArgNum = 0;
floatRegArgNum = 0;
maxIntRegArgNum = MAX_REG_ARG;
maxFloatRegArgNum = MAX_FLOAT_REG_ARG;
maxIntRegArgNum = _maxIntRegArgNum;
maxFloatRegArgNum = _maxFloatRegArgNum;

#ifdef TARGET_ARM
fltArgSkippedRegMask = RBM_NONE;
Expand Down
3 changes: 2 additions & 1 deletion src/coreclr/jit/target.h
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ typedef unsigned char regNumberSmall;
#define FIRST_ARG_STACK_OFFS (2*REGSIZE_BYTES) // Caller's saved EBP and return address

#define MAX_REG_ARG 2

#define MAX_FLOAT_REG_ARG 0
#define REG_ARG_FIRST REG_ECX
#define REG_ARG_LAST REG_EDX
Expand Down Expand Up @@ -1620,6 +1620,7 @@ class Target
ARG_ORDER_L2R
};
static const enum ArgOrder g_tgtArgOrder;
static const enum ArgOrder g_tgtUnmanagedArgOrder;
};

#if defined(DEBUG) || defined(LATE_DISASM) || DUMP_GC_TABLES
Expand Down
5 changes: 3 additions & 2 deletions src/coreclr/jit/targetamd64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@

#include "target.h"

const char* Target::g_tgtCPUName = "x64";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_R2L;
const char* Target::g_tgtCPUName = "x64";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_R2L;
const Target::ArgOrder Target::g_tgtUnmanagedArgOrder = ARG_ORDER_R2L;

// clang-format off
#ifdef UNIX_AMD64_ABI
Expand Down
5 changes: 3 additions & 2 deletions src/coreclr/jit/targetarm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@

#include "target.h"

const char* Target::g_tgtCPUName = "arm";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_R2L;
const char* Target::g_tgtCPUName = "arm";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_R2L;
const Target::ArgOrder Target::g_tgtUnmanagedArgOrder = ARG_ORDER_R2L;

// clang-format off
const regNumber intArgRegs [] = {REG_R0, REG_R1, REG_R2, REG_R3};
Expand Down
5 changes: 3 additions & 2 deletions src/coreclr/jit/targetarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@

#include "target.h"

const char* Target::g_tgtCPUName = "arm64";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_R2L;
const char* Target::g_tgtCPUName = "arm64";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_R2L;
const Target::ArgOrder Target::g_tgtUnmanagedArgOrder = ARG_ORDER_R2L;

// clang-format off
const regNumber intArgRegs [] = {REG_R0, REG_R1, REG_R2, REG_R3, REG_R4, REG_R5, REG_R6, REG_R7};
Expand Down
5 changes: 3 additions & 2 deletions src/coreclr/jit/targetx86.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@

#include "target.h"

const char* Target::g_tgtCPUName = "x86";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_L2R;
const char* Target::g_tgtCPUName = "x86";
const Target::ArgOrder Target::g_tgtArgOrder = ARG_ORDER_L2R;
const Target::ArgOrder Target::g_tgtUnmanagedArgOrder = ARG_ORDER_R2L;

// clang-format off
const regNumber intArgRegs [] = {REG_ECX, REG_EDX};
Expand Down
Loading

0 comments on commit 5ec0c7a

Please sign in to comment.