-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISC-V] ELT Profiler Bring-Up #91313
Conversation
Initial implementation based on ARM64 code.
* Fix argument registers according to RISC-V calling convention * Fix field offsets for PROFILE_PLATFORM_SPECIFIC_DATA * Make sure field offsets for PROFILE_PLATFORM_SPECIFIC_DATA stay fixed by static asserting the offsets in asmconstants.h
…in t0 and t1 because t2 is used to store the call address of the stub
…n see whole struct arguments laid out in memory
Since the RISC-V ABI says values are returned like the first named argument, re-use the struct copying routine from argument parsing as much as possible.
…filer::Shutdown()
* Remove unused t0 field * Remove 'unused' field and widen 'flags' to 64 bits to maintain alignment and shave off one sw instruction
Tagging subscribers to this area: @tommcdon Issue DetailsInitial implementation of ELT profiler for RISC-V. Fixes tests:
Part of #84834
|
@jkotas @jakobbotsch PR ready for review. |
// Profiling | ||
//********************************************************************** | ||
|
||
#ifdef PROFILING_SUPPORTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it is better to put to original vm/riscv64/profiler.cpp
back for consistency with other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting it back in profiler.cpp
also means we're back to hardcoded field offsets in asm stubs, which are easy to get wrong. Using constants pinned to the actual C struct by static asserts in asmconstants.h
is a net improvement, even though the code differs somewhat from other platforms.
return (LPVOID)pData->argumentRegisters.a[0]; | ||
} | ||
// On RISC-V the method is not required to preserve the return buffer address passed in a0. | ||
// However, JIT does that anyway if leave hook needs to be generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the assert _ASSERTE((pData->flags & PROFILE_LEAVE) != 0);
if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
src/coreclr/jit/codegenriscv64.cpp
Outdated
if (compiler->compProfilerMethHndIndirected) | ||
{ | ||
instGen_Set_Reg_To_Imm(EA_PTR_DSP_RELOC, REG_PROFILER_ENTER_ARG_FUNC_ID, methHnd); | ||
GetEmitter()->emitIns_R_R(INS_ld, EA_PTRSIZE, REG_PROFILER_ENTER_ARG_FUNC_ID, REG_PROFILER_ENTER_ARG_FUNC_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In RISC-V, emitIns_R_R_I
is used for INS_ld
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will do.
src/coreclr/jit/codegenriscv64.cpp
Outdated
if (compiler->compProfilerMethHndIndirected) | ||
{ | ||
instGen_Set_Reg_To_Imm(EA_PTR_DSP_RELOC, REG_PROFILER_LEAVE_ARG_FUNC_ID, methHnd); | ||
GetEmitter()->emitIns_R_R(INS_ld, EA_PTRSIZE, REG_PROFILER_LEAVE_ARG_FUNC_ID, REG_PROFILER_LEAVE_ARG_FUNC_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace to emitIns_R_R_I
#undef ASMCONSTANTS_C_ASSERT_OFFSET | ||
|
||
#endif // PROFILING_SUPPORTED | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not sync with other plaforms. IMO, I think it is better to update with all other platforms in this case. @jkotas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in the other comment on cgencpu.h
, IMO having the field offsets pinned to the C struct with static asserts is a net improvement over hardcoded offsets in asm.
I wanted to limit the impact of the change to RISC-V since this PR is large enough. But I can do the same for other platforms if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I understand your concerns. However, I prefer that the same structures and functions should be in the same file for all platforms as much as possible. So I want you to refactoring all the other platforms in another PR. (or you can focus bug-fix only in this PR. Then updates all (including RISC-V) in another PR.) Thank you.
@jkotas Could you give any comment about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be nice to use named constants that are validated at compile time, across all platforms. This cleanup can be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll do an analogous cleanup for PROFILE_PLATFORM_SPECIFIC_DATA
in a separate PR.
PROFILE_PLATFORM_SPECIFIC_DATA* pData = reinterpret_cast<PROFILE_PLATFORM_SPECIFIC_DATA*>(m_handle); | ||
|
||
struct { bool isFloat, is8; } fields[] = { | ||
{ sir->m_structFields & (STRUCT_FLOAT_FIELD_FIRST | STRUCT_FLOAT_FIELD_ONLY_TWO | STRUCT_FLOAT_FIELD_ONLY_ONE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my guess, fields[0]
is for the first one in the field. If then, I think STRUCT_FLOAT_FIELD_ONLY_ONE
can be set when only second field is FLOAT
in GetRiscv64PassStructInRegisterFlags
. (I don't know it can actually happen. I just searched.) Could you check again?
runtime/src/coreclr/vm/methodtable.cpp
Lines 3783 to 3794 in 6d3be9e
int size2 = GetRiscv64PassStructInRegisterFlags((CORINFO_CLASS_HANDLE)pMethodTable); | |
if ((size2 & STRUCT_FLOAT_FIELD_ONLY_ONE) != 0) | |
{ | |
if (pFieldSecond[0].GetSize() == 8) | |
{ | |
size = size & STRUCT_FLOAT_FIELD_FIRST ? (size ^ STRUCT_MERGE_FIRST_SECOND_8) : (size | STRUCT_SECOND_FIELD_DOUBLE); | |
} | |
else | |
{ | |
size = size & STRUCT_FLOAT_FIELD_FIRST ? (size ^ STRUCT_MERGE_FIRST_SECOND) : (size | STRUCT_FLOAT_FIELD_SECOND); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passes the MixedStructFunc
test case from slowpathcommon.cs
and it takes a struct with second float field only. But I'll check if it ever happens.
m_bufferPos = alignedTo8; | ||
const INT64* src = | ||
inFloatReg ? (const INT64*)fReg++ : | ||
inGenReg ? aReg++ : (const INT64*)Func::postIncrement(stack, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When field[i].isFloat
is true and fReg < fRegEnd
is false, src
can be a aReg
which is an integer register. Please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intentional to cover this case from RISC-V ABI, 2.2 Hardware Floating-point Calling Convention:
A real floating-point argument is passed in a floating-point argument register if it is no more than
ABI_FLEN bits wide and at least one floating-point argument register is available. Otherwise, it is
passed according to the integer calling convention.
sir.m_byteStackIndex = 0; | ||
sir.m_byteStackSize = -1; | ||
sir.m_structFields = returnFlags; | ||
return CopyStructFromRegisters(&sir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in most cases except VALUE_TYPE
, it can avoid CopyStructFromRegisters
like you did in GetNextArgAddr
. AndfpReturnSize
looks better than returnFlags
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll avoid CopyStructFromRegisters
if a pointer to argument registers can be returned.
void* hiddenArg; | ||
UINT64 flags; | ||
// Scratch space to reconstruct struct passed in registers | ||
BYTE buffer[sizeof(ArgumentRegisters) + sizeof(FloatArgumentRegisters)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it sufficient for all arguments and return? CopyStructFromRegisters
can copy argument registers, return registers and stacks. Could you please check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it will suffice. For return values we have a separate structure allocated by the ProfileLeaveNaked
stub. So worst case scenario would be a function with 8 or more arguments of type struct LongDouble { long; double; }
to use all argument registers for arguments needed to be reconstructed in scratch space.
I'll check the edge-cases where the last struct in registers needs to be partially on the stack, e.g. func(int i, LongDouble ld1 ... ld8)
but in that case IMO we still fit within the limit.
|
||
sd zero, 240(sp) // Clear hiddenArg. | ||
SAVE_ARGUMENT_REGISTERS sp, PROFILE_PLATFORM_SPECIFIC_DATA__argumentRegisters | ||
sd zero, PROFILE_PLATFORM_SPECIFIC_DATA__functionId(sp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIZEOF__PROFILE_PLATFORM_SPECIFIC_DATA
is 80
but was 88
, why it was changed? And in common why it's so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean PROFILE_PLATFORM_SPECIFIC_DATA__functionId
? The hitherto PROFILE_PLATFORM_SPECIFIC_DATA
had a x8
field in front of it, which probably was a copy-paste from ARM64. We don't need it in RISC-V so I removed it and all the following fields moved up by 8 bytes.
Now all the offset constants are static asserted with the original C struct so it won't compile if there's a mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, I mean PROFILE_PLATFORM_SPECIFIC_DATA__functionId
.
What the purpose of x8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to ARM64 calling convention x8 stores the return buffer address. On RISC-V it's passed as an implicit first parameter, i.e. a0.
addi t6, zero, \flags | ||
sw t6, 248(sp) // Save flags. | ||
sw zero, 252(sp) // clear unused field. | ||
sd t6, PROFILE_PLATFORM_SPECIFIC_DATA__flags(sp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And other offsets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto my answer on PROFILE_PLATFORM_SPECIFIC_DATA__functionId
.
// t1 = functionIDOrClientID | ||
// t2 = profiledSp | ||
// t0 = functionIDOrClientID | ||
// t1 = profiledSp | ||
// t6 = throwable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find who set arguments. You've changed register, should it be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments are set in CodeGen::genProfiling(Enter|Leave)Callback
in codegenriscv64.cpp
. Yes, they should and are updated.
The reason I changed the registers is because t2 is used on RISC-V to store the call address of the stub by genEmitHelperCall
.
src/coreclr/vm/riscv64/profiler.cpp
Outdated
const double *fRegBegin = &pData->floatArgumentRegisters.f[sir->m_idxFloatReg], *fReg = fRegBegin; | ||
const double *fRegEnd = fReg + sizeof(pData->floatArgumentRegisters.f)/sizeof(pData->floatArgumentRegisters.f[0]); | ||
const INT64 *aRegBegin = &pData->argumentRegisters.a[sir->m_idxGenReg], *aReg = aRegBegin; | ||
const INT64 *aRegEnd = aReg + sizeof(pData->argumentRegisters.a)/sizeof(pData->argumentRegisters.a[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use NUM_ARGUMENT_REGISTERS
and NUM_FLOAT_ARGUMENT_REGISTERS
.
And updates ArgumentRegisters
and FloatArgumentRegisters
structures to use the definitions instead of number 8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
src/coreclr/vm/riscv64/profiler.cpp
Outdated
const double *fRegBegin = &pData->floatArgumentRegisters.f[sir->m_idxFloatReg], *fReg = fRegBegin; | ||
const double *fRegEnd = fReg + NUM_FLOAT_ARGUMENT_REGISTERS; | ||
const INT64 *aRegBegin = &pData->argumentRegisters.a[sir->m_idxGenReg], *aReg = aRegBegin; | ||
const INT64 *aRegEnd = aReg + NUM_ARGUMENT_REGISTERS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like aRegEnd = &pData->argumentRegisters.a[0] + NUM_ARGUMENT_REGISTERS
and fRegEnd = &pData->floatArgumentRegisters.f[0] + NUM_FLOAT_ARGUMENT_REGISTERS
are correct. Could you check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, thanks!
8a229bf
to
1f46d7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much!!!
@davmason @jakobbotsch could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT changes look ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The profiler specific changes look good to me, I don't know enough about risc-v or the jit to comment on those changes.
Thank you all |
Initial implementation of ELT profiler for RISC-V. Fixes tests:
Part of #84834
cc @wscho77 @HJLeee @JongHeonChoi @t-mustafin @alpencolt @gbalykov @clamp03