Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] ELT Profiler Bring-Up #91313

Merged
merged 12 commits into from
Sep 5, 2023
Merged

[RISC-V] ELT Profiler Bring-Up #91313

merged 12 commits into from
Sep 5, 2023

Conversation

tomeksowi
Copy link
Contributor

Initial implementation of ELT profiler for RISC-V. Fixes tests:

profiler/elt/slowpatheltenter/slowpatheltenter.sh
profiler/elt/slowpatheltleave/slowpatheltleave.sh
profiler/unittest/inlining/inlining.sh

Part of #84834
cc @wscho77 @HJLeee @JongHeonChoi @t-mustafin @alpencolt @gbalykov @clamp03

Initial implementation based on ARM64 code.
* Fix argument registers according to RISC-V calling convention
* Fix field offsets for PROFILE_PLATFORM_SPECIFIC_DATA
* Make sure field offsets for PROFILE_PLATFORM_SPECIFIC_DATA stay fixed by static asserting the offsets in asmconstants.h
…in t0 and t1 because t2 is used to store the call address of the stub
…n see whole struct arguments laid out in memory
Since the RISC-V ABI says values are returned like the first named argument, re-use the struct copying routine from argument parsing as much as possible.
* Remove unused t0 field
* Remove 'unused' field and widen 'flags' to 64 bits to maintain alignment and shave off one sw instruction
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 30, 2023
@ghost
Copy link

ghost commented Aug 30, 2023

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Initial implementation of ELT profiler for RISC-V. Fixes tests:

profiler/elt/slowpatheltenter/slowpatheltenter.sh
profiler/elt/slowpatheltleave/slowpatheltleave.sh
profiler/unittest/inlining/inlining.sh

Part of #84834
cc @wscho77 @HJLeee @JongHeonChoi @t-mustafin @alpencolt @gbalykov @clamp03

Author: tomeksowi
Assignees: -
Labels:

area-Diagnostics-coreclr, community-contribution

Milestone: -

@clamp03 clamp03 added the arch-riscv Related to the RISC-V architecture label Aug 30, 2023
@tomeksowi
Copy link
Contributor Author

@jkotas @jakobbotsch PR ready for review.

// Profiling
//**********************************************************************

#ifdef PROFILING_SUPPORTED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it is better to put to original vm/riscv64/profiler.cpp back for consistency with other platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it back in profiler.cpp also means we're back to hardcoded field offsets in asm stubs, which are easy to get wrong. Using constants pinned to the actual C struct by static asserts in asmconstants.h is a net improvement, even though the code differs somewhat from other platforms.

return (LPVOID)pData->argumentRegisters.a[0];
}
// On RISC-V the method is not required to preserve the return buffer address passed in a0.
// However, JIT does that anyway if leave hook needs to be generated.
Copy link
Member

@clamp03 clamp03 Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the assert _ASSERTE((pData->flags & PROFILE_LEAVE) != 0); if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

if (compiler->compProfilerMethHndIndirected)
{
instGen_Set_Reg_To_Imm(EA_PTR_DSP_RELOC, REG_PROFILER_ENTER_ARG_FUNC_ID, methHnd);
GetEmitter()->emitIns_R_R(INS_ld, EA_PTRSIZE, REG_PROFILER_ENTER_ARG_FUNC_ID, REG_PROFILER_ENTER_ARG_FUNC_ID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In RISC-V, emitIns_R_R_I is used for INS_ld

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do.

if (compiler->compProfilerMethHndIndirected)
{
instGen_Set_Reg_To_Imm(EA_PTR_DSP_RELOC, REG_PROFILER_LEAVE_ARG_FUNC_ID, methHnd);
GetEmitter()->emitIns_R_R(INS_ld, EA_PTRSIZE, REG_PROFILER_LEAVE_ARG_FUNC_ID, REG_PROFILER_LEAVE_ARG_FUNC_ID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace to emitIns_R_R_I

#undef ASMCONSTANTS_C_ASSERT_OFFSET

#endif // PROFILING_SUPPORTED

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not sync with other plaforms. IMO, I think it is better to update with all other platforms in this case. @jkotas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in the other comment on cgencpu.h, IMO having the field offsets pinned to the C struct with static asserts is a net improvement over hardcoded offsets in asm.

I wanted to limit the impact of the change to RISC-V since this PR is large enough. But I can do the same for other platforms if need be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I understand your concerns. However, I prefer that the same structures and functions should be in the same file for all platforms as much as possible. So I want you to refactoring all the other platforms in another PR. (or you can focus bug-fix only in this PR. Then updates all (including RISC-V) in another PR.) Thank you.
@jkotas Could you give any comment about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice to use named constants that are validated at compile time, across all platforms. This cleanup can be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll do an analogous cleanup for PROFILE_PLATFORM_SPECIFIC_DATA in a separate PR.

PROFILE_PLATFORM_SPECIFIC_DATA* pData = reinterpret_cast<PROFILE_PLATFORM_SPECIFIC_DATA*>(m_handle);

struct { bool isFloat, is8; } fields[] = {
{ sir->m_structFields & (STRUCT_FLOAT_FIELD_FIRST | STRUCT_FLOAT_FIELD_ONLY_TWO | STRUCT_FLOAT_FIELD_ONLY_ONE),
Copy link
Member

@clamp03 clamp03 Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my guess, fields[0] is for the first one in the field. If then, I think STRUCT_FLOAT_FIELD_ONLY_ONE can be set when only second field is FLOAT in GetRiscv64PassStructInRegisterFlags. (I don't know it can actually happen. I just searched.) Could you check again?

int size2 = GetRiscv64PassStructInRegisterFlags((CORINFO_CLASS_HANDLE)pMethodTable);
if ((size2 & STRUCT_FLOAT_FIELD_ONLY_ONE) != 0)
{
if (pFieldSecond[0].GetSize() == 8)
{
size = size & STRUCT_FLOAT_FIELD_FIRST ? (size ^ STRUCT_MERGE_FIRST_SECOND_8) : (size | STRUCT_SECOND_FIELD_DOUBLE);
}
else
{
size = size & STRUCT_FLOAT_FIELD_FIRST ? (size ^ STRUCT_MERGE_FIRST_SECOND) : (size | STRUCT_FLOAT_FIELD_SECOND);
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passes the MixedStructFunc test case from slowpathcommon.cs and it takes a struct with second float field only. But I'll check if it ever happens.

m_bufferPos = alignedTo8;
const INT64* src =
inFloatReg ? (const INT64*)fReg++ :
inGenReg ? aReg++ : (const INT64*)Func::postIncrement(stack, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When field[i].isFloat is true and fReg < fRegEnd is false, src can be a aReg which is an integer register. Please check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional to cover this case from RISC-V ABI, 2.2 Hardware Floating-point Calling Convention:

A real floating-point argument is passed in a floating-point argument register if it is no more than
ABI_FLEN bits wide and at least one floating-point argument register is available. Otherwise, it is
passed according to the integer calling convention.

sir.m_byteStackIndex = 0;
sir.m_byteStackSize = -1;
sir.m_structFields = returnFlags;
return CopyStructFromRegisters(&sir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in most cases except VALUE_TYPE, it can avoid CopyStructFromRegisters like you did in GetNextArgAddr. AndfpReturnSize looks better than returnFlags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll avoid CopyStructFromRegisters if a pointer to argument registers can be returned.

void* hiddenArg;
UINT64 flags;
// Scratch space to reconstruct struct passed in registers
BYTE buffer[sizeof(ArgumentRegisters) + sizeof(FloatArgumentRegisters)];
Copy link
Member

@clamp03 clamp03 Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it sufficient for all arguments and return? CopyStructFromRegisters can copy argument registers, return registers and stacks. Could you please check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it will suffice. For return values we have a separate structure allocated by the ProfileLeaveNaked stub. So worst case scenario would be a function with 8 or more arguments of type struct LongDouble { long; double; } to use all argument registers for arguments needed to be reconstructed in scratch space.

I'll check the edge-cases where the last struct in registers needs to be partially on the stack, e.g. func(int i, LongDouble ld1 ... ld8) but in that case IMO we still fit within the limit.


sd zero, 240(sp) // Clear hiddenArg.
SAVE_ARGUMENT_REGISTERS sp, PROFILE_PLATFORM_SPECIFIC_DATA__argumentRegisters
sd zero, PROFILE_PLATFORM_SPECIFIC_DATA__functionId(sp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIZEOF__PROFILE_PLATFORM_SPECIFIC_DATA is 80 but was 88, why it was changed? And in common why it's so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean PROFILE_PLATFORM_SPECIFIC_DATA__functionId? The hitherto PROFILE_PLATFORM_SPECIFIC_DATA had a x8 field in front of it, which probably was a copy-paste from ARM64. We don't need it in RISC-V so I removed it and all the following fields moved up by 8 bytes.

Now all the offset constants are static asserted with the original C struct so it won't compile if there's a mismatch.

Copy link

@alpencolt alpencolt Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I mean PROFILE_PLATFORM_SPECIFIC_DATA__functionId.
What the purpose of x8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to ARM64 calling convention x8 stores the return buffer address. On RISC-V it's passed as an implicit first parameter, i.e. a0.

addi t6, zero, \flags
sw t6, 248(sp) // Save flags.
sw zero, 252(sp) // clear unused field.
sd t6, PROFILE_PLATFORM_SPECIFIC_DATA__flags(sp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And other offsets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto my answer on PROFILE_PLATFORM_SPECIFIC_DATA__functionId.

// t1 = functionIDOrClientID
// t2 = profiledSp
// t0 = functionIDOrClientID
// t1 = profiledSp
// t6 = throwable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find who set arguments. You've changed register, should it be updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments are set in CodeGen::genProfiling(Enter|Leave)Callback in codegenriscv64.cpp. Yes, they should and are updated.

The reason I changed the registers is because t2 is used on RISC-V to store the call address of the stub by genEmitHelperCall.

const double *fRegBegin = &pData->floatArgumentRegisters.f[sir->m_idxFloatReg], *fReg = fRegBegin;
const double *fRegEnd = fReg + sizeof(pData->floatArgumentRegisters.f)/sizeof(pData->floatArgumentRegisters.f[0]);
const INT64 *aRegBegin = &pData->argumentRegisters.a[sir->m_idxGenReg], *aReg = aRegBegin;
const INT64 *aRegEnd = aReg + sizeof(pData->argumentRegisters.a)/sizeof(pData->argumentRegisters.a[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use NUM_ARGUMENT_REGISTERS and NUM_FLOAT_ARGUMENT_REGISTERS.
And updates ArgumentRegisters and FloatArgumentRegisters structures to use the definitions instead of number 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@tomeksowi tomeksowi requested a review from clamp03 September 1, 2023 08:39
const double *fRegBegin = &pData->floatArgumentRegisters.f[sir->m_idxFloatReg], *fReg = fRegBegin;
const double *fRegEnd = fReg + NUM_FLOAT_ARGUMENT_REGISTERS;
const INT64 *aRegBegin = &pData->argumentRegisters.a[sir->m_idxGenReg], *aReg = aRegBegin;
const INT64 *aRegEnd = aReg + NUM_ARGUMENT_REGISTERS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like aRegEnd = &pData->argumentRegisters.a[0] + NUM_ARGUMENT_REGISTERS and fRegEnd = &pData->floatArgumentRegisters.f[0] + NUM_FLOAT_ARGUMENT_REGISTERS are correct. Could you check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, thanks!

Copy link
Member

@clamp03 clamp03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!!!

@tomeksowi
Copy link
Contributor Author

@davmason @jakobbotsch could you please review?

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT changes look ok to me.

Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profiler specific changes look good to me, I don't know enough about risc-v or the jit to comment on those changes.

@jkotas jkotas merged commit 913a844 into dotnet:main Sep 5, 2023
@jkotas
Copy link
Member

jkotas commented Sep 5, 2023

Thank you all

@ghost ghost locked as resolved and limited conversation to collaborators Oct 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-Diagnostics-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants