Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CUDA] Skeleton of NVPTX target lowering info #1358

Merged
merged 1 commit into from
Feb 22, 2025

Conversation

AdUhTkJm
Copy link
Contributor

Added a skeleton of NVPTX target lowering info.

This enables lowering of simple.cu (as it hardly tests device side functionalities), so a test of LLVM IR is also added onto it.

@AdUhTkJm
Copy link
Contributor Author

Just noticed it should also emit the registration function when emitting LLVM IR. So it's probably not ready to be accepted, and I'll update it later.

@bcardosolopes
Copy link
Member

no problem @AdUhTkJm, let me know when it's ready!

@AdUhTkJm
Copy link
Contributor Author

AdUhTkJm commented Feb 19, 2025

no problem @AdUhTkJm, let me know when it's ready!

It seems generating that registration function is much harder than I had thought. So I plan to block LLVM IR generation for this PR, and gradually set up a few more PRs to do lowering -- otherwise this one might become really huge.

For more details:

I'm planning to generate the registration in LoweringPrepare. The first argument of __cudaRegisterFunction is a pointer to the fat binary handle. We need to obtain it from an option in CodeGenOpts, which we don't have access to at that stage. So I think I'll add an attribute to the ModuleOp, recording the value of that option.

Another problem is that we'll create a runtime function, and it can be hard without knowing which type is the size_t (which is also in codegen). The same thing is happening for other parts of this pass as well, there are comments in lowerArrayDtorCtorIntoLoop, saying

// TODO: instead of fixed integer size, create alias for PtrDiffTy and unify
// with CIRGen stuff.
auto ptrDiffTy =
      cir::IntType::get(builder.getContext(), 64, /*signed=*/false);

Perhaps just create a unsigned 64 bit for size_t for now?

@bcardosolopes
Copy link
Member

and gradually set up a few more PRs to do lowering -- otherwise this one might become really huge.

Sounds great to me!

I'm planning to generate the registration in LoweringPrepare. The first argument of __cudaRegisterFunction is a pointer to the fat binary handle. We need to obtain it from an option in CodeGenOpts, which we don't have access to at that stage. So I think I'll add an attribute to the ModuleOp, recording the value of that option.

Sure! Since you are touching this, would you mind migrating the content of CIRDataLayout to the module as well in a follow up PR? See below.

Perhaps just create a unsigned 64 bit for size_t for now?

Ideally we want to share the content of CIRGenTypeCache with LoweringPrepare, it should live in ModuleOp, like the one above. For this PR: you can create a new method in CIRDataLayout, named getPtrDiffTy() and make it return the code you pasted above, while at it, substitute this current example to also use getPtrDiffTy().

@AdUhTkJm
Copy link
Contributor Author

Ideally we want to share the content of CIRGenTypeCache with LoweringPrepare, it should live in ModuleOp, like the one above. For this PR: you can create a new method in CIRDataLayout, named getPtrDiffTy() and make it return the code you pasted above, while at it, substitute this current example to also use getPtrDiffTy().

Yeah, I'll do that. By the way this PR is ready for review now.

@@ -260,6 +260,11 @@ class CIRGenConsumer : public clang::ASTConsumer {
}
}

if (action != CIRGenAction::OutputType::EmitCIR) {
if (C.getLangOpts().CUDA && !C.getLangOpts().CUDAIsDevice)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not get target specific restrictions as part in this file. What is it that you are hitting that is asking for this? If it's because something cannot lower to LLVM that's fine, this is already the status quo for CUDA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to prevent CIR to lower into LLVM. Currently the lowering pass just go smoothly all the way to LLVM, but the result is wrong: lacking ptx_kernel calling conv and the registration function.

I tried to put that in LoweringPrepare, but it prevents -emit-cir to generate CIR. Where else should I put it?

Copy link
Member

@bcardosolopes bcardosolopes Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the CUDA support didn't had LLVM right at the beginning, it's fine we are in this intermediate state, which will be soon fixed with your incremental work (for future PRs, whenever you add something in CIR, you should also add LLVM support in the same PR, this will prevent these issues).

I'd say not put this anywhere, but add LLVM output to the tests that make sense and annotate them with the COM directive. That will indicate the current status and give you a easy diff to patch once the callconv work land.

@bcardosolopes bcardosolopes merged commit cc67bf7 into llvm:main Feb 22, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants