Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets #76955

Merged
merged 7 commits into from
Feb 12, 2024

Conversation

Pierre-vh
Copy link
Contributor

@Pierre-vh Pierre-vh commented Jan 4, 2024

NOTE: This PR is part of a stack, please check #76954 to review the first commit!

These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities.

Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them.

This contains the documentation changes for both this change and #76954 as well.

@llvmbot llvmbot added clang Clang issues not falling into any other category lld backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen mc Machine (object) code lld:ELF llvm:globalisel flang:driver flang Flang issues not falling into any other category llvm:support flang:fir-hlfir objectyaml llvm:binary-utilities labels Jan 4, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 4, 2024

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-lld

@llvm/pr-subscribers-clang

Author: Pierre van Houtryve (Pierre-vh)

Changes

NOTE: This PR is part of a stack, please check #76954 to review the first commit!

These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities.

Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them.

I will add docs in a follow-up patch once review is more advanced, as I assume details will change over time.


Patch is 134.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76955.diff

72 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-2)
  • (modified) clang/lib/Basic/Targets/AMDGPU.cpp (+16-4)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+3-3)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu (+37)
  • (modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+4)
  • (modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+4)
  • (added) clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc ()
  • (modified) clang/test/Driver/amdgpu-macros.cl (+5)
  • (modified) clang/test/Driver/amdgpu-mcpu.cl (+10)
  • (modified) clang/test/Driver/hip-code-object-version.hip (+12)
  • (modified) clang/test/Driver/hip-device-libs.hip (+17-1)
  • (modified) flang/lib/Frontend/CompilerInvocation.cpp (+2)
  • (modified) flang/test/Lower/AMD/code-object-version.f90 (+2-1)
  • (modified) lld/ELF/Arch/AMDGPU.cpp (+22)
  • (modified) lld/test/ELF/amdgpu-tid.s (+16)
  • (modified) llvm/include/llvm/BinaryFormat/ELF.h (+22-1)
  • (modified) llvm/include/llvm/Object/ELFObjectFile.h (+4-2)
  • (modified) llvm/include/llvm/Support/AMDGPUMetadata.h (+5)
  • (modified) llvm/include/llvm/Support/ScopedPrinter.h (+3-1)
  • (modified) llvm/include/llvm/Target/TargetOptions.h (+1)
  • (modified) llvm/include/llvm/TargetParser/TargetParser.h (+10)
  • (modified) llvm/lib/Object/ELFObjectFile.cpp (+10)
  • (modified) llvm/lib/ObjectYAML/ELFYAML.cpp (+10)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+43-38)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+3)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp (+10)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h (+10-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp (+3-3)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (+7)
  • (modified) llvm/lib/Target/AMDGPU/GCNProcessors.td (+20)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+2)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+53)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+1)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+13)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+15-1)
  • (modified) llvm/lib/TargetParser/TargetParser.cpp (+34)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll (+2)
  • (modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+2)
  • (modified) llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll (+10)
  • (modified) llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll (+10)
  • (modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+4)
  • (modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/gds-allocation.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/gds-atomic.ll (+1)
  • (added) llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll (+18)
  • (modified) llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll (+2)
  • (modified) llvm/test/CodeGen/AMDGPU/implicitarg-offset-attributes.ll (+46)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.d16.dim.ll (+3)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll (+3)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.implicitarg.ptr.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/unsupported-image-sample.ll (+5-7)
  • (modified) llvm/test/MC/AMDGPU/hsa-v5-uses-dynamic-stack.s (+5)
  • (modified) llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml (+29)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll (+20)
  • (modified) llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test (+12)
  • (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+103-123)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..0bfe0e7739960e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4753,9 +4753,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group<m_Group>,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum<TargetOpts<"CodeObjectVersion">, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 6f3a4908623da7..ca935174c05cec 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -275,13 +275,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
     Builder.defineMacro("__R600__");
 
   if (GPUKind != llvm::AMDGPU::GK_NONE) {
-    StringRef CanonName = isAMDGCN(getTriple()) ?
-      getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
+    std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+                                                   : getArchNameR600(GPUKind))
+                                .str();
+
+    // Sanitize the name of generic targets.
+    // e.g. gfx10.1-generic -> gfx10_1_generic
+    if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+        GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+      std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+      std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+    }
+
     Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
     // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
     if (isAMDGCN(getTriple())) {
-      assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-      Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+      assert(StringRef(CanonName).starts_with("gfx") &&
+             "Invalid amdgcn canonical name");
+      StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+      Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
                           Twine("__"));
     }
     if (isAMDGCN(getTriple())) {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f71dbf1729a1d6..be86731ed912ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17481,9 +17481,9 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
 // \p Index is 0, 1, and 2 for x, y, and z dimension, respectively.
 /// Emit code based on Code Object ABI version.
 /// COV_4    : Emit code to use dispatch ptr
-/// COV_5    : Emit code to use implicitarg ptr
+/// COV_5+   : Emit code to use implicitarg ptr
 /// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
-///            and use its value for COV_4 or COV_5 approach. It is used for
+///            and use its value for COV_4 or COV_5+ approach. It is used for
 ///            compiling device libraries in an ABI-agnostic way.
 ///
 /// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
@@ -17526,7 +17526,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) {
         Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
   } else {
     Value *GEP = nullptr;
-    if (Cov == CodeObjectVersionKind::COV_5) {
+    if (Cov >= CodeObjectVersionKind::COV_5) {
       // Indexing the implicit kernarg segment.
       GEP = CGF.Builder.CreateConstGEP1_32(
           CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 2340191ca97d98..75582f6b5669d5 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2585,7 +2585,7 @@ getAMDGPUCodeObjectArgument(const Driver &D, const llvm::opt::ArgList &Args) {
 void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
                                          const llvm::opt::ArgList &Args) {
   const unsigned MinCodeObjVer = 4;
-  const unsigned MaxCodeObjVer = 5;
+  const unsigned MaxCodeObjVer = 6;
 
   if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args)) {
     if (CodeObjArg->getOption().getID() ==
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
index 663687ae227f23..d33acdf7eb8bed 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
@@ -4,6 +4,9 @@
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
 // RUN:   -mcode-object-version=5 -DUSER -x hip -o %t_5.bc %s
 
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
+// RUN:   -mcode-object-version=6 -DUSER -x hip -o %t_6.bc %s
+
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
 // RUN:   -mcode-object-version=none -DDEVICELIB -x hip -o %t_0.bc %s
 
@@ -15,6 +18,10 @@
 // RUN:   %t_5.bc -mlink-builtin-bitcode %t_0.bc -o - |\
 // RUN:   FileCheck -check-prefix=LINKED5 %s
 
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
+// RUN:   %t_6.bc -mlink-builtin-bitcode %t_0.bc -o - |\
+// RUN:   FileCheck -check-prefix=LINKED6 %s
+
 #include "Inputs/cuda.h"
 
 // LINKED4: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400
@@ -77,6 +84,36 @@
 // LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 // LINKED5: "amdgpu_code_object_version", i32 500
 
+// LINKED6: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+// LINKED6-LABEL: bar
+// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
+// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
+// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// LINKED6: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
+// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// LINKED6: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
+// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
+// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+
+// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
+// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
+// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// LINKED6: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
+// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// LINKED6: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
+// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
+// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+
+// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
+// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
+// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// LINKED6: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
+// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// LINKED6: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
+// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
+// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+// LINKED6: "amdgpu_code_object_version", i32 600
+
 #ifdef DEVICELIB
 __device__ void bar(int *x, int *y, int *z)
 {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d..59636e622731b8 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -9,6 +9,9 @@
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=5 -o - %s | FileCheck -check-prefix=V5 %s
 
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
+// RUN:   -mcode-object-version=6 -o - %s | FileCheck -check-prefix=V6 %s
+
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=none -o - %s | FileCheck %s -check-prefix=NONE
 
@@ -17,5 +20,6 @@
 
 // V4: !{{.*}} = !{i32 1, !"amdgpu_code_object_version", i32 400}
 // V5: !{{.*}} = !{i32 1, !"amdgpu_code_object_version", i32 500}
+// V6: !{{.*}} = !{i32 1, !"amdgpu_code_object_version", i32 600}
 // NONE-NOT: !{{.*}} = !{i32 1, !"amdgpu_code_object_version",
 // INV: error: invalid value '4.1' in '-mcode-object-version=4.1'
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10..7f56fe91704870 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -7,6 +7,10 @@
 // RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
+// RUN:     -fcuda-is-device -mcode-object-version=6 -emit-llvm -o - -x hip %s \
+// RUN:     | FileCheck -check-prefix=COV5 %s
+
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
 // RUN:     -fcuda-is-device -mcode-object-version=none -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COVNONE %s
diff --git a/clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc b/clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
new file mode 100644
index 00000000000000..e69de29bb2d1d6
diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12d..3b10444ef71d36 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1201 -DFAMILY=GFX12
 
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx9-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=64 -DCPU=gfx9_generic -DFAMILY=GFX9
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx10.1-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx10_1_generic -DFAMILY=GFX10
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx10.3-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx10_3_generic -DFAMILY=GFX10
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx11-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx11_generic -DFAMILY=GFX11
+
 // ARCH-GCN-DAG: #define FP_FAST_FMA 1
 
 // FAST_FMAF-DAG: #define FP_FAST_FMAF 1
diff --git a/clang/test/Driver/amdgpu-mcpu.cl b/clang/test/Driver/amdgpu-mcpu.cl
index eeb16ae98ebad7..6f18ea0615cb69 100644
--- a/clang/test/Driver/amdgpu-mcpu.cl
+++ b/clang/test/Driver/amdgpu-mcpu.cl
@@ -115,6 +115,11 @@
 // RUN: %clang -### -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefix=GFX1200 %s
 // RUN: %clang -### -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefix=GFX1201 %s
 
+// RUN: %clang -### -target amdgcn -mcpu=gfx9-generic %s 2>&1 | FileCheck --check-prefix=GFX9_GENERIC %s
+// RUN: %clang -### -target amdgcn -mcpu=gfx10.1-generic %s 2>&1 | FileCheck --check-prefix=GFX10_1_GENERIC %s
+// RUN: %clang -### -target amdgcn -mcpu=gfx10.3-generic %s 2>&1 | FileCheck --check-prefix=GFX10_3_GENERIC %s
+// RUN: %clang -### -target amdgcn -mcpu=gfx11-generic %s 2>&1 | FileCheck --check-prefix=GFX11_GENERIC %s
+
 // GCNDEFAULT-NOT: -target-cpu
 // GFX600:    "-target-cpu" "gfx600"
 // GFX601:    "-target-cpu" "gfx601"
@@ -160,3 +165,8 @@
 // GFX1151:   "-target-cpu" "gfx1151"
 // GFX1200:   "-target-cpu" "gfx1200"
 // GFX1201:   "-target-cpu" "gfx1201"
+
+// GFX9_GENERIC:      "-target-cpu" "gfx9-generic"
+// GFX10_1_GENERIC:   "-target-cpu" "gfx10.1-generic"
+// GFX10_3_GENERIC:   "-target-cpu" "gfx10.3-generic"
+// GFX11_GENERIC:     "-target-cpu" "gfx11-generic"
diff --git a/clang/test/Driver/hip-code-object-version.hip b/clang/test/Driver/hip-code-object-version.hip
index af5f9a3da21dfd..d63130115588e0 100644
--- a/clang/test/Driver/hip-code-object-version.hip
+++ b/clang/test/Driver/hip-code-object-version.hip
@@ -23,6 +23,18 @@
 // V5: "-mllvm" "--amdhsa-code-object-version=5"
 // V5: "-targets=host-x86_64-unknown-linux,hipv4-amdgcn-amd-amdhsa--gfx906"
 
+// Check bundle ID for code object version 6.
+
+// RUN: not %clang -### --target=x86_64-linux-gnu \
+// RUN:   -mcode-object-version=6 \
+// RUN:   --offload-arch=gfx906 --rocm-path=%S/Inputs/rocm \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=V6 %s
+
+// V6: "-mcode-object-version=6"
+// V6: "-mllvm" "--amdhsa-code-object-version=6"
+// V6: "-targets=host-x86_64-unknown-linux,hipv4-amdgcn-amd-amdhsa--gfx906"
+
+
 // Check bundle ID for code object version default
 
 // RUN: %clang -### --target=x86_64-linux-gnu \
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index 6ac5778721ba5b..a998db531d6683 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -187,13 +187,26 @@
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test -mcode-object-version=5 with old device library without abi_version_400.bc
+// Test -mcode-object-version=5 with old device library without abi_version_500.bc
 // RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   -mcode-object-version=5 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
 
+// Test -mcode-object-version=6
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=6 \
+// RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI6
+
+// Test -mcode-object-version=6 with old device library without abi_version_600.bc
+// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=6 \
+// RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
+// RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI6
+
 // ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
 
@@ -237,7 +250,10 @@
 // ABI4: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_400.bc"
 // ABI5-NOT: error:
 // ABI5: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_500.bc"
+// ABI6-NOT: error:
+// ABI6: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_600.bc"
 // NOABI4-NOT: error:
 // NOABI4-NOT: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_400.bc"
 // NOABI4-NOT: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_500.bc"
 // NOABI5: error: cannot find ROCm device libraryfor ABI version 5; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
+// NOABI6: error: cannot find ROCm device libraryfor ABI version 6; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index b65b6e31bea821..cf4b2a38bff7a8 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -284,6 +284,8 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
   if (const llvm::opt::Arg *a = args.getLastArg(
           clang::driver::options::OPT_mcode_object_version_EQ)) {
     llvm::StringRef s = a->getValue();
+    if (s == "6")
+      opts.CodeObjectVersion = llvm::CodeObjectVersionKind::COV_6;
     if (s == "5")
       opts.CodeObjectVersion = llvm::CodeObjectVersionKind::COV_5;
     if (s == "4")
diff --git a/flang/test/Lower/AMD/code-object-version.f90 b/flang/test/Lower/AMD/code-object-version.f90
index 7cb9dc079724e7..455f4547252829 100644
--- a/flang/test/Lower/AMD/code-object-version.f90
+++ b/flang/test/Lower/AMD/code-object-version.f90
@@ -3,11 +3,12 @@
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=none %s -o - | FileCheck  --check-prefix=COV_NONE %s
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=4 %s -o - | FileCheck  --check-prefix=COV_4 %s
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=5 %s -o - | FileCheck  --check-prefix=COV_5 %s
+!RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=6 %s -o - | FileCheck  --check-prefix=COV_6 %s
 
 !COV_DEFAULT: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(400 : i32) {addr_space = 4 : i32} : i32
 !COV_NONE-NOT: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(400 : i32) {addr_space = 4 : i32} : i32
 !COV_4: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(400 : i32) {addr_space = 4 : i32} : i32
 !COV_5: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(500 : i32) {addr_space = 4 : i32} : i32
+!COV_6: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(600 : i32) {addr_space = 4 : i32} : i32
 subroutine target_simple
 end subroutine target_simple
-
diff --git a/lld/ELF/Arch/AMDGPU.cpp b/lld/ELF/Arch/AMDGPU.cpp
index 650744db7dee32..bc1e78cfcc963d 100644
--- a/lld/ELF/Arch/AMDGPU.cpp
+++ b/lld/ELF/Arch/AMDGPU.cpp
@@ -25,6 +25,7 @@ class AMDGPU final : public TargetInfo {
 private:
   uint32_t calcEFlagsV3() const;
   uint32_t calcEFlagsV4() const;
+  uint32_t calcEFlagsV6() const;
 
 public:
   AMDGPU();
@@ -106,6 +107,25 @@ uint32_t AMDGPU::calcEFlagsV4() const {
   return retMach | retXnack | retSramEcc;
 }
 
+uint32_t AMDGPU::calcEFlagsV6() const {
+  uint32_t flags = calcEFlagsV4();
+
+  uint32_t genericVersion =
+      getEFlags(ctx.objectFiles[0]) & EF_AMDGPU_GENERIC_VERSION;
+
+  // Verify that all input files have compatible generic version.
+  for (InputFile *f : ArrayRef(ctx.objectFiles).slice(1)) {
+    if (genericVersion != (getEFlags(f) & EF_AMDGPU_GENERIC_VERSION)) {
+      // TODO: test
+      error("incompatible generic version: " + toString(f));
+      return 0;
+    }
+  }
+
+  flags |= genericVersion;
+  return flags;
+}
+
 uint32_t AMDGPU::calcEFlags() const {
   if (ctx.objectFiles.empty())
     return 0;
@@ -121,6 +141,8 @@ uint32_t AMDGPU::calcEFlags() const {
   case ELFABIVERSION_AMDGPU_HSA_V4:
   case ELFABIVERSION_AMDGPU_HSA_V5:
     return calcEFlagsV4();
+  case ELFABIVERSION_AMDGPU_HSA_V6:
+    return calcEFlagsV6();
   default:
     error("unknown abi version: " + Twine(abiVersion));
     return 0;
diff --git a/lld/test/ELF/amdgpu-tid.s b/lld/test/ELF/amdgpu-tid.s
index 6623443a4541d7..ee0062eb750c86 100644
--- a/lld/test/ELF/amdgpu-tid.s
+++ b/lld/test/ELF/amdgpu-tid.s
@@ -43,3 +43,19 @@
 # SRAMECC-OFF:          EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 (0x800)
 # SRAMECC-ON:           EF_AMDGPU_FEATURE_SRAME...
[truncated]

@Pierre-vh
Copy link
Contributor Author

Note: testing is a bit light for now, I'd like to add more tests, but I'm not sure what kind of tests are worth adding.
I could just add a generic target run line wherever gfx9/10/11 run lines are present, but that seems a bit overkill? I'd need to change half the tests we have or more.

Copy link
Contributor Author

@Pierre-vh Pierre-vh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing components:

  • Need a way for external tools to inquire about the specifics of generic targets (without depending on llvm)
    • map a specific gfx target to its generic family
    • given a specific gfx version, what's the minimum generic version it needs (?)
    • tools need to tell generic MACHs from specific ones (currently they can do that by just doing EFLAGS >> 24 and checking if there is any value other than 0 there, but it needs to be documented)
  • Need to double check generic version system

llvm/include/llvm/BinaryFormat/ELF.h Outdated Show resolved Hide resolved
llvm/include/llvm/BinaryFormat/ELF.h Outdated Show resolved Hide resolved
FeatureImageGather4D16Bug])>;
!listconcat(FeatureISAVersion9_0_Consumer_Common.Features,
[FeatureMadMixInsts,
FeatureImageInsts])>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: remove ImageInsts, it's already included

llvm/docs/AMDGPUUsage.rst Show resolved Hide resolved
llvm/docs/AMDGPUUsage.rst Outdated Show resolved Hide resolved
@Pierre-vh
Copy link
Contributor Author

@arsenm Hi, can you take a look - especially on the testing? I don't know if this is tested well enough

clang/lib/Basic/Targets/AMDGPU.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp Outdated Show resolved Hide resolved
StringRef GPUName = ST->getCPU();
if (GPUName.empty() || GPUName.contains("generic"))
if (GPUName.empty() || GPUName.starts_with("generic"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably start refusing to codegen "generic" or whatever none ends up getting called

llvm/docs/AMDGPUUsage.rst Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Feb 6, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/docs/AMDGPUUsage.rst Outdated Show resolved Hide resolved
================= ============== ========= =======================================
String Key Value Type Required? Description
================= ============== ========= =======================================
"amdhsa.version" sequence of Required - The first integer is the major
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what metadata changes would be needed to support generic code objects. I would not add this section.

llvm/docs/AMDGPUUsage.rst Outdated Show resolved Hide resolved
Generic processors are only available on code object V6 and above (see :ref:`amdgpu-elf-code-object`).

Generic processor code objects are versioned (see :ref:`amdgpu-elf-header-e_flags-table-v6-onwards`).
The version number is used by runtimes to determine if a code object can be run on a specific agent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really explain how version is used. What about something like:

The version of non-generic code objects is always set to 0.

For a generic code object, adding a new generic member may require the code generated for the generic target to be changed so it can continue to execute on the previous members as well as on the new member. When this happens the generic code object version number is incremented. Each member of the generic target has a version when it was introduced. A generic code object can execute on a specific member if the version of the code object being loaded is >= the version at which the member was introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased it a bit (e.g. member -> supported processor) but I mostly followed your suggestion

@Pierre-vh
Copy link
Contributor Author

For the MD changes, it's just to describe the version increment, nothing else. I think describing is important as the V6 diff already updated the amdhsa.version.
If amdhsa.version didn't need to change then i need to fix that first, and then we can remove the V6 MD section

@Pierre-vh Pierre-vh requested review from t-tye and arsenm February 7, 2024 08:26
Copy link
Collaborator

@t-tye t-tye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation LGTM. Thanks.

@Pierre-vh
Copy link
Contributor Author

@t-tye Can you please approve then? Otherwise the diff still shows a red "Changes requested" warning :) Thanks
@arsenm Please also approve if there are no more comments

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs codegen tests for the gfx900 vs. gfx906 mad_mix/fma_fix issue

These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities.
Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them.

No docs in this patch either as I plan to do it all in a follow-up patch.
@Pierre-vh
Copy link
Contributor Author

mad_mix

I added run lines to mad-mix.ll and it behaves as expected: no fma/mad_mix emitted

@Pierre-vh Pierre-vh requested a review from arsenm February 9, 2024 09:46
@Pierre-vh Pierre-vh merged commit f93aa51 into llvm:main Feb 12, 2024
5 checks passed
@Pierre-vh Pierre-vh deleted the generictargets branch February 12, 2024 09:18
epilk added a commit that referenced this pull request Apr 25, 2024
It seems like this happened because #79460 moved this from
`FeatureISAVersion11_Common` to `FeatureISAVersion11_0_Common` while
#76955 was being reviewed.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 30, 2024
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.

Docs change are part of the follow-up patch llvm#76955

Change-Id: I6c4311bf124de1455eb3e5eecb2df7f98deb71bb
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 30, 2024
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and llvm#76954
as well.

Change-Id: I0c99e616adddaae5e8ce4ad7283eabdbf64250c3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category flang:driver flang:fir-hlfir flang Flang issues not falling into any other category lld:ELF lld llvm:binary-utilities llvm:globalisel llvm:support mc Machine (object) code objectyaml
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants