[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets #76955

Pierre-vh · 2024-01-04T14:01:13Z

NOTE: This PR is part of a stack, please check #76954 to review the first commit!

These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities.

Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them.

This contains the documentation changes for both this change and #76954 as well.

llvmbot · 2024-01-04T14:01:42Z

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-lld

@llvm/pr-subscribers-clang

Author: Pierre van Houtryve (Pierre-vh)

Changes

NOTE: This PR is part of a stack, please check #76954 to review the first commit!

These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities.

Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them.

I will add docs in a follow-up patch once review is more advanced, as I assume details will change over time.

Patch is 134.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76955.diff

72 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+2-2)
(modified) clang/lib/Basic/Targets/AMDGPU.cpp (+16-4)
(modified) clang/lib/CodeGen/CGBuiltin.cpp (+3-3)
(modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
(modified) clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu (+37)
(modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+4)
(modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+4)
(added) clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc ()
(modified) clang/test/Driver/amdgpu-macros.cl (+5)
(modified) clang/test/Driver/amdgpu-mcpu.cl (+10)
(modified) clang/test/Driver/hip-code-object-version.hip (+12)
(modified) clang/test/Driver/hip-device-libs.hip (+17-1)
(modified) flang/lib/Frontend/CompilerInvocation.cpp (+2)
(modified) flang/test/Lower/AMD/code-object-version.f90 (+2-1)
(modified) lld/ELF/Arch/AMDGPU.cpp (+22)
(modified) lld/test/ELF/amdgpu-tid.s (+16)
(modified) llvm/include/llvm/BinaryFormat/ELF.h (+22-1)
(modified) llvm/include/llvm/Object/ELFObjectFile.h (+4-2)
(modified) llvm/include/llvm/Support/AMDGPUMetadata.h (+5)
(modified) llvm/include/llvm/Support/ScopedPrinter.h (+3-1)
(modified) llvm/include/llvm/Target/TargetOptions.h (+1)
(modified) llvm/include/llvm/TargetParser/TargetParser.h (+10)
(modified) llvm/lib/Object/ELFObjectFile.cpp (+10)
(modified) llvm/lib/ObjectYAML/ELFYAML.cpp (+10)
(modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+43-38)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+3)
(modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp (+10)
(modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h (+10-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp (+3-3)
(modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (+7)
(modified) llvm/lib/Target/AMDGPU/GCNProcessors.td (+20)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+2)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+53)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+1)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+13)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+15-1)
(modified) llvm/lib/TargetParser/TargetParser.cpp (+34)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll (+2)
(modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+2)
(modified) llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll (+10)
(modified) llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll (+10)
(modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+4)
(modified) llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/gds-allocation.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/gds-atomic.ll (+1)
(added) llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll (+18)
(modified) llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll (+2)
(modified) llvm/test/CodeGen/AMDGPU/implicitarg-offset-attributes.ll (+46)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.d16.dim.ll (+3)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll (+3)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.implicitarg.ptr.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/recursion.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/unsupported-image-sample.ll (+5-7)
(modified) llvm/test/MC/AMDGPU/hsa-v5-uses-dynamic-stack.s (+5)
(modified) llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml (+29)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll (+20)
(modified) llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test (+12)
(modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+103-123)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..0bfe0e7739960e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4753,9 +4753,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group<m_Group>,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum<TargetOpts<"CodeObjectVersion">, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 6f3a4908623da7..ca935174c05cec 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -275,13 +275,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
     Builder.defineMacro("__R600__");
 
   if (GPUKind != llvm::AMDGPU::GK_NONE) {
-    StringRef CanonName = isAMDGCN(getTriple()) ?
-      getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
+    std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+                                                   : getArchNameR600(GPUKind))
+                                .str();
+
+    // Sanitize the name of generic targets.
+    // e.g. gfx10.1-generic -> gfx10_1_generic
+    if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+        GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+      std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+      std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+    }
+
     Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
     // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
     if (isAMDGCN(getTriple())) {
-      assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-      Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+      assert(StringRef(CanonName).starts_with("gfx") &&
+             "Invalid amdgcn canonical name");
+      StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+      Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
                           Twine("__"));
     }
     if (isAMDGCN(getTriple())) {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f71dbf1729a1d6..be86731ed912ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17481,9 +17481,9 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
 // \p Index is 0, 1, and 2 for x, y, and z dimension, respectively.
 /// Emit code based on Code Object ABI version.
 /// COV_4    : Emit code to use dispatch ptr
-/// COV_5    : Emit code to use implicitarg ptr
+/// COV_5+   : Emit code to use implicitarg ptr
 /// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
-///            and use its value for COV_4 or COV_5 approach. It is used for
+///            and use its value for COV_4 or COV_5+ approach. It is used for
 ///            compiling device libraries in an ABI-agnostic way.
 ///
 /// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
@@ -17526,7 +17526,7 @@ Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) {
         Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
   } else {
     Value *GEP = nullptr;
-    if (Cov == CodeObjectVersionKind::COV_5) {
+    if (Cov >= CodeObjectVersionKind::COV_5) {
       // Indexing the implicit kernarg segment.
       GEP = CGF.Builder.CreateConstGEP1_32(
           CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 2340191ca97d98..75582f6b5669d5 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2585,7 +2585,7 @@ getAMDGPUCodeObjectArgument(const Driver &D, const llvm::opt::ArgList &Args) {
 void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
                                          const llvm::opt::ArgList &Args) {
   const unsigned MinCodeObjVer = 4;
-  const unsigned MaxCodeObjVer = 5;
+  const unsigned MaxCodeObjVer = 6;
 
   if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args)) {
     if (CodeObjArg->getOption().getID() ==
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
index 663687ae227f23..d33acdf7eb8bed 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
@@ -4,6 +4,9 @@
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
 // RUN:   -mcode-object-version=5 -DUSER -x hip -o %t_5.bc %s
 
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
+// RUN:   -mcode-object-version=6 -DUSER -x hip -o %t_6.bc %s
+
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
 // RUN:   -mcode-object-version=none -DDEVICELIB -x hip -o %t_0.bc %s
 
@@ -15,6 +18,10 @@
 // RUN:   %t_5.bc -mlink-builtin-bitcode %t_0.bc -o - |\
 // RUN:   FileCheck -check-prefix=LINKED5 %s
 
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
+// RUN:   %t_6.bc -mlink-builtin-bitcode %t_0.bc -o - |\
+// RUN:   FileCheck -check-prefix=LINKED6 %s
+
 #include "Inputs/cuda.h"
 
 // LINKED4: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400
@@ -77,6 +84,36 @@
 // LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 // LINKED5: "amdgpu_code_object_version", i32 500
 
+// LINKED6: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+// LINKED6-LABEL: bar
+// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
+// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
+// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// LINKED6: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
+// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// LINKED6: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
+// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
+// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+
+// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
+// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
+// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// LINKED6: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
+// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// LINKED6: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
+// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
+// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+
+// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
+// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
+// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// LINKED6: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
+// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// LINKED6: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
+// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
+// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+// LINKED6: "amdgpu_code_object_version", i32 600
+
 #ifdef DEVICELIB
 __device__ void bar(int *x, int *y, int *z)
 {
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index ff5deaf9ab850d..59636e622731b8 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -9,6 +9,9 @@
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=5 -o - %s | FileCheck -check-prefix=V5 %s
 
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
+// RUN:   -mcode-object-version=6 -o - %s | FileCheck -check-prefix=V6 %s
+
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=none -o - %s | FileCheck %s -check-prefix=NONE
 
@@ -17,5 +20,6 @@
 
 // V4: !{{.*}} = !{i32 1, !"amdgpu_code_object_version", i32 400}
 // V5: !{{.*}} = !{i32 1, !"amdgpu_code_object_version", i32 500}
+// V6: !{{.*}} = !{i32 1, !"amdgpu_code_object_version", i32 600}
 // NONE-NOT: !{{.*}} = !{i32 1, !"amdgpu_code_object_version",
 // INV: error: invalid value '4.1' in '-mcode-object-version=4.1'
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index 282e0a49b9aa10..7f56fe91704870 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -7,6 +7,10 @@
 // RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
+// RUN:     -fcuda-is-device -mcode-object-version=6 -emit-llvm -o - -x hip %s \
+// RUN:     | FileCheck -check-prefix=COV5 %s
+
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
 // RUN:     -fcuda-is-device -mcode-object-version=none -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COVNONE %s
diff --git a/clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc b/clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
new file mode 100644
index 00000000000000..e69de29bb2d1d6
diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12d..3b10444ef71d36 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1201 -DFAMILY=GFX12
 
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx9-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=64 -DCPU=gfx9_generic -DFAMILY=GFX9
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx10.1-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx10_1_generic -DFAMILY=GFX10
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx10.3-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx10_3_generic -DFAMILY=GFX10
+// RUN: %clang -E -dM -target amdgcn -mcpu=gfx11-generic %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx11_generic -DFAMILY=GFX11
+
 // ARCH-GCN-DAG: #define FP_FAST_FMA 1
 
 // FAST_FMAF-DAG: #define FP_FAST_FMAF 1
diff --git a/clang/test/Driver/amdgpu-mcpu.cl b/clang/test/Driver/amdgpu-mcpu.cl
index eeb16ae98ebad7..6f18ea0615cb69 100644
--- a/clang/test/Driver/amdgpu-mcpu.cl
+++ b/clang/test/Driver/amdgpu-mcpu.cl
@@ -115,6 +115,11 @@
 // RUN: %clang -### -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefix=GFX1200 %s
 // RUN: %clang -### -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefix=GFX1201 %s
 
+// RUN: %clang -### -target amdgcn -mcpu=gfx9-generic %s 2>&1 | FileCheck --check-prefix=GFX9_GENERIC %s
+// RUN: %clang -### -target amdgcn -mcpu=gfx10.1-generic %s 2>&1 | FileCheck --check-prefix=GFX10_1_GENERIC %s
+// RUN: %clang -### -target amdgcn -mcpu=gfx10.3-generic %s 2>&1 | FileCheck --check-prefix=GFX10_3_GENERIC %s
+// RUN: %clang -### -target amdgcn -mcpu=gfx11-generic %s 2>&1 | FileCheck --check-prefix=GFX11_GENERIC %s
+
 // GCNDEFAULT-NOT: -target-cpu
 // GFX600:    "-target-cpu" "gfx600"
 // GFX601:    "-target-cpu" "gfx601"
@@ -160,3 +165,8 @@
 // GFX1151:   "-target-cpu" "gfx1151"
 // GFX1200:   "-target-cpu" "gfx1200"
 // GFX1201:   "-target-cpu" "gfx1201"
+
+// GFX9_GENERIC:      "-target-cpu" "gfx9-generic"
+// GFX10_1_GENERIC:   "-target-cpu" "gfx10.1-generic"
+// GFX10_3_GENERIC:   "-target-cpu" "gfx10.3-generic"
+// GFX11_GENERIC:     "-target-cpu" "gfx11-generic"
diff --git a/clang/test/Driver/hip-code-object-version.hip b/clang/test/Driver/hip-code-object-version.hip
index af5f9a3da21dfd..d63130115588e0 100644
--- a/clang/test/Driver/hip-code-object-version.hip
+++ b/clang/test/Driver/hip-code-object-version.hip
@@ -23,6 +23,18 @@
 // V5: "-mllvm" "--amdhsa-code-object-version=5"
 // V5: "-targets=host-x86_64-unknown-linux,hipv4-amdgcn-amd-amdhsa--gfx906"
 
+// Check bundle ID for code object version 6.
+
+// RUN: not %clang -### --target=x86_64-linux-gnu \
+// RUN:   -mcode-object-version=6 \
+// RUN:   --offload-arch=gfx906 --rocm-path=%S/Inputs/rocm \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=V6 %s
+
+// V6: "-mcode-object-version=6"
+// V6: "-mllvm" "--amdhsa-code-object-version=6"
+// V6: "-targets=host-x86_64-unknown-linux,hipv4-amdgcn-amd-amdhsa--gfx906"
+
+
 // Check bundle ID for code object version default
 
 // RUN: %clang -### --target=x86_64-linux-gnu \
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index 6ac5778721ba5b..a998db531d6683 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -187,13 +187,26 @@
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test -mcode-object-version=5 with old device library without abi_version_400.bc
+// Test -mcode-object-version=5 with old device library without abi_version_500.bc
 // RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   -mcode-object-version=5 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
 
+// Test -mcode-object-version=6
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=6 \
+// RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI6
+
+// Test -mcode-object-version=6 with old device library without abi_version_600.bc
+// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=6 \
+// RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
+// RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI6
+
 // ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
 
@@ -237,7 +250,10 @@
 // ABI4: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_400.bc"
 // ABI5-NOT: error:
 // ABI5: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_500.bc"
+// ABI6-NOT: error:
+// ABI6: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_600.bc"
 // NOABI4-NOT: error:
 // NOABI4-NOT: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_400.bc"
 // NOABI4-NOT: "-mlink-builtin-bitcode" "{{.*}}oclc_abi_version_500.bc"
 // NOABI5: error: cannot find ROCm device libraryfor ABI version 5; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
+// NOABI6: error: cannot find ROCm device libraryfor ABI version 6; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index b65b6e31bea821..cf4b2a38bff7a8 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -284,6 +284,8 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
   if (const llvm::opt::Arg *a = args.getLastArg(
           clang::driver::options::OPT_mcode_object_version_EQ)) {
     llvm::StringRef s = a->getValue();
+    if (s == "6")
+      opts.CodeObjectVersion = llvm::CodeObjectVersionKind::COV_6;
     if (s == "5")
       opts.CodeObjectVersion = llvm::CodeObjectVersionKind::COV_5;
     if (s == "4")
diff --git a/flang/test/Lower/AMD/code-object-version.f90 b/flang/test/Lower/AMD/code-object-version.f90
index 7cb9dc079724e7..455f4547252829 100644
--- a/flang/test/Lower/AMD/code-object-version.f90
+++ b/flang/test/Lower/AMD/code-object-version.f90
@@ -3,11 +3,12 @@
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=none %s -o - | FileCheck  --check-prefix=COV_NONE %s
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=4 %s -o - | FileCheck  --check-prefix=COV_4 %s
 !RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=5 %s -o - | FileCheck  --check-prefix=COV_5 %s
+!RUN: %flang_fc1 -emit-hlfir -triple amdgcn-amd-amdhsa -target-cpu gfx908 -mcode-object-version=6 %s -o - | FileCheck  --check-prefix=COV_6 %s
 
 !COV_DEFAULT: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(400 : i32) {addr_space = 4 : i32} : i32
 !COV_NONE-NOT: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(400 : i32) {addr_space = 4 : i32} : i32
 !COV_4: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(400 : i32) {addr_space = 4 : i32} : i32
 !COV_5: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(500 : i32) {addr_space = 4 : i32} : i32
+!COV_6: llvm.mlir.global weak_odr hidden local_unnamed_addr constant @__oclc_ABI_version(600 : i32) {addr_space = 4 : i32} : i32
 subroutine target_simple
 end subroutine target_simple
-
diff --git a/lld/ELF/Arch/AMDGPU.cpp b/lld/ELF/Arch/AMDGPU.cpp
index 650744db7dee32..bc1e78cfcc963d 100644
--- a/lld/ELF/Arch/AMDGPU.cpp
+++ b/lld/ELF/Arch/AMDGPU.cpp
@@ -25,6 +25,7 @@ class AMDGPU final : public TargetInfo {
 private:
   uint32_t calcEFlagsV3() const;
   uint32_t calcEFlagsV4() const;
+  uint32_t calcEFlagsV6() const;
 
 public:
   AMDGPU();
@@ -106,6 +107,25 @@ uint32_t AMDGPU::calcEFlagsV4() const {
   return retMach | retXnack | retSramEcc;
 }
 
+uint32_t AMDGPU::calcEFlagsV6() const {
+  uint32_t flags = calcEFlagsV4();
+
+  uint32_t genericVersion =
+      getEFlags(ctx.objectFiles[0]) & EF_AMDGPU_GENERIC_VERSION;
+
+  // Verify that all input files have compatible generic version.
+  for (InputFile *f : ArrayRef(ctx.objectFiles).slice(1)) {
+    if (genericVersion != (getEFlags(f) & EF_AMDGPU_GENERIC_VERSION)) {
+      // TODO: test
+      error("incompatible generic version: " + toString(f));
+      return 0;
+    }
+  }
+
+  flags |= genericVersion;
+  return flags;
+}
+
 uint32_t AMDGPU::calcEFlags() const {
   if (ctx.objectFiles.empty())
     return 0;
@@ -121,6 +141,8 @@ uint32_t AMDGPU::calcEFlags() const {
   case ELFABIVERSION_AMDGPU_HSA_V4:
   case ELFABIVERSION_AMDGPU_HSA_V5:
     return calcEFlagsV4();
+  case ELFABIVERSION_AMDGPU_HSA_V6:
+    return calcEFlagsV6();
   default:
     error("unknown abi version: " + Twine(abiVersion));
     return 0;
diff --git a/lld/test/ELF/amdgpu-tid.s b/lld/test/ELF/amdgpu-tid.s
index 6623443a4541d7..ee0062eb750c86 100644
--- a/lld/test/ELF/amdgpu-tid.s
+++ b/lld/test/ELF/amdgpu-tid.s
@@ -43,3 +43,19 @@
 # SRAMECC-OFF:          EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 (0x800)
 # SRAMECC-ON:           EF_AMDGPU_FEATURE_SRAME...
[truncated]

Pierre-vh · 2024-01-04T14:02:42Z

Note: testing is a bit light for now, I'd like to add more tests, but I'm not sure what kind of tests are worth adding.
I could just add a generic target run line wherever gfx9/10/11 run lines are present, but that seems a bit overkill? I'd need to change half the tests we have or more.

Pierre-vh

Missing components:

Need a way for external tools to inquire about the specifics of generic targets (without depending on llvm)
- map a specific gfx target to its generic family
- given a specific gfx version, what's the minimum generic version it needs (?)
- tools need to tell generic MACHs from specific ones (currently they can do that by just doing EFLAGS >> 24 and checking if there is any value other than 0 there, but it needs to be documented)
Need to double check generic version system

llvm/include/llvm/BinaryFormat/ELF.h

Pierre-vh · 2024-01-10T13:27:37Z

llvm/lib/Target/AMDGPU/AMDGPU.td

-     FeatureImageGather4D16Bug])>;
+  !listconcat(FeatureISAVersion9_0_Consumer_Common.Features,
+    [FeatureMadMixInsts,
+     FeatureImageInsts])>;


todo: remove ImageInsts, it's already included

llvm/docs/AMDGPUUsage.rst

Pierre-vh · 2024-01-17T14:18:43Z

@arsenm Hi, can you take a look - especially on the testing? I don't know if this is tested well enough

clang/lib/Basic/Targets/AMDGPU.cpp

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

arsenm · 2024-02-06T11:32:12Z

llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp

  StringRef GPUName = ST->getCPU();
-  if (GPUName.empty() || GPUName.contains("generic"))
+  if (GPUName.empty() || GPUName.starts_with("generic"))


we should probably start refusing to codegen "generic" or whatever none ends up getting called

llvm/docs/AMDGPUUsage.rst

github-actions · 2024-02-06T13:47:14Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/docs/AMDGPUUsage.rst

t-tye · 2024-02-06T22:50:16Z

llvm/docs/AMDGPUUsage.rst

+     ================= ============== ========= =======================================
+     String Key        Value Type     Required? Description
+     ================= ============== ========= =======================================
+     "amdhsa.version"  sequence of    Required  - The first integer is the major


I am not sure what metadata changes would be needed to support generic code objects. I would not add this section.

llvm/docs/AMDGPUUsage.rst

t-tye · 2024-02-06T23:07:30Z

llvm/docs/AMDGPUUsage.rst

+Generic processors are only available on code object V6 and above (see :ref:`amdgpu-elf-code-object`).
+
+Generic processor code objects are versioned (see :ref:`amdgpu-elf-header-e_flags-table-v6-onwards`).
+The version number is used by runtimes to determine if a code object can be run on a specific agent.


This does not really explain how version is used. What about something like:

The version of non-generic code objects is always set to 0.

For a generic code object, adding a new generic member may require the code generated for the generic target to be changed so it can continue to execute on the previous members as well as on the new member. When this happens the generic code object version number is incremented. Each member of the generic target has a version when it was introduced. A generic code object can execute on a specific member if the version of the code object being loaded is >= the version at which the member was introduced.

I rephrased it a bit (e.g. member -> supported processor) but I mostly followed your suggestion

Pierre-vh · 2024-02-07T08:07:29Z

For the MD changes, it's just to describe the version increment, nothing else. I think describing is important as the V6 diff already updated the amdhsa.version.
If amdhsa.version didn't need to change then i need to fix that first, and then we can remove the V6 MD section

t-tye

Documentation LGTM. Thanks.

Pierre-vh · 2024-02-08T07:32:54Z

@t-tye Can you please approve then? Otherwise the diff still shows a red "Changes requested" warning :) Thanks
@arsenm Please also approve if there are no more comments

arsenm

I think this needs codegen tests for the gfx900 vs. gfx906 mad_mix/fma_fix issue

These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. No docs in this patch either as I plan to do it all in a follow-up patch.

Pierre-vh · 2024-02-09T09:18:28Z

mad_mix

I added run lines to mad-mix.ll and it behaves as expected: no fma/mad_mix emitted

It seems like this happened because #79460 moved this from `FeatureISAVersion11_Common` to `FeatureISAVersion11_0_Common` while #76955 was being reviewed.

Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch llvm#76955 Change-Id: I6c4311bf124de1455eb3e5eecb2df7f98deb71bb

These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and llvm#76954 as well. Change-Id: I0c99e616adddaae5e8ce4ad7283eabdbf64250c3

Pierre-vh requested review from jayfoad, arsenm, t-tye, b-sumner and AlexVlx January 4, 2024 14:01

Pierre-vh commented Jan 9, 2024

View reviewed changes

llvm/include/llvm/BinaryFormat/ELF.h Outdated Show resolved Hide resolved

llvm/include/llvm/BinaryFormat/ELF.h Outdated Show resolved Hide resolved

Pierre-vh force-pushed the generictargets branch from 2fdf9a2 to 76cd2ea Compare January 10, 2024 10:34

Pierre-vh commented Jan 10, 2024

View reviewed changes

Pierre-vh mentioned this pull request Jan 10, 2024

[AMDGPU] Introduce Code Object V6 #76954

Merged

cgmb reviewed Jan 11, 2024

View reviewed changes

llvm/docs/AMDGPUUsage.rst Show resolved Hide resolved

llvm/docs/AMDGPUUsage.rst Outdated Show resolved Hide resolved

This was referenced Jan 11, 2024

gfx1031: initial import ROCm/rocBLAS#1251

Closed

AMD gfx1103 laptop GPU returning HIPBLAS_STATUS_UNKNOWN Mozilla-Ocho/llamafile#188

Closed

arsenm reviewed Feb 6, 2024

View reviewed changes

t-tye reviewed Feb 6, 2024

View reviewed changes

Pierre-vh requested review from t-tye and arsenm February 7, 2024 08:26

t-tye reviewed Feb 8, 2024

View reviewed changes

arsenm reviewed Feb 8, 2024

View reviewed changes

t-tye approved these changes Feb 8, 2024

View reviewed changes

Pierre-vh added 7 commits February 9, 2024 09:57

fix test

ef109b2

nits

8d76247

clang-format

d546dae

doc

d0969e5

remove md change

f2ebf69

add test for mad-mix + fix other test

e9253a7

Pierre-vh force-pushed the generictargets branch from 635eb5a to e9253a7 Compare February 9, 2024 09:18

Pierre-vh requested a review from arsenm February 9, 2024 09:46

arsenm approved these changes Feb 9, 2024

View reviewed changes

Pierre-vh merged commit f93aa51 into llvm:main Feb 12, 2024
5 checks passed

Pierre-vh deleted the generictargets branch February 12, 2024 09:18

GZGavinZhao mentioned this pull request Mar 31, 2024

rocmPackages: extend ISA compatibility NixOS/nixpkgs#298388

Merged

13 tasks

epilk mentioned this pull request Apr 24, 2024

[AMDGPU] Include missing FeatureMADIntraFwdBug in gfx11-generic #89936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets #76955

[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets #76955

Pierre-vh commented Jan 4, 2024 •

edited

Loading

llvmbot commented Jan 4, 2024 •

edited

Loading

Pierre-vh commented Jan 4, 2024

Pierre-vh left a comment •

edited

Loading

Pierre-vh Jan 10, 2024

Pierre-vh commented Jan 17, 2024

arsenm Feb 6, 2024

github-actions bot commented Feb 6, 2024 •

edited

Loading

t-tye Feb 6, 2024

t-tye Feb 6, 2024

Pierre-vh Feb 7, 2024

Pierre-vh commented Feb 7, 2024

t-tye left a comment

Pierre-vh commented Feb 8, 2024

arsenm left a comment

Pierre-vh commented Feb 9, 2024

[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets #76955

[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets #76955

Conversation

Pierre-vh commented Jan 4, 2024 • edited Loading

llvmbot commented Jan 4, 2024 • edited Loading

Pierre-vh commented Jan 4, 2024

Pierre-vh left a comment • edited Loading

Choose a reason for hiding this comment

Pierre-vh Jan 10, 2024

Choose a reason for hiding this comment

Pierre-vh commented Jan 17, 2024

arsenm Feb 6, 2024

Choose a reason for hiding this comment

github-actions bot commented Feb 6, 2024 • edited Loading

t-tye Feb 6, 2024

Choose a reason for hiding this comment

t-tye Feb 6, 2024

Choose a reason for hiding this comment

Pierre-vh Feb 7, 2024

Choose a reason for hiding this comment

Pierre-vh commented Feb 7, 2024

t-tye left a comment

Choose a reason for hiding this comment

Pierre-vh commented Feb 8, 2024

arsenm left a comment

Choose a reason for hiding this comment

Pierre-vh commented Feb 9, 2024

Pierre-vh commented Jan 4, 2024 •

edited

Loading

llvmbot commented Jan 4, 2024 •

edited

Loading

Pierre-vh left a comment •

edited

Loading

github-actions bot commented Feb 6, 2024 •

edited

Loading