Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Auto-Upgrade some nvvm.annotations to attributes #119261

Merged

Conversation

AlexMaclean
Copy link
Member

@AlexMaclean AlexMaclean commented Dec 9, 2024

Add a new AutoUpgrade function to convert some legacy nvvm.annotations metadata to function level attributes. These attributes are quicker to look-up so improve compile time and are more idiomatic than using metadata which should not include required information that changes the meaning of the program.

Currently supported annotations are:

  • !"kernel" -> ptx_kernel calling convention
  • !"align" -> alignstack parameter attributes (return not yet supported)

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/119261.diff

6 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/NVPTX/NVPTX.h (+5)
  • (modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+4)
  • (modified) llvm/lib/Target/NVPTX/NVPTXUtilities.cpp (+7-2)
  • (added) llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp (+130)
  • (added) llvm/test/CodeGen/NVPTX/upgrade-nvvm-annotations.ll (+30)
diff --git a/llvm/lib/Target/NVPTX/CMakeLists.txt b/llvm/lib/Target/NVPTX/CMakeLists.txt
index 693365161330f5..bb2e4ad48b51d8 100644
--- a/llvm/lib/Target/NVPTX/CMakeLists.txt
+++ b/llvm/lib/Target/NVPTX/CMakeLists.txt
@@ -39,6 +39,7 @@ set(NVPTXCodeGen_sources
   NVVMReflect.cpp
   NVPTXProxyRegErasure.cpp
   NVPTXCtorDtorLowering.cpp
+  NVVMUpgradeAnnotations.cpp
   )
 
 add_llvm_target(NVPTXCodeGen
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index ca915cd3f3732f..53418148be3615 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -52,6 +52,7 @@ FunctionPass *createNVPTXLowerUnreachablePass(bool TrapUnreachable,
                                               bool NoTrapAfterNoreturn);
 MachineFunctionPass *createNVPTXPeephole();
 MachineFunctionPass *createNVPTXProxyRegErasurePass();
+ModulePass *createNVVMUpgradeAnnotationsPass();
 
 struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {
   PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
@@ -74,6 +75,10 @@ struct NVPTXCopyByValArgsPass : PassInfoMixin<NVPTXCopyByValArgsPass> {
   PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
 };
 
+struct NVVMUpgradeAnnotationsPass : PassInfoMixin<NVVMUpgradeAnnotationsPass> {
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
 namespace NVPTX {
 enum DrvInterface {
   NVCL,
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index a5c5e9420ee737..b4fd36625adc9c 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -254,6 +254,8 @@ void NVPTXTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
 
   PB.registerPipelineStartEPCallback(
       [this](ModulePassManager &PM, OptimizationLevel Level) {
+        PM.addPass(NVVMUpgradeAnnotationsPass());
+
         FunctionPassManager FPM;
         FPM.addPass(NVVMReflectPass(Subtarget.getSmVersion()));
         // Note: NVVMIntrRangePass was causing numerical discrepancies at one
@@ -349,6 +351,8 @@ void NVPTXPassConfig::addIRPasses() {
       AAR.addAAResult(WrapperPass->getResult());
   }));
 
+  addPass(createNVVMUpgradeAnnotationsPass());
+
   // NVVMReflectPass is added in addEarlyAsPossiblePasses, so hopefully running
   // it here does nothing.  But since we need it for correctness when lowering
   // to NVPTX, run it here too, in case whoever built our pass pipeline didn't
diff --git a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
index 98bffd92a087b6..04e83576cbf958 100644
--- a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
@@ -311,11 +311,16 @@ std::optional<unsigned> getMaxNReg(const Function &F) {
 }
 
 bool isKernelFunction(const Function &F) {
+  if (F.getCallingConv() == CallingConv::PTX_Kernel)
+    return true;
+
+  if (F.hasFnAttribute("nvvm.kernel"))
+    return true;
+
   if (const auto X = findOneNVVMAnnotation(&F, "kernel"))
     return (*X == 1);
 
-  // There is no NVVM metadata, check the calling convention
-  return F.getCallingConv() == CallingConv::PTX_Kernel;
+  return false;
 }
 
 MaybeAlign getAlign(const Function &F, unsigned Index) {
diff --git a/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp b/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp
new file mode 100644
index 00000000000000..ca550434835a2c
--- /dev/null
+++ b/llvm/lib/Target/NVPTX/NVVMUpgradeAnnotations.cpp
@@ -0,0 +1,130 @@
+//===- NVVMUpgradeAnnotations.cpp - Upgrade NVVM Annotations --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass replaces deprecated metadata in nvvm.annotation with a more modern
+// IR representation.
+//
+//===----------------------------------------------------------------------===//
+
+#include "NVPTX.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Attributes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Pass.h"
+#include <cstdint>
+
+#define DEBUG_TYPE "nvvm-upgrade-annotations"
+
+using namespace llvm;
+
+namespace llvm {
+void initializeNVVMUpgradeAnnotationsLegacyPassPass(PassRegistry &);
+} // namespace llvm
+
+namespace {
+
+class NVVMUpgradeAnnotationsLegacyPass : public ModulePass {
+public:
+  static char ID;
+  NVVMUpgradeAnnotationsLegacyPass() : ModulePass(ID) {
+    initializeNVVMUpgradeAnnotationsLegacyPassPass(
+        *PassRegistry::getPassRegistry());
+  }
+  bool runOnModule(Module &M) override;
+};
+} // namespace
+
+char NVVMUpgradeAnnotationsLegacyPass::ID = 0;
+
+bool static autoUpgradeAnnotation(Function *F, StringRef K, const Metadata *V) {
+  if (K == "kernel") {
+    assert(mdconst::extract<ConstantInt>(V)->getZExtValue() == 1);
+    F->addFnAttr("nvvm.kernel");
+    return true;
+  }
+  if (K == "align") {
+    const uint64_t AlignBits = mdconst::extract<ConstantInt>(V)->getZExtValue();
+    const unsigned Idx = (AlignBits >> 16);
+    const Align StackAlign = Align(AlignBits & 0xFFFF);
+    // TODO: Skip adding the stackalign attribute for returns, for now.
+    if (!Idx)
+      return false;
+    F->addAttributeAtIndex(
+        Idx, Attribute::getWithStackAlignment(F->getContext(), StackAlign));
+    return true;
+  }
+
+  return false;
+}
+
+// Iterate over nvvm.annotations rewriting them as appropiate.
+void static upgradeNVAnnotations(Module &M) {
+  NamedMDNode *NamedMD = M.getNamedMetadata("nvvm.annotations");
+  if (!NamedMD)
+    return;
+
+  SmallVector<MDNode *, 8> NewNodes;
+  SmallSet<const MDNode *, 8> SeenNodes;
+  for (MDNode *MD : NamedMD->operands()) {
+    if (SeenNodes.contains(MD))
+      continue;
+    SeenNodes.insert(MD);
+
+    Function *F = mdconst::dyn_extract_or_null<Function>(MD->getOperand(0));
+    if (!F)
+      continue;
+
+    assert(MD && "Invalid MDNode for annotation");
+    assert((MD->getNumOperands() % 2) == 1 && "Invalid number of operands");
+
+    SmallVector<Metadata *, 8> NewOperands;
+    // start index = 1, to skip the global variable key
+    // increment = 2, to skip the value for each property-value pairs
+    for (unsigned j = 1, je = MD->getNumOperands(); j < je; j += 2) {
+      MDString *K = cast<MDString>(MD->getOperand(j));
+      const MDOperand &V = MD->getOperand(j + 1);
+      bool Upgraded = autoUpgradeAnnotation(F, K->getString(), V);
+      if (!Upgraded)
+        NewOperands.append({K, V});
+    }
+
+    if (!NewOperands.empty()) {
+      NewOperands.insert(NewOperands.begin(), MD->getOperand(0));
+      NewNodes.push_back(MDNode::get(M.getContext(), NewOperands));
+    }
+  }
+
+  NamedMD->clearOperands();
+  for (MDNode *N : NewNodes)
+    NamedMD->addOperand(N);
+}
+
+PreservedAnalyses NVVMUpgradeAnnotationsPass::run(Module &M,
+                                                  ModuleAnalysisManager &AM) {
+  upgradeNVAnnotations(M);
+  return PreservedAnalyses::all();
+}
+
+bool NVVMUpgradeAnnotationsLegacyPass::runOnModule(Module &M) {
+  upgradeNVAnnotations(M);
+  return false;
+}
+
+INITIALIZE_PASS(NVVMUpgradeAnnotationsLegacyPass, DEBUG_TYPE,
+                "NVVMUpgradeAnnotations", false, false)
+
+ModulePass *llvm::createNVVMUpgradeAnnotationsPass() {
+  return new NVVMUpgradeAnnotationsLegacyPass();
+}
diff --git a/llvm/test/CodeGen/NVPTX/upgrade-nvvm-annotations.ll b/llvm/test/CodeGen/NVPTX/upgrade-nvvm-annotations.ll
new file mode 100644
index 00000000000000..68dc2353858cb3
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/upgrade-nvvm-annotations.ll
@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --check-globals all --version 5
+; RUN: opt < %s -mtriple=nvptx64-unknown-unknown -O0 -S | FileCheck %s
+
+define i32 @foo(i32 %a, i32 %b) {
+; CHECK-LABEL: define i32 @foo(
+; CHECK-SAME: i32 alignstack(8) [[A:%.*]], i32 alignstack(16) [[B:%.*]]) {
+; CHECK-NEXT:    ret i32 0
+;
+  ret i32 0
+}
+
+define i32 @bar(i32 %a, i32 %b) {
+; CHECK-LABEL: define i32 @bar(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    ret i32 0
+;
+  ret i32 0
+}
+
+!nvvm.annotations = !{!0, !1, !2}
+
+!0 = !{ptr @foo, !"align", i32 u0x00000008, !"align", i32 u0x00010008, !"align", i32 u0x00020010}
+!1 = !{null, !"align", i32 u0x00000008, !"align", i32 u0x00010008, !"align", i32 u0x00020008}
+!2 = !{ptr @bar, !"kernel", i32 1}
+
+;.
+; CHECK: attributes #[[ATTR0]] = { "nvvm.kernel" }
+;.
+; CHECK: [[META0:![0-9]+]] = !{ptr @foo, !"align", i32 8}
+;.

@AlexMaclean
Copy link
Member Author

@Artem-B, This functionality is currently implemented as a pass that we add at the beginning of llc and opt. Do you think it would make more sense to put in AutoUpgrade?

@Artem-B
Copy link
Member

Artem-B commented Dec 9, 2024

@Artem-B, This functionality is currently implemented as a pass that we add at the beginning of llc and opt. Do you think it would make more sense to put in AutoUpgrade?

Maybe. I've mostly used autoupgrade for intrinsics, but this kind of change may fit there, too. Give it a try. Upgrading on load would probably be better than relying on a bespoke pass -- then we can rely on function attributes as the ground truth, instead of having to look at both, because the special pass may not have run yet.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen LTO Link time optimization (regular/full LTO or ThinLTO) llvm:ir labels Dec 9, 2024
@llvmbot llvmbot added llvm:transforms clang:openmp OpenMP related changes to Clang labels Dec 10, 2024
Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits. LGTM overall, except for the "kernel/nvvm.kernel" distinction question.

;.
; CGSCC: attributes #[[ATTR0]] = { "llvm.assume"="ompx_aligned_barrier" }
; CGSCC: attributes #[[ATTR1:[0-9]+]] = { convergent nocallback nounwind }
; CGSCC: attributes #[[ATTR2:[0-9]+]] = { convergent nocallback nofree nounwind willreturn }
; CGSCC: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
; CGSCC: attributes #[[ATTR4]] = { "kernel" }
; CGSCC: attributes #[[ATTR5]] = { nosync memory(none) }
; CGSCC: attributes #[[ATTR4]] = { "kernel" "nvvm.kernel" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a distinction between "kernel" and "nvvm.kernel" ?

Copy link
Member Author

@AlexMaclean AlexMaclean Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think we do. "kernel" is really more like "OpenMP kernel" and the semantics for this do not seem to be a perfect match for "nvvm.kernel". For example, @multiple_blocks_functions_non_kernel_effects_2 in this test has "kernel" but is not an nvvm kernel. I'm very unfamiliar with the OpenMP semantics so I thought keeping it separate would be the safest approach, it also may be clearest to have a common "nvvm.*" prefix for all attributes currently represented as nvvm.annotations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought OpenMP just used nvvm.kernel as well? As far as I was aware it just controlled whether or not the function got .entry in the PTX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that OpenMP seems to need to be able to draw a distinction between OpenMP kernels and nvvm kernels. For example here it seems like OpenMP only wants to look at "kernel" not "nvvm.kernel". As a result it seems like these attributes cannot be easily unified.

// We are only interested in OpenMP target regions. Others, such as kernels
// generated by CUDA but linked together, are not interesting to this pass.
if (isOpenMPKernel(*KernelFn)) {
++NumOpenMPTargetRegionKernels;
Kernels.insert(KernelFn);
} else
++NumNonOpenMPTargetRegionKernels;

Copy link
Contributor

@shiltian shiltian Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as the comment pointed out. Only OpenMP target regions are marked as kernel. Nothing else, though I'm not really sure if that is a good name. Lol.

Copy link
Contributor

@shiltian shiltian Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, us putting nvvm.kernel on everything should be fixed.

+1

It is likely because the OpenMPGPUCodeGen for the AMDGPU part shares the same code as NVPTX and even doesn't bother to emit the attribute based on actual target. 😮‍💨

¯_(ツ)_/¯

Is there any way around this?

That should not be worked around. To have both is wrong at the first place.

Copy link
Contributor

@arsenm arsenm Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does PTX support calling a kernel entry point as an ordinary function from another function? I've never understood how PTX doesn't set an IR calling convention for the entries.

AMDGPU also does have multiple types of entry points for graphics, but even if we didn't the kernels would have a separate calling convention for callable functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an error at the PTX level so the backend just lets it happen and waits until ptxas blows up AFAICT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that really should just be a separate calling convention

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fair enough. I'll start switch us over to a calling convention in #120806

@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream-nv-ann-upgrade branch from b57629a to 416d908 Compare December 10, 2024 20:28
// not support stackalign attribute for this.
if (Index == 0) {
std::vector<unsigned> Vs;
if (findAllNVVMAnnotation(&F, "align", Vs))
Copy link
Member

@Artem-B Artem-B Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offtopic: I think if findAllNVVMAnnotation() returned ArrayRef it would work much nicer than copying data into a temp array. Bonus points for making it plural.

if (Index == 0) {
  for (unsigned V : findAllNVVMAnnotation())
     do stuff;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I agree the NVVM annotation APIs could be cleaned up significantly, hopefully this work will remove the need for them altogether though.

@AlexMaclean AlexMaclean changed the title [NVPTX] Add NVVMUpgradeAnnotations pass to cleanup legacy annotations [NVPTX] Auto-Upgrade some nvvm.annotations to attributes Dec 12, 2024
@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream-nv-ann-upgrade branch from 416d908 to 6e1b7af Compare January 28, 2025 00:30
@AlexMaclean
Copy link
Member Author

Now that #122320 is landed, all in-tree frontends and tests use the ptx_kernel calling convention instead of metadata. I think it now makes sense to revisit this change as it is much simpler. I've switched from the initially proposed nvvm.kernel attribute to using the calling convention.

@Artem-B please take another look when you have a moment.

@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream-nv-ann-upgrade branch from 885d162 to 7f126e1 Compare January 29, 2025 20:35
@AlexMaclean AlexMaclean merged commit de7438e into llvm:main Jan 30, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 30, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang running on ppc64le-flang-rhel-test while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/18648

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
ninja: error: build.ninja:4378: multiple outputs aren't (yet?) supported by depslog; bring this up on the mailing list if it affects you


@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 30, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-lld-multistage-test running on ppc64le-lld-multistage-test while building llvm at step 6 "build-stage1-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/8137

Here is the relevant piece of the build log for the reference
Step 6 (build-stage1-unified-tree) failure: build (failure)
ninja: error: build.ninja:3959: multiple outputs aren't (yet?) supported by depslog; bring this up on the mailing list if it affects you


@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 30, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-mlir-rhel-clang running on ppc64le-mlir-rhel-test while building llvm at step 5 "build-check-mlir-build-only".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/13869

Here is the relevant piece of the build log for the reference
Step 5 (build-check-mlir-build-only) failure: build (failure)
ninja: error: build.ninja:4378: multiple outputs aren't (yet?) supported by depslog; bring this up on the mailing list if it affects you


@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 30, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/13758

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Analysis/live-stmts.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/21/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -w -analyzer-checker=debug.DumpLiveExprs /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp 2>&1   | /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/21/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -w -analyzer-checker=debug.DumpLiveExprs /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp:239:16: �[0m�[0;1;31merror: �[0m�[1mCHECK-EMPTY: is not on the line after the previous match
�[0m// CHECK-EMPTY:
�[0;1;32m               ^
�[0m�[1m<stdin>:180:1: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m
�[0;1;32m^
�[0m�[1m<stdin>:177:1: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0m
�[0;1;32m^
�[0m�[1m<stdin>:178:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0mImplicitCastExpr 0x13e127578 '_Bool' <LValueToRValue>
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m           1: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m           2: �[0m�[1m�[0;1;46m�[0m[ B0 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:21      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           3: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:22      ^
�[0m�[0;1;30m           4: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:23      ^
�[0m�[0;1;30m           5: �[0m�[1m�[0;1;46m�[0m[ B1 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:24      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           6: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:25      ^
�[0m�[0;1;30m           7: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:26      ^
�[0m�[0;1;30m           8: �[0m�[1m�[0;1;46m�[0m[ B2 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:27      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           9: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:28      ^
�[0m�[0;1;30m          10: �[0m�[1m�[0;1;46m�[0mDeclRefExpr 0x13e1242e0 'int' lvalue ParmVar 0x13e107670 'y' 'int'�[0;1;46m �[0m
�[0;1;32mnext:29       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m          11: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:30      ^
�[0m�[0;1;30m          12: �[0m�[1m�[0;1;46m�[0mDeclRefExpr 0x13e124300 'int' lvalue ParmVar 0x13e1076f0 'z' 'int'�[0;1;46m �[0m
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 30, 2025

LLVM Buildbot has detected a new failure on builder arc-builder running on arc-worker while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/10992

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lit :: googletest-timeout.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 9
not env -u FILECHECK_OPTS "/usr/local/bin/python3.9" /buildbot/worker/arc-folder/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout    --param gtest_filter=InfiniteLoopSubTest --timeout=1 > /buildbot/worker/arc-folder/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out
# executed command: not env -u FILECHECK_OPTS /usr/local/bin/python3.9 /buildbot/worker/arc-folder/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout --param gtest_filter=InfiniteLoopSubTest --timeout=1
# .---command stderr------------
# | lit.py: /buildbot/worker/arc-folder/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# | Traceback (most recent call last):
# |   File "/buildbot/worker/arc-folder/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 304, in post_process_shard_results
# |     testsuites = json.load(f)["testsuites"]
# |   File "/usr/local/lib/python3.9/json/__init__.py", line 293, in load
# |     return loads(fp.read(),
# |   File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads
# |     return _default_decoder.decode(s)
# |   File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode
# |     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
# |   File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode
# |     raise JSONDecodeError("Expecting value", s, err.value) from None
# | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
# | 
# | During handling of the above exception, another exception occurred:
# | 
# | Traceback (most recent call last):
# |   File "/buildbot/worker/arc-folder/llvm-project/llvm/utils/lit/lit.py", line 6, in <module>
# |     main()
# |   File "/buildbot/worker/arc-folder/llvm-project/llvm/utils/lit/lit/main.py", line 130, in main
# |     selected_tests, discovered_tests = GoogleTest.post_process_shard_results(
# |   File "/buildbot/worker/arc-folder/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 306, in post_process_shard_results
# |     raise RuntimeError(
# | RuntimeError: Failed to parse json file: /buildbot/worker/arc-folder/build/utils/lit/tests/Inputs/googletest-timeout/DummySubDir/OneTest.py-googletest-timeout-12152-1-2.json
# | 
# `-----------------------------
# RUN: at line 11
FileCheck --check-prefix=CHECK-INF < /buildbot/worker/arc-folder/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out /buildbot/worker/arc-folder/build/utils/lit/tests/googletest-timeout.py
# executed command: FileCheck --check-prefix=CHECK-INF /buildbot/worker/arc-folder/build/utils/lit/tests/googletest-timeout.py
# .---command stderr------------
# | /buildbot/worker/arc-folder/build/utils/lit/tests/googletest-timeout.py:34:14: error: CHECK-INF: expected string not found in input
# | # CHECK-INF: Timed Out: 1
# |              ^
# | <stdin>:13:29: note: scanning from here
# | Reached timeout of 1 seconds
# |                             ^
# | <stdin>:15:21: note: possible intended match here
# | TIMEOUT: googletest-timeout :: DummySubDir/OneTest.py/1/2 (2 of 2)
# |                     ^
# | 
# | Input file: <stdin>
...

@akuegel
Copy link
Member

akuegel commented Jan 31, 2025

With this change, we run into a failure with this cuda call:

https://github.com/openxla/xla/blob/main/xla/stream_executor/cuda/cuda_executor.cc#L222

Apparently we don't find the function anymore. I checked the emitted ptx, and the difference is here:

before:

.visible .entry loop_multiply_fusion(

after:

.visible .func loop_multiply_fusion(

Also we are missing the alignment information, for example .align 128

@Artem-B maybe you have an idea what we may need to do for XLA to make it work again?

@akuegel
Copy link
Member

akuegel commented Jan 31, 2025

Ok, I found a fix with the help of @d0k, it seems now we need to also set the calling convention of the function to llvm::CallingConv::PTX_Kernel, as the legacy annotations are not accepted anymore (the change to isKernelFunction in NVPTXUtilities.h)

@ronlieb ronlieb self-requested a review January 31, 2025 21:31
@AlexMaclean
Copy link
Member Author

Ok, I found a fix with the help of @d0k, it seems now we need to also set the calling convention of the function to llvm::CallingConv::PTX_Kernel, as the legacy annotations are not accepted anymore (the change to isKernelFunction in NVPTXUtilities.h)

Yep this seems like the appropriate solution to me. You should be able to also remove any code adding the !"kernel" metadata.

As a heads up, I hope to deprecate and remove nearly all nvvm.annotations, replacing them with attributes or other more idiomatic mechanisms. I'll update in-tree frontends and add auto-upgrade rules to ensure backwards compatibility with older IR, but any out-of-tree frontends where the IR doesn't go through Auto-upgrade will need to replace uses of nvvm.annotations as they are removed.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 3, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 5, 2025
Add a new AutoUpgrade function to convert some legacy nvvm.annotations
metadata to function level attributes. These attributes are quicker to
look-up so improve compile time and are more idiomatic than using
metadata which should not include required information that changes the
meaning of the program.

Currently supported annotations are:

- !"kernel" -> ptx_kernel calling convention
- !"align" -> alignstack parameter attributes (return not yet supported)
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 5, 2025
…erm/restore-nvmm-annotations

[NVPTX] Auto-Upgrade some nvvm.annotations to attributes (llvm#119261)
hanhanW added a commit to iree-org/llvm-project that referenced this pull request Feb 5, 2025
@hanhanW
Copy link
Contributor

hanhanW commented Feb 5, 2025

Hi, we run into a failure in downstream project (i.e., IREE) with the change. I'm not an expert of this area, but I'd like to see how to fix our problem properly. Without the test, we are generating something like:

.visible .entry add_dispatch(
	.param .u64 .ptr .global .align 16 add_dispatch_param_0,
	.param .u64 .ptr .global .align 16 add_dispatch_param_1,
	.param .u64 .ptr .global .align 16 add_dispatch_param_2
)
.maxntid 32, 1, 1
{
	.reg .pred 	%p<2>;
	.reg .b32 	%r<2>;
	.reg .f32 	%f<4>;
	.reg .b64 	%rd<8>;

	mov.u32 	%r1, %tid.x;
	bar.sync 	0;
	setp.gt.u32 	%p1, %r1, 15;
	@%p1 bra 	$L__BB0_2;
	ld.param.u64 	%rd4, [add_dispatch_param_0];
	ld.param.u64 	%rd5, [add_dispatch_param_1];
	ld.param.u64 	%rd6, [add_dispatch_param_2];
	mul.wide.u32 	%rd7, %r1, 4;
	add.s64 	%rd1, %rd4, %rd7;
	add.s64 	%rd2, %rd5, %rd7;
	add.s64 	%rd3, %rd6, %rd7;
	ld.global.f32 	%f1, [%rd1];
	ld.global.f32 	%f2, [%rd2];
	add.rn.f32 	%f3, %f1, %f2;
	st.global.f32 	[%rd3], %f3;
$L__BB0_2:
	bar.sync 	0;
	ret;

}

With the patch, the entry becomes func, which is probably fine. However, the maxntid disappears. According to NVIDIA doc, it specifies the maximum number of threads that a thread block can have. Is it valid to drop such information with the patch? Below is the generated PTX with the patch.

.visible .func add_dispatch(
	.param .b64 add_dispatch_param_0,
	.param .b64 add_dispatch_param_1,
	.param .b64 add_dispatch_param_2
)
{
	.reg .pred 	%p<2>;
	.reg .b32 	%r<2>;
	.reg .f32 	%f<4>;
	.reg .b64 	%rd<8>;

	mov.u32 	%r1, %tid.x;
	bar.sync 	0;
	setp.gt.u32 	%p1, %r1, 15;
	@%p1 bra 	$L__BB0_2;
	ld.param.u64 	%rd4, [add_dispatch_param_0];
	ld.param.u64 	%rd5, [add_dispatch_param_1];
	ld.param.u64 	%rd6, [add_dispatch_param_2];
	mul.wide.u32 	%rd7, %r1, 4;
	add.s64 	%rd1, %rd4, %rd7;
	add.s64 	%rd2, %rd5, %rd7;
	add.s64 	%rd3, %rd6, %rd7;
	ld.global.f32 	%f1, [%rd1];
	ld.global.f32 	%f2, [%rd2];
	add.rn.f32 	%f3, %f1, %f2;
	st.global.f32 	[%rd3], %f3;
$L__BB0_2:
	bar.sync 	0;
	ret;

}

I can provide more artifacts if it helps, thanks in advance!

@AlexMaclean
Copy link
Member Author

Hi, we run into a failure in downstream project (i.e., IREE) with the change. I'm not an expert of this area, but I'd like to see how to fix our problem properly. Without the test, we are generating something like:

Hi @hanhanW, my guess here is that your downstream project contains a front-end that is still generating IR that designates kernels via the nvvm.annotations metadata, If this IR is directly lowered via the backend (without getting serialized or otherwise going though a path where auto-upgrade runs), then this metadata will no longer be recognized.

If my assumption is correct, to fix this you should update your frontend to mark kernels via the calling convention. (Similar to this change for clang: https://github.com/llvm/llvm-project/pull/120806/files#diff-148cd424d4a2056106448d04727450e281ca1bf03ac68d5db2e8dff59d397090R261)

F->setCallingConv(llvm::CallingConv::PTX_Kernel);

@hanhanW
Copy link
Contributor

hanhanW commented Feb 6, 2025

Thank you, this is very helpful! I have a local fix, which generates the right code! E.g., replacing the below line with your suggestion. 🙏🏻

https://github.com/iree-org/iree/blob/86b845bcea4f60536a88850d71a1d0661437b3e8/compiler/plugins/target/CUDA/CUDATarget.cpp#L548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX clang:codegen clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category llvm:ir llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants