Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Compute induction end values in VPlan. #112145

Merged
merged 36 commits into from
Dec 29, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Oct 13, 2024

Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on #110004 and
#109975.

Copy link

github-actions bot commented Oct 13, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2024
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
@fhahn fhahn force-pushed the vplan-compute-iv-end-values branch 2 times, most recently from 3b414ba to 90c7f53 Compare October 24, 2024 18:56
@fhahn fhahn force-pushed the vplan-compute-iv-end-values branch 3 times, most recently from 944283e to 8001f75 Compare November 10, 2024 10:32
@fhahn fhahn force-pushed the vplan-compute-iv-end-values branch from 8001f75 to 087668b Compare November 24, 2024 19:54
fhahn added a commit that referenced this pull request Nov 24, 2024
Allow setting the name to use for the generated IR value of the derived
IV in preparations for #112145.

This is analogous to VPInstruction::Name.
@fhahn fhahn force-pushed the vplan-compute-iv-end-values branch from 087668b to e9ad7c2 Compare December 7, 2024 12:21
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2024

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-systemz

Author: Florian Hahn (fhahn)

Changes

Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on #110004 and
#109975.


Patch is 203.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112145.diff

54 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+2-2)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+130-104)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fixed-order-recurrence.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll (+31-29)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/blend-any-of-reduction-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+7-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-uniform-store.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/SystemZ/predicated-first-order-recurrence.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr72969.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/scatter_crash.ll (+22-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+29-29)
  • (modified) llvm/test/Transforms/LoopVectorize/branch-weights.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-any-of-reductions.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+36-36)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+26-26)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr59319-loop-access-info-invalidation.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/pr66616.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-align.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/store-reduction-results-in-tail-folded-loop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+32-15)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index fbcf181a45a664..0298ab523307db 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -233,8 +233,8 @@ class VPBuilder {
 
   VPDerivedIVRecipe *createDerivedIV(InductionDescriptor::InductionKind Kind,
                                      FPMathOperator *FPBinOp, VPValue *Start,
-                                     VPCanonicalIVPHIRecipe *CanonicalIV,
-                                     VPValue *Step, const Twine &Name = "") {
+                                     VPValue *CanonicalIV, VPValue *Step,
+                                     const Twine &Name = "") {
     return tryInsertInstruction(
         new VPDerivedIVRecipe(Kind, FPBinOp, Start, CanonicalIV, Step, Name));
   }
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 4a4553e4a8db8d..1ee596502f9d44 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -513,21 +513,14 @@ class InnerLoopVectorizer {
   /// Fix the non-induction PHIs in \p Plan.
   void fixNonInductionPHIs(VPTransformState &State);
 
-  /// Create a ResumePHI VPInstruction for the induction \p InductionPhiIRI to
-  /// resume iteration count in the scalar epilogue from where the vectorized
-  /// loop left off, and add it to the scalar preheader of VPlan. Also creates
-  /// the induction resume value, and the value for the bypass block, if needed.
-  /// \p Step is the SCEV-expanded induction step to use. In cases where the
-  /// loop skeleton is more complicated (i.e., epilogue vectorization) and the
-  /// resume values can come from an additional bypass block,
-  /// \p MainVectorTripCount provides the trip count of the main vector loop,
-  /// used to compute the resume value reaching the scalar loop preheader
-  /// directly from this additional bypass block.
-  void createInductionResumeVPValue(VPIRInstruction *InductionPhiIRI,
-                                    const InductionDescriptor &ID, Value *Step,
-                                    ArrayRef<BasicBlock *> BypassBlocks,
-                                    VPBuilder &ScalarPHBuilder,
-                                    Value *MainVectorTripCount = nullptr);
+  /// Create the bypass resume value coming from the additional bypass block. \p
+  /// Step is the SCEV-expanded induction step to use. \p MainVectorTripCount
+  /// provides the trip count of the main vector loop, used to compute the
+  /// resume value reaching the scalar loop preheader directly from this
+  /// additional bypass block.
+  void createInductionBypassValue(PHINode *OrigPhi,
+                                  const InductionDescriptor &ID, Value *Step,
+                                  Value *MainVectorTripCount);
 
   /// Returns the original loop trip count.
   Value *getTripCount() const { return TripCount; }
@@ -584,17 +577,10 @@ class InnerLoopVectorizer {
   /// vector loop preheader, middle block and scalar preheader.
   void createVectorLoopSkeleton(StringRef Prefix);
 
-  /// Create new phi nodes for the induction variables to resume iteration count
-  /// in the scalar epilogue, from where the vectorized loop left off.
-  /// In cases where the loop skeleton is more complicated (i.e. epilogue
-  /// vectorization), \p MainVectorTripCount provides the trip count of the main
-  /// loop, used to compute these resume values. If \p IVSubset is provided, it
-  /// contains the phi nodes for which resume values are needed, because they
-  /// will generate wide induction phis in the epilogue loop.
-  void
-  createInductionResumeVPValues(const SCEV2ValueTy &ExpandedSCEVs,
-                                Value *MainVectorTripCount = nullptr,
-                                SmallPtrSetImpl<PHINode *> *IVSubset = nullptr);
+  /// Create values for the induction variables to resume iteration count
+  /// in the bypass block.
+  void createInductionBypassValues(const SCEV2ValueTy &ExpandedSCEVs,
+                                   Value *MainVectorTripCount);
 
   /// Allow subclasses to override and print debug traces before/after vplan
   /// execution, when trace information is requested.
@@ -2613,21 +2599,11 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
                  nullptr, Twine(Prefix) + "scalar.ph");
 }
 
-void InnerLoopVectorizer::createInductionResumeVPValue(
-    VPIRInstruction *InductionPhiRI, const InductionDescriptor &II, Value *Step,
-    ArrayRef<BasicBlock *> BypassBlocks, VPBuilder &ScalarPHBuilder,
+void InnerLoopVectorizer::createInductionBypassValue(
+    PHINode *OrigPhi, const InductionDescriptor &II, Value *Step,
     Value *MainVectorTripCount) {
-  // TODO: Move to LVP or general VPlan construction, once no IR values are
-  // generated.
-  auto *OrigPhi = cast<PHINode>(&InductionPhiRI->getInstruction());
-  Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
-  assert(VectorTripCount && "Expected valid arguments");
-
   Instruction *OldInduction = Legal->getPrimaryInduction();
-  // For the primary induction the end values are known.
-  Value *EndValue = VectorTripCount;
   Value *EndValueFromAdditionalBypass = MainVectorTripCount;
-  // Otherwise compute them accordingly.
   if (OrigPhi != OldInduction) {
     IRBuilder<> B(LoopVectorPreHeader->getTerminator());
 
@@ -2635,10 +2611,6 @@ void InnerLoopVectorizer::createInductionResumeVPValue(
     if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp()))
       B.setFastMathFlags(II.getInductionBinOp()->getFastMathFlags());
 
-    EndValue = emitTransformedIndex(B, VectorTripCount, II.getStartValue(),
-                                    Step, II.getKind(), II.getInductionBinOp());
-    EndValue->setName("ind.end");
-
     // Compute the end value for the additional bypass (if applicable).
     if (MainVectorTripCount) {
       B.SetInsertPoint(getAdditionalBypassBlock(),
@@ -2650,22 +2622,12 @@ void InnerLoopVectorizer::createInductionResumeVPValue(
     }
   }
 
-  auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp(
-      VPInstruction::ResumePhi,
-      {Plan.getOrAddLiveIn(EndValue), Plan.getOrAddLiveIn(II.getStartValue())},
-      OrigPhi->getDebugLoc(), "bc.resume.val");
-  assert(InductionPhiRI->getNumOperands() == 0 &&
-         "InductionPhiRI should not have any operands");
-  InductionPhiRI->addOperand(ResumePhiRecipe);
-
-  if (EndValueFromAdditionalBypass) {
-    // Store the bypass value here, as it needs to be added as operand to its
-    // scalar preheader phi node after the epilogue skeleton has been created.
-    // TODO: Directly add as extra operand to the VPResumePHI recipe.
-    assert(!Induction2AdditionalBypassValue.contains(OrigPhi) &&
-           "entry for OrigPhi already exits");
-    Induction2AdditionalBypassValue[OrigPhi] = EndValueFromAdditionalBypass;
-  }
+  // Store the bypass value here, as it needs to be added as operand to its
+  // scalar preheader phi node after the epilogue skeleton has been created.
+  // TODO: Directly add as extra operand to the VPResumePHI recipe.
+  assert(!Induction2AdditionalBypassValue.contains(OrigPhi) &&
+         "entry for OrigPhi already exits");
+  Induction2AdditionalBypassValue[OrigPhi] = EndValueFromAdditionalBypass;
 }
 
 /// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV
@@ -2682,29 +2644,15 @@ static Value *getExpandedStep(const InductionDescriptor &ID,
   return I->second;
 }
 
-void InnerLoopVectorizer::createInductionResumeVPValues(
-    const SCEV2ValueTy &ExpandedSCEVs, Value *MainVectorTripCount,
-    SmallPtrSetImpl<PHINode *> *IVSubset) {
-  // We are going to resume the execution of the scalar loop.
-  // Go over all of the induction variable PHIs of the scalar loop header and
-  // fix their starting values, which depend on the counter of the last
-  // iteration of the vectorized loop. If we come from one of the
-  // LoopBypassBlocks then we need to start from the original start value.
-  // Otherwise we provide the trip count from the main vector loop.
-  VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader();
-  VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin());
-  for (VPRecipeBase &R : *Plan.getScalarHeader()) {
-    auto *PhiR = cast<VPIRInstruction>(&R);
-    auto *Phi = dyn_cast<PHINode>(&PhiR->getInstruction());
-    if (!Phi)
-      break;
-    if (!Legal->getInductionVars().contains(Phi) ||
-        (IVSubset && !IVSubset->contains(Phi)))
-      continue;
-    const InductionDescriptor &II = Legal->getInductionVars().find(Phi)->second;
-    createInductionResumeVPValue(PhiR, II, getExpandedStep(II, ExpandedSCEVs),
-                                 LoopBypassBlocks, ScalarPHBuilder,
-                                 MainVectorTripCount);
+void InnerLoopVectorizer::createInductionBypassValues(
+    const SCEV2ValueTy &ExpandedSCEVs, Value *MainVectorTripCount) {
+  assert(MainVectorTripCount && "Must have bypass information");
+
+  for (const auto &InductionEntry : Legal->getInductionVars()) {
+    PHINode *OrigPhi = InductionEntry.first;
+    const InductionDescriptor &II = InductionEntry.second;
+    createInductionBypassValue(OrigPhi, II, getExpandedStep(II, ExpandedSCEVs),
+                               MainVectorTripCount);
   }
 }
 
@@ -2766,8 +2714,8 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
   // faster.
   emitMemRuntimeChecks(LoopScalarPreHeader);
 
-  // Emit phis for the new starting index of the scalar loop.
-  createInductionResumeVPValues(ExpandedSCEVs);
+  Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
+  assert(VectorTripCount && "Expected valid arguments");
 
   return {LoopVectorPreHeader, nullptr};
 }
@@ -7848,19 +7796,6 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
   // Generate the induction variable.
   EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
 
-  // Generate VPValues and ResumePhi recipes for wide inductions in the epilogue
-  // plan only. Other inductions only need a resume value for the canonical
-  // induction, which will get created during epilogue skeleton construction.
-  SmallPtrSet<PHINode *, 4> WideIVs;
-  for (VPRecipeBase &H :
-       EPI.EpiloguePlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
-    if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&H))
-      WideIVs.insert(WideIV->getPHINode());
-    else if (auto *PtrIV = dyn_cast<VPWidenPointerInductionRecipe>(&H))
-      WideIVs.insert(cast<PHINode>(PtrIV->getUnderlyingValue()));
-  }
-  createInductionResumeVPValues(ExpandedSCEVs, nullptr, &WideIVs);
-
   return {LoopVectorPreHeader, nullptr};
 }
 
@@ -8049,13 +7984,14 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
     EPResumeVal->addIncoming(Init, EPI.MainLoopIterationCountCheck);
   }
 
-  // Generate induction resume values. These variables save the new starting
-  // indexes for the scalar loop. They are used to test if there are any tail
-  // iterations left once the vector loop has completed.
-  // Note that when the vectorized epilogue is skipped due to iteration count
-  // check, then the resume value for the induction variable comes from
-  // the trip count of the main vector loop, passed as the second argument.
-  createInductionResumeVPValues(ExpandedSCEVs, EPI.VectorTripCount);
+  Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
+  assert(VectorTripCount && "Expected valid arguments");
+
+  // Generate bypass values from the bypass blocks. Note that when the
+  // vectorized epilogue is skipped due to iteration count check, then the
+  // resume value for the induction variable comes from the trip count of the
+  // main vector loop, passed as the second argument.
+  createInductionBypassValues(ExpandedSCEVs, EPI.VectorTripCount);
 
   return {LoopVectorPreHeader, EPResumeVal};
 }
@@ -8858,13 +8794,64 @@ static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
                        {CanonicalIVIncrement, &Plan.getVectorTripCount()}, DL);
 }
 
+static VPValue *addResumeValuesForInduction(VPHeaderPHIRecipe *PhiR,
+                                            VPBuilder &Builder,
+                                            VPBuilder &ScalarPHBuilder,
+                                            VPTypeAnalysis &TypeInfo,
+                                            VPValue *VectorTC) {
+  PHINode *OrigPhi;
+  const InductionDescriptor *ID;
+  VPValue *Start;
+  VPValue *Step;
+  Type *ScalarTy;
+  bool IsCanonical = false;
+  if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(PhiR)) {
+    if (WideIV->getTruncInst())
+      return nullptr;
+    OrigPhi = cast<PHINode>(WideIV->getUnderlyingValue());
+    ID = &WideIV->getInductionDescriptor();
+    Start = WideIV->getStartValue();
+    Step = WideIV->getStepValue();
+    ScalarTy = WideIV->getScalarType();
+    IsCanonical = WideIV->isCanonical();
+  } else if (auto *WideIV = dyn_cast<VPWidenPointerInductionRecipe>(PhiR)) {
+    OrigPhi = cast<PHINode>(WideIV->getUnderlyingValue());
+    ID = &WideIV->getInductionDescriptor();
+    Start = WideIV->getStartValue();
+    Step = WideIV->getOperand(1);
+    ScalarTy = Start->getLiveInIRValue()->getType();
+  } else {
+    return nullptr;
+  }
+
+  VPValue *EndValue = VectorTC;
+  if (!IsCanonical) {
+    EndValue = Builder.createDerivedIV(
+        ID->getKind(),
+        dyn_cast_or_null<FPMathOperator>(ID->getInductionBinOp()), Start,
+        VectorTC, Step);
+  }
+
+  if (ScalarTy != TypeInfo.inferScalarType(EndValue)) {
+    EndValue = Builder.createScalarCast(Instruction::Trunc, EndValue, ScalarTy);
+  }
+
+  auto *ResumePhiRecipe =
+      ScalarPHBuilder.createNaryOp(VPInstruction::ResumePhi, {EndValue, Start},
+                                   OrigPhi->getDebugLoc(), "bc.resume.val");
+  return ResumePhiRecipe;
+}
+
 /// Create resume phis in the scalar preheader for first-order recurrences and
 /// reductions and update the VPIRInstructions wrapping the original phis in the
 /// scalar header.
 static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
+  VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType());
   auto *ScalarPH = Plan.getScalarPreheader();
   auto *MiddleVPBB = cast<VPBasicBlock>(ScalarPH->getSinglePredecessor());
   VPBuilder ScalarPHBuilder(ScalarPH);
+  VPBuilder VectorPHBuilder(
+      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor()));
   VPBuilder MiddleBuilder(MiddleVPBB, MiddleVPBB->getFirstNonPhi());
   VPValue *OneVPV = Plan.getOrAddLiveIn(
       ConstantInt::get(Plan.getCanonicalIV()->getScalarType(), 1));
@@ -8874,6 +8861,13 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
     if (!ScalarPhiI)
       break;
     auto *VectorPhiR = cast<VPHeaderPHIRecipe>(Builder.getRecipe(ScalarPhiI));
+
+    if (VPValue *ResumePhi = addResumeValuesForInduction(
+            VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo,
+            &Plan.getVectorTripCount())) {
+      ScalarPhiIRI->addOperand(ResumePhi);
+      continue;
+    }
     if (!isa<VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe>(VectorPhiR))
       continue;
     // The backedge value provides the value to resume coming out of a loop,
@@ -9635,7 +9629,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) {
       State.Builder, CanonicalIV, getStartValue()->getLiveInIRValue(), Step,
       Kind, cast_if_present<BinaryOperator>(FPBinOp));
   DerivedIV->setName(Name);
-  assert(DerivedIV != CanonicalIV && "IV didn't need transforming?");
 
   State.set(this, DerivedIV, VPLane(0));
 }
@@ -10299,6 +10292,39 @@ bool LoopVectorizePass::processLoop(Loop *L) {
                                            EPI, &LVL, &CM, BFI, PSI, Checks,
                                            *BestMainPlan);
 
+        // Collect PHI nodes of wide inductions in the VPlan for the epilogue.
+        // Those will need their resume-values computed from the main vector
+        // loop. Others can be removed in the main VPlan.
+        SmallPtrSet<PHINode *, 2> WidenedPhis;
+        for (VPRecipeBase &R :
+             BestEpiPlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
+          if (!isa<VPWidenIntOrFpInductionRecipe,
+                   VPWidenPointerInductionRecipe>(&R))
+            continue;
+          if (isa<VPWidenIntOrFpInductionRecipe>(&R))
+            WidenedPhis.insert(
+                cast<VPWidenIntOrFpInductionRecipe>(&R)->getPHINode());
+          else
+            WidenedPhis.insert(
+                cast<PHINode>(R.getVPSingleValue()->getUnderlyingValue()));
+        }
+        for (VPRecipeBase &R :
+             *cast<VPIRBasicBlock>(BestMainPlan->getScalarHeader())) {
+          auto *VPIRInst = cast<VPIRInstruction>(&R);
+          auto *IRI = dyn_cast<PHINode>(&VPIRInst->getInstruction());
+          if (!IRI)
+            break;
+          if (WidenedPhis.contains(IRI) ||
+              !LVL.getInductionVars().contains(IRI))
+            continue;
+          VPRecipeBase *ResumePhi =
+              VPIRInst->getOperand(0)->getDefiningRecipe();
+          VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn(
+                                      Constant::getNullValue(IRI->getType())));
+          ResumePhi->eraseFromParent();
+        }
+        VPlanTransforms::removeDeadRecipes(*BestMainPlan);
+
         auto ExpandedSCEVs = LVP.executePlan(EPI.MainLoopVF, EPI.MainLoopUF,
                                              *BestMainPlan, MainILV, DT, false);
         ++LoopsVectorized;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b438700a3fe2ce..9e3567d04d2332 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -64,6 +64,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
     case VPInstruction::FirstOrderRecurrenceSplice:
     case VPInstruction::LogicalAnd:
     case VPInstruction::PtrAdd:
+    case VPInstruction::ResumePhi:
       return false;
     default:
       return true;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index 1948211858d446..f1191fab8350c0 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -13,9 +13,9 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 8, [[TMP4]]
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 8
+; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8)
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
 ; CHECK-NEXT:    [[TMP9:%.*]] = add <vscale x 8 x i64> [[TMP8]], zeroinitializer
@@ -100,9 +100,9 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[WIDE_TRIP_COUNT]], [[TMP4]]
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 8
+; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
 ; CHECK-NEXT:    [[TMP9:%.*]] = add <vscale x 8 x i64> [[TMP8]], zeroinitializer
diff...
[truncated]

@fhahn
Copy link
Contributor Author

fhahn commented Dec 7, 2024

Now that #110577 landed, this should be ready for review.

Some VPlan tests still need updating, but the relevant IR tests have been updated already. It still needs some updates for outer loop vectorization.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice step leveraging ResumePhi and DerivedIV recipes!
Patch should be marked NFC?

@@ -64,6 +64,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
case VPInstruction::FirstOrderRecurrenceSplice:
case VPInstruction::LogicalAnd:
case VPInstruction::PtrAdd:
case VPInstruction::ResumePhi:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent fix, but testable only now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had to undo this change; we have to create a ResumePhi for the canonical IV in the main plan, which won't have any users but cannot be cleaned up

Instruction *OldInduction = Legal->getPrimaryInduction();
// For the primary induction the end values are known.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still applies, to additional end value only:

// For the primary induction the additional bypass end value is known. Otherwise it is computed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back, thanks

Comment on lines 580 to 581
/// Create values for the induction variables to resume iteration count
/// in the bypass block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create values for the induction variables to resume iteration count
/// in the bypass block.
/// Create and record the values for induction variables to resume coming from the additional bypass block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

/// provides the trip count of the main vector loop, used to compute the
/// resume value reaching the scalar loop preheader directly from this
/// additional bypass block.
void createInductionBypassValue(PHINode *OrigPhi,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void createInductionBypassValue(PHINode *OrigPhi,
void createInductionAdditionalBypassValue(PHINode *OrigPhi,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

ArrayRef<BasicBlock *> BypassBlocks,
VPBuilder &ScalarPHBuilder,
Value *MainVectorTripCount = nullptr);
/// Create the bypass resume value coming from the additional bypass block. \p
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create the bypass resume value coming from the additional bypass block. \p
/// Create and record the bypass resume value for an induction Phi coming from the additional bypass block. \p

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

@@ -9635,7 +9629,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) {
State.Builder, CanonicalIV, getStartValue()->getLiveInIRValue(), Step,
Kind, cast_if_present<BinaryOperator>(FPBinOp));
DerivedIV->setName(Name);
assert(DerivedIV != CanonicalIV && "IV didn't need transforming?");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such redundant recipes should be cleaned up, if generated at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up in the latest version, except for the case where the vector trip count is constant null, which will only be set in prepareToExecute.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and when Index=VTC=0 it may result in DerivedIV==Index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Comment on lines 10295 to 10327
// Collect PHI nodes of wide inductions in the VPlan for the epilogue.
// Those will need their resume-values computed from the main vector
// loop. Others can be removed in the main VPlan.
SmallPtrSet<PHINode *, 2> WidenedPhis;
for (VPRecipeBase &R :
BestEpiPlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
if (!isa<VPWidenIntOrFpInductionRecipe,
VPWidenPointerInductionRecipe>(&R))
continue;
if (isa<VPWidenIntOrFpInductionRecipe>(&R))
WidenedPhis.insert(
cast<VPWidenIntOrFpInductionRecipe>(&R)->getPHINode());
else
WidenedPhis.insert(
cast<PHINode>(R.getVPSingleValue()->getUnderlyingValue()));
}
for (VPRecipeBase &R :
*cast<VPIRBasicBlock>(BestMainPlan->getScalarHeader())) {
auto *VPIRInst = cast<VPIRInstruction>(&R);
auto *IRI = dyn_cast<PHINode>(&VPIRInst->getInstruction());
if (!IRI)
break;
if (WidenedPhis.contains(IRI) ||
!LVL.getInductionVars().contains(IRI))
continue;
VPRecipeBase *ResumePhi =
VPIRInst->getOperand(0)->getDefiningRecipe();
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn(
Constant::getNullValue(IRI->getType())));
ResumePhi->eraseFromParent();
}
VPlanTransforms::removeDeadRecipes(*BestMainPlan);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be outlined and/or placed elsewhere, too detailed for processLoop() which is already too long and doing too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outlined, thanks!

@@ -10299,6 +10292,39 @@ bool LoopVectorizePass::processLoop(Loop *L) {
EPI, &LVL, &CM, BFI, PSI, Checks,
*BestMainPlan);

// Collect PHI nodes of wide inductions in the VPlan for the epilogue.
// Those will need their resume-values computed from the main vector
// loop. Others can be removed in the main VPlan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we only need resume values from the main plan, if there is a corresponding wide induction in the epilogue plan.

Comment on lines 10320 to 10324
VPRecipeBase *ResumePhi =
VPIRInst->getOperand(0)->getDefiningRecipe();
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn(
Constant::getNullValue(IRI->getType())));
ResumePhi->eraseFromParent();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up redundant resume Phi recipes created for inductions that were not widened? VPlan's DCE cannot collect them because of cross-VPlan def-use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, unfortunately the latest version cannot use VPlan DCE, because we also need to add a ResumePHI for the canonical IV in the main vector loop which. won't have any uses in the main plan. Hence manual removal here.

Comment on lines 10322 to 10323
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn(
Constant::getNullValue(IRI->getType())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserves some explanation/comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks!

fhahn added a commit that referenced this pull request Dec 22, 2024
Split DerivedIV simplification off from
#112145 and use to remove the
need for extra checks in createScalarIVSteps. Required an extra
simplification run after IV transforms.
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split off the changes to simplify DerivedIV to e1833e3 and use it to simplify createScalarIVSteps

Comment on lines 2674 to 2676
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp()))
BypassBuilder.setFastMathFlags(
II.getInductionBinOp()->getFastMathFlags());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp()))
BypassBuilder.setFastMathFlags(
II.getInductionBinOp()->getFastMathFlags());
auto *BinOp = II.getInductionBinOp();
if (isa_and_nonnull<FPMathOperator>(BinOp))
BypassBuilder.setFastMathFlags(BinOp->getFastMathFlags());

plus additional use below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines 8919 to 8921
if (isa<VPWidenInductionRecipe>(VectorPhiR)) {
if (VPValue *ResumePhi = addResumePhiRecipeForInduction(
VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (isa<VPWidenInductionRecipe>(VectorPhiR)) {
if (VPValue *ResumePhi = addResumePhiRecipeForInduction(
VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo,
if (auto *WideIVR = dyn_cast<VPWidenInductionRecipe>(VectorPhiR)) {
if (VPValue *ResumePhi = addResumePhiRecipeForInduction(
WideIVR, VectorPHBuilder, ScalarPHBuilder, TypeInfo,

instead of the dyn_cast in addResumePhiRecipeForInduction().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


// If index is the vector trip count, the concrete value will only be set in
// prepareToExecute, leading to missed simplifications, e.g. if it is 0.
// TODO: Remove the special case for the vector trip count once it is computed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is clear.

Knowing VTC is zero at compile-time suggests more simplification than eliminating redundant DeriveIV's ... ideally VF's (and UF's) should be restricted so as to prevent this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is slightly counter-intuitive, VTC == 0 can only happen when the original BTC == -1, but should be handled correctly with checking the canonical IV increment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So prepareToExecute() sets the value of VTC too late for any folding to take place. Treating overflow of VTC = max_uint BTC + 1 = 0 by skipping the vector loop, is clear. Question is if we can avoid vectorizing altogether, knowing this is the case at compile-time, considering the effort of computeMaxVF()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will look into that separately, thanks

VPRecipeBase *ResumePhi = VPIRInst->getOperand(0)->getDefiningRecipe();
VPIRInst->setOperand(
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType())));
ResumePhi->eraseFromParent();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth noting here (also) that this erasure should be followed by dce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a run here; this cleans up the redundant IV computations discussed in one of the tests

@@ -11,6 +11,7 @@ target triple = "aarch64-unknown-linux-gnu"
; VPLANS-LABEL: Checking a loop in 'simple_memset'
; VPLANS: VPlan 'Initial VPlan for VF={vscale x 1,vscale x 2,vscale x 4},UF>=1' {
; VPLANS-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
; VPLANS-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless considered irrelevant for the test, in which case this check should (continue to) be dropped?
Checking only the def w/o any use is somewhat inconsistent, but there are many test changes...

Comment on lines 10083 to 10084
VPIRInst->setOperand(
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPIRinst wraps the header phi in the scalar loop, so it is probaby better to keep it here.

Is this operand needed, set artificially to zero (can convey a type... ;-), or should it best be removed, keeping VPIRInst only as a wrapper of the header phi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it is needed, otherwise it would complicate VPIRInstruction::execute; but in the future we can disconnect the scalar header for the main vector loop, thus removing the need to update them there. (There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but in the future we can disconnect the scalar header for the main vector loop ...

When can this scalar header be disconnected? Presumably it is needed for phi's that do have wide users in EpiPlan, so could be dropped altogether (or not constructed at all) if and only if all phi's are free of such users?

Can VPIRInst be erased as well (perhaps with make_early_inc_range)?

(thought I responded but cannot see the response here): VPIRinst wraps the header phi in the scalar loop, so it is probably better to keep it here.

A VPIRInst wraps an IRInst in order to (a) add operands to phi's and/or (b) support placing non-VPIRInst recipes between IRInst's, by bumping the builder. If neither is needed, VPIRInst can be dropped?
If an artificial operand is needed to appease VPIRInstruction::execute(), could undef/poison be used instead of zero?

(There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)

So dropping it or using undef would require updating said code in VPIRInstruction::execute() and/or elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both can be removed in the latest version, updated, thanks!

Disconnection should be able after #120918

Comment on lines 2676 to 2677
BypassBuilder.setFastMathFlags(
II.getInductionBinOp()->getFastMathFlags());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BypassBuilder.setFastMathFlags(
II.getInductionBinOp()->getFastMathFlags());
BypassBuilder.setFastMathFlags(BinOp->getFastMathFlags());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOne thanks

/// Create resume phis in the scalar preheader for first-order recurrences and
/// reductions and update the VPIRInstructions wrapping the original phis in the
/// scalar header.
/// Create and return a ResumePhi for \p PhiR, if it is wide induction recipe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create and return a ResumePhi for \p PhiR, if it is wide induction recipe.
/// Create and return a ResumePhi for \p WideIV, unless it is truncated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added a couple of final thoughts.

/// Create resume phis in the scalar preheader for first-order recurrences and
/// reductions and update the VPIRInstructions wrapping the original phis in the
/// scalar header.
/// Create and return a ResumePhi for \p WideIF, unless it is truncated. If the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create and return a ResumePhi for \p WideIF, unless it is truncated. If the
/// Create and return a ResumePhi for \p WideIV, unless it is truncated. If the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!


// If index is the vector trip count, the concrete value will only be set in
// prepareToExecute, leading to missed simplifications, e.g. if it is 0.
// TODO: Remove the special case for the vector trip count once it is computed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So prepareToExecute() sets the value of VTC too late for any folding to take place. Treating overflow of VTC = max_uint BTC + 1 = 0 by skipping the vector loop, is clear. Question is if we can avoid vectorizing altogether, knowing this is the case at compile-time, considering the effort of computeMaxVF()?

Comment on lines 10083 to 10084
VPIRInst->setOperand(
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but in the future we can disconnect the scalar header for the main vector loop ...

When can this scalar header be disconnected? Presumably it is needed for phi's that do have wide users in EpiPlan, so could be dropped altogether (or not constructed at all) if and only if all phi's are free of such users?

Can VPIRInst be erased as well (perhaps with make_early_inc_range)?

(thought I responded but cannot see the response here): VPIRinst wraps the header phi in the scalar loop, so it is probably better to keep it here.

A VPIRInst wraps an IRInst in order to (a) add operands to phi's and/or (b) support placing non-VPIRInst recipes between IRInst's, by bumping the builder. If neither is needed, VPIRInst can be dropped?
If an artificial operand is needed to appease VPIRInstruction::execute(), could undef/poison be used instead of zero?

(There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)

So dropping it or using undef would require updating said code in VPIRInstruction::execute() and/or elsewhere?

// There is no corresponding wide induction in the epilogue plan that would
// need a resume value. Set the operand in VPIRInst to zero, so ResumePhi
// can be removed. The resume values for the scalar loop will be created
// during execution of EpiPlan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resume values for the scalar loop will be created
// during execution of EpiPlan.

All resume values, or only those free of wide users in EpiPlan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be all resume values for the scalar loop, those free of wide users are not need to resume in the epilogue vector loop

if (EpiWidenedPhis.contains(IRI))
continue;
// There is no corresponding wide induction in the epilogue plan that would
// need a resume value. Set the operand in VPIRInst to zero, so ResumePhi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emphasize that this zero is artificial, along with a FIXME to remove it (one way or another)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both can be removed in the latest version, updated, thanks!

@fhahn fhahn merged commit 7f3428d into llvm:main Dec 29, 2024
8 checks passed
@fhahn fhahn deleted the vplan-compute-iv-end-values branch December 29, 2024 19:05
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 29, 2024
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 29, 2024

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve2-vla-2stage running on linaro-g4-01 while building llvm at step 12 "ninja check 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/199/builds/506

Here is the relevant piece of the build log for the reference
Step 12 (ninja check 2) failure: stage 2 checked (failure)
******************** TEST 'HWAddressSanitizer-aarch64 :: TestCases/Linux/fixed-shadow.c' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
1 0x10000000000
2 0x20000000000
3 0x30000000000
4 0x40000000000
5 0x50000000000
6 0x60000000000
7 0x70000000000
8 0x80000000000
9 0x90000000000
10 0xa0000000000
11 0xb0000000000
12 0xc0000000000
13 0xd0000000000
14 0xe0000000000
15 0xf0000000000
16 0x100000000000
17 0x110000000000
18 0x120000000000
19 0x130000000000
20 0x140000000000
21 0x150000000000
22 0x160000000000
23 0x170000000000
24 0x180000000000
25 0x190000000000
26 0x1a0000000000
27 0x1b0000000000
28 0x1c0000000000
29 0x1d0000000000
30 0x1e0000000000
31 0x1f0000000000
32 0x200000000000
33 0x210000000000
34 0x220000000000
35 0x230000000000
36 0x240000000000
37 0x250000000000
38 0x260000000000
39 0x270000000000
40 0x280000000000
41 0x290000000000
42 0x2a0000000000
43 0x2b0000000000
44 0x2c0000000000
45 0x2d0000000000
...

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 5, 2025
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025
Split DerivedIV simplification off from
llvm/llvm-project#112145 and use to remove the
need for extra checks in createScalarIVSteps. Required an extra
simplification run after IV transforms.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm/llvm-project#110004 and
llvm/llvm-project#109975.

PR: llvm/llvm-project#112145
fhahn added a commit that referenced this pull request Jan 18, 2025
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on #110004,
#109975 and
#112145.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 18, 2025
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on llvm/llvm-project#110004,
llvm/llvm-project#109975 and
llvm/llvm-project#112145.
fhahn added a commit that referenced this pull request Jan 19, 2025
This reverts the revert commit 58326f1.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on #110004,
#109975 and
#112145.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 19, 2025
…12147)"

This reverts the revert commit 58326f1.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on llvm/llvm-project#110004,
llvm/llvm-project#109975 and
llvm/llvm-project#112145.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants