-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Compute induction end values in VPlan. #112145
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Model updating IV users directly in VPlan, replace fixupIVUsers. Depends on llvm#110004, llvm#109975 and llvm#112145.
3b414ba
to
90c7f53
Compare
944283e
to
8001f75
Compare
8001f75
to
087668b
Compare
Allow setting the name to use for the generated IR value of the derived IV in preparations for #112145. This is analogous to VPInstruction::Name.
087668b
to
e9ad7c2
Compare
Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on llvm#110004 and llvm#109975.
e9ad7c2
to
e58cb96
Compare
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-backend-systemz Author: Florian Hahn (fhahn) ChangesUse createDerivedIV to compute IV end values directly in VPlan, instead This allows updating IV users outside the loop as follow-up. Depends on #110004 and Patch is 203.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112145.diff 54 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index fbcf181a45a664..0298ab523307db 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -233,8 +233,8 @@ class VPBuilder {
VPDerivedIVRecipe *createDerivedIV(InductionDescriptor::InductionKind Kind,
FPMathOperator *FPBinOp, VPValue *Start,
- VPCanonicalIVPHIRecipe *CanonicalIV,
- VPValue *Step, const Twine &Name = "") {
+ VPValue *CanonicalIV, VPValue *Step,
+ const Twine &Name = "") {
return tryInsertInstruction(
new VPDerivedIVRecipe(Kind, FPBinOp, Start, CanonicalIV, Step, Name));
}
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 4a4553e4a8db8d..1ee596502f9d44 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -513,21 +513,14 @@ class InnerLoopVectorizer {
/// Fix the non-induction PHIs in \p Plan.
void fixNonInductionPHIs(VPTransformState &State);
- /// Create a ResumePHI VPInstruction for the induction \p InductionPhiIRI to
- /// resume iteration count in the scalar epilogue from where the vectorized
- /// loop left off, and add it to the scalar preheader of VPlan. Also creates
- /// the induction resume value, and the value for the bypass block, if needed.
- /// \p Step is the SCEV-expanded induction step to use. In cases where the
- /// loop skeleton is more complicated (i.e., epilogue vectorization) and the
- /// resume values can come from an additional bypass block,
- /// \p MainVectorTripCount provides the trip count of the main vector loop,
- /// used to compute the resume value reaching the scalar loop preheader
- /// directly from this additional bypass block.
- void createInductionResumeVPValue(VPIRInstruction *InductionPhiIRI,
- const InductionDescriptor &ID, Value *Step,
- ArrayRef<BasicBlock *> BypassBlocks,
- VPBuilder &ScalarPHBuilder,
- Value *MainVectorTripCount = nullptr);
+ /// Create the bypass resume value coming from the additional bypass block. \p
+ /// Step is the SCEV-expanded induction step to use. \p MainVectorTripCount
+ /// provides the trip count of the main vector loop, used to compute the
+ /// resume value reaching the scalar loop preheader directly from this
+ /// additional bypass block.
+ void createInductionBypassValue(PHINode *OrigPhi,
+ const InductionDescriptor &ID, Value *Step,
+ Value *MainVectorTripCount);
/// Returns the original loop trip count.
Value *getTripCount() const { return TripCount; }
@@ -584,17 +577,10 @@ class InnerLoopVectorizer {
/// vector loop preheader, middle block and scalar preheader.
void createVectorLoopSkeleton(StringRef Prefix);
- /// Create new phi nodes for the induction variables to resume iteration count
- /// in the scalar epilogue, from where the vectorized loop left off.
- /// In cases where the loop skeleton is more complicated (i.e. epilogue
- /// vectorization), \p MainVectorTripCount provides the trip count of the main
- /// loop, used to compute these resume values. If \p IVSubset is provided, it
- /// contains the phi nodes for which resume values are needed, because they
- /// will generate wide induction phis in the epilogue loop.
- void
- createInductionResumeVPValues(const SCEV2ValueTy &ExpandedSCEVs,
- Value *MainVectorTripCount = nullptr,
- SmallPtrSetImpl<PHINode *> *IVSubset = nullptr);
+ /// Create values for the induction variables to resume iteration count
+ /// in the bypass block.
+ void createInductionBypassValues(const SCEV2ValueTy &ExpandedSCEVs,
+ Value *MainVectorTripCount);
/// Allow subclasses to override and print debug traces before/after vplan
/// execution, when trace information is requested.
@@ -2613,21 +2599,11 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
nullptr, Twine(Prefix) + "scalar.ph");
}
-void InnerLoopVectorizer::createInductionResumeVPValue(
- VPIRInstruction *InductionPhiRI, const InductionDescriptor &II, Value *Step,
- ArrayRef<BasicBlock *> BypassBlocks, VPBuilder &ScalarPHBuilder,
+void InnerLoopVectorizer::createInductionBypassValue(
+ PHINode *OrigPhi, const InductionDescriptor &II, Value *Step,
Value *MainVectorTripCount) {
- // TODO: Move to LVP or general VPlan construction, once no IR values are
- // generated.
- auto *OrigPhi = cast<PHINode>(&InductionPhiRI->getInstruction());
- Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
- assert(VectorTripCount && "Expected valid arguments");
-
Instruction *OldInduction = Legal->getPrimaryInduction();
- // For the primary induction the end values are known.
- Value *EndValue = VectorTripCount;
Value *EndValueFromAdditionalBypass = MainVectorTripCount;
- // Otherwise compute them accordingly.
if (OrigPhi != OldInduction) {
IRBuilder<> B(LoopVectorPreHeader->getTerminator());
@@ -2635,10 +2611,6 @@ void InnerLoopVectorizer::createInductionResumeVPValue(
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp()))
B.setFastMathFlags(II.getInductionBinOp()->getFastMathFlags());
- EndValue = emitTransformedIndex(B, VectorTripCount, II.getStartValue(),
- Step, II.getKind(), II.getInductionBinOp());
- EndValue->setName("ind.end");
-
// Compute the end value for the additional bypass (if applicable).
if (MainVectorTripCount) {
B.SetInsertPoint(getAdditionalBypassBlock(),
@@ -2650,22 +2622,12 @@ void InnerLoopVectorizer::createInductionResumeVPValue(
}
}
- auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp(
- VPInstruction::ResumePhi,
- {Plan.getOrAddLiveIn(EndValue), Plan.getOrAddLiveIn(II.getStartValue())},
- OrigPhi->getDebugLoc(), "bc.resume.val");
- assert(InductionPhiRI->getNumOperands() == 0 &&
- "InductionPhiRI should not have any operands");
- InductionPhiRI->addOperand(ResumePhiRecipe);
-
- if (EndValueFromAdditionalBypass) {
- // Store the bypass value here, as it needs to be added as operand to its
- // scalar preheader phi node after the epilogue skeleton has been created.
- // TODO: Directly add as extra operand to the VPResumePHI recipe.
- assert(!Induction2AdditionalBypassValue.contains(OrigPhi) &&
- "entry for OrigPhi already exits");
- Induction2AdditionalBypassValue[OrigPhi] = EndValueFromAdditionalBypass;
- }
+ // Store the bypass value here, as it needs to be added as operand to its
+ // scalar preheader phi node after the epilogue skeleton has been created.
+ // TODO: Directly add as extra operand to the VPResumePHI recipe.
+ assert(!Induction2AdditionalBypassValue.contains(OrigPhi) &&
+ "entry for OrigPhi already exits");
+ Induction2AdditionalBypassValue[OrigPhi] = EndValueFromAdditionalBypass;
}
/// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV
@@ -2682,29 +2644,15 @@ static Value *getExpandedStep(const InductionDescriptor &ID,
return I->second;
}
-void InnerLoopVectorizer::createInductionResumeVPValues(
- const SCEV2ValueTy &ExpandedSCEVs, Value *MainVectorTripCount,
- SmallPtrSetImpl<PHINode *> *IVSubset) {
- // We are going to resume the execution of the scalar loop.
- // Go over all of the induction variable PHIs of the scalar loop header and
- // fix their starting values, which depend on the counter of the last
- // iteration of the vectorized loop. If we come from one of the
- // LoopBypassBlocks then we need to start from the original start value.
- // Otherwise we provide the trip count from the main vector loop.
- VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader();
- VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin());
- for (VPRecipeBase &R : *Plan.getScalarHeader()) {
- auto *PhiR = cast<VPIRInstruction>(&R);
- auto *Phi = dyn_cast<PHINode>(&PhiR->getInstruction());
- if (!Phi)
- break;
- if (!Legal->getInductionVars().contains(Phi) ||
- (IVSubset && !IVSubset->contains(Phi)))
- continue;
- const InductionDescriptor &II = Legal->getInductionVars().find(Phi)->second;
- createInductionResumeVPValue(PhiR, II, getExpandedStep(II, ExpandedSCEVs),
- LoopBypassBlocks, ScalarPHBuilder,
- MainVectorTripCount);
+void InnerLoopVectorizer::createInductionBypassValues(
+ const SCEV2ValueTy &ExpandedSCEVs, Value *MainVectorTripCount) {
+ assert(MainVectorTripCount && "Must have bypass information");
+
+ for (const auto &InductionEntry : Legal->getInductionVars()) {
+ PHINode *OrigPhi = InductionEntry.first;
+ const InductionDescriptor &II = InductionEntry.second;
+ createInductionBypassValue(OrigPhi, II, getExpandedStep(II, ExpandedSCEVs),
+ MainVectorTripCount);
}
}
@@ -2766,8 +2714,8 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
// faster.
emitMemRuntimeChecks(LoopScalarPreHeader);
- // Emit phis for the new starting index of the scalar loop.
- createInductionResumeVPValues(ExpandedSCEVs);
+ Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
+ assert(VectorTripCount && "Expected valid arguments");
return {LoopVectorPreHeader, nullptr};
}
@@ -7848,19 +7796,6 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// Generate the induction variable.
EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
- // Generate VPValues and ResumePhi recipes for wide inductions in the epilogue
- // plan only. Other inductions only need a resume value for the canonical
- // induction, which will get created during epilogue skeleton construction.
- SmallPtrSet<PHINode *, 4> WideIVs;
- for (VPRecipeBase &H :
- EPI.EpiloguePlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
- if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&H))
- WideIVs.insert(WideIV->getPHINode());
- else if (auto *PtrIV = dyn_cast<VPWidenPointerInductionRecipe>(&H))
- WideIVs.insert(cast<PHINode>(PtrIV->getUnderlyingValue()));
- }
- createInductionResumeVPValues(ExpandedSCEVs, nullptr, &WideIVs);
-
return {LoopVectorPreHeader, nullptr};
}
@@ -8049,13 +7984,14 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
EPResumeVal->addIncoming(Init, EPI.MainLoopIterationCountCheck);
}
- // Generate induction resume values. These variables save the new starting
- // indexes for the scalar loop. They are used to test if there are any tail
- // iterations left once the vector loop has completed.
- // Note that when the vectorized epilogue is skipped due to iteration count
- // check, then the resume value for the induction variable comes from
- // the trip count of the main vector loop, passed as the second argument.
- createInductionResumeVPValues(ExpandedSCEVs, EPI.VectorTripCount);
+ Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
+ assert(VectorTripCount && "Expected valid arguments");
+
+ // Generate bypass values from the bypass blocks. Note that when the
+ // vectorized epilogue is skipped due to iteration count check, then the
+ // resume value for the induction variable comes from the trip count of the
+ // main vector loop, passed as the second argument.
+ createInductionBypassValues(ExpandedSCEVs, EPI.VectorTripCount);
return {LoopVectorPreHeader, EPResumeVal};
}
@@ -8858,13 +8794,64 @@ static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
{CanonicalIVIncrement, &Plan.getVectorTripCount()}, DL);
}
+static VPValue *addResumeValuesForInduction(VPHeaderPHIRecipe *PhiR,
+ VPBuilder &Builder,
+ VPBuilder &ScalarPHBuilder,
+ VPTypeAnalysis &TypeInfo,
+ VPValue *VectorTC) {
+ PHINode *OrigPhi;
+ const InductionDescriptor *ID;
+ VPValue *Start;
+ VPValue *Step;
+ Type *ScalarTy;
+ bool IsCanonical = false;
+ if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(PhiR)) {
+ if (WideIV->getTruncInst())
+ return nullptr;
+ OrigPhi = cast<PHINode>(WideIV->getUnderlyingValue());
+ ID = &WideIV->getInductionDescriptor();
+ Start = WideIV->getStartValue();
+ Step = WideIV->getStepValue();
+ ScalarTy = WideIV->getScalarType();
+ IsCanonical = WideIV->isCanonical();
+ } else if (auto *WideIV = dyn_cast<VPWidenPointerInductionRecipe>(PhiR)) {
+ OrigPhi = cast<PHINode>(WideIV->getUnderlyingValue());
+ ID = &WideIV->getInductionDescriptor();
+ Start = WideIV->getStartValue();
+ Step = WideIV->getOperand(1);
+ ScalarTy = Start->getLiveInIRValue()->getType();
+ } else {
+ return nullptr;
+ }
+
+ VPValue *EndValue = VectorTC;
+ if (!IsCanonical) {
+ EndValue = Builder.createDerivedIV(
+ ID->getKind(),
+ dyn_cast_or_null<FPMathOperator>(ID->getInductionBinOp()), Start,
+ VectorTC, Step);
+ }
+
+ if (ScalarTy != TypeInfo.inferScalarType(EndValue)) {
+ EndValue = Builder.createScalarCast(Instruction::Trunc, EndValue, ScalarTy);
+ }
+
+ auto *ResumePhiRecipe =
+ ScalarPHBuilder.createNaryOp(VPInstruction::ResumePhi, {EndValue, Start},
+ OrigPhi->getDebugLoc(), "bc.resume.val");
+ return ResumePhiRecipe;
+}
+
/// Create resume phis in the scalar preheader for first-order recurrences and
/// reductions and update the VPIRInstructions wrapping the original phis in the
/// scalar header.
static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
+ VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType());
auto *ScalarPH = Plan.getScalarPreheader();
auto *MiddleVPBB = cast<VPBasicBlock>(ScalarPH->getSinglePredecessor());
VPBuilder ScalarPHBuilder(ScalarPH);
+ VPBuilder VectorPHBuilder(
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor()));
VPBuilder MiddleBuilder(MiddleVPBB, MiddleVPBB->getFirstNonPhi());
VPValue *OneVPV = Plan.getOrAddLiveIn(
ConstantInt::get(Plan.getCanonicalIV()->getScalarType(), 1));
@@ -8874,6 +8861,13 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
if (!ScalarPhiI)
break;
auto *VectorPhiR = cast<VPHeaderPHIRecipe>(Builder.getRecipe(ScalarPhiI));
+
+ if (VPValue *ResumePhi = addResumeValuesForInduction(
+ VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo,
+ &Plan.getVectorTripCount())) {
+ ScalarPhiIRI->addOperand(ResumePhi);
+ continue;
+ }
if (!isa<VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe>(VectorPhiR))
continue;
// The backedge value provides the value to resume coming out of a loop,
@@ -9635,7 +9629,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) {
State.Builder, CanonicalIV, getStartValue()->getLiveInIRValue(), Step,
Kind, cast_if_present<BinaryOperator>(FPBinOp));
DerivedIV->setName(Name);
- assert(DerivedIV != CanonicalIV && "IV didn't need transforming?");
State.set(this, DerivedIV, VPLane(0));
}
@@ -10299,6 +10292,39 @@ bool LoopVectorizePass::processLoop(Loop *L) {
EPI, &LVL, &CM, BFI, PSI, Checks,
*BestMainPlan);
+ // Collect PHI nodes of wide inductions in the VPlan for the epilogue.
+ // Those will need their resume-values computed from the main vector
+ // loop. Others can be removed in the main VPlan.
+ SmallPtrSet<PHINode *, 2> WidenedPhis;
+ for (VPRecipeBase &R :
+ BestEpiPlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
+ if (!isa<VPWidenIntOrFpInductionRecipe,
+ VPWidenPointerInductionRecipe>(&R))
+ continue;
+ if (isa<VPWidenIntOrFpInductionRecipe>(&R))
+ WidenedPhis.insert(
+ cast<VPWidenIntOrFpInductionRecipe>(&R)->getPHINode());
+ else
+ WidenedPhis.insert(
+ cast<PHINode>(R.getVPSingleValue()->getUnderlyingValue()));
+ }
+ for (VPRecipeBase &R :
+ *cast<VPIRBasicBlock>(BestMainPlan->getScalarHeader())) {
+ auto *VPIRInst = cast<VPIRInstruction>(&R);
+ auto *IRI = dyn_cast<PHINode>(&VPIRInst->getInstruction());
+ if (!IRI)
+ break;
+ if (WidenedPhis.contains(IRI) ||
+ !LVL.getInductionVars().contains(IRI))
+ continue;
+ VPRecipeBase *ResumePhi =
+ VPIRInst->getOperand(0)->getDefiningRecipe();
+ VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn(
+ Constant::getNullValue(IRI->getType())));
+ ResumePhi->eraseFromParent();
+ }
+ VPlanTransforms::removeDeadRecipes(*BestMainPlan);
+
auto ExpandedSCEVs = LVP.executePlan(EPI.MainLoopVF, EPI.MainLoopUF,
*BestMainPlan, MainILV, DT, false);
++LoopsVectorized;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b438700a3fe2ce..9e3567d04d2332 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -64,6 +64,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
case VPInstruction::FirstOrderRecurrenceSplice:
case VPInstruction::LogicalAnd:
case VPInstruction::PtrAdd:
+ case VPInstruction::ResumePhi:
return false;
default:
return true;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index 1948211858d446..f1191fab8350c0 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -13,9 +13,9 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 8, [[TMP4]]
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8
+; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8)
; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 8 x i64> [[TMP8]], zeroinitializer
@@ -100,9 +100,9 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[WIDE_TRIP_COUNT]], [[TMP4]]
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8
+; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]])
; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 8 x i64> [[TMP8]], zeroinitializer
diff...
[truncated]
|
Now that #110577 landed, this should be ready for review. Some VPlan tests still need updating, but the relevant IR tests have been updated already. It still needs some updates for outer loop vectorization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice step leveraging ResumePhi and DerivedIV recipes!
Patch should be marked NFC?
@@ -64,6 +64,7 @@ bool VPRecipeBase::mayWriteToMemory() const { | |||
case VPInstruction::FirstOrderRecurrenceSplice: | |||
case VPInstruction::LogicalAnd: | |||
case VPInstruction::PtrAdd: | |||
case VPInstruction::ResumePhi: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent fix, but testable only now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had to undo this change; we have to create a ResumePhi for the canonical IV in the main plan, which won't have any users but cannot be cleaned up
Instruction *OldInduction = Legal->getPrimaryInduction(); | ||
// For the primary induction the end values are known. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment still applies, to additional end value only:
// For the primary induction the additional bypass end value is known. Otherwise it is computed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back, thanks
/// Create values for the induction variables to resume iteration count | ||
/// in the bypass block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create values for the induction variables to resume iteration count | |
/// in the bypass block. | |
/// Create and record the values for induction variables to resume coming from the additional bypass block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
/// provides the trip count of the main vector loop, used to compute the | ||
/// resume value reaching the scalar loop preheader directly from this | ||
/// additional bypass block. | ||
void createInductionBypassValue(PHINode *OrigPhi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void createInductionBypassValue(PHINode *OrigPhi, | |
void createInductionAdditionalBypassValue(PHINode *OrigPhi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
ArrayRef<BasicBlock *> BypassBlocks, | ||
VPBuilder &ScalarPHBuilder, | ||
Value *MainVectorTripCount = nullptr); | ||
/// Create the bypass resume value coming from the additional bypass block. \p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create the bypass resume value coming from the additional bypass block. \p | |
/// Create and record the bypass resume value for an induction Phi coming from the additional bypass block. \p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
@@ -9635,7 +9629,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) { | |||
State.Builder, CanonicalIV, getStartValue()->getLiveInIRValue(), Step, | |||
Kind, cast_if_present<BinaryOperator>(FPBinOp)); | |||
DerivedIV->setName(Name); | |||
assert(DerivedIV != CanonicalIV && "IV didn't need transforming?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such redundant recipes should be cleaned up, if generated at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up in the latest version, except for the case where the vector trip count is constant null, which will only be set in prepareToExecute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and when Index=VTC=0 it may result in DerivedIV==Index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
// Collect PHI nodes of wide inductions in the VPlan for the epilogue. | ||
// Those will need their resume-values computed from the main vector | ||
// loop. Others can be removed in the main VPlan. | ||
SmallPtrSet<PHINode *, 2> WidenedPhis; | ||
for (VPRecipeBase &R : | ||
BestEpiPlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) { | ||
if (!isa<VPWidenIntOrFpInductionRecipe, | ||
VPWidenPointerInductionRecipe>(&R)) | ||
continue; | ||
if (isa<VPWidenIntOrFpInductionRecipe>(&R)) | ||
WidenedPhis.insert( | ||
cast<VPWidenIntOrFpInductionRecipe>(&R)->getPHINode()); | ||
else | ||
WidenedPhis.insert( | ||
cast<PHINode>(R.getVPSingleValue()->getUnderlyingValue())); | ||
} | ||
for (VPRecipeBase &R : | ||
*cast<VPIRBasicBlock>(BestMainPlan->getScalarHeader())) { | ||
auto *VPIRInst = cast<VPIRInstruction>(&R); | ||
auto *IRI = dyn_cast<PHINode>(&VPIRInst->getInstruction()); | ||
if (!IRI) | ||
break; | ||
if (WidenedPhis.contains(IRI) || | ||
!LVL.getInductionVars().contains(IRI)) | ||
continue; | ||
VPRecipeBase *ResumePhi = | ||
VPIRInst->getOperand(0)->getDefiningRecipe(); | ||
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn( | ||
Constant::getNullValue(IRI->getType()))); | ||
ResumePhi->eraseFromParent(); | ||
} | ||
VPlanTransforms::removeDeadRecipes(*BestMainPlan); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be outlined and/or placed elsewhere, too detailed for processLoop() which is already too long and doing too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outlined, thanks!
@@ -10299,6 +10292,39 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |||
EPI, &LVL, &CM, BFI, PSI, Checks, | |||
*BestMainPlan); | |||
|
|||
// Collect PHI nodes of wide inductions in the VPlan for the epilogue. | |||
// Those will need their resume-values computed from the main vector | |||
// loop. Others can be removed in the main VPlan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we only need resume values from the main plan, if there is a corresponding wide induction in the epilogue plan.
VPRecipeBase *ResumePhi = | ||
VPIRInst->getOperand(0)->getDefiningRecipe(); | ||
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn( | ||
Constant::getNullValue(IRI->getType()))); | ||
ResumePhi->eraseFromParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean up redundant resume Phi recipes created for inductions that were not widened? VPlan's DCE cannot collect them because of cross-VPlan def-use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, unfortunately the latest version cannot use VPlan DCE, because we also need to add a ResumePHI for the canonical IV in the main vector loop which. won't have any uses in the main plan. Hence manual removal here.
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn( | ||
Constant::getNullValue(IRI->getType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserves some explanation/comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks!
Split DerivedIV simplification off from #112145 and use to remove the need for extra checks in createScalarIVSteps. Required an extra simplification run after IV transforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split off the changes to simplify DerivedIV to e1833e3 and use it to simplify createScalarIVSteps
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp())) | ||
BypassBuilder.setFastMathFlags( | ||
II.getInductionBinOp()->getFastMathFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp())) | |
BypassBuilder.setFastMathFlags( | |
II.getInductionBinOp()->getFastMathFlags()); | |
auto *BinOp = II.getInductionBinOp(); | |
if (isa_and_nonnull<FPMathOperator>(BinOp)) | |
BypassBuilder.setFastMathFlags(BinOp->getFastMathFlags()); |
plus additional use below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
if (isa<VPWidenInductionRecipe>(VectorPhiR)) { | ||
if (VPValue *ResumePhi = addResumePhiRecipeForInduction( | ||
VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (isa<VPWidenInductionRecipe>(VectorPhiR)) { | |
if (VPValue *ResumePhi = addResumePhiRecipeForInduction( | |
VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo, | |
if (auto *WideIVR = dyn_cast<VPWidenInductionRecipe>(VectorPhiR)) { | |
if (VPValue *ResumePhi = addResumePhiRecipeForInduction( | |
WideIVR, VectorPHBuilder, ScalarPHBuilder, TypeInfo, |
instead of the dyn_cast in addResumePhiRecipeForInduction().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
|
||
// If index is the vector trip count, the concrete value will only be set in | ||
// prepareToExecute, leading to missed simplifications, e.g. if it is 0. | ||
// TODO: Remove the special case for the vector trip count once it is computed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is clear.
Knowing VTC is zero at compile-time suggests more simplification than eliminating redundant DeriveIV's ... ideally VF's (and UF's) should be restricted so as to prevent this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is slightly counter-intuitive, VTC == 0 can only happen when the original BTC == -1, but should be handled correctly with checking the canonical IV increment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So prepareToExecute() sets the value of VTC too late for any folding to take place. Treating overflow of VTC = max_uint BTC + 1 = 0 by skipping the vector loop, is clear. Question is if we can avoid vectorizing altogether, knowing this is the case at compile-time, considering the effort of computeMaxVF()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will look into that separately, thanks
VPRecipeBase *ResumePhi = VPIRInst->getOperand(0)->getDefiningRecipe(); | ||
VPIRInst->setOperand( | ||
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType()))); | ||
ResumePhi->eraseFromParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth noting here (also) that this erasure should be followed by dce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a run here; this cleans up the redundant IV computations discussed in one of the tests
@@ -11,6 +11,7 @@ target triple = "aarch64-unknown-linux-gnu" | |||
; VPLANS-LABEL: Checking a loop in 'simple_memset' | |||
; VPLANS: VPlan 'Initial VPlan for VF={vscale x 1,vscale x 2,vscale x 4},UF>=1' { | |||
; VPLANS-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF | |||
; VPLANS-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless considered irrelevant for the test, in which case this check should (continue to) be dropped?
Checking only the def w/o any use is somewhat inconsistent, but there are many test changes...
VPIRInst->setOperand( | ||
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPIRinst wraps the header phi in the scalar loop, so it is probaby better to keep it here.
Is this operand needed, set artificially to zero (can convey a type... ;-), or should it best be removed, keeping VPIRInst only as a wrapper of the header phi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment it is needed, otherwise it would complicate VPIRInstruction::execute; but in the future we can disconnect the scalar header for the main vector loop, thus removing the need to update them there. (There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but in the future we can disconnect the scalar header for the main vector loop ...
When can this scalar header be disconnected? Presumably it is needed for phi's that do have wide users in EpiPlan, so could be dropped altogether (or not constructed at all) if and only if all phi's are free of such users?
Can VPIRInst be erased as well (perhaps with make_early_inc_range)?
(thought I responded but cannot see the response here): VPIRinst wraps the header phi in the scalar loop, so it is probably better to keep it here.
A VPIRInst wraps an IRInst in order to (a) add operands to phi's and/or (b) support placing non-VPIRInst recipes between IRInst's, by bumping the builder. If neither is needed, VPIRInst can be dropped?
If an artificial operand is needed to appease VPIRInstruction::execute(), could undef/poison be used instead of zero?
(There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)
So dropping it or using undef would require updating said code in VPIRInstruction::execute() and/or elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both can be removed in the latest version, updated, thanks!
Disconnection should be able after #120918
BypassBuilder.setFastMathFlags( | ||
II.getInductionBinOp()->getFastMathFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BypassBuilder.setFastMathFlags( | |
II.getInductionBinOp()->getFastMathFlags()); | |
BypassBuilder.setFastMathFlags(BinOp->getFastMathFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOne thanks
/// Create resume phis in the scalar preheader for first-order recurrences and | ||
/// reductions and update the VPIRInstructions wrapping the original phis in the | ||
/// scalar header. | ||
/// Create and return a ResumePhi for \p PhiR, if it is wide induction recipe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create and return a ResumePhi for \p PhiR, if it is wide induction recipe. | |
/// Create and return a ResumePhi for \p WideIV, unless it is truncated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added a couple of final thoughts.
/// Create resume phis in the scalar preheader for first-order recurrences and | ||
/// reductions and update the VPIRInstructions wrapping the original phis in the | ||
/// scalar header. | ||
/// Create and return a ResumePhi for \p WideIF, unless it is truncated. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create and return a ResumePhi for \p WideIF, unless it is truncated. If the | |
/// Create and return a ResumePhi for \p WideIV, unless it is truncated. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks!
|
||
// If index is the vector trip count, the concrete value will only be set in | ||
// prepareToExecute, leading to missed simplifications, e.g. if it is 0. | ||
// TODO: Remove the special case for the vector trip count once it is computed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So prepareToExecute() sets the value of VTC too late for any folding to take place. Treating overflow of VTC = max_uint BTC + 1 = 0 by skipping the vector loop, is clear. Question is if we can avoid vectorizing altogether, knowing this is the case at compile-time, considering the effort of computeMaxVF()
?
VPIRInst->setOperand( | ||
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but in the future we can disconnect the scalar header for the main vector loop ...
When can this scalar header be disconnected? Presumably it is needed for phi's that do have wide users in EpiPlan, so could be dropped altogether (or not constructed at all) if and only if all phi's are free of such users?
Can VPIRInst be erased as well (perhaps with make_early_inc_range)?
(thought I responded but cannot see the response here): VPIRinst wraps the header phi in the scalar loop, so it is probably better to keep it here.
A VPIRInst wraps an IRInst in order to (a) add operands to phi's and/or (b) support placing non-VPIRInst recipes between IRInst's, by bumping the builder. If neither is needed, VPIRInst can be dropped?
If an artificial operand is needed to appease VPIRInstruction::execute(), could undef/poison be used instead of zero?
(There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)
So dropping it or using undef would require updating said code in VPIRInstruction::execute() and/or elsewhere?
// There is no corresponding wide induction in the epilogue plan that would | ||
// need a resume value. Set the operand in VPIRInst to zero, so ResumePhi | ||
// can be removed. The resume values for the scalar loop will be created | ||
// during execution of EpiPlan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resume values for the scalar loop will be created
// during execution of EpiPlan.
All resume values, or only those free of wide users in EpiPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be all resume values for the scalar loop, those free of wide users are not need to resume in the epilogue vector loop
if (EpiWidenedPhis.contains(IRI)) | ||
continue; | ||
// There is no corresponding wide induction in the epilogue plan that would | ||
// need a resume value. Set the operand in VPIRInst to zero, so ResumePhi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emphasize that this zero is artificial, along with a FIXME to remove it (one way or another)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both can be removed in the latest version, updated, thanks!
Model updating IV users directly in VPlan, replace fixupIVUsers. Depends on llvm#110004, llvm#109975 and llvm#112145.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/199/builds/506 Here is the relevant piece of the build log for the reference
|
Model updating IV users directly in VPlan, replace fixupIVUsers. Depends on llvm#110004, llvm#109975 and llvm#112145.
Split DerivedIV simplification off from llvm/llvm-project#112145 and use to remove the need for extra checks in createScalarIVSteps. Required an extra simplification run after IV transforms.
Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on llvm/llvm-project#110004 and llvm/llvm-project#109975. PR: llvm/llvm-project#112145
Model updating IV users directly in VPlan, replace fixupIVUsers. Now simple extracts are created for all phis in the exit block during initial VPlan construction. A later VPlan transform (optimizeInductionExitUsers) replaces extracts of inductions with their pre-computed values if possible. This completes the transition towards modeling all live-outs directly in VPlan. There are a few follow-ups: * emit extracts initially also for resume phis, and optimize them tougher with IV exit users * support for VPlans with multiple exits in optimizeInductionExitUsers. Depends on #110004, #109975 and #112145.
Model updating IV users directly in VPlan, replace fixupIVUsers. Now simple extracts are created for all phis in the exit block during initial VPlan construction. A later VPlan transform (optimizeInductionExitUsers) replaces extracts of inductions with their pre-computed values if possible. This completes the transition towards modeling all live-outs directly in VPlan. There are a few follow-ups: * emit extracts initially also for resume phis, and optimize them tougher with IV exit users * support for VPlans with multiple exits in optimizeInductionExitUsers. Depends on llvm/llvm-project#110004, llvm/llvm-project#109975 and llvm/llvm-project#112145.
This reverts the revert commit 58326f1. The build failure in sanitizer stage2 builds has been fixed with 0d39fe6. Original commit message: Model updating IV users directly in VPlan, replace fixupIVUsers. Now simple extracts are created for all phis in the exit block during initial VPlan construction. A later VPlan transform (optimizeInductionExitUsers) replaces extracts of inductions with their pre-computed values if possible. This completes the transition towards modeling all live-outs directly in VPlan. There are a few follow-ups: * emit extracts initially also for resume phis, and optimize them tougher with IV exit users * support for VPlans with multiple exits in optimizeInductionExitUsers. Depends on #110004, #109975 and #112145.
…12147)" This reverts the revert commit 58326f1. The build failure in sanitizer stage2 builds has been fixed with 0d39fe6. Original commit message: Model updating IV users directly in VPlan, replace fixupIVUsers. Now simple extracts are created for all phis in the exit block during initial VPlan construction. A later VPlan transform (optimizeInductionExitUsers) replaces extracts of inductions with their pre-computed values if possible. This completes the transition towards modeling all live-outs directly in VPlan. There are a few follow-ups: * emit extracts initially also for resume phis, and optimize them tougher with IV exit users * support for VPlans with multiple exits in optimizeInductionExitUsers. Depends on llvm/llvm-project#110004, llvm/llvm-project#109975 and llvm/llvm-project#112145.
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.
This allows updating IV users outside the loop as follow-up.
Depends on #110004 and
#109975.