[VectorCombine] Fold vector.interleave2 with two constant splats #125144

mshockwave · 2025-01-31T00:40:52Z

If we're interleaving 2 constant splats, for instance <vscale x 8 x i32> <splat of 666> and <vscale x 8 x i32> <splat of 777>, we can create a larger splat <vscale x 8 x i64> <splat of ((777 << 32) | 666)> first before casting it back into <vscale x 16 x i32>.

This is split out from #120490

TBA...

llvmbot · 2025-01-31T00:41:24Z

@llvm/pr-subscribers-llvm-transforms

Author: Min-Yih Hsu (mshockwave)

Changes

If we're interleaving 2 constant splats, for instance <vscale x 8 x i32> <splat of 666> and <vscale x 8 x i32> <splat of 777>, we can create a larger splat <vscale x 8 x i64> <splat of ((777 << 32) | 666)> first before casting it back into <vscale x 16 x i32>.

This is split out from #120490

Full diff: https://github.com/llvm/llvm-project/pull/125144.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+41)
(added) llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll (+14)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 59920b5a4dd20a..fd49620b5e3ac3 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -125,6 +125,7 @@ class VectorCombine {
   bool foldShuffleFromReductions(Instruction &I);
   bool foldCastFromReductions(Instruction &I);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
+  bool foldInterleaveIntrinsics(Instruction &I);
   bool shrinkType(Instruction &I);
 
   void replaceValue(Value &Old, Value &New) {
@@ -3145,6 +3146,45 @@ bool VectorCombine::foldInsExtVectorToShuffle(Instruction &I) {
   return true;
 }
 
+bool VectorCombine::foldInterleaveIntrinsics(Instruction &I) {
+  // If we're interleaving 2 constant splats, for instance `<vscale x 8 x i32>
+  // <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can create a
+  // larger splat
+  // `<vscale x 8 x i64> <splat of ((777 << 32) | 666)>` first before casting it
+  // back into `<vscale x 16 x i32>`.
+  using namespace PatternMatch;
+  const APInt *SplatVal0, *SplatVal1;
+  if (!match(&I, m_Intrinsic<Intrinsic::vector_interleave2>(
+                     m_APInt(SplatVal0), m_APInt(SplatVal1))))
+    return false;
+
+  LLVM_DEBUG(dbgs() << "VC: Folding interleave2 with two splats: " << I
+                    << "\n");
+
+  auto *VTy =
+      cast<VectorType>(cast<IntrinsicInst>(I).getArgOperand(0)->getType());
+  auto *ExtVTy = VectorType::getExtendedElementVectorType(VTy);
+  unsigned Width = VTy->getElementType()->getIntegerBitWidth();
+
+  if (TTI.getInstructionCost(&I, CostKind) <
+      TTI.getCastInstrCost(Instruction::BitCast, I.getType(), ExtVTy,
+                           TTI::CastContextHint::None, CostKind)) {
+    LLVM_DEBUG(dbgs() << "VC: The cost to cast from " << *ExtVTy << " to "
+                      << *I.getType() << " is too high.\n");
+    return false;
+  }
+
+  APInt NewSplatVal = SplatVal1->zext(Width * 2);
+  NewSplatVal <<= Width;
+  NewSplatVal |= SplatVal0->zext(Width * 2);
+  auto *NewSplat = ConstantVector::getSplat(
+      ExtVTy->getElementCount(), ConstantInt::get(F.getContext(), NewSplatVal));
+
+  IRBuilder<> Builder(&I);
+  replaceValue(I, *Builder.CreateBitCast(NewSplat, I.getType()));
+  return true;
+}
+
 /// This is the entry point for all transforms. Pass manager differences are
 /// handled in the callers of this function.
 bool VectorCombine::run() {
@@ -3189,6 +3229,7 @@ bool VectorCombine::run() {
       MadeChange |= scalarizeBinopOrCmp(I);
       MadeChange |= scalarizeLoadExtract(I);
       MadeChange |= scalarizeVPIntrinsic(I);
+      MadeChange |= foldInterleaveIntrinsics(I);
     }
 
     if (Opcode == Instruction::Store)
diff --git a/llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll b/llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll
new file mode 100644
index 00000000000000..f2eb4e4e2dbc85
--- /dev/null
+++ b/llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll
@@ -0,0 +1,14 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=riscv64 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s
+; RUN: opt -S -mtriple=riscv32 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s
+
+define void @store_factor2_const_splat(ptr %dst) {
+; CHECK-LABEL: define void @store_factor2_const_splat(
+; CHECK-SAME: ptr [[DST:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    call void @llvm.vp.store.nxv16i32.p0(<vscale x 16 x i32> bitcast (<vscale x 8 x i64> splat (i64 3337189589658) to <vscale x 16 x i32>), ptr [[DST]], <vscale x 16 x i1> splat (i1 true), i32 88)
+; CHECK-NEXT:    ret void
+;
+  %interleave2 = call <vscale x 16 x i32> @llvm.vector.interleave2.nxv16i32(<vscale x 8 x i32> splat (i32 666), <vscale x 8 x i32> splat (i32 777))
+  call void @llvm.vp.store.nxv16i32.p0(<vscale x 16 x i32> %interleave2, ptr %dst, <vscale x 16 x i1> splat (i1 true), i32 88)
+  ret void
+}

llvmbot · 2025-01-31T00:41:25Z

@llvm/pr-subscribers-vectorizers

Author: Min-Yih Hsu (mshockwave)

Changes

If we're interleaving 2 constant splats, for instance <vscale x 8 x i32> <splat of 666> and <vscale x 8 x i32> <splat of 777>, we can create a larger splat <vscale x 8 x i64> <splat of ((777 << 32) | 666)> first before casting it back into <vscale x 16 x i32>.

This is split out from #120490

Full diff: https://github.com/llvm/llvm-project/pull/125144.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+41)
(added) llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll (+14)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 59920b5a4dd20a..fd49620b5e3ac3 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -125,6 +125,7 @@ class VectorCombine {
   bool foldShuffleFromReductions(Instruction &I);
   bool foldCastFromReductions(Instruction &I);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
+  bool foldInterleaveIntrinsics(Instruction &I);
   bool shrinkType(Instruction &I);
 
   void replaceValue(Value &Old, Value &New) {
@@ -3145,6 +3146,45 @@ bool VectorCombine::foldInsExtVectorToShuffle(Instruction &I) {
   return true;
 }
 
+bool VectorCombine::foldInterleaveIntrinsics(Instruction &I) {
+  // If we're interleaving 2 constant splats, for instance `<vscale x 8 x i32>
+  // <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can create a
+  // larger splat
+  // `<vscale x 8 x i64> <splat of ((777 << 32) | 666)>` first before casting it
+  // back into `<vscale x 16 x i32>`.
+  using namespace PatternMatch;
+  const APInt *SplatVal0, *SplatVal1;
+  if (!match(&I, m_Intrinsic<Intrinsic::vector_interleave2>(
+                     m_APInt(SplatVal0), m_APInt(SplatVal1))))
+    return false;
+
+  LLVM_DEBUG(dbgs() << "VC: Folding interleave2 with two splats: " << I
+                    << "\n");
+
+  auto *VTy =
+      cast<VectorType>(cast<IntrinsicInst>(I).getArgOperand(0)->getType());
+  auto *ExtVTy = VectorType::getExtendedElementVectorType(VTy);
+  unsigned Width = VTy->getElementType()->getIntegerBitWidth();
+
+  if (TTI.getInstructionCost(&I, CostKind) <
+      TTI.getCastInstrCost(Instruction::BitCast, I.getType(), ExtVTy,
+                           TTI::CastContextHint::None, CostKind)) {
+    LLVM_DEBUG(dbgs() << "VC: The cost to cast from " << *ExtVTy << " to "
+                      << *I.getType() << " is too high.\n");
+    return false;
+  }
+
+  APInt NewSplatVal = SplatVal1->zext(Width * 2);
+  NewSplatVal <<= Width;
+  NewSplatVal |= SplatVal0->zext(Width * 2);
+  auto *NewSplat = ConstantVector::getSplat(
+      ExtVTy->getElementCount(), ConstantInt::get(F.getContext(), NewSplatVal));
+
+  IRBuilder<> Builder(&I);
+  replaceValue(I, *Builder.CreateBitCast(NewSplat, I.getType()));
+  return true;
+}
+
 /// This is the entry point for all transforms. Pass manager differences are
 /// handled in the callers of this function.
 bool VectorCombine::run() {
@@ -3189,6 +3229,7 @@ bool VectorCombine::run() {
       MadeChange |= scalarizeBinopOrCmp(I);
       MadeChange |= scalarizeLoadExtract(I);
       MadeChange |= scalarizeVPIntrinsic(I);
+      MadeChange |= foldInterleaveIntrinsics(I);
     }
 
     if (Opcode == Instruction::Store)
diff --git a/llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll b/llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll
new file mode 100644
index 00000000000000..f2eb4e4e2dbc85
--- /dev/null
+++ b/llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll
@@ -0,0 +1,14 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=riscv64 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s
+; RUN: opt -S -mtriple=riscv32 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s
+
+define void @store_factor2_const_splat(ptr %dst) {
+; CHECK-LABEL: define void @store_factor2_const_splat(
+; CHECK-SAME: ptr [[DST:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    call void @llvm.vp.store.nxv16i32.p0(<vscale x 16 x i32> bitcast (<vscale x 8 x i64> splat (i64 3337189589658) to <vscale x 16 x i32>), ptr [[DST]], <vscale x 16 x i1> splat (i1 true), i32 88)
+; CHECK-NEXT:    ret void
+;
+  %interleave2 = call <vscale x 16 x i32> @llvm.vector.interleave2.nxv16i32(<vscale x 8 x i32> splat (i32 666), <vscale x 8 x i32> splat (i32 777))
+  call void @llvm.vp.store.nxv16i32.p0(<vscale x 16 x i32> %interleave2, ptr %dst, <vscale x 16 x i1> splat (i1 true), i32 88)
+  ret void
+}

mshockwave · 2025-01-31T00:45:07Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

+  unsigned Width = VTy->getElementType()->getIntegerBitWidth();
+
+  if (TTI.getInstructionCost(&I, CostKind) <
+      TTI.getCastInstrCost(Instruction::BitCast, I.getType(), ExtVTy,


We're really just worrying about the legalization cost here, should ExtVTy be an illegal type.

topperc · 2025-01-31T01:07:53Z

llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll

@@ -0,0 +1,14 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=riscv64 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s
+; RUN: opt -S -mtriple=riscv32 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s


Add a test that uses Zve32x instead of V. We shouldn't form an i64 vector type in that case. I think it would crash the backend.

Maybe test to make sure we don't do i64->i128 too.

Both are done.

lukel97

LGTM w/ the zve32x test

lukel97 · 2025-01-31T02:32:24Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

+  // larger splat
+  // `<vscale x 8 x i64> <splat of ((777 << 32) | 666)>` first before casting it
+  // back into `<vscale x 16 x i32>`.
+  using namespace PatternMatch;


I think you can remove this since PatternMatch is already included at the top of VectorCombine.cpp

lukel97 · 2025-01-31T02:32:53Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

+  // If we're interleaving 2 constant splats, for instance `<vscale x 8 x i32>
+  // <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can create a
+  // larger splat
+  // `<vscale x 8 x i64> <splat of ((777 << 32) | 666)>` first before casting it
+  // back into `<vscale x 16 x i32>`.


Nit, make this a doc comment by moving above the signature + use ///

lukel97 · 2025-02-01T03:15:39Z

llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat.ll

@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=riscv64 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s
+; RUN: opt -S -mtriple=riscv32 -mattr=+v,+m,+zvfh %s -passes=vector-combine | FileCheck %s


Nit, do we need +zvfh in here and vector-interleave2-splat-e64.ll?

No we don't need it. It's fixed now.

topperc · 2025-02-01T03:21:59Z

llvm/test/Transforms/VectorCombine/RISCV/vector-interleave2-splat-e64.ll

+
+; We should not form a i128 vector.
+
+define void @interleave2_const_splat_nxv8i64(ptr %dst) {


Does this have some bad interaction with zve32x that required a separate test file?

Does this have some bad interaction with zve32x that required a separate test file?

Correct, from what I'd tried zve32x doesn't like SEW=64 in general.

topperc

LGTM

llvm-ci · 2025-02-04T03:14:40Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/15181

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: api/omp_host_call.c' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/omp_host_call.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_host_call.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_host_call.c.tmp | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/omp_host_call.c
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/omp_host_call.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_host_call.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_host_call.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/omp_host_call.c
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/api/omp_host_call.c
# `-----------------------------
# error: command failed with exit status: 2

--

********************

…m#125144) If we're interleaving 2 constant splats, for instance `<vscale x 8 x i32> <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can create a larger splat `<vscale x 8 x i64> <splat of ((777 << 32) | 666)>` first before casting it back into `<vscale x 16 x i32>`.

[VectorCombine] Fold vector.interleave2 with two constant splats

5130f96

TBA...

mshockwave requested review from lukel97 and michaelmaitland January 31, 2025 00:40

llvmbot added vectorizers llvm:transforms labels Jan 31, 2025

mshockwave commented Jan 31, 2025

View reviewed changes

mshockwave mentioned this pull request Jan 31, 2025

[IA][RISCV] Support VP loads/stores in InterleavedAccessPass #120490

Merged

topperc reviewed Jan 31, 2025

View reviewed changes

lukel97 approved these changes Jan 31, 2025

View reviewed changes

mshockwave added 3 commits January 31, 2025 10:23

fixup! Add more test cases

0904d15

fixup! Address review comment

8f12911

fixup! Add some comments in the test

578401c

lukel97 reviewed Feb 1, 2025

View reviewed changes

topperc reviewed Feb 1, 2025

View reviewed changes

fixup! Remove unneeded extensions in the tests

d346e0e

topperc approved these changes Feb 4, 2025

View reviewed changes

mshockwave merged commit 635ab51 into llvm:main Feb 4, 2025
8 checks passed

mshockwave deleted the patch/veccombine-interleave-splat branch February 4, 2025 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VectorCombine] Fold vector.interleave2 with two constant splats #125144

[VectorCombine] Fold vector.interleave2 with two constant splats #125144

mshockwave commented Jan 31, 2025

llvmbot commented Jan 31, 2025

llvmbot commented Jan 31, 2025

mshockwave Jan 31, 2025

topperc Jan 31, 2025 •

edited

Loading

topperc Jan 31, 2025

mshockwave Jan 31, 2025

lukel97 left a comment

lukel97 Jan 31, 2025

mshockwave Jan 31, 2025

lukel97 Jan 31, 2025

mshockwave Jan 31, 2025

lukel97 Feb 1, 2025

mshockwave Feb 3, 2025

topperc Feb 1, 2025 •

edited

Loading

mshockwave Feb 3, 2025

topperc left a comment

llvm-ci commented Feb 4, 2025


		; We should not form a i128 vector.

		define void @interleave2_const_splat_nxv8i64(ptr %dst) {

[VectorCombine] Fold vector.interleave2 with two constant splats #125144

[VectorCombine] Fold vector.interleave2 with two constant splats #125144

Conversation

mshockwave commented Jan 31, 2025

llvmbot commented Jan 31, 2025

llvmbot commented Jan 31, 2025

Choose a reason for hiding this comment

topperc Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukel97 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topperc Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topperc left a comment

Choose a reason for hiding this comment

llvm-ci commented Feb 4, 2025

topperc Jan 31, 2025 •

edited

Loading

topperc Feb 1, 2025 •

edited

Loading