DXC common factor optimization #6595

lizhengxing · 2024-05-07T17:44:18Z

This PR pulls the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c), into DXC.

For the code below:
foo = (a * b) * c
bar = (a * d) * c

As the upstream change states, it can recognize the a*c is a common factor and redundant.

However, this upstream change might overlook some obvious common factors. One case has been observed is:
%2 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
%3 = extractvalue %dx.types.CBufRet.f32 %2, 3
%4 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
%5 = extractvalue %dx.types.CBufRet.f32 %4, 1
%6 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
%7 = extractvalue %dx.types.CBufRet.f32 %6, 3
%8 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
%9 = extractvalue %dx.types.CBufRet.f32 %8, 1

%11 = fmul fast float %3, %10
%12 = fmul fast float %11, %5

%14 = fmul fast float %7, %13
%15 = fmul fast float %14, %9 ---> %3*%5 == %7*%9 --> they should be reassociated to a common factor

The upstream change can't identify this common factor because DXC don't know (%3, %7) and (%7, %9) are redundant when running Reassociate pass. The redundancy of (%3, %7) and (%7, %9) will be eliminated after GVN pass.

For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and it will run GVN pass after it to remove the redundancy generared in the second run of Reassociate pass.

One issue of this PR is changing the order of floating point operations will cause the precision issue and very different result of some edge cases.

So this PR adds the disable-aggressive-reassociation flag (it's false by default) to the non-upstream change. It can be used to mitigate and debug the issue caused by this PR.

This PR pulls the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c), into DXC. For the code below: foo = (a * b) * c bar = (a * d) * c As the upstream change states, it can recognize the a*c is a common factor and redundant. However, this upstream change might overlook some obvious common factors. One case has been observed is: %2 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) %3 = extractvalue %dx.types.CBufRet.f32 %2, 3 %4 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) %5 = extractvalue %dx.types.CBufRet.f32 %4, 1 %6 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) %7 = extractvalue %dx.types.CBufRet.f32 %6, 3 %8 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) %9 = extractvalue %dx.types.CBufRet.f32 %8, 1 %11 = fmul fast float %3, %10 %12 = fmul fast float %11, %5 %14 = fmul fast float %7, %13 %15 = fmul fast float %14, %9 ---> %3*%5 == %7*%9 --> they should be reassociated to a common factor The upstream change can't identify this common factor because DXC don't know (%3, %7) and (%7, %9) are redundant when running Reassociate pass. The redundancy of (%3, %7) and (%7, %9) will be eliminated after GVN pass. For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and it will run GVN pass after it to remove the redundancy generared in the second run of Reassociate pass. One issue of this PR is changing the order of floating point operations will cause the precision issue and very different result of some edge cases. So this PR adds the disable-aggressive-reassociation flag (it's false by default) to the non-upstream change. It can be used to mitigate and debug the issue caused by this PR. Fixes #6593.

github-actions · 2024-05-07T17:45:10Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 14c440712d7411f184bd29a70718e3b8568dcfe7 6e0baa18fcad6ede2e2d2cda775fbb51137d0e69 -- include/dxc/Support/HLSLOptions.h include/llvm/Transforms/IPO/PassManagerBuilder.h lib/DxcSupport/HLSLOptions.cpp lib/Transforms/IPO/PassManagerBuilder.cpp lib/Transforms/Scalar/Reassociate.cpp tools/clang/include/clang/Frontend/CodeGenOptions.h tools/clang/lib/CodeGen/BackendUtil.cpp tools/clang/tools/dxcompiler/dxcompilerobj.cpp

View the diff from clang-format here.

diff --git a/include/dxc/Support/HLSLOptions.h b/include/dxc/Support/HLSLOptions.h
index 75d096d3..1690986e 100644
--- a/include/dxc/Support/HLSLOptions.h
+++ b/include/dxc/Support/HLSLOptions.h
@@ -236,7 +236,8 @@ public:
   std::string TimeTrace = "";           // OPT_ftime_trace[EQ]
   unsigned TimeTraceGranularity = 500;  // OPT_ftime_trace_granularity_EQ
   bool VerifyDiagnostics = false;       // OPT_verify
-  bool EnableAggressiveReassociation = true; // OPT_disable_aggressive_reassociation
+  bool EnableAggressiveReassociation =
+      true; // OPT_disable_aggressive_reassociation
 
   // Optimization pass enables, disables and selects
   OptimizationToggles
diff --git a/include/llvm/Transforms/IPO/PassManagerBuilder.h b/include/llvm/Transforms/IPO/PassManagerBuilder.h
index 0b3837a5..8e813702 100644
--- a/include/llvm/Transforms/IPO/PassManagerBuilder.h
+++ b/include/llvm/Transforms/IPO/PassManagerBuilder.h
@@ -139,7 +139,7 @@ public:
   bool HLSLEnableDebugNops = false; // HLSL Change
   bool HLSLEarlyInlining = true; // HLSL Change
   bool HLSLNoSink = false; // HLSL Change
-  bool HLSLEnableAggressiveReassociation = true; // HLSL Change
+  bool HLSLEnableAggressiveReassociation = true;    // HLSL Change
   void addHLSLPasses(legacy::PassManagerBase &MPM); // HLSL Change
 
 private:
diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp
index fa3178eb..e64d0c49 100644
--- a/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -504,8 +504,8 @@ void PassManagerBuilder::populateModulePassManager(
 
   // HLSL Change Begins.
   {
-    // Run reassociate pass again after GVN since GVN will expose more opportunities
-    // for reassociation.
+    // Run reassociate pass again after GVN since GVN will expose more
+    // opportunities for reassociation.
     if (HLSLEnableAggressiveReassociation) {
       MPM.add(createReassociatePass()); // Reassociate expressions
       if (EnableGVN) {
diff --git a/lib/Transforms/Scalar/Reassociate.cpp b/lib/Transforms/Scalar/Reassociate.cpp
index cb67beef..b5b0f7fa 100644
--- a/lib/Transforms/Scalar/Reassociate.cpp
+++ b/lib/Transforms/Scalar/Reassociate.cpp
@@ -20,7 +20,6 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "llvm/Transforms/Scalar.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/STLExtras.h"
@@ -38,6 +37,7 @@
 #include "llvm/Pass.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include <algorithm>
 using namespace llvm;
@@ -2286,8 +2286,7 @@ void Reassociate::ReassociateExpression(BinaryOperator *I) {
   RewriteExprTree(I, Ops);
 }
 
-void Reassociate::BuildPairMap(
-    ReversePostOrderTraversal<Function *> &RPOT) {
+void Reassociate::BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT) {
   // Make a "pairmap" of how often each operand pair occurs.
   for (BasicBlock *BI : RPOT) {
     for (Instruction &I : *BI) {
@@ -2357,11 +2356,11 @@ bool Reassociate::runOnFunction(Function &F) {
   // pass pipeline for further potential gains.
   // It might also be possible to update the pair map during runtime, but the
   // overhead of that may be large if there's many reassociable chains.
-// TODO: RPOT
-// Get the functions basic blocks in Reverse Post Order. This order is used by
-// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
-// blocks (it has been seen that the analysis in this pass could hang when
-// analysing dead basic blocks).
+  // TODO: RPOT
+  // Get the functions basic blocks in Reverse Post Order. This order is used by
+  // BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
+  // blocks (it has been seen that the analysis in this pass could hang when
+  // analysing dead basic blocks).
   ReversePostOrderTraversal<Function *> RPOT(&F);
   BuildPairMap(RPOT);

Check this box to apply formatting changes to this branch.

dmpots

I think it would be good to split this into two changes since we are really doing two separate things here.

Port the upstream fix back to the reassociation pass.
Add the extra reassocation+gvn run with the new -aggressive-reassociate flag.

Both of these would be useful bisect points if we need to chase down a regression.

lizhengxing · 2024-05-07T18:32:02Z

I think it would be good to split this into two changes since we are really doing two separate things here.

Port the upstream fix back to the reassociation pass.

Add the extra reassocation+gvn run with the new -aggressive-reassociate flag.

Both of these would be useful bisect points if we need to chase down a regression.

Yeah, I thought about it after I found it's hard to put the upstream change under the new -aggressive-reassociate flag. Let me refactor this PR after I got more reviews.

damyanp · 2024-05-07T21:15:13Z

Yeah, I thought about it after I found it's hard to put the upstream change under the new -aggressive-reassociate flag. Let me refactor this PR after I got more reviews.

Probably best to get the refactored PR out so people can review the thing that you're planning to commit.

lizhengxing · 2024-05-08T01:19:18Z

I will split this PR into 3 separate PRs based on the reviews. This PR is no longer needed. So close and delete it.

lizhengxing requested a review from a team as a code owner May 7, 2024 17:44

lizhengxing requested a review from tex3d May 7, 2024 17:45

lizhengxing requested review from pow2clk, bogner, llvm-beanz and dmpots May 7, 2024 17:45

chore: autopublish 2024-05-07T18:05:22Z

e293aa2

dmpots suggested changes May 7, 2024

View reviewed changes

lizhengxing closed this May 8, 2024

lizhengxing deleted the zhengxingli/common-factor-optimization branch May 16, 2024 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DXC common factor optimization #6595

DXC common factor optimization #6595

lizhengxing commented May 7, 2024 •

edited

Loading

github-actions bot commented May 7, 2024 •

edited by lizhengxing

Loading

dmpots left a comment •

edited

Loading

lizhengxing commented May 7, 2024

damyanp commented May 7, 2024

lizhengxing commented May 8, 2024 •

edited

Loading

DXC common factor optimization #6595

DXC common factor optimization #6595

Conversation

lizhengxing commented May 7, 2024 • edited Loading

github-actions bot commented May 7, 2024 • edited by lizhengxing Loading

dmpots left a comment • edited Loading

Choose a reason for hiding this comment

lizhengxing commented May 7, 2024

damyanp commented May 7, 2024

lizhengxing commented May 8, 2024 • edited Loading

lizhengxing commented May 7, 2024 •

edited

Loading

github-actions bot commented May 7, 2024 •

edited by lizhengxing

Loading

dmpots left a comment •

edited

Loading

lizhengxing commented May 8, 2024 •

edited

Loading