-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DXC common factor optimization #6595
Conversation
This PR pulls the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c), into DXC. For the code below: foo = (a * b) * c bar = (a * d) * c As the upstream change states, it can recognize the a*c is a common factor and redundant. However, this upstream change might overlook some obvious common factors. One case has been observed is: %2 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) %3 = extractvalue %dx.types.CBufRet.f32 %2, 3 %4 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) %5 = extractvalue %dx.types.CBufRet.f32 %4, 1 %6 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) %7 = extractvalue %dx.types.CBufRet.f32 %6, 3 %8 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) %9 = extractvalue %dx.types.CBufRet.f32 %8, 1 %11 = fmul fast float %3, %10 %12 = fmul fast float %11, %5 %14 = fmul fast float %7, %13 %15 = fmul fast float %14, %9 ---> %3*%5 == %7*%9 --> they should be reassociated to a common factor The upstream change can't identify this common factor because DXC don't know (%3, %7) and (%7, %9) are redundant when running Reassociate pass. The redundancy of (%3, %7) and (%7, %9) will be eliminated after GVN pass. For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and it will run GVN pass after it to remove the redundancy generared in the second run of Reassociate pass. One issue of this PR is changing the order of floating point operations will cause the precision issue and very different result of some edge cases. So this PR adds the disable-aggressive-reassociation flag (it's false by default) to the non-upstream change. It can be used to mitigate and debug the issue caused by this PR. Fixes #6593.
You can test this locally with the following command:git-clang-format --diff 14c440712d7411f184bd29a70718e3b8568dcfe7 6e0baa18fcad6ede2e2d2cda775fbb51137d0e69 -- include/dxc/Support/HLSLOptions.h include/llvm/Transforms/IPO/PassManagerBuilder.h lib/DxcSupport/HLSLOptions.cpp lib/Transforms/IPO/PassManagerBuilder.cpp lib/Transforms/Scalar/Reassociate.cpp tools/clang/include/clang/Frontend/CodeGenOptions.h tools/clang/lib/CodeGen/BackendUtil.cpp tools/clang/tools/dxcompiler/dxcompilerobj.cpp View the diff from clang-format here.diff --git a/include/dxc/Support/HLSLOptions.h b/include/dxc/Support/HLSLOptions.h
index 75d096d3..1690986e 100644
--- a/include/dxc/Support/HLSLOptions.h
+++ b/include/dxc/Support/HLSLOptions.h
@@ -236,7 +236,8 @@ public:
std::string TimeTrace = ""; // OPT_ftime_trace[EQ]
unsigned TimeTraceGranularity = 500; // OPT_ftime_trace_granularity_EQ
bool VerifyDiagnostics = false; // OPT_verify
- bool EnableAggressiveReassociation = true; // OPT_disable_aggressive_reassociation
+ bool EnableAggressiveReassociation =
+ true; // OPT_disable_aggressive_reassociation
// Optimization pass enables, disables and selects
OptimizationToggles
diff --git a/include/llvm/Transforms/IPO/PassManagerBuilder.h b/include/llvm/Transforms/IPO/PassManagerBuilder.h
index 0b3837a5..8e813702 100644
--- a/include/llvm/Transforms/IPO/PassManagerBuilder.h
+++ b/include/llvm/Transforms/IPO/PassManagerBuilder.h
@@ -139,7 +139,7 @@ public:
bool HLSLEnableDebugNops = false; // HLSL Change
bool HLSLEarlyInlining = true; // HLSL Change
bool HLSLNoSink = false; // HLSL Change
- bool HLSLEnableAggressiveReassociation = true; // HLSL Change
+ bool HLSLEnableAggressiveReassociation = true; // HLSL Change
void addHLSLPasses(legacy::PassManagerBase &MPM); // HLSL Change
private:
diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp
index fa3178eb..e64d0c49 100644
--- a/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -504,8 +504,8 @@ void PassManagerBuilder::populateModulePassManager(
// HLSL Change Begins.
{
- // Run reassociate pass again after GVN since GVN will expose more opportunities
- // for reassociation.
+ // Run reassociate pass again after GVN since GVN will expose more
+ // opportunities for reassociation.
if (HLSLEnableAggressiveReassociation) {
MPM.add(createReassociatePass()); // Reassociate expressions
if (EnableGVN) {
diff --git a/lib/Transforms/Scalar/Reassociate.cpp b/lib/Transforms/Scalar/Reassociate.cpp
index cb67beef..b5b0f7fa 100644
--- a/lib/Transforms/Scalar/Reassociate.cpp
+++ b/lib/Transforms/Scalar/Reassociate.cpp
@@ -20,7 +20,6 @@
//
//===----------------------------------------------------------------------===//
-#include "llvm/Transforms/Scalar.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"
@@ -38,6 +37,7 @@
#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/Local.h"
#include <algorithm>
using namespace llvm;
@@ -2286,8 +2286,7 @@ void Reassociate::ReassociateExpression(BinaryOperator *I) {
RewriteExprTree(I, Ops);
}
-void Reassociate::BuildPairMap(
- ReversePostOrderTraversal<Function *> &RPOT) {
+void Reassociate::BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT) {
// Make a "pairmap" of how often each operand pair occurs.
for (BasicBlock *BI : RPOT) {
for (Instruction &I : *BI) {
@@ -2357,11 +2356,11 @@ bool Reassociate::runOnFunction(Function &F) {
// pass pipeline for further potential gains.
// It might also be possible to update the pair map during runtime, but the
// overhead of that may be large if there's many reassociable chains.
-// TODO: RPOT
-// Get the functions basic blocks in Reverse Post Order. This order is used by
-// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
-// blocks (it has been seen that the analysis in this pass could hang when
-// analysing dead basic blocks).
+ // TODO: RPOT
+ // Get the functions basic blocks in Reverse Post Order. This order is used by
+ // BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
+ // blocks (it has been seen that the analysis in this pass could hang when
+ // analysing dead basic blocks).
ReversePostOrderTraversal<Function *> RPOT(&F);
BuildPairMap(RPOT);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to split this into two changes since we are really doing two separate things here.
- Port the upstream fix back to the reassociation pass.
- Add the extra reassocation+gvn run with the new -aggressive-reassociate flag.
Both of these would be useful bisect points if we need to chase down a regression.
Yeah, I thought about it after I found it's hard to put the upstream change under the new -aggressive-reassociate flag. Let me refactor this PR after I got more reviews. |
Probably best to get the refactored PR out so people can review the thing that you're planning to commit. |
I will split this PR into 3 separate PRs based on the reviews. This PR is no longer needed. So close and delete it. |
This PR pulls the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c), into DXC.
For the code below:
foo = (a * b) * c
bar = (a * d) * c
As the upstream change states, it can recognize the a*c is a common factor and redundant.
However, this upstream change might overlook some obvious common factors. One case has been observed is:
%2 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
%3 = extractvalue %dx.types.CBufRet.f32 %2, 3
%4 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
%5 = extractvalue %dx.types.CBufRet.f32 %4, 1
%6 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
%7 = extractvalue %dx.types.CBufRet.f32 %6, 3
%8 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
%9 = extractvalue %dx.types.CBufRet.f32 %8, 1
%11 = fmul fast float %3, %10
%12 = fmul fast float %11, %5
%14 = fmul fast float %7, %13
%15 = fmul fast float %14, %9 ---> %3*%5 == %7*%9 --> they should be reassociated to a common factor
The upstream change can't identify this common factor because DXC don't know (%3, %7) and (%7, %9) are redundant when running Reassociate pass. The redundancy of (%3, %7) and (%7, %9) will be eliminated after GVN pass.
For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and it will run GVN pass after it to remove the redundancy generared in the second run of Reassociate pass.
One issue of this PR is changing the order of floating point operations will cause the precision issue and very different result of some edge cases.
So this PR adds the disable-aggressive-reassociation flag (it's false by default) to the non-upstream change. It can be used to mitigate and debug the issue caused by this PR.