-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More aggressive reassociations #6626
More aggressive reassociations #6626
Conversation
In your message you say
How are %7 and %9 redundant here? Should it be %5 and %9? It would be easier to read this if you gave them names I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look at the performance impact of this change before committing it.
Yes, %5 and %9 are redundant. I simplified the dxils and renamed the virtual register number. I'll update the description. |
83b7b96
to
6623591
Compare
c72d018
to
8af0fbe
Compare
Done. |
aa61d01
to
d4efcf2
Compare
8af0fbe
to
06c4b11
Compare
Overall, compared to the upstream change (#6598), this change can reduce ALU in ~37% of shaders. The occupancy of a small number of shaders are improved (~1%). There were an equal number of regressions and improvements. Overall, this change looks to be positive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d4efcf2
to
4a2d290
Compare
Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c) in this PR (#6598), it still might overlook some obvious common factors. One case has been observed is: %Float4_0 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3 %Float2_0 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1 /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */ %Float4_1 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) /* %Float4_1.w is redundant with %Float4_0.w */ %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */ %Float2_1 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) /* %Float2_1.y is redundant with %Float2_0.y */ %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1 .... %11 = fmul fast float %Float4_0.w, %10 %12 = fmul fast float %11, %Float2_0.y .... %14 = fmul fast float %Float4_1.w, %13 %15 = fmul fast float %14, %Float2_1.y (%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass. For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass. Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back. This is part 3 of the fix for #6593.
06c4b11
to
b052e1b
Compare
Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c) in this PR (#6598), it still might overlook some obvious common factors.
One case has been observed is:
The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass.
For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass.
Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back.
This is part 3 of the fix for #6593.