-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out which LLVM optimisation passes are worth enabling #595
Comments
From jinyus/related_post_gen#440 (comment): using Perhaps as a starting point we can just set that option when using |
When using `inko build --opt=aggressive`, we not set LLVM's optimization level to "aggressive", which is the equivalent of -O3 for clang. This gives users to ability to have their code optimized at least somewhat, provided they're willing to deal with the significant increase in compile times. For example, Inko's test suite takes about 3 seconds to compile without optimizations, while taking just under 10 seconds when using --opt=aggressive. The option --opt=balanced still doesn't apply optimizations as we've yet to figure out which ones we want to explicitly opt-in to. See #595 for more details. Changelog: performance
1a30de9 changes |
At leas the following passes are worth looking into more, based on playing around with them to see what effect they have:
|
Worth adding: even with
On my laptop this takes 24 seconds to run, with about 80% of the time being spent in the code of I'm not sure how on earth this code is that slow, given that Rust does it in about 2.5 seconds. |
Curiously, the above program finishes in only 3.68 seconds on my desktop. Perhaps the Intel CPU on my laptop is just really terrible at this code for some reason? |
Depending on how LLVM decides to optimize things, these attributes may help improve code generation, though it's difficult to say for certain how much at this stage. See #595 for more details. Changelog: performance
Depending on how LLVM decides to optimize things, these attributes may help improve code generation, though it's difficult to say for certain how much at this stage. See #595 for more details. Changelog: performance
Depending on how LLVM decides to optimize things, these attributes may help improve code generation, though it's difficult to say for certain how much at this stage. See #595 for more details. Changelog: performance
Depending on how LLVM decides to optimize things, these attributes may help improve code generation, though it's difficult to say for certain how much at this stage. See #595 for more details. Changelog: performance
Depending on how LLVM decides to optimize things, these attributes may help improve code generation, though it's difficult to say for certain how much at this stage. See #595 for more details. Changelog: performance
Depending on how LLVM decides to optimize things, these attributes may help improve code generation, though it's difficult to say for certain how much at this stage. See #595 for more details. Changelog: performance
The passes used by LLVM when using O2 (somewhat cleaned up):
The command I used for this: opt-17 -passes='default<O2>' -print-pipeline-passes < /dev/null 2>/dev/null And for O1:
The diff: diff --git a/tmp/o1.txt b/tmp/o2.txt
index a81189c..ba359e6 100644
--- a/tmp/o1.txt
+++ b/tmp/o2.txt
@@ -27,12 +27,19 @@ cgscc(
inline<only-mandatory>
inline
function-attrs<skip-non-recursive>
+ openmp-opt-cgscc
function<eager-inv;no-rerun>(
sroa<modify-cfg>
early-cse<memssa>
+ speculative-execution
+ jump-threading
+ correlated-propagation
simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
instcombine<max-iterations=1000;no-use-loop-info>
+ aggressive-instcombine
+ constraint-elimination
libcalls-shrinkwrap
+ tailcallelim
simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
reassociate
loop-mssa(
@@ -52,13 +59,23 @@ cgscc(
loop-unroll-full
)
sroa<modify-cfg>
- memcpyopt
+ vector-combine
+ mldst-motion<no-split-footer-bb>
+ gvn<>
sccp
bdce
instcombine<max-iterations=1000;no-use-loop-info>
- coro-elide
+ jump-threading
+ correlated-propagation
adce
- simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
+ memcpyopt
+ dse
+ move-auto-init
+ loop-mssa(
+ licm<allowspeculation>
+ )
+ coro-elide
+ simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>
instcombine<max-iterations=1000;no-use-loop-info>
)
function-attrs
@@ -84,13 +101,14 @@ function<eager-inv>(
)
loop-distribute
inject-tli-mappings
- loop-vectorize<no-interleave-forced-only;vectorize-forced-only;>
+ loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>
loop-load-elim
instcombine<max-iterations=1000;no-use-loop-info>
simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>
+ slp-vectorizer
vector-combine
instcombine<max-iterations=1000;no-use-loop-info>
- loop-unroll<O1>
+ loop-unroll<O2>
transform-warning
sroa<preserve-cfg>
instcombine<max-iterations=1000;no-use-loop-info> |
Code for the transformation passes: https://github.com/llvm/llvm-project/tree/release/17.x/llvm/lib/Transforms |
Some of the passes I looked at:
|
LLVM uses a Based on this blog post I'm guessing this is geared towards C++, though it's not entirely clear if the blog post talks about the same code specifically. Given that we aggressively split code into separate modules to allow parallel compilation, and that Inko's use of dynamic dispatch is a little different from C++, I suspect this pass isn't actually useful/able to do much. I also tried to look at the disassembly of OpenFlow to see what impact this pass (or the lack of it) has, but the resulting assembly and LLVM IR is just too noisy to compare in a meaningful way. Based on this, I think we should skip this pass for |
libcalls-shrinkwrap seems to very specific to C, and even then I'm not sure what benefit it actually brings. We should just ditch this one for |
transform-warning seems useless as it reports warnings related to C pragmas, which isn't relevant to Inko. |
It seems the impact of passes on timings isn't as concentrated as I thought. That is, I thought maybe a few passes were responsible for most of the compile time spent in LLVM, but this isn't the case. Instead, it comes down to roughly the following:
What this means is that we can't just disable a bunch of redundant passes in order to improve compile times, as the passes that dominate the time also happen to be important ones. |
For future references, here are the timings for Inko's standard library tests: Timings
|
This enables a set of LLVM optimization passes for the "balanced" and "aggressive" profiles. Both are based on the "default<O2>" pipeline, with the removal of some irrelevant passes. In the case of the "balanced" profile we also remove some additional passes that aren't likely to be useful in most cases. This fixes #595. Changelog: added
This enables a set of LLVM optimization passes for the "balanced" and "aggressive" profiles. Both are based on the "default<O2>" pipeline, with the removal of some irrelevant passes. In the case of the "balanced" profile we also remove some additional passes that aren't likely to be useful in most cases. This fixes #595. Changelog: added
Some additional reading about the |
This enables a set of LLVM optimization passes for the "balanced" and "aggressive" profiles. Both are based on the "default<O2>" pipeline, with the removal of some irrelevant passes. In the case of the "balanced" profile we also remove some additional passes that aren't likely to be useful in most cases. This fixes #595. Changelog: added
This enables a set of LLVM optimization passes for the "balanced" and "aggressive" profiles. Both are based on the "default<O2>" pipeline, with the removal of some irrelevant passes. In the case of the "balanced" profile we also remove some additional passes that aren't likely to be useful in most cases. This fixes #595. Changelog: added
This enables a set of LLVM optimization passes for the "balanced" and "aggressive" profiles. Both are based on the "default<O2>" pipeline, with the removal of some irrelevant passes. In the case of the "balanced" profile we also remove some additional passes that aren't likely to be useful in most cases. This fixes #595. Changelog: added
This enables a set of LLVM optimization passes for the "balanced" and "aggressive" profiles. Both are based on the "default<O2>" pipeline, with the removal of some irrelevant passes. In the case of the "balanced" profile we also remove some additional passes that aren't likely to be useful in most cases. This fixes #595. Changelog: added
This enables a set of LLVM optimization passes for the "balanced" and "aggressive" profiles. Both are based on the "default<O2>" pipeline, with the removal of some irrelevant passes. In the case of the "balanced" profile we also remove some additional passes that aren't likely to be useful in most cases. This fixes #595. Changelog: added
84830ab implements a list of passes to run when applying optimizations. This list is roughly similar to While this will increase compile times a fair bit, even for |
Right now the only optimisation pass we enable is the mem2reg pass, because that's pretty much a requirement for non-insane machine code. We deliberately don't use the O2/O3 options as they enable far too many optimisation passes, and don't give you the ability to opt-out of some of them (Swift takes a similar approach).
We should start collecting a list of what passes are worth enabling, and ideally what the compile time cost is versus the runtime improvement. The end goal is to basically enable the passes that give a decent amount of runtime performance improvements, but without slowing down compile times too much.
The text was updated successfully, but these errors were encountered: