Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out which LLVM optimisation passes are worth enabling #595

Closed
yorickpeterse opened this issue Jul 19, 2023 · 15 comments
Closed

Figure out which LLVM optimisation passes are worth enabling #595

yorickpeterse opened this issue Jul 19, 2023 · 15 comments
Assignees
Labels
accepting contributions Issues that are suitable to be worked on by anybody, not just maintainers compiler Changes related to the compiler
Milestone

Comments

@yorickpeterse
Copy link
Collaborator

Right now the only optimisation pass we enable is the mem2reg pass, because that's pretty much a requirement for non-insane machine code. We deliberately don't use the O2/O3 options as they enable far too many optimisation passes, and don't give you the ability to opt-out of some of them (Swift takes a similar approach).

We should start collecting a list of what passes are worth enabling, and ideally what the compile time cost is versus the runtime improvement. The end goal is to basically enable the passes that give a decent amount of runtime performance improvements, but without slowing down compile times too much.

@yorickpeterse yorickpeterse added accepting contributions Issues that are suitable to be worked on by anybody, not just maintainers compiler Changes related to the compiler labels Jul 19, 2023
@yorickpeterse
Copy link
Collaborator Author

From jinyus/related_post_gen#440 (comment): using OptimizationLevel::Aggressive can have a big impact on the performance compared to None. In itself this isn't surprising, because of course optimizations are beneficial. I however would like to know (somehow) which optimizations are worth enabling, rather than just enabling something as opaque as -O3.

Perhaps as a starting point we can just set that option when using inko build --aggressive, then figure out which ones to explicitly enable for regular builds.

yorickpeterse added a commit that referenced this issue Nov 17, 2023
When using `inko build --opt=aggressive`, we not set LLVM's optimization
level to "aggressive", which is the equivalent of -O3 for clang. This
gives users to ability to have their code optimized at least somewhat,
provided they're willing to deal with the significant increase in
compile times. For example, Inko's test suite takes about 3 seconds to
compile without optimizations, while taking just under 10 seconds when
using --opt=aggressive.

The option --opt=balanced still doesn't apply optimizations as we've yet
to figure out which ones we want to explicitly opt-in to.

See #595 for more details.

Changelog: performance
@yorickpeterse
Copy link
Collaborator Author

1a30de9 changes inko build such that --opt=aggressive applies the equivalent of clang's -O3. This significantly increases compile times, but it's better than nothing until we come up with our own list of passes to enable.

@yorickpeterse yorickpeterse modified the milestones: 0.18.0, 0.19.0 Oct 22, 2024
@yorickpeterse
Copy link
Collaborator Author

At leas the following passes are worth looking into more, based on playing around with them to see what effect they have:

  • instcombine
  • gvn
  • sroa (gets rid of redundant alloca instructions and their loads/stores)
  • simplifycfg (simplifies the CFG, mostly useful for debugging I think)

@yorickpeterse
Copy link
Collaborator Author

Worth adding: even with --opt=aggressive, certain methods such as Int.% aren't performing very well by the looks of it. For example, take this snippet (based on https://github.com/bddicken/languages):

import std.env (arguments)
import std.int (Format)
import std.rand (Random)
import std.stdio (Stdout)

class async Main {
  fn async main {
    let out = Stdout.new
    let rand = Random.new
    let n = Int.parse(arguments.get(0), Format.Decimal).get
    let r = rand.int_between(0, 10_000)
    let a = Array.filled(with: 0, times: 10_000)
    let mut i = 0

    while i < 10_000 {
      let mut j = 0

      while j < 100_000 {
        a.set(i, a.get(i) + (j % n))
        j += 1
      }

      a.set(i, a.get(i) + r)
      i += 1
    }

    let _ = out.print(a.get(r).to_string)
  }
}

On my laptop this takes 24 seconds to run, with about 80% of the time being spent in the code of Int.%. Oddly enough, even if I just reduce that to _INKO.int_rem() it still takes more or less the same amount of time.

I'm not sure how on earth this code is that slow, given that Rust does it in about 2.5 seconds.

@yorickpeterse
Copy link
Collaborator Author

Curiously, the above program finishes in only 3.68 seconds on my desktop. Perhaps the Intel CPU on my laptop is just really terrible at this code for some reason?

yorickpeterse added a commit that referenced this issue Nov 29, 2024
Depending on how LLVM decides to optimize things, these attributes may
help improve code generation, though it's difficult to say for certain
how much at this stage.

See #595 for more details.

Changelog: performance
yorickpeterse added a commit that referenced this issue Nov 29, 2024
Depending on how LLVM decides to optimize things, these attributes may
help improve code generation, though it's difficult to say for certain
how much at this stage.

See #595 for more details.

Changelog: performance
yorickpeterse added a commit that referenced this issue Nov 29, 2024
Depending on how LLVM decides to optimize things, these attributes may
help improve code generation, though it's difficult to say for certain
how much at this stage.

See #595 for more details.

Changelog: performance
yorickpeterse added a commit that referenced this issue Nov 29, 2024
Depending on how LLVM decides to optimize things, these attributes may
help improve code generation, though it's difficult to say for certain
how much at this stage.

See #595 for more details.

Changelog: performance
yorickpeterse added a commit that referenced this issue Nov 29, 2024
Depending on how LLVM decides to optimize things, these attributes may
help improve code generation, though it's difficult to say for certain
how much at this stage.

See #595 for more details.

Changelog: performance
yorickpeterse added a commit that referenced this issue Nov 29, 2024
Depending on how LLVM decides to optimize things, these attributes may
help improve code generation, though it's difficult to say for certain
how much at this stage.

See #595 for more details.

Changelog: performance
@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 13, 2025

The passes used by LLVM when using O2 (somewhat cleaned up):

annotation2metadata
forceattrs
inferattrs
coro-early
function<eager-inv>(
  lower-expect
  simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
  sroa<modify-cfg>
  early-cse<>
)
openmp-opt
ipsccp
called-value-propagation
globalopt
function<eager-inv>(
  mem2reg
  instcombine<max-iterations=1000;no-use-loop-info>
  simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
)
require<globals-aa>
function(
  invalidate<aa>
)
require<profile-summary>
cgscc(
  devirt<4>(
    inline<only-mandatory>
    inline
    function-attrs<skip-non-recursive>
    openmp-opt-cgscc
    function<eager-inv;no-rerun>(
      sroa<modify-cfg>
      early-cse<memssa>
      speculative-execution
      jump-threading
      correlated-propagation
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
      instcombine<max-iterations=1000;no-use-loop-info>
      aggressive-instcombine
      constraint-elimination
      libcalls-shrinkwrap
      tailcallelim
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
      reassociate
      loop-mssa(
        loop-instsimplify
        loop-simplifycfg
        licm<no-allowspeculation>
        loop-rotate<header-duplication;no-prepare-for-lto>
        licm<allowspeculation>
        simple-loop-unswitch<no-nontrivial;trivial>
      )
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
      instcombine<max-iterations=1000;no-use-loop-info>
      loop(
        loop-idiom
        indvars
        loop-deletion
        loop-unroll-full
      )
      sroa<modify-cfg>
      vector-combine
      mldst-motion<no-split-footer-bb>
      gvn<>
      sccp
      bdce
      instcombine<max-iterations=1000;no-use-loop-info>
      jump-threading
      correlated-propagation
      adce
      memcpyopt
      dse
      move-auto-init
      loop-mssa(
        licm<allowspeculation>
      )
      coro-elide
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>
      instcombine<max-iterations=1000;no-use-loop-info>
    )
    function-attrs
    function(
      require<should-not-run-function-passes>
    )
    coro-split
  )
)
deadargelim
coro-cleanup
globalopt
globaldce
elim-avail-extern
rpo-function-attrs
recompute-globalsaa
function<eager-inv>(
  float2int
  lower-constant-intrinsics
  loop(
    loop-rotate<header-duplication;no-prepare-for-lto>
    loop-deletion
  )
  loop-distribute
  inject-tli-mappings
  loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>
  loop-load-elim
  instcombine<max-iterations=1000;no-use-loop-info>
  simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>
  slp-vectorizer
  vector-combine
  instcombine<max-iterations=1000;no-use-loop-info>
  loop-unroll<O2>
  transform-warning
  sroa<preserve-cfg>
  instcombine<max-iterations=1000;no-use-loop-info>
  loop-mssa(
    licm<allowspeculation>
  )
  alignment-from-assumptions
  loop-sink
  instsimplify
  div-rem-pairs
  tailcallelim
  simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
)
globaldce
constmerge
cg-profile
rel-lookup-table-converter
function(
  annotation-remarks
)
verify
BitcodeWriterPass

The command I used for this:

opt-17 -passes='default<O2>' -print-pipeline-passes < /dev/null 2>/dev/null

And for O1:

annotation2metadata
forceattrs
inferattrs
coro-early
function<eager-inv>(
  lower-expect
  simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
  sroa<modify-cfg>
  early-cse<>
)
openmp-opt
ipsccp
called-value-propagation
globalopt
function<eager-inv>(
  mem2reg
  instcombine<max-iterations=1000;no-use-loop-info>
  simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
)
require<globals-aa>
function(
  invalidate<aa>
)
require<profile-summary>
cgscc(
  devirt<4>(
    inline<only-mandatory>
    inline
    function-attrs<skip-non-recursive>
    function<eager-inv;no-rerun>(
      sroa<modify-cfg>
      early-cse<memssa>
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
      instcombine<max-iterations=1000;no-use-loop-info>
      libcalls-shrinkwrap
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
      reassociate
      loop-mssa(
        loop-instsimplify
        loop-simplifycfg
        licm<no-allowspeculation>
        loop-rotate<header-duplication;no-prepare-for-lto>
        licm<allowspeculation>
        simple-loop-unswitch<no-nontrivial;trivial>
      )
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
      instcombine<max-iterations=1000;no-use-loop-info>
      loop(
        loop-idiom
        indvars
        loop-deletion
        loop-unroll-full
      )
      sroa<modify-cfg>
      memcpyopt
      sccp
      bdce
      instcombine<max-iterations=1000;no-use-loop-info>
      coro-elide
      adce
      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
      instcombine<max-iterations=1000;no-use-loop-info>
    )
    function-attrs
    function(
      require<should-not-run-function-passes>
    )
    coro-split
  )
)
deadargelim
coro-cleanup
globalopt
globaldce
elim-avail-extern
rpo-function-attrs
recompute-globalsaa
function<eager-inv>(
  float2int
  lower-constant-intrinsics
  loop(
    loop-rotate<header-duplication;no-prepare-for-lto>
    loop-deletion
  )
  loop-distribute
  inject-tli-mappings
  loop-vectorize<no-interleave-forced-only;vectorize-forced-only;>
  loop-load-elim
  instcombine<max-iterations=1000;no-use-loop-info>
  simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>
  vector-combine
  instcombine<max-iterations=1000;no-use-loop-info>
  loop-unroll<O1>
  transform-warning
  sroa<preserve-cfg>
  instcombine<max-iterations=1000;no-use-loop-info>
  loop-mssa(
    licm<allowspeculation>
  )
  alignment-from-assumptions
  loop-sink
  instsimplify
  div-rem-pairs
  tailcallelim
  simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
)
globaldce
constmerge
cg-profile
rel-lookup-table-converter
function(
  annotation-remarks
)
verify
BitcodeWriterPass

The diff:

diff --git a/tmp/o1.txt b/tmp/o2.txt
index a81189c..ba359e6 100644
--- a/tmp/o1.txt
+++ b/tmp/o2.txt
@@ -27,12 +27,19 @@ cgscc(
     inline<only-mandatory>
     inline
     function-attrs<skip-non-recursive>
+    openmp-opt-cgscc
     function<eager-inv;no-rerun>(
       sroa<modify-cfg>
       early-cse<memssa>
+      speculative-execution
+      jump-threading
+      correlated-propagation
       simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
       instcombine<max-iterations=1000;no-use-loop-info>
+      aggressive-instcombine
+      constraint-elimination
       libcalls-shrinkwrap
+      tailcallelim
       simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
       reassociate
       loop-mssa(
@@ -52,13 +59,23 @@ cgscc(
         loop-unroll-full
       )
       sroa<modify-cfg>
-      memcpyopt
+      vector-combine
+      mldst-motion<no-split-footer-bb>
+      gvn<>
       sccp
       bdce
       instcombine<max-iterations=1000;no-use-loop-info>
-      coro-elide
+      jump-threading
+      correlated-propagation
       adce
-      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>
+      memcpyopt
+      dse
+      move-auto-init
+      loop-mssa(
+        licm<allowspeculation>
+      )
+      coro-elide
+      simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>
       instcombine<max-iterations=1000;no-use-loop-info>
     )
     function-attrs
@@ -84,13 +101,14 @@ function<eager-inv>(
   )
   loop-distribute
   inject-tli-mappings
-  loop-vectorize<no-interleave-forced-only;vectorize-forced-only;>
+  loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>
   loop-load-elim
   instcombine<max-iterations=1000;no-use-loop-info>
   simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>
+  slp-vectorizer
   vector-combine
   instcombine<max-iterations=1000;no-use-loop-info>
-  loop-unroll<O1>
+  loop-unroll<O2>
   transform-warning
   sroa<preserve-cfg>
   instcombine<max-iterations=1000;no-use-loop-info>

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 13, 2025

@yorickpeterse yorickpeterse modified the milestones: 0.19.0, 0.18.0 Jan 13, 2025
@yorickpeterse yorickpeterse self-assigned this Jan 13, 2025
@yorickpeterse
Copy link
Collaborator Author

Some of the passes I looked at:

  • annotation2metadata: not relevant
  • forceattrs: not relevant
  • inferattrs: seems geared towards C based on this file.
    It does also handle some C function that we use such as memcpy and
    realloc. I'm not able to measure any compile time impact of using this pass,
    so we should just enable it.
  • coro-early: not relevant as we don't have coroutines
  • lower-expect: output might be used by other passes, enable it
  • simplifycfg: applies a bunch of optimizations, seems important so we should enable this
    • This pass runs multiple times when using 02, using different arguments
      (e.g. turning switches into lookup tables isn't done until the last run).
      Perhaps we can just run it once towards the end?
  • sroa: important, enable
  • early-cse: enable
  • openmp-opt: not relevant
  • ipsccp: enable
  • called-value-propagation: not sure how relevant this is. Also enabled for O1, so it can't be too bad.
  • globalopt: probably useful since we don't allow pointers to globals
  • instcombine: enable, increases compile times by about 20% (at least if no other passes are also enabled)
  • require<globals-aa>: enable, doesn't seem to add overhead
  • invalidate<aa>: no idea
  • profile-summary: used for PGO, which we don't support at this time and probably won't for quite a while
  • cgscc: seems important, enable based on what 02 uses

@yorickpeterse
Copy link
Collaborator Author

LLVM uses a devirt pass manager/thing that basically runs a set of passes N times as part of the cgscc pass manager/thing. The idea is to run a bunch of passes each time an indirect call is turned into a direct call, with an upper bound.

Based on this blog post I'm guessing this is geared towards C++, though it's not entirely clear if the blog post talks about the same code specifically.

Given that we aggressively split code into separate modules to allow parallel compilation, and that Inko's use of dynamic dispatch is a little different from C++, I suspect this pass isn't actually useful/able to do much.

I also tried to look at the disassembly of OpenFlow to see what impact this pass (or the lack of it) has, but the resulting assembly and LLVM IR is just too noisy to compare in a meaningful way.

Based on this, I think we should skip this pass for --opt=balanced.

@yorickpeterse
Copy link
Collaborator Author

libcalls-shrinkwrap seems to very specific to C, and even then I'm not sure what benefit it actually brings. We should just ditch this one for --opt=balanced. Some more details on what "shrink wrapping" is is found here.

@yorickpeterse
Copy link
Collaborator Author

transform-warning seems useless as it reports warnings related to C pragmas, which isn't relevant to Inko.

@yorickpeterse
Copy link
Collaborator Author

It seems the impact of passes on timings isn't as concentrated as I thought. That is, I thought maybe a few passes were responsible for most of the compile time spent in LLVM, but this isn't the case. Instead, it comes down to roughly the following:

  • The more passes we run, the less machine code needs to be generated and thus less time is spent in the code generator
  • The fewer passes we run, the more time is spent in generating machine code (i.e. the inverse)
  • For Inko's standard library, about 35% of the time is spent in the InstCombinePass pass, followed by 10% in the SROAPass pass, with the remainder being everything else (i.e. every other pass takes up 1-5%)
  • When generating machine code, about 45% of the time is spent in X86 DAG->DAG Instruction Selection

What this means is that we can't just disable a bunch of redundant passes in order to improve compile times, as the passes that dominate the time also happen to be important ones.

@yorickpeterse
Copy link
Collaborator Author

For future references, here are the timings for Inko's standard library tests:

Timings
===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 32.3530 seconds (32.4957 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  10.1155 ( 33.0%)   0.3896 ( 22.6%)  10.5050 ( 32.5%)  10.5517 ( 32.5%)  InstCombinePass
   3.2377 ( 10.6%)   0.1953 ( 11.3%)   3.4330 ( 10.6%)   3.4495 ( 10.6%)  SROAPass
   1.9165 (  6.3%)   0.1683 (  9.7%)   2.0847 (  6.4%)   2.0933 (  6.4%)  IPSCCPPass
   1.8345 (  6.0%)   0.1166 (  6.8%)   1.9511 (  6.0%)   1.9605 (  6.0%)  SLPVectorizerPass
   1.5277 (  5.0%)   0.0848 (  4.9%)   1.6125 (  5.0%)   1.6198 (  5.0%)  EarlyCSEPass
   1.5254 (  5.0%)   0.0549 (  3.2%)   1.5803 (  4.9%)   1.5871 (  4.9%)  DSEPass
   1.2514 (  4.1%)   0.0436 (  2.5%)   1.2950 (  4.0%)   1.3010 (  4.0%)  GVNPass
   1.1480 (  3.7%)   0.0987 (  5.7%)   1.2466 (  3.9%)   1.2516 (  3.9%)  CalledValuePropagationPass
   1.0705 (  3.5%)   0.0647 (  3.7%)   1.1353 (  3.5%)   1.1392 (  3.5%)  SimplifyCFGPass
   0.9770 (  3.2%)   0.1169 (  6.8%)   1.0938 (  3.4%)   1.0993 (  3.4%)  InlinerPass
   0.6547 (  2.1%)   0.0000 (  0.0%)   0.6547 (  2.0%)   0.6572 (  2.0%)  GlobalOptPass
   0.4710 (  1.5%)   0.0255 (  1.5%)   0.4965 (  1.5%)   0.4983 (  1.5%)  JumpThreadingPass
   0.4318 (  1.4%)   0.0265 (  1.5%)   0.4584 (  1.4%)   0.4599 (  1.4%)  CorrelatedValuePropagationPass
   0.2944 (  1.0%)   0.0337 (  2.0%)   0.3281 (  1.0%)   0.3298 (  1.0%)  PostOrderFunctionAttrsPass
   0.2842 (  0.9%)   0.0106 (  0.6%)   0.2949 (  0.9%)   0.2960 (  0.9%)  SCCPPass
   0.2321 (  0.8%)   0.0093 (  0.5%)   0.2413 (  0.7%)   0.2424 (  0.7%)  MemCpyOptPass
   0.2286 (  0.7%)   0.0117 (  0.7%)   0.2402 (  0.7%)   0.2411 (  0.7%)  ReassociatePass
   0.2240 (  0.7%)   0.0138 (  0.8%)   0.2377 (  0.7%)   0.2388 (  0.7%)  ADCEPass
   0.2234 (  0.7%)   0.0078 (  0.5%)   0.2312 (  0.7%)   0.2322 (  0.7%)  LICMPass
   0.2156 (  0.7%)   0.0099 (  0.6%)   0.2255 (  0.7%)   0.2266 (  0.7%)  BDCEPass
   0.1704 (  0.6%)   0.0356 (  2.1%)   0.2060 (  0.6%)   0.2073 (  0.6%)  LoopDistributePass
   0.1801 (  0.6%)   0.0170 (  1.0%)   0.1970 (  0.6%)   0.1978 (  0.6%)  ConstraintEliminationPass
   0.1668 (  0.5%)   0.0108 (  0.6%)   0.1776 (  0.5%)   0.1785 (  0.5%)  LCSSAPass
   0.1444 (  0.5%)   0.0214 (  1.2%)   0.1659 (  0.5%)   0.1670 (  0.5%)  Float2IntPass
   0.1349 (  0.4%)   0.0210 (  1.2%)   0.1559 (  0.5%)   0.1573 (  0.5%)  LoopSimplifyPass
   0.1246 (  0.4%)   0.0143 (  0.8%)   0.1389 (  0.4%)   0.1391 (  0.4%)  InferAlignmentPass
   0.1222 (  0.4%)   0.0090 (  0.5%)   0.1312 (  0.4%)   0.1313 (  0.4%)  TailCallElimPass
   0.1231 (  0.4%)   0.0067 (  0.4%)   0.1297 (  0.4%)   0.1299 (  0.4%)  InstSimplifyPass
   0.1144 (  0.4%)   0.0149 (  0.9%)   0.1293 (  0.4%)   0.1299 (  0.4%)  ReversePostOrderFunctionAttrsPass
   0.1247 (  0.4%)   0.0000 (  0.0%)   0.1247 (  0.4%)   0.1254 (  0.4%)  RecomputeGlobalsAAPass
   0.1213 (  0.4%)   0.0000 (  0.0%)   0.1213 (  0.4%)   0.1217 (  0.4%)  RequireAnalysisPass<llvm::GlobalsAA, llvm::Module, llvm::AnalysisManager<Module>>
   0.1148 (  0.4%)   0.0000 (  0.0%)   0.1148 (  0.4%)   0.1153 (  0.4%)  DeadArgumentEliminationPass
   0.1067 (  0.3%)   0.0056 (  0.3%)   0.1123 (  0.3%)   0.1128 (  0.3%)  AggressiveInstCombinePass
   0.1028 (  0.3%)   0.0057 (  0.3%)   0.1086 (  0.3%)   0.1090 (  0.3%)  LoopUnrollPass
   0.0918 (  0.3%)   0.0000 (  0.0%)   0.0918 (  0.3%)   0.0922 (  0.3%)  GlobalDCEPass
   0.0819 (  0.3%)   0.0000 (  0.0%)   0.0819 (  0.3%)   0.0822 (  0.3%)  AlwaysInlinerPass
   0.0679 (  0.2%)   0.0025 (  0.1%)   0.0704 (  0.2%)   0.0707 (  0.2%)  IndVarSimplifyPass
   0.0644 (  0.2%)   0.0059 (  0.3%)   0.0703 (  0.2%)   0.0705 (  0.2%)  VectorCombinePass
   0.0623 (  0.2%)   0.0033 (  0.2%)   0.0656 (  0.2%)   0.0658 (  0.2%)  LoopRotatePass
   0.0570 (  0.2%)   0.0051 (  0.3%)   0.0622 (  0.2%)   0.0623 (  0.2%)  LoopDeletionPass
   0.0411 (  0.1%)   0.0030 (  0.2%)   0.0441 (  0.1%)   0.0443 (  0.1%)  LowerExpectIntrinsicPass
   0.0345 (  0.1%)   0.0014 (  0.1%)   0.0359 (  0.1%)   0.0360 (  0.1%)  LoopIdiomRecognizePass
   0.0286 (  0.1%)   0.0042 (  0.2%)   0.0328 (  0.1%)   0.0328 (  0.1%)  LowerConstantIntrinsicsPass
   0.0286 (  0.1%)   0.0013 (  0.1%)   0.0299 (  0.1%)   0.0299 (  0.1%)  LoopFullUnrollPass
   0.0270 (  0.1%)   0.0018 (  0.1%)   0.0287 (  0.1%)   0.0291 (  0.1%)  PromotePass
   0.0208 (  0.1%)   0.0061 (  0.4%)   0.0269 (  0.1%)   0.0269 (  0.1%)  InvalidateAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis>
   0.0116 (  0.0%)   0.0138 (  0.8%)   0.0253 (  0.1%)   0.0254 (  0.1%)  AnnotationRemarksPass
   0.0243 (  0.1%)   0.0000 (  0.0%)   0.0243 (  0.1%)   0.0244 (  0.1%)  ConstantMergePass
   0.0212 (  0.1%)   0.0030 (  0.2%)   0.0241 (  0.1%)   0.0244 (  0.1%)  AlignmentFromAssumptionsPass
   0.0231 (  0.1%)   0.0000 (  0.0%)   0.0231 (  0.1%)   0.0232 (  0.1%)  CGProfilePass
   0.0193 (  0.1%)   0.0034 (  0.2%)   0.0227 (  0.1%)   0.0230 (  0.1%)  RequireAnalysisPass<llvm::ShouldNotRunFunctionPassesAnalysis, llvm::Function, llvm::AnalysisManager<Function>>
   0.0187 (  0.1%)   0.0029 (  0.2%)   0.0216 (  0.1%)   0.0217 (  0.1%)  LoopVectorizePass
   0.0186 (  0.1%)   0.0029 (  0.2%)   0.0215 (  0.1%)   0.0215 (  0.1%)  InjectTLIMappings
   0.0188 (  0.1%)   0.0009 (  0.1%)   0.0198 (  0.1%)   0.0198 (  0.1%)  LoopInstSimplifyPass
   0.0164 (  0.1%)   0.0016 (  0.1%)   0.0180 (  0.1%)   0.0180 (  0.1%)  LibCallsShrinkWrapPass
   0.0165 (  0.1%)   0.0000 (  0.0%)   0.0165 (  0.1%)   0.0166 (  0.1%)  AssignmentTrackingPass
   0.0114 (  0.0%)   0.0048 (  0.3%)   0.0162 (  0.0%)   0.0161 (  0.0%)  InvalidateAnalysisPass<llvm::AAManager>
   0.0139 (  0.0%)   0.0015 (  0.1%)   0.0154 (  0.0%)   0.0155 (  0.0%)  DivRemPairsPass
   0.0103 (  0.0%)   0.0013 (  0.1%)   0.0116 (  0.0%)   0.0118 (  0.0%)  MoveAutoInitPass
   0.0091 (  0.0%)   0.0017 (  0.1%)   0.0108 (  0.0%)   0.0108 (  0.0%)  LoopLoadEliminationPass
   0.0090 (  0.0%)   0.0013 (  0.1%)   0.0103 (  0.0%)   0.0104 (  0.0%)  MergedLoadStoreMotionPass
   0.0080 (  0.0%)   0.0013 (  0.1%)   0.0093 (  0.0%)   0.0094 (  0.0%)  CoroSplitPass
   0.0076 (  0.0%)   0.0014 (  0.1%)   0.0089 (  0.0%)   0.0091 (  0.0%)  SpeculativeExecutionPass
   0.0074 (  0.0%)   0.0013 (  0.1%)   0.0088 (  0.0%)   0.0088 (  0.0%)  OpenMPOptCGSCCPass
   0.0074 (  0.0%)   0.0014 (  0.1%)   0.0087 (  0.0%)   0.0088 (  0.0%)  WarnMissedTransformationsPass
   0.0071 (  0.0%)   0.0012 (  0.1%)   0.0084 (  0.0%)   0.0085 (  0.0%)  CoroElidePass
   0.0073 (  0.0%)   0.0005 (  0.0%)   0.0078 (  0.0%)   0.0082 (  0.0%)  EntryExitInstrumenterPass
   0.0067 (  0.0%)   0.0013 (  0.1%)   0.0080 (  0.0%)   0.0080 (  0.0%)  LoopSinkPass
   0.0066 (  0.0%)   0.0004 (  0.0%)   0.0069 (  0.0%)   0.0070 (  0.0%)  LoopSimplifyCFGPass
   0.0043 (  0.0%)   0.0003 (  0.0%)   0.0045 (  0.0%)   0.0046 (  0.0%)  SimpleLoopUnswitchPass
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  InferFunctionAttrsPass
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  EliminateAvailableExternallyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  OpenMPOptPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  CoroEarlyPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  RelLookupTableConverterPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Annotation2MetadataPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  CoroCleanupPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  RequireAnalysisPass<llvm::ProfileSummaryAnalysis, llvm::Module, llvm::AnalysisManager<Module>>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  ForceFunctionAttrsPass
  30.6262 (100.0%)   1.7268 (100.0%)  32.3530 (100.0%)  32.4957 (100.0%)  Total

===-------------------------------------------------------------------------===
                        Analysis execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 1.8587 seconds (1.8610 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.4144 ( 24.6%)   0.0267 ( 15.3%)   0.4410 ( 23.7%)   0.4426 ( 23.8%)  MemorySSAAnalysis
   0.3073 ( 18.2%)   0.0311 ( 17.9%)   0.3384 ( 18.2%)   0.3390 ( 18.2%)  DominatorTreeAnalysis
   0.2154 ( 12.8%)   0.0000 (  0.0%)   0.2154 ( 11.6%)   0.2164 ( 11.6%)  CallGraphAnalysis
   0.1220 (  7.2%)   0.0178 ( 10.2%)   0.1398 (  7.5%)   0.1394 (  7.5%)  AAManager
   0.0994 (  5.9%)   0.0118 (  6.8%)   0.1112 (  6.0%)   0.1109 (  6.0%)  LoopAnalysis
   0.0855 (  5.1%)   0.0092 (  5.3%)   0.0947 (  5.1%)   0.0948 (  5.1%)  PostDominatorTreeAnalysis
   0.0699 (  4.2%)   0.0101 (  5.8%)   0.0800 (  4.3%)   0.0796 (  4.3%)  ScalarEvolutionAnalysis
   0.0659 (  3.9%)   0.0101 (  5.8%)   0.0760 (  4.1%)   0.0761 (  4.1%)  BranchProbabilityAnalysis
   0.0435 (  2.6%)   0.0065 (  3.7%)   0.0500 (  2.7%)   0.0503 (  2.7%)  BlockFrequencyAnalysis
   0.0341 (  2.0%)   0.0055 (  3.1%)   0.0395 (  2.1%)   0.0394 (  2.1%)  BasicAA
   0.0174 (  1.0%)   0.0094 (  5.4%)   0.0268 (  1.4%)   0.0269 (  1.4%)  TargetLibraryAnalysis
   0.0214 (  1.3%)   0.0041 (  2.4%)   0.0255 (  1.4%)   0.0255 (  1.4%)  FunctionAnalysisManagerCGSCCProxy
   0.0200 (  1.2%)   0.0032 (  1.8%)   0.0232 (  1.2%)   0.0232 (  1.2%)  AssumptionAnalysis
   0.0169 (  1.0%)   0.0029 (  1.7%)   0.0199 (  1.1%)   0.0192 (  1.0%)  TypeBasedAA
   0.0154 (  0.9%)   0.0032 (  1.8%)   0.0186 (  1.0%)   0.0186 (  1.0%)  LoopAccessAnalysis
   0.0163 (  1.0%)   0.0021 (  1.2%)   0.0184 (  1.0%)   0.0184 (  1.0%)  TargetIRAnalysis
   0.0156 (  0.9%)   0.0024 (  1.4%)   0.0179 (  1.0%)   0.0181 (  1.0%)  OptimizationRemarkEmitterAnalysis
   0.0143 (  0.8%)   0.0026 (  1.5%)   0.0169 (  0.9%)   0.0171 (  0.9%)  DemandedBitsAnalysis
   0.0135 (  0.8%)   0.0024 (  1.4%)   0.0159 (  0.9%)   0.0161 (  0.9%)  LazyValueAnalysis
   0.0142 (  0.8%)   0.0020 (  1.2%)   0.0162 (  0.9%)   0.0160 (  0.9%)  OuterAnalysisManagerProxy<ModuleAnalysisManager, Function>
   0.0151 (  0.9%)   0.0000 (  0.0%)   0.0151 (  0.8%)   0.0152 (  0.8%)  GlobalsAA
   0.0110 (  0.7%)   0.0016 (  0.9%)   0.0127 (  0.7%)   0.0126 (  0.7%)  ScopedNoAliasAA
   0.0079 (  0.5%)   0.0047 (  2.7%)   0.0125 (  0.7%)   0.0124 (  0.7%)  LazyCallGraphAnalysis
   0.0096 (  0.6%)   0.0016 (  0.9%)   0.0112 (  0.6%)   0.0112 (  0.6%)  MemoryDependenceAnalysis
   0.0090 (  0.5%)   0.0015 (  0.8%)   0.0105 (  0.6%)   0.0104 (  0.6%)  OuterAnalysisManagerProxy<ModuleAnalysisManager, LazyCallGraph::SCC, LazyCallGraph &>
   0.0057 (  0.3%)   0.0010 (  0.6%)   0.0067 (  0.4%)   0.0067 (  0.4%)  ShouldNotRunFunctionPassesAnalysis
   0.0033 (  0.2%)   0.0003 (  0.2%)   0.0035 (  0.2%)   0.0036 (  0.2%)  InnerAnalysisManagerProxy<LoopAnalysisManager, Function>
   0.0012 (  0.1%)   0.0001 (  0.0%)   0.0012 (  0.1%)   0.0012 (  0.1%)  OuterAnalysisManagerProxy<FunctionAnalysisManager, Loop, LoopStandardAnalysisResults &>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  ShouldRunExtraSimpleLoopUnswitch
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  InnerAnalysisManagerProxy<FunctionAnalysisManager, Module>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  InnerAnalysisManagerProxy<CGSCCAnalysisManager, Module>
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  ProfileSummaryAnalysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  InlineAdvisorAnalysis
   1.6850 (100.0%)   0.1738 (100.0%)   1.8587 (100.0%)   1.8610 (100.0%)  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  56.7659 (100.0%)   6.8341 (100.0%)  63.6001 (100.0%)  63.9357 (100.0%)  Code Generation Time
  56.7659 (100.0%)   6.8341 (100.0%)  63.6001 (100.0%)  63.9357 (100.0%)  Total

===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 1.1642 seconds (1.1694 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.6182 ( 56.7%)   0.0417 ( 57.0%)   0.6599 ( 56.7%)   0.6628 ( 56.7%)  Global Splitting
   0.1644 ( 15.1%)   0.0114 ( 15.6%)   0.1758 ( 15.1%)   0.1777 ( 15.2%)  Evict
   0.1460 ( 13.4%)   0.0044 (  6.0%)   0.1504 ( 12.9%)   0.1508 ( 12.9%)  Local Splitting
   0.1368 ( 12.5%)   0.0111 ( 15.1%)   0.1479 ( 12.7%)   0.1474 ( 12.6%)  Spiller
   0.0256 (  2.4%)   0.0046 (  6.2%)   0.0302 (  2.6%)   0.0306 (  2.6%)  Seed Live Regs
   1.0911 (100.0%)   0.0731 (100.0%)   1.1642 (100.0%)   1.1694 (100.0%)  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 7.8835 seconds (7.9081 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   1.4450 ( 21.4%)   0.2327 ( 20.6%)   1.6777 ( 21.3%)   1.6937 ( 21.4%)  DAG Combining 1
   1.2619 ( 18.7%)   0.2108 ( 18.6%)   1.4727 ( 18.7%)   1.4765 ( 18.7%)  Instruction Selection
   0.8683 ( 12.9%)   0.1277 ( 11.3%)   0.9960 ( 12.6%)   0.9973 ( 12.6%)  DAG Combining 2
   0.8247 ( 12.2%)   0.1445 ( 12.8%)   0.9691 ( 12.3%)   0.9709 ( 12.3%)  Instruction Scheduling
   0.6457 (  9.6%)   0.1189 ( 10.5%)   0.7646 (  9.7%)   0.7640 (  9.7%)  Instruction Creation
   0.5889 (  8.7%)   0.0878 (  7.8%)   0.6767 (  8.6%)   0.6777 (  8.6%)  DAG Combining after legalize types
   0.4701 (  7.0%)   0.0847 (  7.5%)   0.5549 (  7.0%)   0.5537 (  7.0%)  DAG Legalization
   0.4244 (  6.3%)   0.0831 (  7.4%)   0.5076 (  6.4%)   0.5080 (  6.4%)  Type Legalization
   0.1275 (  1.9%)   0.0205 (  1.8%)   0.1480 (  1.9%)   0.1492 (  1.9%)  Vector Legalization
   0.0949 (  1.4%)   0.0194 (  1.7%)   0.1143 (  1.5%)   0.1153 (  1.5%)  Instruction Scheduling Cleanup
   0.0014 (  0.0%)   0.0003 (  0.0%)   0.0018 (  0.0%)   0.0018 (  0.0%)  DAG Combining after legalize vectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type Legalization 2
   6.7530 (100.0%)   1.1306 (100.0%)   7.8835 (100.0%)   7.9081 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 26.2099 seconds (26.3309 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   9.5242 ( 42.9%)   1.6887 ( 42.2%)  11.2129 ( 42.8%)  11.2731 ( 42.8%)  X86 DAG->DAG Instruction Selection
   1.9193 (  8.6%)   0.1586 (  4.0%)   2.0779 (  7.9%)   2.0883 (  7.9%)  Greedy Register Allocator #2
   1.1489 (  5.2%)   0.1237 (  3.1%)   1.2726 (  4.9%)   1.2782 (  4.9%)  Live DEBUG_VALUE analysis
   0.9053 (  4.1%)   0.1012 (  2.5%)   1.0064 (  3.8%)   1.0111 (  3.8%)  Machine Instruction Scheduler
   0.5543 (  2.5%)   0.1039 (  2.6%)   0.6581 (  2.5%)   0.6602 (  2.5%)  X86 Assembly Printer
   0.4869 (  2.2%)   0.1382 (  3.5%)   0.6251 (  2.4%)   0.6277 (  2.4%)  CodeGen Prepare
   0.5113 (  2.3%)   0.0795 (  2.0%)   0.5908 (  2.3%)   0.5936 (  2.3%)  Live Variable Analysis
   0.4442 (  2.0%)   0.0451 (  1.1%)   0.4892 (  1.9%)   0.4916 (  1.9%)  Register Coalescer
   0.3864 (  1.7%)   0.0492 (  1.2%)   0.4356 (  1.7%)   0.4372 (  1.7%)  ReachingDefAnalysis
   0.3605 (  1.6%)   0.0279 (  0.7%)   0.3884 (  1.5%)   0.3902 (  1.5%)  Control Flow Optimizer
   0.3297 (  1.5%)   0.0372 (  0.9%)   0.3669 (  1.4%)   0.3682 (  1.4%)  Live Interval Analysis
   0.2821 (  1.3%)   0.0457 (  1.1%)   0.3278 (  1.3%)   0.3293 (  1.3%)  Machine Common Subexpression Elimination
   0.1847 (  0.8%)   0.0221 (  0.6%)   0.2068 (  0.8%)   0.2076 (  0.8%)  Machine code sinking
   0.1681 (  0.8%)   0.0245 (  0.6%)   0.1926 (  0.7%)   0.1933 (  0.7%)  Machine Copy Propagation Pass
   0.1549 (  0.7%)   0.0239 (  0.6%)   0.1788 (  0.7%)   0.1796 (  0.7%)  Two-Address instruction pass
   0.1502 (  0.7%)   0.0198 (  0.5%)   0.1700 (  0.6%)   0.1707 (  0.6%)  Virtual Register Rewriter
   0.1527 (  0.7%)   0.0172 (  0.4%)   0.1699 (  0.6%)   0.1706 (  0.6%)  Branch Probability Basic Block Placement
   0.1376 (  0.6%)   0.0273 (  0.7%)   0.1649 (  0.6%)   0.1652 (  0.6%)  Peephole Optimizations
   0.1363 (  0.6%)   0.0270 (  0.7%)   0.1634 (  0.6%)   0.1640 (  0.6%)  Remove dead machine instructions
   0.1200 (  0.5%)   0.0177 (  0.4%)   0.1377 (  0.5%)   0.1382 (  0.5%)  Machine Copy Propagation Pass #2
   0.1092 (  0.5%)   0.0252 (  0.6%)   0.1344 (  0.5%)   0.1346 (  0.5%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0975 (  0.4%)   0.0179 (  0.4%)   0.1154 (  0.4%)   0.1158 (  0.4%)  Live Range Shrink
   0.1008 (  0.5%)   0.0134 (  0.3%)   0.1142 (  0.4%)   0.1148 (  0.4%)  Eliminate PHI nodes for register allocation
   0.0774 (  0.3%)   0.0252 (  0.6%)   0.1026 (  0.4%)   0.1028 (  0.4%)  Constant Hoisting
   0.0860 (  0.4%)   0.0150 (  0.4%)   0.1010 (  0.4%)   0.1014 (  0.4%)  X86 Byte/Word Instruction Fixup
   0.0782 (  0.4%)   0.0163 (  0.4%)   0.0945 (  0.4%)   0.0949 (  0.4%)  Machine InstCombiner
   0.0780 (  0.4%)   0.0151 (  0.4%)   0.0931 (  0.4%)   0.0935 (  0.4%)  Remove dead machine instructions #2
   0.0656 (  0.3%)   0.0270 (  0.7%)   0.0926 (  0.4%)   0.0929 (  0.4%)  Branch Probability Analysis
   0.0778 (  0.4%)   0.0104 (  0.3%)   0.0882 (  0.3%)   0.0886 (  0.3%)  Debug Variable Analysis
   0.0709 (  0.3%)   0.0141 (  0.4%)   0.0849 (  0.3%)   0.0853 (  0.3%)  Machine Late Instructions Cleanup Pass
   0.0539 (  0.2%)   0.0225 (  0.6%)   0.0764 (  0.3%)   0.0766 (  0.3%)  Post-Dominator Tree Construction
   0.0558 (  0.3%)   0.0190 (  0.5%)   0.0747 (  0.3%)   0.0751 (  0.3%)  ObjC ARC contraction
   0.0637 (  0.3%)   0.0097 (  0.2%)   0.0734 (  0.3%)   0.0733 (  0.3%)  Slot index numbering #2
   0.0509 (  0.2%)   0.0215 (  0.5%)   0.0725 (  0.3%)   0.0729 (  0.3%)  Expand large div/rem
   0.0602 (  0.3%)   0.0108 (  0.3%)   0.0710 (  0.3%)   0.0713 (  0.3%)  X86 Execution Dependency Fix
   0.0604 (  0.3%)   0.0102 (  0.3%)   0.0706 (  0.3%)   0.0709 (  0.3%)  Merge disjoint stack slots
   0.0538 (  0.2%)   0.0150 (  0.4%)   0.0688 (  0.3%)   0.0690 (  0.3%)  Branch Probability Analysis #2
   0.0553 (  0.2%)   0.0127 (  0.3%)   0.0681 (  0.3%)   0.0680 (  0.3%)  MachinePostDominator Tree Construction
   0.0480 (  0.2%)   0.0197 (  0.5%)   0.0678 (  0.3%)   0.0680 (  0.3%)  Block Frequency Analysis
   0.0569 (  0.3%)   0.0106 (  0.3%)   0.0675 (  0.3%)   0.0675 (  0.3%)  MachinePostDominator Tree Construction #2
   0.0476 (  0.2%)   0.0186 (  0.5%)   0.0662 (  0.3%)   0.0662 (  0.3%)  Dominator Tree Construction #2
   0.0558 (  0.3%)   0.0097 (  0.2%)   0.0655 (  0.2%)   0.0655 (  0.2%)  MachineDominator Tree Construction #6
   0.0546 (  0.2%)   0.0063 (  0.2%)   0.0609 (  0.2%)   0.0610 (  0.2%)  Stack Slot Coloring
   0.0468 (  0.2%)   0.0134 (  0.3%)   0.0602 (  0.2%)   0.0603 (  0.2%)  Post-Dominator Tree Construction #2
   0.0436 (  0.2%)   0.0163 (  0.4%)   0.0599 (  0.2%)   0.0598 (  0.2%)  Dominator Tree Construction
   0.0512 (  0.2%)   0.0086 (  0.2%)   0.0597 (  0.2%)   0.0597 (  0.2%)  Machine Dominance Frontier Construction
   0.0466 (  0.2%)   0.0119 (  0.3%)   0.0585 (  0.2%)   0.0587 (  0.2%)  MachineDominator Tree Construction
   0.0479 (  0.2%)   0.0101 (  0.3%)   0.0580 (  0.2%)   0.0581 (  0.2%)  Slot index numbering
   0.0454 (  0.2%)   0.0091 (  0.2%)   0.0545 (  0.2%)   0.0545 (  0.2%)  MachinePostDominator Tree Construction #3
   0.0434 (  0.2%)   0.0085 (  0.2%)   0.0519 (  0.2%)   0.0520 (  0.2%)  Machine Block Frequency Analysis #3
   0.0396 (  0.2%)   0.0115 (  0.3%)   0.0511 (  0.2%)   0.0511 (  0.2%)  Machine Block Frequency Analysis
   0.0404 (  0.2%)   0.0106 (  0.3%)   0.0510 (  0.2%)   0.0510 (  0.2%)  Dominator Tree Construction #3
   0.0391 (  0.2%)   0.0083 (  0.2%)   0.0474 (  0.2%)   0.0476 (  0.2%)  Early Machine Loop Invariant Code Motion
   0.0392 (  0.2%)   0.0079 (  0.2%)   0.0472 (  0.2%)   0.0471 (  0.2%)  MachineDominator Tree Construction #5
   0.0351 (  0.2%)   0.0107 (  0.3%)   0.0458 (  0.2%)   0.0458 (  0.2%)  Shrink Wrapping analysis
   0.0380 (  0.2%)   0.0077 (  0.2%)   0.0456 (  0.2%)   0.0457 (  0.2%)  X86 LEA Optimize
   0.0366 (  0.2%)   0.0087 (  0.2%)   0.0453 (  0.2%)   0.0454 (  0.2%)  BreakFalseDeps
   0.0378 (  0.2%)   0.0076 (  0.2%)   0.0453 (  0.2%)   0.0453 (  0.2%)  MachineDominator Tree Construction #7
   0.0363 (  0.2%)   0.0086 (  0.2%)   0.0449 (  0.2%)   0.0449 (  0.2%)  MachineDominator Tree Construction #2
   0.0361 (  0.2%)   0.0076 (  0.2%)   0.0437 (  0.2%)   0.0438 (  0.2%)  Machine Block Frequency Analysis #4
   0.0343 (  0.2%)   0.0092 (  0.2%)   0.0435 (  0.2%)   0.0438 (  0.2%)  Free MachineFunction
   0.0333 (  0.1%)   0.0078 (  0.2%)   0.0411 (  0.2%)   0.0410 (  0.2%)  MachineDominator Tree Construction #3
   0.0334 (  0.2%)   0.0074 (  0.2%)   0.0408 (  0.2%)   0.0408 (  0.2%)  Machine Block Frequency Analysis #5
   0.0321 (  0.1%)   0.0081 (  0.2%)   0.0402 (  0.2%)   0.0402 (  0.2%)  Check CFA info and insert CFI instructions if needed
   0.0282 (  0.1%)   0.0121 (  0.3%)   0.0403 (  0.2%)   0.0401 (  0.2%)  Scalar Evolution Analysis
   0.0276 (  0.1%)   0.0121 (  0.3%)   0.0397 (  0.2%)   0.0399 (  0.2%)  Natural Loop Information
   0.0326 (  0.1%)   0.0075 (  0.2%)   0.0400 (  0.2%)   0.0399 (  0.2%)  MachineDominator Tree Construction #4
   0.0306 (  0.1%)   0.0091 (  0.2%)   0.0396 (  0.2%)   0.0398 (  0.2%)  Loop Strength Reduction
   0.0274 (  0.1%)   0.0115 (  0.3%)   0.0390 (  0.1%)   0.0390 (  0.1%)  Expand Atomic instructions
   0.0300 (  0.1%)   0.0068 (  0.2%)   0.0369 (  0.1%)   0.0370 (  0.1%)  Post-RA pseudo instruction expansion pass
   0.0281 (  0.1%)   0.0075 (  0.2%)   0.0356 (  0.1%)   0.0356 (  0.1%)  Machine Block Frequency Analysis #2
   0.0292 (  0.1%)   0.0060 (  0.1%)   0.0351 (  0.1%)   0.0351 (  0.1%)  PostRA Machine Sink
   0.0257 (  0.1%)   0.0078 (  0.2%)   0.0336 (  0.1%)   0.0337 (  0.1%)  Machine Natural Loop Construction
   0.0235 (  0.1%)   0.0098 (  0.2%)   0.0333 (  0.1%)   0.0334 (  0.1%)  Lower constant intrinsics
   0.0227 (  0.1%)   0.0096 (  0.2%)   0.0323 (  0.1%)   0.0324 (  0.1%)  Expand memcmp() to load/stores
   0.0258 (  0.1%)   0.0063 (  0.2%)   0.0321 (  0.1%)   0.0322 (  0.1%)  X86 Optimize Call Frame
   0.0245 (  0.1%)   0.0066 (  0.2%)   0.0311 (  0.1%)   0.0312 (  0.1%)  Machine Natural Loop Construction #4
   0.0230 (  0.1%)   0.0071 (  0.2%)   0.0301 (  0.1%)   0.0301 (  0.1%)  Finalize ISel and expand pseudo-instructions
   0.0199 (  0.1%)   0.0085 (  0.2%)   0.0284 (  0.1%)   0.0284 (  0.1%)  Natural Loop Information #2
   0.0210 (  0.1%)   0.0068 (  0.2%)   0.0278 (  0.1%)   0.0279 (  0.1%)  Canonicalize Freeze Instructions in Loops
   0.0206 (  0.1%)   0.0072 (  0.2%)   0.0278 (  0.1%)   0.0278 (  0.1%)  Greedy Register Allocator
   0.0218 (  0.1%)   0.0058 (  0.1%)   0.0276 (  0.1%)   0.0277 (  0.1%)  Machine Natural Loop Construction #3
   0.0208 (  0.1%)   0.0060 (  0.1%)   0.0267 (  0.1%)   0.0268 (  0.1%)  X86 cmov Conversion
   0.0207 (  0.1%)   0.0059 (  0.1%)   0.0267 (  0.1%)   0.0267 (  0.1%)  Machine Natural Loop Construction #2
   0.0208 (  0.1%)   0.0059 (  0.1%)   0.0267 (  0.1%)   0.0267 (  0.1%)  Machine Cycle Info Analysis
   0.0197 (  0.1%)   0.0060 (  0.1%)   0.0257 (  0.1%)   0.0257 (  0.1%)  Early Tail Duplication
   0.0204 (  0.1%)   0.0053 (  0.1%)   0.0256 (  0.1%)   0.0256 (  0.1%)  Machine Loop Invariant Code Motion
   0.0201 (  0.1%)   0.0054 (  0.1%)   0.0255 (  0.1%)   0.0256 (  0.1%)  X86 Fixup SetCC
   0.0205 (  0.1%)   0.0047 (  0.1%)   0.0252 (  0.1%)   0.0252 (  0.1%)  Remove Redundant DEBUG_VALUE analysis
   0.0198 (  0.1%)   0.0055 (  0.1%)   0.0253 (  0.1%)   0.0252 (  0.1%)  X86 pseudo instruction expansion pass
   0.0180 (  0.1%)   0.0072 (  0.2%)   0.0251 (  0.1%)   0.0251 (  0.1%)  Interleaved Access Pass
   0.0187 (  0.1%)   0.0061 (  0.2%)   0.0248 (  0.1%)   0.0248 (  0.1%)  Natural Loop Information #6
   0.0177 (  0.1%)   0.0070 (  0.2%)   0.0247 (  0.1%)   0.0246 (  0.1%)  Natural Loop Information #4
   0.0196 (  0.1%)   0.0049 (  0.1%)   0.0244 (  0.1%)   0.0245 (  0.1%)  Tail Duplication
   0.0170 (  0.1%)   0.0069 (  0.2%)   0.0239 (  0.1%)   0.0236 (  0.1%)  Partially inline calls to library functions
   0.0181 (  0.1%)   0.0056 (  0.1%)   0.0236 (  0.1%)   0.0236 (  0.1%)  Remove unreachable machine basic blocks
   0.0155 (  0.1%)   0.0073 (  0.2%)   0.0228 (  0.1%)   0.0229 (  0.1%)  Function Alias Analysis Results #2
   0.0175 (  0.1%)   0.0053 (  0.1%)   0.0228 (  0.1%)   0.0228 (  0.1%)  X86 LEA Fixup
   0.0151 (  0.1%)   0.0070 (  0.2%)   0.0222 (  0.1%)   0.0222 (  0.1%)  Merge contiguous icmps into a memcmp
   0.0154 (  0.1%)   0.0066 (  0.2%)   0.0220 (  0.1%)   0.0220 (  0.1%)  Natural Loop Information #3
   0.0166 (  0.1%)   0.0050 (  0.1%)   0.0217 (  0.1%)   0.0216 (  0.1%)  X86 Fixup Vector Constants
   0.0152 (  0.1%)   0.0060 (  0.1%)   0.0212 (  0.1%)   0.0214 (  0.1%)  Replace intrinsics with calls to vector library
   0.0150 (  0.1%)   0.0062 (  0.2%)   0.0213 (  0.1%)   0.0213 (  0.1%)  Natural Loop Information #5
   0.0153 (  0.1%)   0.0056 (  0.1%)   0.0209 (  0.1%)   0.0210 (  0.1%)  Expand large fp convert
   0.0144 (  0.1%)   0.0065 (  0.2%)   0.0209 (  0.1%)   0.0209 (  0.1%)  Remove unreachable blocks from the CFG
   0.0147 (  0.1%)   0.0060 (  0.1%)   0.0207 (  0.1%)   0.0208 (  0.1%)  Scalarize Masked Memory Intrinsics
   0.0141 (  0.1%)   0.0064 (  0.2%)   0.0205 (  0.1%)   0.0206 (  0.1%)  Canonicalize natural loops
   0.0156 (  0.1%)   0.0046 (  0.1%)   0.0201 (  0.1%)   0.0201 (  0.1%)  Process Implicit Definitions
   0.0152 (  0.1%)   0.0043 (  0.1%)   0.0195 (  0.1%)   0.0197 (  0.1%)  X86 Avoid Store Forwarding Blocks
   0.0139 (  0.1%)   0.0056 (  0.1%)   0.0195 (  0.1%)   0.0197 (  0.1%)  Expand vector predication intrinsics
   0.0145 (  0.1%)   0.0047 (  0.1%)   0.0193 (  0.1%)   0.0195 (  0.1%)  Live Register Matrix
   0.0131 (  0.1%)   0.0060 (  0.2%)   0.0191 (  0.1%)   0.0193 (  0.1%)  Function Alias Analysis Results #3
   0.0136 (  0.1%)   0.0057 (  0.1%)   0.0194 (  0.1%)   0.0191 (  0.1%)  Expand reduction intrinsics
   0.0125 (  0.1%)   0.0057 (  0.1%)   0.0182 (  0.1%)   0.0184 (  0.1%)  Basic Alias Analysis (stateless AA impl) #2
   0.0136 (  0.1%)   0.0043 (  0.1%)   0.0180 (  0.1%)   0.0180 (  0.1%)  Spill Code Placement Analysis
   0.0122 (  0.1%)   0.0055 (  0.1%)   0.0177 (  0.1%)   0.0178 (  0.1%)  Exception handling preparation
   0.0127 (  0.1%)   0.0052 (  0.1%)   0.0178 (  0.1%)   0.0177 (  0.1%)  X86 Partial Reduction
   0.0134 (  0.1%)   0.0043 (  0.1%)   0.0177 (  0.1%)   0.0177 (  0.1%)  X86 Fixup Inst Tuning
   0.0122 (  0.1%)   0.0051 (  0.1%)   0.0173 (  0.1%)   0.0172 (  0.1%)  Post RA top-down list latency scheduler
   0.0120 (  0.1%)   0.0046 (  0.1%)   0.0167 (  0.1%)   0.0167 (  0.1%)  Machine Trace Metrics
   0.0117 (  0.1%)   0.0049 (  0.1%)   0.0166 (  0.1%)   0.0166 (  0.1%)  Lower AMX type for load/store
   0.0123 (  0.1%)   0.0041 (  0.1%)   0.0164 (  0.1%)   0.0164 (  0.1%)  Bundle Machine CFG Edges
   0.0108 (  0.0%)   0.0054 (  0.1%)   0.0161 (  0.1%)   0.0162 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0112 (  0.1%)   0.0043 (  0.1%)   0.0155 (  0.1%)   0.0156 (  0.1%)  Induction Variable Users
   0.0116 (  0.1%)   0.0037 (  0.1%)   0.0152 (  0.1%)   0.0153 (  0.1%)  Bundle Machine CFG Edges #2
   0.0110 (  0.0%)   0.0038 (  0.1%)   0.0149 (  0.1%)   0.0149 (  0.1%)  Live Stack Slot Analysis
   0.0103 (  0.0%)   0.0043 (  0.1%)   0.0146 (  0.1%)   0.0148 (  0.1%)  Insert KCFI indirect call checks
   0.0098 (  0.0%)   0.0042 (  0.1%)   0.0140 (  0.1%)   0.0139 (  0.1%)  Insert stack protectors
   0.0101 (  0.0%)   0.0036 (  0.1%)   0.0137 (  0.1%)   0.0138 (  0.1%)  Optimize machine instruction PHIs
   0.0101 (  0.0%)   0.0035 (  0.1%)   0.0136 (  0.1%)   0.0137 (  0.1%)  X86 EFLAGS copy lowering
   0.0092 (  0.0%)   0.0044 (  0.1%)   0.0136 (  0.1%)   0.0137 (  0.1%)  Lazy Branch Probability Analysis
   0.0094 (  0.0%)   0.0040 (  0.1%)   0.0134 (  0.1%)   0.0135 (  0.1%)  Assignment Tracking Analysis
   0.0093 (  0.0%)   0.0040 (  0.1%)   0.0133 (  0.1%)   0.0134 (  0.1%)  Lazy Branch Probability Analysis #2
   0.0087 (  0.0%)   0.0045 (  0.1%)   0.0132 (  0.1%)   0.0131 (  0.0%)  Function Alias Analysis Results
   0.0090 (  0.0%)   0.0039 (  0.1%)   0.0128 (  0.0%)   0.0130 (  0.0%)  X86 Indirect Branch Tracking
   0.0088 (  0.0%)   0.0041 (  0.1%)   0.0129 (  0.0%)   0.0129 (  0.0%)  Basic Alias Analysis (stateless AA impl) #4
   0.0091 (  0.0%)   0.0035 (  0.1%)   0.0126 (  0.0%)   0.0128 (  0.0%)  Local Dynamic TLS Access Clean-up
   0.0085 (  0.0%)   0.0040 (  0.1%)   0.0126 (  0.0%)   0.0126 (  0.0%)  Unpack machine instruction bundles
   0.0089 (  0.0%)   0.0034 (  0.1%)   0.0123 (  0.0%)   0.0125 (  0.0%)  Virtual Register Map
   0.0083 (  0.0%)   0.0041 (  0.1%)   0.0124 (  0.0%)   0.0125 (  0.0%)  Expand indirectbr instructions
   0.0085 (  0.0%)   0.0041 (  0.1%)   0.0126 (  0.0%)   0.0124 (  0.0%)  Basic Alias Analysis (stateless AA impl) #3
   0.0088 (  0.0%)   0.0035 (  0.1%)   0.0123 (  0.0%)   0.0124 (  0.0%)  Local Stack Slot Allocation
   0.0086 (  0.0%)   0.0037 (  0.1%)   0.0123 (  0.0%)   0.0123 (  0.0%)  Tile Register Pre-configure
   0.0084 (  0.0%)   0.0034 (  0.1%)   0.0118 (  0.0%)   0.0122 (  0.0%)  Rename Disconnected Subregister Components
   0.0085 (  0.0%)   0.0034 (  0.1%)   0.0119 (  0.0%)   0.0121 (  0.0%)  Argument Stack Rebase
   0.0082 (  0.0%)   0.0035 (  0.1%)   0.0116 (  0.0%)   0.0120 (  0.0%)  Machine Optimization Remark Emitter #2
   0.0084 (  0.0%)   0.0033 (  0.1%)   0.0117 (  0.0%)   0.0120 (  0.0%)  X86 PIC Global Base Reg Initialization
   0.0084 (  0.0%)   0.0034 (  0.1%)   0.0118 (  0.0%)   0.0120 (  0.0%)  Early If-Conversion
   0.0079 (  0.0%)   0.0038 (  0.1%)   0.0118 (  0.0%)   0.0119 (  0.0%)  Shadow Stack GC Lowering
   0.0082 (  0.0%)   0.0035 (  0.1%)   0.0117 (  0.0%)   0.0119 (  0.0%)  Machine Sanitizer Binary Metadata
   0.0080 (  0.0%)   0.0038 (  0.1%)   0.0117 (  0.0%)   0.0118 (  0.0%)  Prepare callbr
   0.0080 (  0.0%)   0.0036 (  0.1%)   0.0116 (  0.0%)   0.0118 (  0.0%)  Machine Optimization Remark Emitter #3
   0.0079 (  0.0%)   0.0036 (  0.1%)   0.0116 (  0.0%)   0.0118 (  0.0%)  Stack Frame Layout Analysis
   0.0079 (  0.0%)   0.0037 (  0.1%)   0.0115 (  0.0%)   0.0118 (  0.0%)  Machine Optimization Remark Emitter #4
   0.0082 (  0.0%)   0.0033 (  0.1%)   0.0115 (  0.0%)   0.0118 (  0.0%)  Machine Optimization Remark Emitter
   0.0083 (  0.0%)   0.0033 (  0.1%)   0.0116 (  0.0%)   0.0117 (  0.0%)  X86 FP Stackifier
   0.0081 (  0.0%)   0.0033 (  0.1%)   0.0115 (  0.0%)   0.0117 (  0.0%)  Lazy Machine Block Frequency Analysis #5
   0.0079 (  0.0%)   0.0036 (  0.1%)   0.0115 (  0.0%)   0.0117 (  0.0%)  X86 Atom pad short functions
   0.0076 (  0.0%)   0.0037 (  0.1%)   0.0113 (  0.0%)   0.0117 (  0.0%)  TLS Variable Hoist
   0.0081 (  0.0%)   0.0036 (  0.1%)   0.0117 (  0.0%)   0.0117 (  0.0%)  Insert XRay ops
   0.0081 (  0.0%)   0.0033 (  0.1%)   0.0113 (  0.0%)   0.0116 (  0.0%)  X86 Load Value Injection (LVI) Load Hardening
   0.0081 (  0.0%)   0.0034 (  0.1%)   0.0115 (  0.0%)   0.0116 (  0.0%)  X86 Lower Tile Copy
   0.0076 (  0.0%)   0.0036 (  0.1%)   0.0112 (  0.0%)   0.0116 (  0.0%)  X86 Indirect Thunks
   0.0076 (  0.0%)   0.0037 (  0.1%)   0.0113 (  0.0%)   0.0116 (  0.0%)  Lazy Block Frequency Analysis
   0.0077 (  0.0%)   0.0037 (  0.1%)   0.0114 (  0.0%)   0.0115 (  0.0%)  Contiguously Lay Out Funclets
   0.0079 (  0.0%)   0.0035 (  0.1%)   0.0114 (  0.0%)   0.0115 (  0.0%)  X86 vzeroupper inserter
   0.0080 (  0.0%)   0.0032 (  0.1%)   0.0112 (  0.0%)   0.0115 (  0.0%)  Register Allocation Pass Scoring
   0.0080 (  0.0%)   0.0034 (  0.1%)   0.0114 (  0.0%)   0.0115 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0080 (  0.0%)   0.0034 (  0.1%)   0.0114 (  0.0%)   0.0114 (  0.0%)  Init Undef Pass
   0.0076 (  0.0%)   0.0037 (  0.1%)   0.0113 (  0.0%)   0.0114 (  0.0%)  StackMap Liveness Analysis
   0.0077 (  0.0%)   0.0035 (  0.1%)   0.0112 (  0.0%)   0.0114 (  0.0%)  Lazy Machine Block Frequency Analysis #7
   0.0081 (  0.0%)   0.0033 (  0.1%)   0.0114 (  0.0%)   0.0114 (  0.0%)  X86 Windows Fixup Buffer Security Check
   0.0077 (  0.0%)   0.0036 (  0.1%)   0.0113 (  0.0%)   0.0114 (  0.0%)  X86 Insert Cache Prefetches
   0.0081 (  0.0%)   0.0032 (  0.1%)   0.0113 (  0.0%)   0.0114 (  0.0%)  Lazy Machine Block Frequency Analysis
   0.0075 (  0.0%)   0.0036 (  0.1%)   0.0111 (  0.0%)   0.0113 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0078 (  0.0%)   0.0033 (  0.1%)   0.0112 (  0.0%)   0.0113 (  0.0%)  X86 DynAlloca Expander
   0.0079 (  0.0%)   0.0033 (  0.1%)   0.0112 (  0.0%)   0.0113 (  0.0%)  X86 speculative load hardening
   0.0080 (  0.0%)   0.0032 (  0.1%)   0.0111 (  0.0%)   0.0113 (  0.0%)  Lazy Machine Block Frequency Analysis #4
   0.0080 (  0.0%)   0.0031 (  0.1%)   0.0112 (  0.0%)   0.0113 (  0.0%)  X86 Domain Reassignment Pass
   0.0077 (  0.0%)   0.0035 (  0.1%)   0.0112 (  0.0%)   0.0112 (  0.0%)  Lazy Machine Block Frequency Analysis #8
   0.0078 (  0.0%)   0.0034 (  0.1%)   0.0112 (  0.0%)   0.0112 (  0.0%)  Insert fentry calls
   0.0075 (  0.0%)   0.0036 (  0.1%)   0.0111 (  0.0%)   0.0112 (  0.0%)  X86 insert wait instruction
   0.0080 (  0.0%)   0.0032 (  0.1%)   0.0112 (  0.0%)   0.0112 (  0.0%)  Tile Register Configure
   0.0078 (  0.0%)   0.0033 (  0.1%)   0.0111 (  0.0%)   0.0112 (  0.0%)  Fixup Statepoint Caller Saved
   0.0078 (  0.0%)   0.0032 (  0.1%)   0.0110 (  0.0%)   0.0112 (  0.0%)  Lazy Machine Block Frequency Analysis #6
   0.0075 (  0.0%)   0.0035 (  0.1%)   0.0110 (  0.0%)   0.0112 (  0.0%)  X86 Speculative Execution Side Effect Suppression
   0.0076 (  0.0%)   0.0034 (  0.1%)   0.0110 (  0.0%)   0.0111 (  0.0%)  Implement the 'patchable-function' attribute
   0.0077 (  0.0%)   0.0033 (  0.1%)   0.0111 (  0.0%)   0.0111 (  0.0%)  Detect Dead Lanes
   0.0077 (  0.0%)   0.0032 (  0.1%)   0.0109 (  0.0%)   0.0111 (  0.0%)  Lazy Machine Block Frequency Analysis #3
   0.0076 (  0.0%)   0.0033 (  0.1%)   0.0109 (  0.0%)   0.0111 (  0.0%)  Lazy Block Frequency Analysis #2
   0.0072 (  0.0%)   0.0036 (  0.1%)   0.0108 (  0.0%)   0.0111 (  0.0%)  Pseudo Probe Inserter
   0.0075 (  0.0%)   0.0035 (  0.1%)   0.0110 (  0.0%)   0.0110 (  0.0%)  X86 Discriminate Memory Operands
   0.0074 (  0.0%)   0.0035 (  0.1%)   0.0108 (  0.0%)   0.0110 (  0.0%)  Lower AMX intrinsics
   0.0075 (  0.0%)   0.0035 (  0.1%)   0.0110 (  0.0%)   0.0110 (  0.0%)  Compressing EVEX instrs when possible
   0.0073 (  0.0%)   0.0035 (  0.1%)   0.0108 (  0.0%)   0.0110 (  0.0%)  X86 Load Value Injection (LVI) Ret-Hardening
   0.0075 (  0.0%)   0.0034 (  0.1%)   0.0108 (  0.0%)   0.0109 (  0.0%)  Lazy Machine Block Frequency Analysis #9
   0.0073 (  0.0%)   0.0034 (  0.1%)   0.0107 (  0.0%)   0.0109 (  0.0%)  Lazy Machine Block Frequency Analysis #10
   0.0072 (  0.0%)   0.0034 (  0.1%)   0.0106 (  0.0%)   0.0109 (  0.0%)  X86 Return Thunks
   0.0073 (  0.0%)   0.0034 (  0.1%)   0.0108 (  0.0%)   0.0107 (  0.0%)  Safe Stack instrumentation pass
   0.0076 (  0.0%)   0.0030 (  0.1%)   0.0105 (  0.0%)   0.0107 (  0.0%)  Lazy Machine Block Frequency Analysis #2
   0.0070 (  0.0%)   0.0034 (  0.1%)   0.0104 (  0.0%)   0.0107 (  0.0%)  Lower Garbage Collection Instructions
   0.0019 (  0.0%)   0.0009 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Assumption Cache Tracker
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Default Regalloc Eviction Advisor
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Default Regalloc Priority Advisor
  22.2084 (100.0%)   4.0015 (100.0%)  26.2099 (100.0%)  26.3309 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 64.8567 seconds (65.1986 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  57.9002 (100.0%)   6.9564 (100.0%)  64.8567 (100.0%)  65.1986 (100.0%)  Clang front-end timer
  57.9002 (100.0%)   6.9564 (100.0%)  64.8567 (100.0%)  65.1986 (100.0%)  Total

yorickpeterse added a commit that referenced this issue Jan 14, 2025
This enables a set of LLVM optimization passes for the "balanced" and
"aggressive" profiles. Both are based on the "default<O2>" pipeline,
with the removal of some irrelevant passes. In the case of the
"balanced" profile we also remove some additional passes that aren't
likely to be useful in most cases.

This fixes #595.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 14, 2025
This enables a set of LLVM optimization passes for the "balanced" and
"aggressive" profiles. Both are based on the "default<O2>" pipeline,
with the removal of some irrelevant passes. In the case of the
"balanced" profile we also remove some additional passes that aren't
likely to be useful in most cases.

This fixes #595.

Changelog: added
@yorickpeterse
Copy link
Collaborator Author

yorickpeterse added a commit that referenced this issue Jan 14, 2025
This enables a set of LLVM optimization passes for the "balanced" and
"aggressive" profiles. Both are based on the "default<O2>" pipeline,
with the removal of some irrelevant passes. In the case of the
"balanced" profile we also remove some additional passes that aren't
likely to be useful in most cases.

This fixes #595.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 14, 2025
This enables a set of LLVM optimization passes for the "balanced" and
"aggressive" profiles. Both are based on the "default<O2>" pipeline,
with the removal of some irrelevant passes. In the case of the
"balanced" profile we also remove some additional passes that aren't
likely to be useful in most cases.

This fixes #595.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 14, 2025
This enables a set of LLVM optimization passes for the "balanced" and
"aggressive" profiles. Both are based on the "default<O2>" pipeline,
with the removal of some irrelevant passes. In the case of the
"balanced" profile we also remove some additional passes that aren't
likely to be useful in most cases.

This fixes #595.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 14, 2025
This enables a set of LLVM optimization passes for the "balanced" and
"aggressive" profiles. Both are based on the "default<O2>" pipeline,
with the removal of some irrelevant passes. In the case of the
"balanced" profile we also remove some additional passes that aren't
likely to be useful in most cases.

This fixes #595.

Changelog: added
yorickpeterse added a commit that referenced this issue Jan 14, 2025
This enables a set of LLVM optimization passes for the "balanced" and
"aggressive" profiles. Both are based on the "default<O2>" pipeline,
with the removal of some irrelevant passes. In the case of the
"balanced" profile we also remove some additional passes that aren't
likely to be useful in most cases.

This fixes #595.

Changelog: added
@yorickpeterse
Copy link
Collaborator Author

84830ab implements a list of passes to run when applying optimizations. This list is roughly similar to clang -O2, minus some passes that don't make sense, and with the instcombine pass iterations count adjusted per #595 (comment).

While this will increase compile times a fair bit, even for --opt=balanced, it's better than no optimizations. My long term plan is to try and reduce the amount of IR we feed to LLVM in order to keep compile times under control, such as by eliminating closures where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepting contributions Issues that are suitable to be worked on by anybody, not just maintainers compiler Changes related to the compiler
Projects
None yet
Development

No branches or pull requests

1 participant