Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find successful passes with genetic algorithm #900

Closed
wants to merge 1 commit into from

Conversation

nmdis1999
Copy link
Contributor

@nmdis1999 nmdis1999 commented Nov 9, 2023

This PR aims to find successful passes (i.e which pass all YK and YKLUA tests) with the help of genetic algorithm. The fitness function currently gives priority to execution time (the lesser the better) to pick up most likely parents.

@nmdis1999 nmdis1999 assigned nmdis1999, ltratt and vext01 and unassigned nmdis1999 Nov 9, 2023
@ltratt
Copy link
Contributor

ltratt commented Nov 9, 2023

@nmdis1999 Does this carry over the comments I made on the other PR?

@nmdis1999
Copy link
Contributor Author

@ltratt All but one, I have to change while loop (with i) to for. Will push this after cleaning some lines in next commit.

@nmdis1999
Copy link
Contributor Author

@ltratt I am capturing timing information in cargo_run.py (688283f) and then feeding that in try_passes.py's genetic algorithm. I wondering if this is the right way or is there a way to capture just runtime of cargo test without the compilation step. I tried running cargo build before running cargo test but it didn't seem to do much.

@nmdis1999
Copy link
Contributor Author

A bit of explanation how genetic algorithm is assigning weight to list of passes. Genetic algorithm first generates a list of passes which is then used to re-compile/build YK and run YKLUA test via cargo_run.py. Only YKLUA's test runtime is considered for assigning weight (or fitness score).

Since cargo_run.py calls run.sh from YKLUA repository it is not visible in this PR. But here is the content of run.sh:

#!/bin/sh

set -e

# Check if YKLUA_PATH is set and is a valid directory.
if [ -z "$YKLUA_PATH" ] || [ ! -d "$YKLUA_PATH" ]; then
    echo "YKLUA_PATH directory does not exist."
    exit 1
fi

cd $YKLUA_PATH
MODE=debug

if [ ! -z $PRELINK_PASSES ]; then
  echo "yk-here"
  yk-config ${MODE} --prelink-pipeline "${PRELINK_PASSES}" --cflags 
else
  yk-config ${MODE} --cflags
fi

if [ ! -z $LINKTIME_PASSES ]; then
  yk-config ${MODE} --postlink-pipeline "${LINKTIME_PASSES}" --ldflags
else
  yk-config ${MODE} --ldflags
fi

make clean && make YK_BUILD_TYPE=debug

cd benchmark
SECS=120
LUA=../src/lua

OK=""
FAIL=""

exstatus=0
YKD_SERIALISE_COMPILATION=0 timeout -s9 ${SECS}s ${LUA} fannkuchredux.lua 3 \
    >/dev/null 2>&1 || exstatus=$?

if [ ${exstatus} -eq 0 ]; then
    echo "OK"
    OK="${OK}"
else
    if [ ${exstatus} -eq 137 ]; then
        echo "TIMEOUT"
    else
        echo "FAIL"
    fi
    FAIL="${FAIL}"
fi

echo "\n---"
echo "OK: ${OK}"
echo "FAIL: ${FAIL}"

@ltratt
Copy link
Contributor

ltratt commented Dec 4, 2023

@nmdis1999 If/when this has successfully found some passes, please let us know.

@nmdis1999
Copy link
Contributor Author

@ltratt I ran the algorithm for 100 generation (and population size * 2), this ran between 4-5 days for both PRELINK and POSTLINK pass list.
Here is the best pass list the algorithm dumped based on execution time of the benchmark run:

PRELINK

forceattrs,openmp-opt,called-value-propagation,globalopt,function(invalidate<aa>),deadargelim,globaldce,verify

POSTLINK

annotation2metadata,forceattrs,function<eager-inv>(lower-expect,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>,sroa<modify-cfg>,early-cse<>),openmp-opt,ipsccp,called-value-propagation,function(mem2reg), function<eager-inv>(instcombine,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),require<globals-aa>,require<profile-summary>,cgscc(devirt<4>(inline<only-mandatory>,inline,function-attrs,openmp-opt-cgscc,function<eager-inv>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>,instcombine,libcalls-shrinkwrap,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>,reassociate,require<opt-remark-emit>,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate,licm<allowspeculation>,simple-loop-unswitch<no-nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>,instcombine,loop(loop-idiom,indvars,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine,jump-threading,correlated-propagation,adce,memcpyopt,dse,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;sink-common-insts>,instcombine),coro-split)),name-anon-globals

@ltratt
Copy link
Contributor

ltratt commented Dec 6, 2023

Wow that's a lot of postlink passes! Does it make much difference to interpreter performance?

@nmdis1999 nmdis1999 changed the title Find successful passes with genetic algorithm [DRAFT] Find successful passes with genetic algorithm Jan 8, 2024
@nmdis1999
Copy link
Contributor Author

@ltratt @vext01 this PR is ready for review now.

@vext01
Copy link
Contributor

vext01 commented Jan 9, 2024

This is looking good. All of my comments are minor.

@nmdis1999
Copy link
Contributor Author

@ltratt @vext01 addressed all the comments. Ready for review :)

@ltratt
Copy link
Contributor

ltratt commented Jan 9, 2024

We're getting there! @nmdis1999 Please have a quick look over the open chats (including one or two from last year). With luck we can get this moving soon (once we have some rough numbers from your benchmarking run).

@nmdis1999
Copy link
Contributor Author

@ltratt addressed your comments. Ready for review/

@ltratt
Copy link
Contributor

ltratt commented Jan 12, 2024

OK, so code review looks reasonable -- all we need to know now is whether it works! So we'll need performance numbers in here as and when you have them.

@nmdis1999 nmdis1999 force-pushed the genetic-algorithm2 branch 4 times, most recently from 6acd1bb to 3e9e7cb Compare January 29, 2024 11:01
@ltratt
Copy link
Contributor

ltratt commented Jan 29, 2024

@nmdis1999 Please undo the force push (ironically you'll need to force push in order to do that!).

@nmdis1999 nmdis1999 force-pushed the genetic-algorithm2 branch 3 times, most recently from 2152eb3 to 0edbc0d Compare January 29, 2024 12:02
@nmdis1999
Copy link
Contributor Author

nmdis1999 commented Jan 29, 2024

PR is ready for review @ltratt @vext01

Benchmark numbers (for 30 runs with 95% confidence interval):

db.lua Mean 95% CI
WITHOUT YKLLVM (O2) 0.034 0.004
WITHOUT YKLLVM (O3) 0.030 0.003
WITH YKLLVM 0.140 0.002
PRELINK 0.107 0.005
POSTLINK 0.105 0.002
PRELINK + POSTLINK 0.101 0.002
all.lua Mean 95% CI
WITHOUT YKLLVM (O2) 0.465 0.004
WITHOUT YKLLVM (O3) 0.469 0.005
WITH YKLLVM 2.186 0.02
PRELINK 1.230 0.008
POSTLINK 1.294 0.006
PRELINK + POSTLINK 1.212 0.01
nbody.lua Mean 95% CI
WITHOUT YKLLVM O2 0.007 0.003
WITHOUT YKLLVM O3 0.008 0.007
WITH YKLLVM 0.074 0.001
PRELINK 0.040 0.001
POSTLINK 0.041 0.003
PRELINK + POSTLINK 0.042 0.003

@ltratt
Copy link
Contributor

ltratt commented Jan 29, 2024

@nmdis1999 Quick questions:

  1. Why sometimes CI and sometimes std dev? Let's pick one (ideally CI) and stick with it.
  2. Can you define what "WITHOUT YKLLVM" (etc) mean? I can guess but I'm not 100% sure, so best that you tell me!
  3. What is the best prelink and/or postlink set(s) of passes you've found so far?

@nmdis1999
Copy link
Contributor Author

@nmdis1999 Quick questions:

  1. Why sometimes CI and sometimes std dev? Let's pick one (ideally CI) and stick with it.

Sorry, I forgot to change the headers. They are all CI.

  1. Can you define what "WITHOUT YKLLVM" (etc) mean? I can guess but I'm not 100% sure, so best that you tell me!

WITHOUT YKLLVM means the lua binary is compiled with O2/O3 without YK related flags.

  1. What is the best prelink and/or postlink set(s) of passes you've found so far?

Best Prelink set of passes:

annotation2metadata,forceattrs,inferattrs,coro-early,called-value-propagation,require<globals-aa>,cgscc(devirt<4>(inline<only-mandatory>,inline,function-attrs<skip-non-recursive-function-attrs>,openmp-opt-cgscc,function<eager-inv;no-rerun>(sroa<modify-cfg>,early-cse<memssa>,speculative-execution,jump-threading,correlated-propagation,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,aggressive-instcombine,libcalls-shrinkwrap,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>,reassociate,constraint-elimination,loop-mssa(loop-instsimplify,loop-simplifycfg,licm<no-allowspeculation>,loop-rotate<header-duplication;prepare-for-lto>,licm<allowspeculation>,simple-loop-unswitch<no-nontrivial;trivial>),simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,loop(loop-idiom,indvars,loop-deletion,loop-unroll-full),sroa<modify-cfg>,vector-combine,mldst-motion<no-split-footer-bb>,gvn<>,sccp,bdce,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,jump-threading,correlated-propagation,adce,memcpyopt,dse,move-auto-init,loop-mssa(licm<allowspeculation>),coro-elide,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;sink-common-insts;speculate-blocks;simplify-cond-branch>,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>),function-attrs,function(require<should-not-run-function-passes>),coro-split)),coro-cleanup,globaldce,function(annotation-remarks),name-anon-globals

Best Postlink set of passes:

cross-dso-cfi,openmp-opt,globaldce<vfe-linkage-unit-visibility>,function<eager-inv>(callsite-splitting),cgscc(function-attrs),globalsplit,globalopt,function(mem2reg),deadargelim,function(loop-sink,div-rem-pairs,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>)

tests/Cargo.toml Outdated
@@ -36,7 +36,7 @@ yktracec = { path = "../yktracec", features = ["yk_testing"] }

[dev-dependencies]
criterion = { version = "0.5.1", features = ["html_reports"] }
lang_tester = "0.7.4"
lang_tester = { version = "0.7.4", path = "../../lang_tester" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to make this point to the branch in Edd's github fork. Cargo has the settings you need to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, we don't want to merge that into yk/master. It fudges lang_tester to not match stdin/stdout.

What's the plan here?

for _ in range(n):
subprocess.run(["make clean"], shell=True, env=env)
c = subprocess.run(["make && timeout 30 sh test.sh"], shell=True, env=env or os.environ)
os.chdir(yk_path)
if c.returncode == 0:
r = subprocess.run(["timeout 60 cargo test"], shell=True, env=env or os.environ)
r = subprocess.run(f"timeout 60 cargo test", shell=True, env=env or os.environ)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra space after = is unneccessary.

def setup(curr_dir, base_temp_dir, yk_path, yklua):
num_cores = multiprocessing.cpu_count() - 1
directories = []
# for i in range(num_cores):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code.

@ltratt
Copy link
Contributor

ltratt commented Jan 29, 2024

f42221a is going to be a problem: we can't commit the PR with those changes, because they break "normal" yk. @vext01 do you have any thoughts on how we could handle this in the master branch? It might be that we end up having to do this stuff in a branch/fork for a while? [I'd prefer to merge into master, but aware that it might be tricky.]

@vext01
Copy link
Contributor

vext01 commented Jan 29, 2024

I suppose we could use a cargo feature for the kludged lang_tester?

@ltratt
Copy link
Contributor

ltratt commented Jan 29, 2024

It's not just lang_tester though: notice that this branch also disables a test and disables code in ykcapi.

@vext01
Copy link
Contributor

vext01 commented Jan 29, 2024

I suppose that could be feature gated too... I'm not really sure what the best way forward here is. gate or branch...

@ltratt
Copy link
Contributor

ltratt commented May 28, 2024

Please force push a rebase + squash (doesn't have to be to 1 commit though) against master.

@nmdis1999 nmdis1999 force-pushed the genetic-algorithm2 branch from 06f62b4 to c3bf5ac Compare May 28, 2024 14:41
@nmdis1999 nmdis1999 force-pushed the genetic-algorithm2 branch from c3bf5ac to c99cd35 Compare May 28, 2024 14:46
@nmdis1999
Copy link
Contributor Author

@ltratt rebased and pushed

@ltratt ltratt closed this Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants