Skip to content
This repository was archived by the owner on Dec 19, 2023. It is now read-only.

Slow builds introduced by menhir 20211230 or 20220210 #512

Open
mjambon opened this issue Apr 1, 2022 · 6 comments
Open

Slow builds introduced by menhir 20211230 or 20220210 #512

mjambon opened this issue Apr 1, 2022 · 6 comments

Comments

@mjambon
Copy link
Member

mjambon commented Apr 1, 2022

We were running into out-of-memory errors when we tried building pfff with menhir 20220210. This is a report of the fast build times like we're used to and the new, long build times.

Note that the atd package on which we depend for semgrep doesn't build with menhir 20211230 (some type error), so I didn't check the performance for that version but I suspect this is where it started based on the menhir changelog:

The code back-end has been rewritten from the ground up by Émile Trotignon and François Pottier, and now produces efficient and well-typed OCaml code. The infamous Obj.magic is not used any more.

The times shown are the build times for the whole pfff project after deleting the mentioned build folders (under _build/). The command I used is

for x in lang_*/parsing; do echo $x; rm -rf _build/default/$x ; /bin/time -f "$x: %U s" -o "$(dirname $x).time" make; done

Results

The longest build time is now for lang_js/parsing: it went from 23.67 s to 154.29 s. I didn't run into OOM errors during this benchmarking but I did previously when rebuilding the whole project both in CI and locally (starting my build with about 11 GB available on my machine).

Before (menhir 20211128):

lang_cpp/parsing: 25.15 s
lang_csharp/parsing: 2.67 s
lang_css/parsing: 1.74 s
lang_c/parsing: 3.43 s
lang_erlang/parsing: 2.47 s
lang_FUZZY/parsing: 1.99 s
lang_GENERIC/parsing: 1.64 s
lang_go/parsing: 9.37 s
lang_haskell/parsing: 2.31 s
lang_html/parsing: 3.51 s
lang_java/parsing: 13.27 s
lang_json/parsing: 2.18 s
lang_js/parsing: 23.67 s
lang_lisp/parsing: 2.42 s
lang_ml/parsing: 14.92 s
lang_nw/parsing: 2.28 s
lang_php/parsing: 26.39 s
lang_python/parsing: 10.37 s
lang_regexp/parsing: 3.12 s
lang_ruby/parsing: 14.77 s
lang_rust/parsing: 2.54 s
lang_scala/parsing: 5.94 s
lang_skip/parsing: 2.85 s
lang_sql/parsing: 1.73 s
lang_web/parsing: 1.70 s

After (menhir 20220210):

lang_cpp/parsing: 92.68 s
lang_csharp/parsing: 2.74 s
lang_css/parsing: 1.81 s
lang_c/parsing: 36.69 s
lang_erlang/parsing: 2.54 s
lang_FUZZY/parsing: 2.20 s
lang_GENERIC/parsing: 1.84 s
lang_go/parsing: 20.57 s
lang_haskell/parsing: 2.70 s
lang_html/parsing: 3.52 s
lang_java/parsing: 53.64 s
lang_json/parsing: 2.41 s
lang_js/parsing: 154.29 s
lang_lisp/parsing: 2.55 s
lang_ml/parsing: 32.30 s
lang_nw/parsing: 2.46 s
lang_php/parsing: 67.24 s
lang_python/parsing: 20.36 s
lang_regexp/parsing: 3.46 s
lang_ruby/parsing: 15.20 s
lang_rust/parsing: 2.74 s
lang_scala/parsing: 6.05 s
lang_skip/parsing: 3.16 s
lang_sql/parsing: 1.97 s
lang_web/parsing: 1.87 s
@mseri
Copy link

mseri commented Apr 2, 2022

Ping @fpottier

@fpottier
Copy link

fpottier commented Apr 2, 2022

Could you try menhir -O 0 or menhir -O 1 and let me know if the build times are better?

@fpottier
Copy link

fpottier commented Apr 2, 2022

(The default level is -O 2, which is sometimes costly.)

@mjambon
Copy link
Member Author

mjambon commented Apr 2, 2022

@fpottier I get these build times for the lang_cpp folder, using menhir 20220210:

-O 0:

lang_cpp/parsing: 33.72 s

-O 1:

lang_cpp/parsing: 33.30 s

-O 2 (passed explicitly to be sure):

lang_cpp/parsing: 102.66 s

This is great. I'm not sure if we have a benchmark suite for parser performance. (cc @aryx)
Thanks @fpottier!

@aryx
Copy link
Collaborator

aryx commented Apr 2, 2022

We dont have

@fpottier
Copy link

fpottier commented Apr 3, 2022

Thanks. I will probably have to make -O 1 the default.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants