Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiling and optimizations #189

Closed
vogler opened this issue Apr 14, 2021 · 6 comments
Closed

profiling and optimizations #189

vogler opened this issue Apr 14, 2021 · 6 comments
Labels
performance Analysis time, memory usage practical-course Practical Course at TUM

Comments

@vogler
Copy link
Collaborator

vogler commented Apr 14, 2021

Originally posted by @vogler in #188 (comment)

This likely needs some more in-depth profiling (like most of Goblint really...) because it's not obvious at all. For example, if using just DefExc in a 4-component IntDomTuple, most operations would also be doing lots of BatOption.map calls on None.

The assumption was that for n activated out of m total domains you have for a binop

  • n * (1+2) (List.map2 + both args) match for IntDomList
  • m match for IntDomTuple.

I think I ran some test where it was faster, but would be better to have some inline benchmarks [1] in bench for questions like this.
The main motivation was to reduce boilerplate and make adding domains easier.

Or for example, create/create2 in IntDomTuple make repeated get_bool calls each time, which involves a lot more than looking up a locally cached bool variable (some places do this with options).

create/create2 shouldn't be called often compared to map*.
But good point in general. Maybe it's a good idea to include some memoization for GobConfig.get_* instead letting consumers implement it repeatedly.
Edit: done: 5c73008

My general suspicion and experience with #164 is that we have subtle but massive performance bottlenecks in all sorts of subtle places, but nobody has ever profiled Goblint properly. Which isn't surprising because OCaml profiling tooling is almost non-existent. I tried and still ended up just guessing and using Stats.time to pin the problem.

True, perf report [2] works but it's hard to find bottlenecks, for example it does not list things like get_bool. Haven't tried if enabling frame pointers helps.
The equality stuff in bench/hashcons also looks interesting. What speaks against let (=) a b = a == b || a = b?

[1] https://github.com/Chris00/ocaml-benchmark
[2] https://ocaml.org/learn/tutorials/performance_and_profiling.html#Using-perf-on-Linux

@vogler
Copy link
Collaborator Author

vogler commented Apr 14, 2021

@sim642 sim642 added the performance Analysis time, memory usage label Apr 14, 2021
@vogler
Copy link
Collaborator Author

vogler commented Jul 27, 2021

Example of profiling with perf that shows how name mangling and inlining make it hard to use: #265 (comment)

@sim642
Copy link
Member

sim642 commented Jul 27, 2021

Regarding name mangling I found this but didn't try it: https://discuss.ocaml.org/t/ann-perf-demangling-of-ocaml-symbols-a-short-introduction-to-perf/7143. Supposedly those improvements are also going upstream into perf.

@michael-schwarz michael-schwarz added the practical-course Practical Course at TUM label Nov 12, 2022
@michael-schwarz
Copy link
Member

We currently have a practical course at TUM investigating this and related issues.

@jerhard
Copy link
Member

jerhard commented Nov 14, 2023

I would suggest to close this issue as we already had a look at optimizing by introducing short-circuiting, caching of options, and using flambda. The remaining issue of profiling and optimization then seems overly broad.

@michael-schwarz
Copy link
Member

Close as overly broad for now. We should open dedicated issues if we have new ideas.

@michael-schwarz michael-schwarz closed this as not planned Won't fix, can't repro, duplicate, stale Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Analysis time, memory usage practical-course Practical Course at TUM
Projects
None yet
Development

No branches or pull requests

4 participants