bench: add `-quiet` and `-iters=<n>` benchmark config args #22999

jonatack · 2021-09-16T16:04:07Z

Running a benchmark 10 times, twice (before and after), for #22974 and then editing the output by hand to remove the warnings and recommendations, brought home the point that it would be nice to be able to do that automatisch. This PR adds a -quiet arg to silence warnings and recommendations and an iters=<n> arg to run each benchmark for the number of iterations passed.

$ src/bench/bench_bitcoin -?
Options:

  -?
       Print this help message and exit

  -asymptote=<n1,n2,n3,...>
       Test asymptotic growth of the runtime of an algorithm, if supported by
       the benchmark

  -filter=<regex>
       Regular expression filter to select benchmark by name (default: .*)

  -iters=<n>
       Iterations of each benchmark to run (default: 1)

  -list
       List benchmarks without executing them

  -output_csv=<output.csv>
       Generate CSV file with the most important benchmark results

  -output_json=<output.json>
       Generate JSON file with all benchmark results

  -quiet
       Silence warnings and recommendations in benchmark results

examples

$ ./src/bench/bench_bitcoin -filter=AddrManGood -iters=5 -quiet

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    2,538,968,665.00 |                0.39 |   15.9% |     12.12 | `AddrManGood`
|    2,536,901,200.00 |                0.39 |   13.0% |     13.73 | `AddrManGood`
|    2,337,840,590.00 |                0.43 |    3.9% |     12.07 | `AddrManGood`
|    1,997,515,936.00 |                0.50 |    2.6% |     10.09 | `AddrManGood`
|    2,217,950,210.00 |                0.45 |    1.3% |     11.30 | `AddrManGood`

$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=2 -quiet=1

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|            8,062.56 |          124,030.15 |    5.7% |      0.09 | `PrevectorDeserializeNontrivial`
|            7,784.81 |          128,455.29 |    1.5% |      0.09 | `PrevectorDeserializeNontrivial`

|              356.44 |        2,805,497.65 |    1.5% |      0.00 | `PrevectorDeserializeTrivial`
|              354.52 |        2,820,715.33 |    0.9% |      0.00 | `PrevectorDeserializeTrivial`

|              241.27 |        4,144,791.38 |    0.9% |      0.00 | `PrevectorDestructorNontrivial`
|              241.45 |        4,141,658.77 |    0.9% |      0.00 | `PrevectorDestructorNontrivial`

|              146.64 |        6,819,400.81 |    0.9% |      0.00 | `PrevectorDestructorTrivial`
|              147.98 |        6,757,806.43 |    0.6% |      0.00 | `PrevectorDestructorTrivial`

$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=-1 -quiet=0
$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=0 -quiet=0
$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=1 -quiet=0
Warning, results might be unstable:
* DEBUG defined
* CPU frequency scaling enabled: CPU 0 between 400.0 and 3,100.0 MHz
* Turbo is enabled, CPU frequency will fluctuate

Recommendations
* Make sure you compile for Release
* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|            6,204.87 |          161,163.71 |   15.2% |      0.07 | :wavy_dash: `PrevectorDeserializeNontrivial` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
|              214.33 |        4,665,680.65 |    0.1% |      0.00 | `PrevectorDeserializeTrivial`
|              257.23 |        3,887,584.03 |    8.6% |      0.00 | :wavy_dash: `PrevectorDestructorNontrivial` (Unstable with ~43.5 iters. Increase `minEpochIterations` to e.g. 435)
|              151.34 |        6,607,846.82 |    1.9% |      0.00 | `PrevectorDestructorTrivial`

- use ALLOW_BOOL for -list arg instead of ALLOW_ANY - touch up `-asymptote=<n1,n2,n3...>` help - pack Args struct a bit more efficiently - handle args in alphabetical order

jonatack · 2021-09-16T16:08:09Z

@martinus please let me know if this is ok, as the -quiet arg makes some minor changes to nanobench.h to allow the equivalent of NANOBENCH_SUPPRESS_WARNINGS=1 as an arg listed in the help.

theStack · 2021-09-16T20:41:30Z

Strong Concept ACK on introducing an -iters parameter, after also having reviewed #22974 :)
As for -quiet I'm less convinced about the usefulness, I think the warnings and recommendations almost always make sense and I personally wouldn't see a need to silence them -- if a result is marked as "Unstable" I'm usually throwing it away, assuming that it only has a limited value. (Note though that I'm not a heavy users of the benchmarks in general, maybe there is a need for silencing that I'm not aware of).

jonatack · 2021-09-16T21:02:06Z

Thanks! The -quiet arg was actually my first motivation, to be able to less painfully compare results or share them, most of which for me have the verbose >5% err unstable warnings even when all the recommendations are followed, and it takes an annoying amount of time to edit out all the warnings by hand. People can see the %err, so for sharing results the noisy warnings are quite pollutive and it's so handy to be able to silence them.

fanquake · 2021-09-17T02:47:54Z

I don't think adding more code, and modifying a dependency, just to replicate functionality that already exists is a good idea. What's the problem with using NANOBENCH_SUPPRESS_WARNINGS? In almost all cases I'm sure we'd rather people heed the warnings, and produce more useful benchmarks, than just ignore them.

and it takes an annoying amount of time to edit out all the warnings by hand.

Couldn't you write a bash one-liner to post-process your benchmark output, that throws away lines that don't start with a |?

martinus · 2021-09-17T07:31:16Z

src/bench/nanobench.h

@@ -2278,7 +2279,7 @@ struct IterationLogic::Impl {
                    os << col.value();
                }
                os << "| ";
-                auto showUnstable = isWarningsEnabled() && rErrorMedian >= 0.05;
+                auto showUnstable = isWarningsEnabled(mBench.m_quiet) && rErrorMedian >= 0.05;


I don't think that warning should ever be hidden. It shows that the value you are seeing is not reliable because the error is too high. If you regularly see that warning in a benchmark, the benchmark should be improved or the machine should be somehow stabilized.

Yes, unfortunately many to most of the benchmarks have the warning for me, despite taking the steps to stabilize the machine. Encountered this extensively while working on #22284 and every time I run benchmarks to test pulls. I now use NANOBENCH_SUPPRESS_WARNINGS to be able to share my results without running a bench dozens of times to have a few without the warnings.

martinus · 2021-09-17T07:42:43Z

src/bench/nanobench.h

@@ -624,6 +624,9 @@ class Bench {
    Bench& operator=(Bench const& other);
    ~Bench() noexcept;

+    //! Whether to suppress warnings and recommendations. Equivalent to NANOBENCH_SUPPRESS_WARNINGS.
+    bool m_quiet{false};


I'm not sure about adding support for this though, because this can already be done with the environment variable NANOBENCH_SUPPRESS_WARNINGS and it is not obvious which of these setting should override the other. Maybe it would be better to just add documentation for NANOBENCH_SUPPRESS_WARNINGS and also NANOBENCH_ENDLESS into the usage documentation?

If we really want that, in nanobench all the configuration should be inside of Config class. That way configuration can be reused in other benchmarks. Then Bench should have a well documented getter/setter for this. I'd prefer to keep https://github.com/martinus/nanobench and this code here in sync though.

Makes sense.

martinus · 2021-09-17T08:21:01Z

src/bench/bench.cpp

-            for (auto n : args.asymptote) {
-                bench.complexityN(n);
+        for (int i = 0; i < args.iters; ++i) {
+            if (i == 0 && args.iters > 1) {


I do not think adding an iterations argument that work like that is a good idea. nanobench itself already has the ability to perform multiple measurements (called "epoch" in nanobench), in fact each benchmark is already measured 11 times. That way it is able to show a measurement error. The number of iterations in each of the measurements is determined automatically, based on the computer's clock accuracy.

If you get unstable benchmark results, the first thing to do should be to make sure the computer is really stable: no frequency scaling, no turbo, no other interfering programs. nanobench shows the warnings for good reason 🙂

If that doesn't help, make sure the actual benchmark itself is stable and actually always does the same (few randomness in it, better not have much allocations, threadding, locks, etc).

If that too doesn't help, you can e.g. increase the number of iterations with minEpochIterations. That's a bit problematic though because some benchmarks need a huge setting here, others a very low one. So generally it is probably better use minEpochTime and expose that setting in the arguments (probably as double value in seconds, e.g. like -minEpochTime=0.5)

Yes. I've gone through these things. In practice, what I'm seeing people doing to share and compare results in PR reviews is run a benchmark repeatedly, for which the iters=<n> proposal here is a handy convenience.

jonatack · 2021-09-17T13:22:08Z

I'm going to close and use these features for my own Bitcoin Core benchmarking. If anyone else would like to use them, they can cherry-pick this branch for their own use.

Thanks for your replies, @martinus. Feel free to add bench documentation as you suggest in #22999 (comment) and I'll be happy to review it.

e148a52 bench: fixed ubsan implicit conversion (Martin Ankerl) da4e2f1 bench: various args improvements (Jon Atack) d312fd9 bench: clean up includes (Jon Atack) 1f10f16 bench: add usage description and documentation (Martin Ankerl) d3c6f8b bench: introduce -min_time argument (Martin Ankerl) 9fef832 bench: make EvictionProtection.* work with any number of iterations (Martin Ankerl) 153e686 bench: change AddrManGood to AddrManAddThenGood (Martin Ankerl) 468b232 bench: remove unnecessary & incorrect multiplication in MuHashDiv (Martin Ankerl) eed99cf bench: update nanobench from 4.3.4 to 4.3.6 (Martin Ankerl) Pull request description: This PR updates the nanobench with the latest release from upstream, v4.3.6. It fixes the missing performance counters. Due to discussions on #22999 I have done some work that should make the benchmark results more reliable. It introduces a new flag `-min_time` that allows to run a benchmark for much longer then the default. When results are unreliable, choosing a large timeframe here should usually get repeatable results even when frequency scaling cannot be disabled. The default is now 10ms. For this to work I have changed the `AddrManGood` and `EvictionProtection` benchmarks so they work with any number of iterations. Also, this adds more usage documentation to `bench_bitcoin -h` and I've cherry-picked two changes from #22999 authored by Jon Atack ACKs for top commit: jonatack: re-ACK e148a52 laanwj: Code review ACK e148a52 Tree-SHA512: 2da6de19a5c85ac234b190025e195c727546166dbb75e3f9267e667a73677ba1e29b7765877418a42b1407b65df901e0130763936525e6f1450f18f08837c40c

jonatack added 6 commits September 16, 2021 17:30

bench: add -quiet benchmark option argument

12a455c

bench: add Bench::m_quiet class member

dc4e5c3

bench: pass Bench::m_quiet to isWarningsEnabled()

6fb8798

bench: drop unneeded header includes

308b9ff

bench: various args improvements

089a83a

- use ALLOW_BOOL for -list arg instead of ALLOW_ANY - touch up `-asymptote=<n1,n2,n3...>` help - pack Args struct a bit more efficiently - handle args in alphabetical order

bench: add -iters=<n> benchmark option argument

7cdad2c

laanwj added the Tests label Sep 16, 2021

martinus reviewed Sep 17, 2021

View reviewed changes

jonatack closed this Sep 17, 2021

martinus mentioned this pull request Sep 18, 2021

bench: update nanobench add -min_time #23025

Merged

bitcoin locked and limited conversation to collaborators Oct 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: add `-quiet` and `-iters=<n>` benchmark config args #22999

bench: add `-quiet` and `-iters=<n>` benchmark config args #22999

jonatack commented Sep 16, 2021

jonatack commented Sep 16, 2021 •

edited

Loading

theStack commented Sep 16, 2021

jonatack commented Sep 16, 2021

fanquake commented Sep 17, 2021

martinus Sep 17, 2021

jonatack Sep 17, 2021

martinus Sep 17, 2021

jonatack Sep 17, 2021

martinus Sep 17, 2021

jonatack Sep 17, 2021

jonatack commented Sep 17, 2021

bench: add -quiet and -iters=<n> benchmark config args #22999

bench: add -quiet and -iters=<n> benchmark config args #22999

Conversation

jonatack commented Sep 16, 2021

jonatack commented Sep 16, 2021 • edited Loading

theStack commented Sep 16, 2021

jonatack commented Sep 16, 2021

fanquake commented Sep 17, 2021

martinus Sep 17, 2021

Choose a reason for hiding this comment

jonatack Sep 17, 2021

Choose a reason for hiding this comment

martinus Sep 17, 2021

Choose a reason for hiding this comment

jonatack Sep 17, 2021

Choose a reason for hiding this comment

martinus Sep 17, 2021

Choose a reason for hiding this comment

jonatack Sep 17, 2021

Choose a reason for hiding this comment

jonatack commented Sep 17, 2021

bench: add `-quiet` and `-iters=<n>` benchmark config args #22999

bench: add `-quiet` and `-iters=<n>` benchmark config args #22999

jonatack commented Sep 16, 2021 •

edited

Loading