Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bench: add -quiet and -iters=<n> benchmark config args #22999

Closed

Conversation

jonatack
Copy link
Member

Running a benchmark 10 times, twice (before and after), for #22974 and then editing the output by hand to remove the warnings and recommendations, brought home the point that it would be nice to be able to do that automatisch. This PR adds a -quiet arg to silence warnings and recommendations and an iters=<n> arg to run each benchmark for the number of iterations passed.

$ src/bench/bench_bitcoin -?
Options:

  -?
       Print this help message and exit

  -asymptote=<n1,n2,n3,...>
       Test asymptotic growth of the runtime of an algorithm, if supported by
       the benchmark

  -filter=<regex>
       Regular expression filter to select benchmark by name (default: .*)

  -iters=<n>
       Iterations of each benchmark to run (default: 1)

  -list
       List benchmarks without executing them

  -output_csv=<output.csv>
       Generate CSV file with the most important benchmark results

  -output_json=<output.json>
       Generate JSON file with all benchmark results

  -quiet
       Silence warnings and recommendations in benchmark results

examples

$ ./src/bench/bench_bitcoin -filter=AddrManGood -iters=5 -quiet

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    2,538,968,665.00 |                0.39 |   15.9% |     12.12 | `AddrManGood`
|    2,536,901,200.00 |                0.39 |   13.0% |     13.73 | `AddrManGood`
|    2,337,840,590.00 |                0.43 |    3.9% |     12.07 | `AddrManGood`
|    1,997,515,936.00 |                0.50 |    2.6% |     10.09 | `AddrManGood`
|    2,217,950,210.00 |                0.45 |    1.3% |     11.30 | `AddrManGood`
$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=2 -quiet=1

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|            8,062.56 |          124,030.15 |    5.7% |      0.09 | `PrevectorDeserializeNontrivial`
|            7,784.81 |          128,455.29 |    1.5% |      0.09 | `PrevectorDeserializeNontrivial`

|              356.44 |        2,805,497.65 |    1.5% |      0.00 | `PrevectorDeserializeTrivial`
|              354.52 |        2,820,715.33 |    0.9% |      0.00 | `PrevectorDeserializeTrivial`

|              241.27 |        4,144,791.38 |    0.9% |      0.00 | `PrevectorDestructorNontrivial`
|              241.45 |        4,141,658.77 |    0.9% |      0.00 | `PrevectorDestructorNontrivial`

|              146.64 |        6,819,400.81 |    0.9% |      0.00 | `PrevectorDestructorTrivial`
|              147.98 |        6,757,806.43 |    0.6% |      0.00 | `PrevectorDestructorTrivial`
$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=-1 -quiet=0
$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=0 -quiet=0
$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=1 -quiet=0
Warning, results might be unstable:
* DEBUG defined
* CPU frequency scaling enabled: CPU 0 between 400.0 and 3,100.0 MHz
* Turbo is enabled, CPU frequency will fluctuate

Recommendations
* Make sure you compile for Release
* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|            6,204.87 |          161,163.71 |   15.2% |      0.07 | :wavy_dash: `PrevectorDeserializeNontrivial` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
|              214.33 |        4,665,680.65 |    0.1% |      0.00 | `PrevectorDeserializeTrivial`
|              257.23 |        3,887,584.03 |    8.6% |      0.00 | :wavy_dash: `PrevectorDestructorNontrivial` (Unstable with ~43.5 iters. Increase `minEpochIterations` to e.g. 435)
|              151.34 |        6,607,846.82 |    1.9% |      0.00 | `PrevectorDestructorTrivial`

@jonatack
Copy link
Member Author

jonatack commented Sep 16, 2021

@martinus please let me know if this is ok, as the -quiet arg makes some minor changes to nanobench.h to allow the equivalent of NANOBENCH_SUPPRESS_WARNINGS=1 as an arg listed in the help.

@laanwj laanwj added the Tests label Sep 16, 2021
@theStack
Copy link
Contributor

Strong Concept ACK on introducing an -iters parameter, after also having reviewed #22974 :)
As for -quiet I'm less convinced about the usefulness, I think the warnings and recommendations almost always make sense and I personally wouldn't see a need to silence them -- if a result is marked as "Unstable" I'm usually throwing it away, assuming that it only has a limited value. (Note though that I'm not a heavy users of the benchmarks in general, maybe there is a need for silencing that I'm not aware of).

@jonatack
Copy link
Member Author

Thanks! The -quiet arg was actually my first motivation, to be able to less painfully compare results or share them, most of which for me have the verbose >5% err unstable warnings even when all the recommendations are followed, and it takes an annoying amount of time to edit out all the warnings by hand. People can see the %err, so for sharing results the noisy warnings are quite pollutive and it's so handy to be able to silence them.

@fanquake
Copy link
Member

I don't think adding more code, and modifying a dependency, just to replicate functionality that already exists is a good idea. What's the problem with using NANOBENCH_SUPPRESS_WARNINGS? In almost all cases I'm sure we'd rather people heed the warnings, and produce more useful benchmarks, than just ignore them.

and it takes an annoying amount of time to edit out all the warnings by hand.

Couldn't you write a bash one-liner to post-process your benchmark output, that throws away lines that don't start with a |?

@@ -2278,7 +2279,7 @@ struct IterationLogic::Impl {
os << col.value();
}
os << "| ";
auto showUnstable = isWarningsEnabled() && rErrorMedian >= 0.05;
auto showUnstable = isWarningsEnabled(mBench.m_quiet) && rErrorMedian >= 0.05;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that warning should ever be hidden. It shows that the value you are seeing is not reliable because the error is too high. If you regularly see that warning in a benchmark, the benchmark should be improved or the machine should be somehow stabilized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately many to most of the benchmarks have the warning for me, despite taking the steps to stabilize the machine. Encountered this extensively while working on #22284 and every time I run benchmarks to test pulls. I now use NANOBENCH_SUPPRESS_WARNINGS to be able to share my results without running a bench dozens of times to have a few without the warnings.

@@ -624,6 +624,9 @@ class Bench {
Bench& operator=(Bench const& other);
~Bench() noexcept;

//! Whether to suppress warnings and recommendations. Equivalent to NANOBENCH_SUPPRESS_WARNINGS.
bool m_quiet{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about adding support for this though, because this can already be done with the environment variable NANOBENCH_SUPPRESS_WARNINGS and it is not obvious which of these setting should override the other. Maybe it would be better to just add documentation for NANOBENCH_SUPPRESS_WARNINGS and also NANOBENCH_ENDLESS into the usage documentation?

If we really want that, in nanobench all the configuration should be inside of Config class. That way configuration can be reused in other benchmarks. Then Bench should have a well documented getter/setter for this. I'd prefer to keep https://github.com/martinus/nanobench and this code here in sync though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

for (auto n : args.asymptote) {
bench.complexityN(n);
for (int i = 0; i < args.iters; ++i) {
if (i == 0 && args.iters > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think adding an iterations argument that work like that is a good idea. nanobench itself already has the ability to perform multiple measurements (called "epoch" in nanobench), in fact each benchmark is already measured 11 times. That way it is able to show a measurement error. The number of iterations in each of the measurements is determined automatically, based on the computer's clock accuracy.

If you get unstable benchmark results, the first thing to do should be to make sure the computer is really stable: no frequency scaling, no turbo, no other interfering programs. nanobench shows the warnings for good reason 🙂

If that doesn't help, make sure the actual benchmark itself is stable and actually always does the same (few randomness in it, better not have much allocations, threadding, locks, etc).

If that too doesn't help, you can e.g. increase the number of iterations with minEpochIterations. That's a bit problematic though because some benchmarks need a huge setting here, others a very low one. So generally it is probably better use minEpochTime and expose that setting in the arguments (probably as double value in seconds, e.g. like -minEpochTime=0.5)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've gone through these things. In practice, what I'm seeing people doing to share and compare results in PR reviews is run a benchmark repeatedly, for which the iters=<n> proposal here is a handy convenience.

@jonatack
Copy link
Member Author

I'm going to close and use these features for my own Bitcoin Core benchmarking. If anyone else would like to use them, they can cherry-pick this branch for their own use.

Thanks for your replies, @martinus. Feel free to add bench documentation as you suggest in #22999 (comment) and I'll be happy to review it.

@jonatack jonatack closed this Sep 17, 2021
laanwj added a commit that referenced this pull request Sep 24, 2021
e148a52 bench: fixed ubsan implicit conversion (Martin Ankerl)
da4e2f1 bench: various args improvements (Jon Atack)
d312fd9 bench: clean up includes (Jon Atack)
1f10f16 bench: add usage description and documentation (Martin Ankerl)
d3c6f8b bench: introduce -min_time argument (Martin Ankerl)
9fef832 bench: make EvictionProtection.* work with any number of iterations (Martin Ankerl)
153e686 bench: change AddrManGood to AddrManAddThenGood (Martin Ankerl)
468b232 bench: remove unnecessary & incorrect  multiplication in MuHashDiv (Martin Ankerl)
eed99cf bench: update nanobench from 4.3.4 to 4.3.6 (Martin Ankerl)

Pull request description:

  This PR updates the nanobench with the latest release from upstream, v4.3.6. It fixes the missing performance counters.

  Due to discussions on #22999 I have done some work that should make the benchmark results more reliable. It introduces a new flag `-min_time` that allows to run a benchmark for much longer then the default. When results are unreliable, choosing a large timeframe here should usually get repeatable results even when frequency scaling cannot be disabled. The default is now 10ms. For this to work I have changed the `AddrManGood` and `EvictionProtection` benchmarks so they work with any number of iterations.

  Also, this adds more usage documentation to `bench_bitcoin -h` and I've cherry-picked two changes from #22999 authored by Jon Atack

ACKs for top commit:
  jonatack:
    re-ACK e148a52
  laanwj:
    Code review ACK e148a52

Tree-SHA512: 2da6de19a5c85ac234b190025e195c727546166dbb75e3f9267e667a73677ba1e29b7765877418a42b1407b65df901e0130763936525e6f1450f18f08837c40c
@bitcoin bitcoin locked and limited conversation to collaborators Oct 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants