`<random>`: `normal_distribution` is slower than Boost #1003

statementreply · 2020-07-06T03:48:18Z

Describe the bug

A benchmark by Alexander Neumann (original issue reporter) showed that std::normal_distribution with std::mt19937_64 was 4 times slower than boost::normal_distribution with std::mt19937_64.

Additional context

Part of the performance deficiency is due to #1000.

Also tracked by DevCom-86909 and Microsoft-internal VSO-486661 / AB#486661.

The text was updated successfully, but these errors were encountered:

StephanTLavavej · 2024-10-04T22:26:18Z

I overhauled the benchmark (dropping unnecessary code and especially the non-deterministic seeding) and got fresh numbers now that we've merged #4740.

Click to expand new benchmark:

diff --git a/benchmarks/CMakeLists.txt b/benchmarks/CMakeLists.txt
index 31572a96..c082ae50 100644
--- a/benchmarks/CMakeLists.txt
+++ b/benchmarks/CMakeLists.txt
@@ -114,6 +114,7 @@ add_benchmark(iota src/iota.cpp)
 add_benchmark(locale_classic src/locale_classic.cpp)
 add_benchmark(minmax_element src/minmax_element.cpp)
 add_benchmark(mismatch src/mismatch.cpp)
+add_benchmark(normal_distribution src/normal_distribution.cpp)
 add_benchmark(path_lexically_normal src/path_lexically_normal.cpp)
 add_benchmark(priority_queue_push_range src/priority_queue_push_range.cpp)
 add_benchmark(random_integer_generation src/random_integer_generation.cpp)

benchmarks/src/normal_distribution.cpp:

#include <benchmark/benchmark.h>
#include <random>

#pragma warning(push)
#pragma warning(disable : 4244) // conversion from 'meow' to 'woof', possible loss of data
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <boost/random/uniform_real_distribution.hpp>
#pragma warning(pop)

template <class RandomGenerator>
void BM_Generator(benchmark::State& state) {
    RandomGenerator gen;
    gen.discard(1'000'000);
    while (state.KeepRunning()) {
        benchmark::DoNotOptimize(gen());
    }
}

template <class RandomGenerator, class Distribution>
void BM_Distribution(benchmark::State& state) {
    RandomGenerator gen;
    gen.discard(1'000'000);
    Distribution dist(0.0, 1.0);

    while (state.KeepRunning()) {
        benchmark::DoNotOptimize(dist(gen));
    }
}

namespace b_r = boost::random;

BENCHMARK(BM_Generator<std::mt19937_64>);
BENCHMARK(BM_Generator<b_r::mt19937_64>);

BENCHMARK(BM_Distribution<std::mt19937_64, std::normal_distribution<double>>);
BENCHMARK(BM_Distribution<std::mt19937_64, b_r::normal_distribution<double>>);
BENCHMARK(BM_Distribution<std::mt19937_64, std::uniform_real_distribution<double>>);
BENCHMARK(BM_Distribution<std::mt19937_64, b_r::uniform_real_distribution<double>>);

BENCHMARK(BM_Distribution<b_r::mt19937_64, std::normal_distribution<double>>);
BENCHMARK(BM_Distribution<b_r::mt19937_64, b_r::normal_distribution<double>>);
BENCHMARK(BM_Distribution<b_r::mt19937_64, std::uniform_real_distribution<double>>);
BENCHMARK(BM_Distribution<b_r::mt19937_64, b_r::uniform_real_distribution<double>>);

BENCHMARK_MAIN();

Click to expand build/run incantations:

D:\GitHub\STL>set _CL_=/I C:\Users\stl\Downloads\boost_1_86_0

D:\GitHub\STL>"C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Auxiliary\Build\vcvarsall.bat" x64
**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.12.0-pre.2.1
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'

D:\GitHub\STL>cmake --preset x64 && cmake --build --preset x64
[...]
[1021/1021] Linking CXX static library out\lib\amd64\libcpmtd0.lib

D:\GitHub\STL>cmake -B out\bench -S benchmarks -G Ninja -DSTL_BINARY_DIR=out\x64 && cmake --build out\bench
[...]
[100/100] Linking CXX executable benchmark-priority_queue_push_range.exe

D:\GitHub\STL>out\bench\benchmark-normal_distribution.exe
2024-10-04T15:11:50-07:00
Running out\bench\benchmark-normal_distribution.exe
Run on (32 X 3394 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
-------------------------------------------------------------------------------------------------------------------
Benchmark                                                                         Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------------
BM_Generator<std::mt19937_64>                                                  4.08 ns         4.05 ns    165925926
BM_Generator<b_r::mt19937_64>                                                  2.65 ns         2.62 ns    280000000
BM_Distribution<std::mt19937_64, std::normal_distribution<double>>             13.1 ns         12.8 ns     56000000
BM_Distribution<std::mt19937_64, b_r::normal_distribution<double>>             9.19 ns         9.21 ns     74666667
BM_Distribution<std::mt19937_64, std::uniform_real_distribution<double>>       5.81 ns         5.94 ns    100000000
BM_Distribution<std::mt19937_64, b_r::uniform_real_distribution<double>>       9.53 ns         9.52 ns     64000000
BM_Distribution<b_r::mt19937_64, std::normal_distribution<double>>             12.6 ns         12.8 ns     56000000
BM_Distribution<b_r::mt19937_64, b_r::normal_distribution<double>>             8.41 ns         8.37 ns     89600000
BM_Distribution<b_r::mt19937_64, std::uniform_real_distribution<double>>       5.23 ns         5.16 ns    112000000
BM_Distribution<b_r::mt19937_64, b_r::uniform_real_distribution<double>>       8.79 ns         8.89 ns     89600000

I used VS 2022 17.12 Preview 2 on my 5950X. Table:

Benchmark	Time
`BM_Generator<std::mt19937_64>`	4.08 ns
`BM_Generator<b_r::mt19937_64>`	2.65 ns
`BM_Distribution<std::mt19937_64, std::normal_distribution<double>>`	13.1 ns
`BM_Distribution<std::mt19937_64, b_r::normal_distribution<double>>`	9.19 ns
`BM_Distribution<std::mt19937_64, std::uniform_real_distribution<double>>`	5.81 ns
`BM_Distribution<std::mt19937_64, b_r::uniform_real_distribution<double>>`	9.53 ns
`BM_Distribution<b_r::mt19937_64, std::normal_distribution<double>>`	12.6 ns
`BM_Distribution<b_r::mt19937_64, b_r::normal_distribution<double>>`	8.41 ns
`BM_Distribution<b_r::mt19937_64, std::uniform_real_distribution<double>>`	5.23 ns
`BM_Distribution<b_r::mt19937_64, b_r::uniform_real_distribution<double>>`	8.79 ns

With std::mt19937_64 as the generator, Boost's normal_distribution is only 1.43x faster than ours.

And now our uniform_real_distribution is 1.64x faster than Boost's, so the new generate_canonical is indeed awesome.

I conclude that our underlying algorithm for normal_distribution is still suboptimal, but the generate_canonical improvement has substantially narrowed the overall perf gap. If we improved normal_distribution, we would likely outperform Boost, as uniform_real_distribution already does.

statementreply mentioned this issue Jul 6, 2020

Port DevCom bugs to GitHub #939

Closed

58 tasks

StephanTLavavej added the performance Must go faster label Jul 6, 2020

StephanTLavavej added the info needed We need more info before working on this label Jul 22, 2024

This comment was marked as resolved.

Sign in to view

StephanTLavavej removed the info needed We need more info before working on this label Oct 4, 2024

StephanTLavavej changed the title ~~<random>: std::normal_distribution four times slower than the corresponding boost version~~ <random>: normal_distribution is slower than Boost Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<random>`: `normal_distribution` is slower than Boost #1003

`<random>`: `normal_distribution` is slower than Boost #1003

statementreply commented Jul 6, 2020 •

edited by MahmoudGSaleh

Loading

This comment was marked as resolved.

StephanTLavavej commented Oct 4, 2024

<random>: normal_distribution is slower than Boost #1003

<random>: normal_distribution is slower than Boost #1003

Comments

statementreply commented Jul 6, 2020 • edited by MahmoudGSaleh Loading

This comment was marked as resolved.

StephanTLavavej commented Oct 4, 2024

`<random>`: `normal_distribution` is slower than Boost #1003

`<random>`: `normal_distribution` is slower than Boost #1003

statementreply commented Jul 6, 2020 •

edited by MahmoudGSaleh

Loading