Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<random>: normal_distribution is slower than Boost #1003

Open
statementreply opened this issue Jul 6, 2020 · 2 comments
Open

<random>: normal_distribution is slower than Boost #1003

statementreply opened this issue Jul 6, 2020 · 2 comments
Labels
performance Must go faster

Comments

@statementreply
Copy link
Contributor

statementreply commented Jul 6, 2020

Describe the bug

A benchmark by Alexander Neumann (original issue reporter) showed that std::normal_distribution with std::mt19937_64 was 4 times slower than boost::normal_distribution with std::mt19937_64.

Additional context

Part of the performance deficiency is due to #1000.

Also tracked by DevCom-86909 and Microsoft-internal VSO-486661 / AB#486661.

@StephanTLavavej StephanTLavavej added the performance Must go faster label Jul 6, 2020
@StephanTLavavej StephanTLavavej added the info needed We need more info before working on this label Jul 22, 2024
@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej removed the info needed We need more info before working on this label Oct 4, 2024
@StephanTLavavej
Copy link
Member

I overhauled the benchmark (dropping unnecessary code and especially the non-deterministic seeding) and got fresh numbers now that we've merged #4740.

Click to expand new benchmark:
diff --git a/benchmarks/CMakeLists.txt b/benchmarks/CMakeLists.txt
index 31572a96..c082ae50 100644
--- a/benchmarks/CMakeLists.txt
+++ b/benchmarks/CMakeLists.txt
@@ -114,6 +114,7 @@ add_benchmark(iota src/iota.cpp)
 add_benchmark(locale_classic src/locale_classic.cpp)
 add_benchmark(minmax_element src/minmax_element.cpp)
 add_benchmark(mismatch src/mismatch.cpp)
+add_benchmark(normal_distribution src/normal_distribution.cpp)
 add_benchmark(path_lexically_normal src/path_lexically_normal.cpp)
 add_benchmark(priority_queue_push_range src/priority_queue_push_range.cpp)
 add_benchmark(random_integer_generation src/random_integer_generation.cpp)

benchmarks/src/normal_distribution.cpp:

#include <benchmark/benchmark.h>
#include <random>

#pragma warning(push)
#pragma warning(disable : 4244) // conversion from 'meow' to 'woof', possible loss of data
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <boost/random/uniform_real_distribution.hpp>
#pragma warning(pop)

template <class RandomGenerator>
void BM_Generator(benchmark::State& state) {
    RandomGenerator gen;
    gen.discard(1'000'000);
    while (state.KeepRunning()) {
        benchmark::DoNotOptimize(gen());
    }
}

template <class RandomGenerator, class Distribution>
void BM_Distribution(benchmark::State& state) {
    RandomGenerator gen;
    gen.discard(1'000'000);
    Distribution dist(0.0, 1.0);

    while (state.KeepRunning()) {
        benchmark::DoNotOptimize(dist(gen));
    }
}

namespace b_r = boost::random;

BENCHMARK(BM_Generator<std::mt19937_64>);
BENCHMARK(BM_Generator<b_r::mt19937_64>);

BENCHMARK(BM_Distribution<std::mt19937_64, std::normal_distribution<double>>);
BENCHMARK(BM_Distribution<std::mt19937_64, b_r::normal_distribution<double>>);
BENCHMARK(BM_Distribution<std::mt19937_64, std::uniform_real_distribution<double>>);
BENCHMARK(BM_Distribution<std::mt19937_64, b_r::uniform_real_distribution<double>>);

BENCHMARK(BM_Distribution<b_r::mt19937_64, std::normal_distribution<double>>);
BENCHMARK(BM_Distribution<b_r::mt19937_64, b_r::normal_distribution<double>>);
BENCHMARK(BM_Distribution<b_r::mt19937_64, std::uniform_real_distribution<double>>);
BENCHMARK(BM_Distribution<b_r::mt19937_64, b_r::uniform_real_distribution<double>>);

BENCHMARK_MAIN();
Click to expand build/run incantations:
D:\GitHub\STL>set _CL_=/I C:\Users\stl\Downloads\boost_1_86_0

D:\GitHub\STL>"C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Auxiliary\Build\vcvarsall.bat" x64
**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.12.0-pre.2.1
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'

D:\GitHub\STL>cmake --preset x64 && cmake --build --preset x64
[...]
[1021/1021] Linking CXX static library out\lib\amd64\libcpmtd0.lib

D:\GitHub\STL>cmake -B out\bench -S benchmarks -G Ninja -DSTL_BINARY_DIR=out\x64 && cmake --build out\bench
[...]
[100/100] Linking CXX executable benchmark-priority_queue_push_range.exe

D:\GitHub\STL>out\bench\benchmark-normal_distribution.exe
2024-10-04T15:11:50-07:00
Running out\bench\benchmark-normal_distribution.exe
Run on (32 X 3394 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
-------------------------------------------------------------------------------------------------------------------
Benchmark                                                                         Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------------
BM_Generator<std::mt19937_64>                                                  4.08 ns         4.05 ns    165925926
BM_Generator<b_r::mt19937_64>                                                  2.65 ns         2.62 ns    280000000
BM_Distribution<std::mt19937_64, std::normal_distribution<double>>             13.1 ns         12.8 ns     56000000
BM_Distribution<std::mt19937_64, b_r::normal_distribution<double>>             9.19 ns         9.21 ns     74666667
BM_Distribution<std::mt19937_64, std::uniform_real_distribution<double>>       5.81 ns         5.94 ns    100000000
BM_Distribution<std::mt19937_64, b_r::uniform_real_distribution<double>>       9.53 ns         9.52 ns     64000000
BM_Distribution<b_r::mt19937_64, std::normal_distribution<double>>             12.6 ns         12.8 ns     56000000
BM_Distribution<b_r::mt19937_64, b_r::normal_distribution<double>>             8.41 ns         8.37 ns     89600000
BM_Distribution<b_r::mt19937_64, std::uniform_real_distribution<double>>       5.23 ns         5.16 ns    112000000
BM_Distribution<b_r::mt19937_64, b_r::uniform_real_distribution<double>>       8.79 ns         8.89 ns     89600000

I used VS 2022 17.12 Preview 2 on my 5950X. Table:

Benchmark Time
BM_Generator<std::mt19937_64> 4.08 ns
BM_Generator<b_r::mt19937_64> 2.65 ns
BM_Distribution<std::mt19937_64, std::normal_distribution<double>> 13.1 ns
BM_Distribution<std::mt19937_64, b_r::normal_distribution<double>> 9.19 ns
BM_Distribution<std::mt19937_64, std::uniform_real_distribution<double>> 5.81 ns
BM_Distribution<std::mt19937_64, b_r::uniform_real_distribution<double>> 9.53 ns
BM_Distribution<b_r::mt19937_64, std::normal_distribution<double>> 12.6 ns
BM_Distribution<b_r::mt19937_64, b_r::normal_distribution<double>> 8.41 ns
BM_Distribution<b_r::mt19937_64, std::uniform_real_distribution<double>> 5.23 ns
BM_Distribution<b_r::mt19937_64, b_r::uniform_real_distribution<double>> 8.79 ns

With std::mt19937_64 as the generator, Boost's normal_distribution is only 1.43x faster than ours.

And now our uniform_real_distribution is 1.64x faster than Boost's, so the new generate_canonical is indeed awesome.

I conclude that our underlying algorithm for normal_distribution is still suboptimal, but the generate_canonical improvement has substantially narrowed the overall perf gap. If we improved normal_distribution, we would likely outperform Boost, as uniform_real_distribution already does.

@StephanTLavavej StephanTLavavej changed the title <random>: std::normal_distribution four times slower than the corresponding boost version <random>: normal_distribution is slower than Boost Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants