Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize (use of) Random Number Generators #40636

Open
VinInn opened this issue Jan 29, 2023 · 12 comments
Open

Modernize (use of) Random Number Generators #40636

VinInn opened this issue Jan 29, 2023 · 12 comments

Comments

@VinInn
Copy link
Contributor

VinInn commented Jan 29, 2023

CMS currently uses CLHEP:
The CLHEP project was proposed by Leif Lönnblad at CHEP 92
It has same age of CMS...

The number of issues are pretty large and the adaptation of the library to new "user cases" has made the situation just worse...

Let's take the Poisson generator heavily used in the Digitization of HCAL:
https://cmssdt.cern.ch/dxr/CMSSW/source/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc#160
it calls RandPoissonQ

This is a recent igprof
http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_hcal4_CMSSW_13_0_X_2023-01-26-2300_slc7_amd64_gcc11/55

one can use the "old" RandPoisson that would most probably be faster if thread local was NOT used
http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_oldPoisson_CMSSW_13_0_X_2023-01-26-2300_slc7_amd64_gcc11/53

one can try to use std::poisson_distribution [1]
http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_stdPois_CMSSW_13_0_X_2023-01-28-1100_slc7_amd64_gcc11/116
BUT CLHEP engine returns only double, float or unsigned int while std engine returns uint32 or uint64 and
generate_canonical will require two invocation to clhep unsigned int to generate a double so is twice too slow

I have then made my own implementation (from old clhep RandPoisson small average case) [2]
http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_trivialPoisson_CMSSW_13_0_X_2023-01-28-1100_slc7_amd64_gcc11/226
and it is as fast as we expect... (no wasted call to exp, no thread local, full use of all bits of MixMax)
4x speedup!

So either

  1. CLHEP implementation is changed
  2. CLHEP is made consistent with std::random
  3. CMS implements her own random library picking the best of all

[1]

[diff --git a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
index 492821da6d4..0603922bf80 100644
--- a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
+++ b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
@@ -12,6 +12,26 @@
 #include "CalibCalorimetry/HcalAlgos/interface/HcalPulseShapes.h"

 #include "CLHEP/Random/RandPoissonQ.h"
+#include <random>
+#include <limits>
+
+namespace {
+
+  class RandomGeneratorWrapper {
+  public:
+    using result_type = unsigned int;
+
+    RandomGeneratorWrapper(CLHEP::HepRandomEngine* engine) : m_engine(engine) {}
+
+    result_type operator()() { return (unsigned int)(*m_engine); }
+
+    static result_type min() { return std::numeric_limits<result_type>::min(); }
+    static result_type max() { return std::numeric_limits<result_type>::max(); }
+
+    CLHEP::HepRandomEngine* m_engine;
+  };
+
+}  // namespace

 #include <cmath>
 #include <list>
@@ -156,8 +176,13 @@ void HcalSiPMHitResponse::addPEnoise(CLHEP::HepRandomEngine* engine) {
     int nPreciseBins = nbins * getReadoutFrameSize(id);

     unsigned int sumnoisePE(0);
+
+    RandomGeneratorWrapper eng(engine);
+    std::poisson_distribution<> poisson(dc_pe_avg);
+
     for (int tprecise(0); tprecise < nPreciseBins; ++tprecise) {
-      int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise
+      int noisepe = poisson(eng);  // add dark current noise
+      //      int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise

       if (noisepe > 0) {
         if (precisionTimedPhotons.find(id) == precisionTimedPhotons.end()) {

[2]

diff --git a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
index 492821da6d4..b815a19ab81 100644
--- a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
+++ b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
@@ -11,7 +11,37 @@
 #include "FWCore/Utilities/interface/isFinite.h"
 #include "CalibCalorimetry/HcalAlgos/interface/HcalPulseShapes.h"

-#include "CLHEP/Random/RandPoissonQ.h"
+#include "CLHEP/Random/RandPoisson.h"
+
+
+namespace {
+
+class TrivialPoisson {
+public:
+
+  TrivialPoisson(CLHEP::HepRandomEngine* engine, double av) : m_engine(engine), m_expAvm(std::exp(-av)){}
+
+  int operator()() {
+    int em = -1;
+    double t = 1.0;
+    do {
+      em += 1;
+      t *= m_engine->flat();
+    } while( t > m_expAvm);
+    return em;
+  }
+
+
+  private:
+  CLHEP::HepRandomEngine* m_engine;
+  const double m_expAvm;
+
+};
+
+
+}
+
+

 #include <cmath>
 #include <list>
@@ -155,9 +185,11 @@ void HcalSiPMHitResponse::addPEnoise(CLHEP::HepRandomEngine* engine) {

     int nPreciseBins = nbins * getReadoutFrameSize(id);

+    TrivialPoisson poisson(engine, dc_pe_avg);
     unsigned int sumnoisePE(0);
     for (int tprecise(0); tprecise < nPreciseBins; ++tprecise) {
-      int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise
+      //int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise
+      int noisepe = poisson();  // add dark current noise

       if (noisepe > 0) {
         if (precisionTimedPhotons.find(id) == precisionTimedPhotons.end()) {
[innocent@patatrack02 src]$
@cmsbuild
Copy link
Contributor

A new Issue was created by @VinInn Vincenzo Innocente.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor

assign simulation, generators

@cmsbuild
Copy link
Contributor

New categories assigned: generators,simulation

@mdhildreth,@mkirsano,@menglu21,@alberto-sanchez,@SiewYan,@GurpreetSinghChahal,@Saptaparna,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@civanch
Copy link
Contributor

civanch commented Jan 29, 2023

@VinInn , I am not sure if G. Cosmo and J. Apostolakis can access this issue in the web. It somehow should be sent to them. CMSSW may migrate to the new Poisson with new CLHEP version. If the new implementation is really fast and thread safe, then I do not think there will be any opposition. It is the best solution, because CMSSW will migrate to it only changing external package. If you agree, I will send them your proposal or you send yourself.

Saying that CLHEP is 30 years old is not fully true: the interface is old, the code for most parts was updated many times to be modernized, random generators are very new and are nearly advanced, which was confirmed by random number benchmarks - there is a long story behind. Why not simply to update CLHEP Poisson?

Of course, it is possible implementing CMS util for Poisson, which will use internally CLHEP generator. This may be applied in several sensitive DIGI classes until RandPoissonQ will be modernized but this will require change CMSSW classes introducing a bit of the mess.

Concerning proposal 3) - I would avoid creation of a new CMS library for random generators - there are so many configuration and other issues, for example, significant work will be needed to be sure that the new generator is really random and all different CMS job start from correct random seed.

@VinInn
Copy link
Contributor Author

VinInn commented Jan 30, 2023

@civanch : the real problem IS the CLHEP Interface.

  1. the engine does not export the type, min,max, and a method that returns the real bytes generated
    1a) the virtual interface adds some complication as well but can be solved if one stick to a fixed number of generated bytes.... (of course not the minimal common denominator!)
  2. The cdf generator (gauss, poisson etc) interface is "static" while in many cases the implementation requires a state (possibly computed in advance)
    2a) this led to suboptimal use and internally use of thread_local variables.

btw I do not intend to make a new "engine" just reuse those available (std, clhep) and provide a more modern interface (and implementation)

For what concern Poisson specifically: in case mu is known in advance (essentially at configuration time) a precomputed table approach is most probably the fastest. I will work on this for HCAL (the average is 5.5e-05 : the poisson result is essentially always 0)

@civanch
Copy link
Contributor

civanch commented Jan 30, 2023

@VinInn , I agree that concret Poisson generator will be always faster than general. If it would be possible to use clhep for flat random inside this CMS class it would be a clean solution.

Concerning interface change in CLHEP: at least, this can be proposed.

Poisson is also used in SIM step for hit production, as well as Gauss. The MixMax generator dominates and it takes ~1.3% of ttbar simulation. Gauss takes ~0.3% , Poisson is not seen in profiler dump. The relative role of these generators should be more significant at DIGI but for today we do not have a profile. We have to do.

@VinInn
Copy link
Contributor Author

VinInn commented Jan 30, 2023

@civanch

The relative role of these generators should be more significant at DIGI but for today we do not have a profile. We have to do.

This is what I'm doing at the moment!
Have already submitted 3 PRs (one merged) and the one fixing poisson will be the fourth...

@makortel
Copy link
Contributor

I am not sure if G. Cosmo and J. Apostolakis can access this issue in the web. It somehow should be sent to them.

Our GitHub is public so anyone can read and comment.

@civanch
Copy link
Contributor

civanch commented Jan 31, 2023

I had a chat with colleagues. Comments from them:

in CLHEP there're three Poisson implementations:

  • RandPoisson
    the original (1995), algorithm derived from Numerical Recipes in C
  • RandPoissonQ
    by FNAL (2000), using a table-driven algorithm
  • RandPoissonT
    by FNAL (2000), supposed to be the most accurate version

In old measurements RandPoissonQ was faster than RandPoisson.

Current Geant4 default (not CLHEP default but following its interface) is G4Poisson, which is less accurate then RandPoissonQ.

The recommendation is to try G4Poisson or implement the most effective algorithm within CMSSW.

@VinInn
Copy link
Contributor Author

VinInn commented Jan 31, 2023

IMHO in digitization "good enough" suffice. I do not think that a detailed simulation of tails is of the outmost relevance
(Poisson and gaussian tails are actually unphysical do to time and space limits)

@VinInn
Copy link
Contributor Author

VinInn commented Feb 1, 2023

RandPoissonT is just a wrapper around RandPoissonQ
The problem is that even if one would instantiate an object and use the fire method the class caches the mean not exp(-x).
A lost opportunity....
ditto for G4Poisson that is just an inlined function (stateless)

@VinInn
Copy link
Contributor Author

VinInn commented Feb 1, 2023

let me share here another opportunity to speedup Normal (aka Gaussian) Random number generation.
in [1] a prototype

I use the classical Box-Muller algorithm in single precision float. vdt is used as math library.
The speed up w/r/t CLHEP comes from two components:

  1. I use one 53bit random number to generate two floats
  2. the Box-Muller transformation is fully vectorized

igprof (on a AMD zen3) results
current release (sse)
http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_ori_CMSSW_13_0_SKYLAKEAVX512_X_2023-01-29-2300_el8_amd64_gcc11/61
new code sse: speed up 2.9x
http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_fast_CMSSW_13_0_SKYLAKEAVX512_X_2023-01-29-2300_el8_amd64_gcc11/161
new code avx2: speed up 3.6
http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_fastAVX2_CMSSW_13_0_SKYLAKEAVX512_X_2023-01-29-2300_el8_amd64_gcc11/215

Of course one has to accept that the number of different generated numbers is "limited" (few 10^7) and the max sigma is about 6.6. Given that the starting point is 53bits (actually more if one uses the original uint64 generated) one can extend both a bit using some well established "tricks".

It should be noted that the routine in question SiGaussianTailNoiseAdder::addNoiseVR computes in float so in any case it does not exploit the full dynamic range of double precision....

[1]
https://github.com/VinInn/cmssw/blob/a14f9ed707d6b93a79ee67f11c71fcdf9e81925e/SimTracker/SiStripDigitizer/src/SiGaussianTailNoiseAdder.cc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants