Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression (slowdown) in ALICE event generation #15579

Closed
1 task done
sawenzel opened this issue May 21, 2024 · 10 comments · Fixed by #15816
Closed
1 task done

Performance regression (slowdown) in ALICE event generation #15579

sawenzel opened this issue May 21, 2024 · 10 comments · Fixed by #15816
Assignees
Labels
bug experiment Affects an experiment / reported by its software & computimng experts priority:high

Comments

@sawenzel
Copy link
Contributor

sawenzel commented May 21, 2024

Check duplicate issues.

  • Checked for duplicates

Description

10-20fold slowdown of running a particular ALICE event generation process, where the generator is compiled just-in-time from a ROOT macro (see reproducer). Slowdown observed when switching from ROOT tag 6.28.04 to 6.28.08.

In particular, bisecting this slowdown shows that it is caused by this commit 4e4545f (from #11631). Reverting this single commit brings back the original performance.

Reproducer

On lxplus7.cern.ch, run:

#!/usr/bin/env bash

# good software version
/cvmfs/alice.cern.ch/bin/alienv printenv O2sim::v20231207-1 > env.txt

# bad software version
#/cvmfs/alice.cern.ch/bin/alienv printenv O2sim::v20231208-1 > env.txt

source env.txt

time o2-sim-dpl-eventgen --nEvents 5 --generator external --configFile ${O2DPG_ROOT}/MC/config/PWGEM/ini/GeneratorEMCocktail.ini --seed 11

ROOT version

Everything having 4e4545f included.

Installation method

build from source

Operating system

Linux

Additional context

No response

@sawenzel sawenzel added the bug label May 21, 2024
@sawenzel
Copy link
Contributor Author

callgrind_generator.zip

Attaching callgrind output files showing that the slowdown comes from:

(a) ROOT spends much more time/instructions (roughly a factor 9x) in libCore with ROOT6.30.01.
Notably the function TClassEdit::CleanType is now called much more often than before.

(b) We also spend more time in libc/libstdc++, but that is probably a consequence of more libCore activity.

(c) libCling increased by a factor of 2.

@dpiparo dpiparo self-assigned this May 27, 2024
@dpiparo dpiparo added priority:high experiment Affects an experiment / reported by its software & computimng experts labels Jun 8, 2024
@dpiparo
Copy link
Member

dpiparo commented Jun 9, 2024

Hi @sawenzel . I encountered a problem reproducing:

$ lsb_release --all
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.9.2009 (Core) 
Release:	7.9.2009
Codename:	Core

$ /cvmfs/alice.cern.ch/bin/alienv printenv O2sim::v20231207-1 
libgcc_s.so.1 must be installed for pthread_cancel to work
/cvmfs/alice.cern.ch/bin/alienv: line 291:  9641 Aborted                 (core dumped) tclsh  > /dev/null 2>&1 <<EOF
EOF

init.c(556):ERROR:161: Cannot initialize TCL
 test 0 = 1

$ /cvmfs/alice.cern.ch/bin/alienv printenv O2sim::v20231208-1
libgcc_s.so.1 must be installed for pthread_cancel to work
/cvmfs/alice.cern.ch/bin/alienv: line 291:  9795 Aborted                 (core dumped) tclsh  > /dev/null 2>&1 <<EOF
EOF

init.c(556):ERROR:161: Cannot initialize TCL

@dpiparo
Copy link
Member

dpiparo commented Jun 9, 2024

Ok, using something like v20240108-1 works and uses a recent 6.30. I am starting with that one.

@dpiparo
Copy link
Member

dpiparo commented Jun 9, 2024

I spoke too soon. The workflow crashes.

@sawenzel
Copy link
Contributor Author

Hi @dpiparo. There was a copy-paste error for the command to run which is now corrected. But I am puzzled why you see these Tcl errors (maybe @ktf could comment). I am literally only doing ssh lxplus7.cern.ch + executing the above lines inside /tmp/swenzel. There is also nothing peculiar in my .bashrc and I checked that it even works inside env -i bash --noprofile --norc ... so that remains puzzling. For the record, I was on lxplus713.cern.ch for these checks.

@dpiparo
Copy link
Member

dpiparo commented Jun 11, 2024

I confirm that with the new command the processing starts. Debugging for real now.

dpiparo added a commit to dpiparo/root that referenced this issue Jun 11, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes root-project#15579
dpiparo added a commit that referenced this issue Jun 11, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes #15579
@dpiparo dpiparo reopened this Jun 11, 2024
@dpiparo
Copy link
Member

dpiparo commented Jun 11, 2024

The PR speeds up by factors (5x-7x) the simple reproducer of the issue we now have thanks to @sawenzel :

// Inspired by Alice production code, extracted from the surrounding context
#include "TF1.h"

void Foo(int N) {
   double am  = 90;
   double awidth = 1;
   for (int i = 0; i < N; ++i) {
      static TF1 rbw("rbw", "pow([1],2)*pow([0],2)/(pow(x*x-[0]*[0],2)+pow(x*x*[1]/[0],2))", -10, 10);
      rbw.SetRange(am - 5 * awidth, am + 5 * awidth);
      rbw.SetParameter(0, am);
      rbw.SetParameter(1, awidth);
      am = rbw.GetRandom();
      if (am <-100000000) std::cout << am << std::endl; // to avoid compiler optimisations
   }
}

int main()
{
   Foo(1000);
   return 0;
}

The code is faster because the fast code path is taken in the PluginHandler because the right type is used in the GSLIntegratorPlugin (unsigned long, via the size_t typedef), and not just a compatible one (unsigned int).

@dpiparo
Copy link
Member

dpiparo commented Jun 12, 2024

Profiles

@sawenzel
Copy link
Contributor Author

I confirm that the issue is fixed via #15816. Thank you!

@dpiparo
Copy link
Member

dpiparo commented Jun 12, 2024

great. I am backporting the change and closing the issue. Thanks a lot for your help in debugging this problem!

dpiparo added a commit to dpiparo/root that referenced this issue Jun 12, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes root-project#15579
dpiparo added a commit to dpiparo/root that referenced this issue Jun 12, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes root-project#15579
dpiparo added a commit that referenced this issue Jun 12, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes #15579
dpiparo added a commit that referenced this issue Jun 12, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes #15579
ktf pushed a commit to alisw/root that referenced this issue Jun 13, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes root-project#15579
sawenzel added a commit to sawenzel/AliceO2 that referenced this issue Jun 20, 2024
* easier debugging / valgrinding
* mainly to debug the ROOT performance issue root-project/root#15579
silverweed pushed a commit to silverweed/root that referenced this issue Aug 19, 2024
plugin, otherwise heavy checks in the PluginHandler take place that
perform interpreter lookups.

Fixes root-project#15579
sawenzel added a commit to sawenzel/AliceO2 that referenced this issue Nov 6, 2024
* easier debugging / valgrinding
* mainly to debug the ROOT performance issue root-project/root#15579
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug experiment Affects an experiment / reported by its software & computimng experts priority:high
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants