Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ofast-flag rather than fast-math for mkFit #41374

Merged
merged 3 commits into from
May 4, 2023

Conversation

mmasciov
Copy link
Contributor

PR description:

As per title, this PR is meant to switch to ofast-flag rather than fast-math for mkFit building, as this was proven to allow for improved stability across different architectures.
An analogous change is applied to the standalone mkFit Makefile.config.

Additional notes on the mkFit Makefile.config:
ofast-flag in CMSSW consists of -Ofast -fno-reciprocal-math -mrecip=none (see cms-sw/cmsdist#8280).
This is reflected in the mkFit Makefile.config, with intentional redundancy:
-Ofast turns on -ffast-math, which in turn activates -fno-math-errno (see reference); the redundant flags are not removed intentionally, so that in the case one is willing to remove the effects of ofast-flag in the standalone mkFit application, it will be sufficient to comment out the line added in Makefile.config with this PR, without changing vectorization due to other pre-existing flags.

PR validation:

This PR is not meant to affect physics performance, except for compilation-driven effects. Technical performance is not affected either.
This is validated in: http://uaf-10.t2.ucsd.edu/~mmasciov/MIC/testFastMath/MTV_ttbarPU_cgpu-1_fastmath_vs_ofast/ (red/black vs. blue)

FYI, @mmusich, @slava77

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41374/35230

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mmasciov (Mario Masciovecchio) for master.

It involves the following packages:

  • RecoTracker/MkFitCore (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@VourMa, @makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @missirol, @gpetruc, @mmusich, @mtosi, @dgulhan this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Apr 19, 2023

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29b55/32045/summary.html
COMMIT: cb2e7d6
CMSSW: CMSSW_13_1_X_2023-04-19-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41374/32045/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 47 lines from the logs
  • Reco comparison results: 5210 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3459877
  • DQMHistoTests: Total failures: 42823
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3417032
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@clacaputo
Copy link
Contributor

enable profiling

@clacaputo
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29b55/32086/summary.html
COMMIT: cb2e7d6
CMSSW: CMSSW_13_1_X_2023-04-21-1100/el8_amd64_gcc11
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41374/32086/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 25 lines from the logs
  • Reco comparison results: 5216 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3460915
  • DQMHistoTests: Total failures: 42826
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3418067
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@mmasciov
Copy link
Contributor Author

@clacaputo, thanks for enabling and triggering tests with profiling.
However, all igprofCPU* entries seem to be empty: is this symptomatic that profiling tests did not complete properly for any reason?
(BTW, just to clarify: we don't expect any significant difference to be found in profiling tests either)

@gartung
Copy link
Member

gartung commented Apr 24, 2023

I had to resubmit some of the profiling jobs which might have caused the missing igprofCPU entries.

@gartung
Copy link
Member

gartung commented Apr 24, 2023

I noticed the empty igprofCPU entries in the IB build profiling jobs as well and I am trying to debug the problem.

@mmasciov
Copy link
Contributor Author

I noticed the empty igprofCPU entries in the IB build profiling jobs as well and I am trying to debug the problem.

@gartung, @clacaputo, is there any news about this (as it's the only thing holding this PR to my understanding)?

@gartung
Copy link
Member

gartung commented Apr 28, 2023

You can try re-running the PR tests. It might work this time.

@gartung
Copy link
Member

gartung commented Apr 28, 2023

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29b55/32234/summary.html
COMMIT: cb2e7d6
CMSSW: CMSSW_13_1_X_2023-04-28-1100/el8_amd64_gcc11
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41374/32234/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 43 lines from the logs
  • Reco comparison results: 5215 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3460877
  • DQMHistoTests: Total failures: 42829
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3418026
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@gartung
Copy link
Member

gartung commented Apr 29, 2023

@clacaputo I haven't figured out why the Igprof cpu profiles are empty. For now I would use the Resources Pier Charts. They are the json files linked in the profiling summary for each workflow.
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29b55/32086/summary.html

@clacaputo
Copy link
Contributor

@clacaputo I haven't figured out why the Igprof cpu profiles are empty. For now I would use the Resources Pier Charts. They are the json files linked in the profiling summary for each workflow. https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29b55/32086/summary.html

Hi @gartung , thanks for checking. Unfortunately, after clicking on the json, the Circles page is empty

@clacaputo
Copy link
Contributor

I've tested the PR on vocms011 producing 400 events for 11834.21 in CMSSW_13_1_X_2023-04-30-2300.

The TimeDiff CMSSW_13_1_X_2023-04-30-2300 VS CMSSW_13_1_X_2023-04-30-2300+PR is this:

The same excluding the first 1 events
  delta/mean delta/orJob     original                   new       module name
  ---------- ------------     --------                  ----       ------------
   -1.322640      -0.00%         0.10 ms/ev ->         0.02 ms/ev ootPhotonCore
   +0.069762      +0.01%        23.65 ms/ev ->        25.36 ms/ev detachedQuadStepTracks
   -0.064655      -0.01%        11.89 ms/ev ->        11.14 ms/ev mkFitEventOfHits
   -0.063905      -0.01%        12.03 ms/ev ->        11.28 ms/ev mkFitEventOfHitsPreSplitting
   -0.062740      -0.03%        57.06 ms/ev ->        53.58 ms/ev ecalMultiFitUncalibRecHit@cpu
   +0.059093      +0.00%         6.35 ms/ev ->         6.73 ms/ev tobTecStep
   +0.057614      +0.01%        14.94 ms/ev ->        15.82 ms/ev initialStep
   +0.056911      +0.00%         5.00 ms/ev ->         5.30 ms/ev mixedTripletStep
   +0.055222      +0.01%        18.14 ms/ev ->        19.17 ms/ev detachedTripletStep
   +0.055166      +0.09%       197.39 ms/ev ->       208.59 ms/ev detachedTripletStepTracks
   +0.054596      +0.01%        23.33 ms/ev ->        24.64 ms/ev highPtTripletStepTrackCandidates
   +0.053510      +0.08%       181.53 ms/ev ->       191.51 ms/ev initialStepTracksPreSplitting
   +0.052727      +0.08%       182.90 ms/ev ->       192.81 ms/ev initialStepTracks
   +0.052273      +0.00%         6.77 ms/ev ->         7.13 ms/ev highPtTripletStep
   +0.052233      +0.01%        19.16 ms/ev ->        20.19 ms/ev pixelLessStep
   +0.052184      +0.04%       105.32 ms/ev ->       110.96 ms/ev lowPtTripletStepTracks
   +0.051537      +0.03%        68.10 ms/ev ->        71.71 ms/ev highPtTripletStepTracks
   +0.051366      +0.01%        19.29 ms/ev ->        20.31 ms/ev lowPtTripletStep
   -0.051177      -0.04%       103.15 ms/ev ->        98.00 ms/ev detachedTripletStepSeeds
   +0.051065      +0.01%        16.32 ms/ev ->        17.18 ms/ev lowPtQuadStep
  ---------- ------------     --------                  ----       ------------
Job total:  13.0736 s/ev ==> 13.135 s/ev

There is slight increase in reco timing that could be a fluctuation.

@mmasciov
Copy link
Contributor Author

mmasciov commented May 3, 2023

@clacaputo, indeed we do not expect any change beyond fluctuations, as also confirmed in our validation (see, e.g., http://uaf-10.t2.ucsd.edu/~mmasciov/MIC/testFastMath/MTV_ttbarPU_cgpu-1_fastmath_vs_ofast/plots_timing/summaryReal.pdf).

@smuzaffar smuzaffar modified the milestones: CMSSW_13_1_X, CMSSW_13_2_X May 4, 2023
@clacaputo
Copy link
Contributor

+reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented May 4, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented May 4, 2023

+1

@cmsbuild cmsbuild merged commit 34c4538 into cms-sw:master May 4, 2023
@srlantz srlantz deleted the ofast-flag_mkFit branch October 30, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants