Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[14.0.X] instrument fitVertices to output more information when failing assert (issue cms-sw/cmssw#44923) #45631

Merged
merged 1 commit into from
Aug 6, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion RecoTracker/PixelVertexFinding/plugins/alpaka/fitVertices.h
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,19 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE::vertexFinder {
alpaka::syncBlockThreads(acc);
// reuse nn
for (auto i : cms::alpakatools::uniform_elements(acc, foundClusters)) {
ALPAKA_ASSERT_ACC(wv[i] > 0.f);
bool const wv_cond = (wv[i] > 0.f);
if (not wv_cond) {
printf("ERROR: wv[%d] (%f) > 0.f failed\n", i, wv[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen that addition of a printf requires a lot of registers.
How much slower is the code after this change, is it significant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it relevant? The printout should occur every other ten of millions events?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I recall, if it is compiled, that's enough

Copy link
Contributor Author

@mmusich mmusich Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any feeling (nor a measurement). If @cms-sw/heterogeneous-l2 have a feeling about it please chime in, otherwise it will have to be measured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's relevant, because the increase in register use is there even if the branch is never taken at runtime.

In general I would prefer not to add any printf - but since we have been unable to reproduce the assertion offline, I think in this case the extra resource usage is warranted.

Assuming this helps figuring out the source of the problem and fixing it, afterwards we can move the printf inside some if constexpr (debug) { ... } clause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much slower is the code after this change, is it significant?

Hard to guess, it would need to be measured 🤷🏻‍♂️

Copy link
Contributor

@missirol missirol Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We checked the HLT throughput (details in [*]).

  • Pre-PR: 618.8 +/- 1.3 events/sec (json).
  • Post-PR: 616.1 +/- 1.0 events/sec (json).

The impact of this PR on the HLT throughput is well below 1%, and somewhat within the uncertainties of these estimates.

[*]

  • Input data: run-383631, LSs 476-479, ~40k events (PU ~64).
  • HLT menu: /cdaq/physics/Run2024/2e34/v1.4.3/HLT/V2 (current pp online menu).
  • Release: CMSSW_14_0_13_patch1_MULTIARCHS (with and without this PR).
  • Node: hilton-c2b01-44-01 (same hardware as a 2022/23 HLT node) (CPUs: 2 AMD EPYC 7763 64-Core; GPUs: 2 NVIDIA Tesla T4).
  • 8 jobs, 32 threads and 24 streams per job.
  • NVIDIA MPS enabled.
  • Each of the two measurements reported above is the average of 4 repetitions.

// printing info on tracks associated to this vertex
for (auto trk_i = 0u; trk_i < nt; ++trk_i) {
if (iv[trk_i] != int(i)) {
continue;
}
printf(" iv[%d]=%d zt[%d]=%f ezt2[%d]=%f\n", trk_i, iv[trk_i], trk_i, zt[trk_i], trk_i, ezt2[trk_i]);
}
ALPAKA_ASSERT_ACC(false);
}

zv[i] /= wv[i];
nn[i] = -1; // ndof
}
Expand Down