Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the caching memory allocator lock-free #46658

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Nov 11, 2024

PR description:

Move the live block descriptors to the alpaka buffers: instead of tracking the descriptors for the live blocks in a global map, store each block's descriptor within the deleter of the block itself.

Reimplement the free list as an std::vector of tbb::concurrent_queue objects, one per
bin of the caching allocator.
Since a block may be in the free list but not actually be available for reuse, blocks are popped from the queue until an available block is found and reused, or the queue is empty and new block is requested. Then all popped blocks are pushed back to the queue.

Use atomic operations for the individual statistics.
Access to the whole set of statistics may not be fully consistent, but they should only be used for debugging or monitoring.

PR validation:

Validated on top of CMSSW 14.0.x with the 2024 HLT menu.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 11, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 11, 2024

With CMSSW_14_0_15_patch1, running the HLT with 128 threads we see:
vtune_threading-notaus_hlt-reference-x86-64-v3-1x128t128s

With these changes, the contention moves to the CUDA mutex (to be addressed by further improvements):
vtune_threading-notaus_hlt-lockfree-x86-64-v3-1x128t128s

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/42578

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

  • HeterogeneousCore/AlpakaInterface (heterogeneous)

@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 11, 2024

@makortel, I split the implementation of the CachingAllocator into header and source file to ease the development. In principle I can move back to a header-only implementation, if that's better ?

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 11, 2024

@makortel I would like to make some other clean up in the code, but first I wanted to see if you have any comments on this implementation.

Then I can push the follow up changes here, or in a separate PR.

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 11, 2024

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 11, 2024

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 11, 2024

@Dr15Jones I would appreciate any feedback from you as well :-)

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49dcac/42726/summary.html
COMMIT: 3dd76b4
CMSSW: CMSSW_14_2_X_2024-11-11-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46658/42726/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

In file included from src/HeterogeneousCore/AlpakaInterface/interface/CachedBufAlloc.h:6,
                 from src/HeterogeneousCore/AlpakaInterface/interface/memory.h:9,
                 from src/DataFormats/Portable/interface/PortableHostCollection.h:11,
                 from src/DataFormats/Portable/interface/PortableCollection.h:6,
                 from src/DataFormats/Portable/test/test_catch2_portableCollectionOnHost.cc:3:
src/HeterogeneousCore/AlpakaInterface/interface/CachingAllocator.h:11:10: fatal error: tbb/concurrent_queue.h: No such file or directory
   11 | #include 
      |          ^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from src/HeterogeneousCore/AlpakaInterface/interface/CachedBufAlloc.h:6,
                 from src/HeterogeneousCore/AlpakaInterface/interface/memory.h:9,


@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 11, 2024

please test with #46657

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/43135

ERROR: Build errors found during clang-tidy run.

src/HeterogeneousCore/AlpakaInterface/src/alpaka/CachingAllocator.cc:183:25: error: no member named 'bytes' in 'std::unique_ptr<cms::alpakatools::CachingAllocator<alpaka::DevCpu, alpaka::QueueGenericThreadsBlocking<alpaka::DevCpu>>::BlockDescriptor>' [clang-diagnostic-error]
  183 |     totalLive_ += block.bytes;
      |                   ~~~~~ ^
src/HeterogeneousCore/AlpakaInterface/src/alpaka/CachingAllocator.cc:510:18: note: in instantiation of member function 'cms::alpakatools::CachingAllocator<alpaka::DevCpu, alpaka::QueueGenericThreadsBlocking<alpaka::DevCpu>>::allocate' requested here
--
src/HeterogeneousCore/AlpakaInterface/src/alpaka/CachingAllocator.cc:184:30: error: no member named 'requested' in 'std::unique_ptr<cms::alpakatools::CachingAllocator<alpaka::DevCpu, alpaka::QueueGenericThreadsBlocking<alpaka::DevCpu>>::BlockDescriptor>' [clang-diagnostic-error]
  184 |     totalRequested_ += block.requested;
      |                        ~~~~~ ^
Suppressed 662 warnings (662 in non-user code).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@fwyzard fwyzard force-pushed the lock_free_allocator_142x branch from 001180f to 7b571e9 Compare December 21, 2024 11:00
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2024

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2024

@makortel @Dr15Jones thank you for all the comments and suggestions, I definitely like the updated version better.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #46658 was updated. @Dr15Jones, @fwyzard, @makortel, @smuzaffar can you please check and sign again.

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49dcac/43596/summary.html
COMMIT: 7b571e9
CMSSW: CMSSW_15_0_X_2024-12-21-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46658/43596/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 12 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3818730
  • DQMHistoTests: Total failures: 570
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3818140
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 214 log files, 184 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53071
  • DQMHistoTests: Total failures: 872
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 52199
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 7, 2025

Milestone for this pull request has been moved to CMSSW_15_1_X. Please open a backport if it should also go in to CMSSW_15_0_X.

@cmsbuild cmsbuild modified the milestones: CMSSW_15_0_X, CMSSW_15_1_X Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants