Releases: UO-OACISS/apex
Releases · UO-OACISS/apex
Patch release v2.4.1
Emergency patch to fix HPX collectives API change in next HPX release.
APEX Release v2.4.0
This is an update to APEX, with several new features including:
- New simulated annealing search for policies
- New Kokkos kernel autotuning support
- Memory leak detection (experimental)
- Updated scatterplot support, including counters and updated Python scripts to use python3
- HIP/ROCm Roctracer support
Full list of commits:
view commit • Don't enable examples by default
view commit • Kokkos doesn't like it if you replace the OpenMP library at runtime. So OMPT support now has to be explicitly enabled by --apex:ompt to preload the OpenMP runtime library (if desired).
view commit • Adding kokkos tuning support. Needs work.
view commit • Kokkos tuning working, but AH not getting right answer.
view commit • Working, but AH still stuck in local minima.
view commit • Adding/fixing PBS and SLURM variables
view commit • Fixing build error without kokkos autotuning
view commit • Trying to improve convergence for kokkos autotuning
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Debugging Kokkos tuning issues
view commit • Adding Kokkos tooling header, eliminates need to require Kokkos as a dependency
view commit • Adding quotes around path to harmony home
view commit • Working Kokkos autotuner. This uses a Nelder Mead search, with an initial radius of 0.5 centered on the initial point requested by Kokkos (if specified). Future work includes caching results and trying other search strategies like simulated annealing.
view commit • Refactoring kokkos tuning away from profiling, making it possible to disable it
view commit • Updating to python3
view commit • Writing a memory wrapper report. There's a huge amount of CUPTI memory leaks, and they happen when the first real call to CUDA happens. I can't force that call, or ignore memory during the first "real" call, yet.
view commit • Cleaner way of preventing "false"(?) CUPTI memory leaks.
view commit • Fixing memory leaks and instability during shutdown. When using the memory tracker, make sure that the reporting is done before the BFD address resolution infrastructure is destroyed.
view commit • Adding task tree ASCII output, for issue #150
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Adding "Remainder" to tree ASCII output.
view commit • Adding support for ratio and ordinal values
view commit • Fixing tree ASCII output and memory leak reporting.
view commit • Tasktree human readable is now in a file, and hierarchically sorted by time.
view commit • Making --apex:quiet truly quiet
view commit • Adding direct multidimensional simulated annealing search.
view commit • GCC 9.3.0 has an internal pedantic compiler error. So turning off pedantic.
view commit • Updating subproject build of LLVM OpenMP runtime for GCC
view commit • Fixing race condition in startup of memory wrapper, I hope...
view commit • Updating scripts to python3
view commit • Adding counter scatterplot support, too
view commit • Allowing for custom scatterplot fractions. To change from the default of 1% (0.01), set APEX_SCATTERPLOT_FRACTION equal to some value between 0.0 and 1.0.
view commit • Adding counter scatterplot script
view commit • Updating scatterplot scripts to handle larger scales
view commit • Do lazy opening of sample files so that the correct Node ID is used
view commit • improving colors
view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
view commit • More scatterplot cleanup
view commit • Updating escape sequence for new python
view commit • Fixing x axis to make all subgraphs uniform
view commit • Fixing dlsym() wrapper function to use templates for the function types, it's better than just blindly casting. Better to let the type system help us.
view commit • Added HIP to the configure and added a test case. It seems to work. Now have to add the actual roctracer support.
view commit • ROCTX support added.
view commit • Working callback support for HIP. Next step is to add activity support, and link the correlation IDs. That should be modeled after the CUPTI support.
view commit • Updating scatterplot scripts to add mean values
view commit • Working HIP with actions
view commit • Merge branch 'develop' into hip
view commit • Testing HIP code with CUDA config
view commit • Working HIP memory tracki...
Read more
Patch release v2.3.2
Patch release for bug fixes.
Commits in this release:
view commit • Updating documentation
view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
view commit • Checking for nvcc 10 and gcc 8 and setting flags accordingly
view commit • Adding periodic plugin example, enabling static global constructors and destructors
view commit • Adding pthread wrapper and screen_output to policy plugin example
view commit • Update README.md
view commit • Re-enablling ability to get vector of available profiles, updated periodic example
view commit • Don't pin threads by default, it's kind of broken on summit
view commit • Fixing HPX build due to static global constructor
view commit • Fixing bug #134. Changing from pthread_setaffinity_np() to sched_get/setaffinity()
view commit • Fixing issue #135 When tracking CPU/GPU activity, the memory allocation counters should be associated with the thread making the call, when writing to OTF2 traces. This change allows for an optional argument to the apex::sample_value call that indicates whether the counter is assocaited with the specific thread or the process as a whole (the default).
view commit • Fixing #137. Now explicitly tracking all memory allocations and frees on both the host and the device.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Re-enable pinning by default
view commit • Fixing #136. Now have the ability to capture task tree, not just graph. No more cycles!
view commit • Adding dependency_tree class
view commit • Fixing build errors for -std=c++11 compliance
view commit • Initial memory wrapper, bugs everywhere
view commit • Adding additional MPI rank detection support
view commit • Fixing build issue with HPX due to modified sample_value function
view commit • Fixing cuda 10.1 build errors.
view commit • Fixing gperftool config by finding correct include location
view commit • Fixing gperftool config by finding correct include location
view commit • Removing some high-overhead and useless counters
view commit • Working memory wrapper for malloc/free, removing pointers from name demangling due to instability
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Adding support for calloc and realloc
view commit • Fixing comment
view commit • Adding memory wrapper code for HPX configurations
view commit • Updating copyright to 2021
view commit • Fixing measurement output when dump is called multiple times.
view commit • Fixing tasktree processing for non-timers, adding to apex_exec script
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Fixing elapsed time in graphs and shortening timer names by not including full file name and path by default
view commit • Fixing concurrency handler static global variable
view commit • Fix HPX barriers in OTF2 output
view commit • Merge pull request #143 from severinstrobl/otf2_hpx_barriers
view commit • Enabling LLVM 11 to build cuda examples
view commit • Forgot to set profiler to "stopped" when adding async activity.
view commit • Removing APEX counters (llvm won't link them?)
view commit • Cleaning up timers. We had been using a custom clock in order to use rdtsc on Intel platforms, but that's kind of pointless. It becomes a nightmare when trying to convert for OTF2 traces, and CUDA (and other GPUs) only provide timestamps in nanoseconds. Therefore, all timing is assumed to be done in nanoseconds now.
view commit • Flush CUPTI before dumping.
view commit • Need to move forward declaration.
view commit • Only override the rank if suspect it's wrong
view commit • Updating version number
view commit • Updating version number.
view commit • Merge branch 'develop'
Version 2.3.1 - Patch Release
This release patches CMake variables to make them consistent across the project, and some bug fixes for OpenMP initial/implicit tasks and support for CUDA 11.
Commits in this release:
- view commit • Beginning the process of cleaning up the CMAKE config
- view commit • Testing the CMake cleanup before merge with develop
- view commit • Merge branch 'cmake_cleanup' into develop
- view commit • Removing explicit check for cori or edison and replacing it with a check for Cray KNL
- view commit • Fixing anonymous OpenMP regions, where appopriate For implicit tasks, barriers and barrier wait events, either there is no codeptr associated with it, or there is. For implicit tasks, use the codeptr of the parent. For barriers, don't do anything. An anonymous barrier usually means that the thread is idle between parallel regions.
- view commit • Correcting support for CUDA 11 in NVML interface.
- view commit • Changing USE_LM_SENSORS to APEX_WITH_LM_SENSORS
- view commit • Fixing paths for lm sensors in HPX configs, and correcting cmake warnings.
- view commit • Updating APEX version
- view commit • Fixing demangle config for HPX
2.3.0 Release
This release contains many bug fixes, and some new features. New features include:
Kokkos support
OpenACC profiling support
NVIDIA CUDA/CUPTI support
NVIDIA NVML support
RAJA support
Compiler-based instrumentation support
Additional /proc/self data
Disable RDTSC timer on `x86_64` architectures
Minimal MPI profiling support
HPX reduction for OTF2 event unification (when HPX networking enabled)
Ported to PGI, Intel compilers
Updated `apex_exec` script for parsing command line arguments
Event filtering
Documentation updates
All of the commits for this release:
view commit • Adding new "pre-shutdown" event for listeners The profiler_listener, otf2_listener and trace_event_listener all need to take a timestamp when the program is finished, but when CUPTI asynchronous processing has to happen, that can dialate the trace because the final timestamp doesn't get taken until long after the buffers are processed. Now, the timestamp is taken before the buffers are processed. All asynchronous background processing also needs to be disabled, so that there aren't new events in the trace after the last timestamp.
view commit • Adding kokkos support.
view commit • Porting to PGI on Summit
view commit • Adding kokkos support.
view commit • Fixing bug in memcpy activity The stream ID wasn't getting captured, causing overlapping timers in the OTF2 trace.
view commit • Add MPI_Finalize wrapper When configuring APEX with MPI support, wrap the MPI_Finalize function so that we can use MPI functions during OTF2 event unification instead of the filesystem.
view commit • Unify the final timestamp At the end of exeuction, exchange final timestamps so that the OTF2 trace has an accurate final timestamp.
view commit • Don't finalize profiles if background stats not computed
view commit • Adding MPI to some CUDA examples to test the event unification support.
view commit • Debugging kokkos support on summit
view commit • Merge branch 'kokkos' of github.com:khuck/xpress-apex into kokkos
view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
view commit • Adding kokkos support.
view commit • Debugging kokkos support on summit
view commit • Allow HPX configs to disable RDTSC
view commit • Updating to renamed perfstubs API calls
view commit • Merge branch 'kokkos' of github.com:khuck/xpress-apex into kokkos
view commit • Fixing race conditions between processes when doing OTF2 event unification
view commit • Merge branch 'kokkos' into develop
view commit • Two changes: making Jupyter support a runtime option and updating some OMPT initialization. This is the beginning of the process of updating OMPT support to fully support OpenMP 5.0 including target directives.
view commit • Fixing task dependencies in OpenMP/OMPT
view commit • Check if OMPT was initialized before forcing shutdown
view commit • Merge branch 'master' into develop
view commit • Adding simple OpenMP test
view commit • Debugging with Intel 20 compiler. Still lots of shutdown problems.
view commit • Updates for OpenMP and OpenACC support. Target offloading with OpenACC to CUDA is now supported.
view commit • Adding NVML support
view commit • Working NVML support for utilization
view commit • Adding NVML find support for HPX configs
view commit • Fixing cmake error with nvml
view commit • Adding lots more NVML data. Clock, power, temp, PCIe throughput
view commit • Enabling /proc/self/status by default
view commit • Be smarter about which devices to monitor with NVML
view commit • Adding driver support for when changing devices, to make sure NVML is capturing the right device
view commit • Adding NVML nvlink statistics
view commit • Don't build openmp examples with compiler without openmp support
view commit • Merge branch 'master' into develop
view commit • Silly CMake bug
view commit • Removing contention in google event tracer, but still have to flush buffers occasionally.
view commit • Adding command line argument processing for apex_exec script
view commit • Fixing doxygen warning
view commit • Updating documentation to v2.2.0
view commit • Fixing label for host-allocated memory in Cuda
view commit • Always delete OTF2 archive if exists at startup
view commit • Updating copyright and documentation
view commit • Fixing support for std::unique_ptr with clang
view commit • Updating readthedocs documentation
view commit • removing debug...
Read more
Version 2.2.0
This release contains many updates and fixes. Of note is new support for CUDA/CUPTI events, and the ability to detect MPI applications even though HPX or APEX aren't configured with MPI support.
Changes:
- view commit • Change to personal fork of concurrentqueue for stability
- view commit • Cleaning up clang pedantic errors
- view commit • Tweaking build system to support Windows
- view commit • Merge pull request #122 from STEllAR-GROUP/fixing_windows_support
- view commit • Adding annotation for process_profiles task
- view commit • Cleaning up the dot/graphviz output
- view commit • Adding "untied timers" option. With this option enabled, a profiler can be started on one OS thread and stopped on another. APEX won't keep track of the profiler stack.
- view commit • Fixing unit conversion when writing out TAU profiles
- view commit • Add capture of /proc/self/status Threads value
- view commit • Capture the number of OS context switches
- view commit • Cleaning up thread swap test
- view commit • Adding additional error messages to PAPI component support
- view commit • Debugging PAPI error checking
- view commit • Updating to support binutils 2.34 API changes, adding pthread.h include header where needed
- view commit • Updating deprecated HPX headers
- view commit • First step in adding CUDA support Adding a CUDA example and adding CUDA/CUPTI headers through CMake.
- view commit • Adding another cuda example
- view commit • Working kernel measurement
- view commit • Basic callback and activity support enabled
- view commit • Done with initial implementation
- view commit • Disable thread affinity for HPX configurations
- view commit • Minor change to support running in MPI environment when MPI is not used by HPX or the APEX configuration. This happens when HPX is configured without a parcel port, and APEX thinks all ranks are 0. This change adds a check for MPI environment variables to validate the MPI rank that was passed in.
- view commit • Adding MPI rank/size detection support for MPICH ...which also covers MVAPICH, Intel, Cray, etc. Also added some PBS/torque support, but unfortunately they don't provide an environment variable that specifies the total number of ranks. Maybe in the future we could have that be a special APEX environment variable that specifies the total number of ranks, if needed.
- view commit • First step in adding CUDA support Adding a CUDA example and adding CUDA/CUPTI headers through CMake.
- view commit • Adding another cuda example
- view commit • Working kernel measurement
- view commit • Basic callback and activity support enabled
- view commit • Done with initial implementation
- view commit • Merge branch 'cuda_support' of github.com:khuck/xpress-apex into cuda_support
- view commit • Adding CUDA task dependency support
- view commit • task dependency working! When GPU callbacks are made, we map the correlation ID to the task_wrapper associated with the parent. Then the GPU activity can be linked to the parent that launched it. also added two more examples.
- view commit • Working CUDA support with task graphs and correct annotations This commit contains a nasty bug in task_identifier, where any identifier string gets "in place" modified when demangled. That can cause problems later when if map of said task_identifiers is modified. This will be merged to develop when the full support with tracing is merged.
- view commit • Adding basic CUDA counters to the support for kernels and memory transfers.
- view commit • Adding HPX config support for CUDA/CUPTI
- view commit • Minor typo in HPX configuration
- view commit • More changes for HPX support
- view commit • Testing with cuda 10.1 and fixing config Testing with older cuda revealed that some installations are different.
- view commit • Fixing bugs in shutdown. During shutdown, the asynchronous buffers were processed but the static strings that some labels depended on went out of scope. So the strings got corrupted. This is fixed by using const char * strings instead of const std::string&. Also, the counters are way too much overhead, so they are now optional.
- view commit • Adding Google Chrome trace event support
- view commit • Working (rudimentary) Google Trace Event support. This support only handles timers, no counters (yet).
- view commit • Merge branch 'chrome_trace_event' into develop
- view commit • Fixing implementation of public profile processing function to work with gcc 8
- view commit • Minor change to add cudart to the link
- view commit • Merge branch 'cuda_support' of https://github.com/khuck/xpress-apex into cuda_support
- view commit • Minor changes to CUDA support and Google trace The Google trace support needs to be refactored, but otherwise this seems to be working.
- view commit • Merge...
Read more
Version 2.1.9
Bug fixes and updates to support changes in HPX.
Bug fix and maintenance release, version 2.1.9
view commit • Adding spack and cmake to buildbot build process
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Initializing reset counter in profile constructor
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • should have tested before commit.
view commit • Fixing deadlock in policy shutdown, and segfault when profiles aren't processed before exit
view commit • Cleaning up the pedantic compiler flags And resolving pedantic compiler warnings.
view commit • Cleanup changes introduced a bug, this fixes it The cleanup changes caused APEX to request HPX to schedule profile processing during shutdown, but unfortunately HPX has already stopped by then. Instead, force synchronous processing of remaining profile data from the on_dump() event.
view commit • Fixing parallel buildbot for HPX builds
view commit • Still can't build more than 4 wide on ktau
view commit • Changing HPX tasks from actions to regular hpx::async calls
view commit • should have used hpx::apply()
view commit • Use moodycamel queue from hpx::concurrency namespace
view commit • Merge pull request #121 from msimberg/moodycamel
view commit • Merge pull request #120 from khuck/master
Version v2.1.8
Bug fixes and updates to support changes in HPX.
view commit • Fixing CSV output bug Only node 0 was getting written.
view commit • Adding MPI OTF2 test to make sure event unification works correctly
view commit • Expanding the C++ demo to make it more useful
view commit • Fixing performance bug with a lock being held tooo long when processing profile objects from the queues.
view commit • Cleaning up apex::reset behavior
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Updating to latest perfstubs API
view commit • Fixing mismatched apex_init() declaration
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Adding counters to CSV output.
view commit • Do write out the APEX_MAIN timer at exit.
view commit • Fixing location of perfstubs git repo so that checkouts happen correctly
view commit • Fixing location of perfstubs headers
view commit • Fixing git checkout for good this time
view commit • Ignoring the perfstubs directory git clean will wipe out the perfstubs directory without this change to .gitignore
view commit • Changing location of papi on test build system
view commit • Merge branch 'develop'
view commit • Updates to support fixes for HPX issue #4438
view commit • Merge pull request #119 from khuck/fixes_for_hpx_4441
view commit • Merge branch 'develop'
v2.1.7 Release to sync up with HPX v1.4.0
Bug fixes and refactoring to support new HPX modularization effort. APEX is no longer called from anywhere in HPX, but APEX does still make HPX calls. The previous circular dependency has been refactored out. HPX now has an external_timer class that provides a plugin API that APEX registers at program load. When HPX runs, the external_timer class will make callbacks to the registered library (APEX).
List of commits:
v2.1.6
Refactoring to remove circular dependency between HPX and APEX. libhpx no longer calls APEX directly, it is handled through a callback API.
view commit • Initial refactor to eliminate circular build dependency between APEX and HPX
view commit • Changing to explicit callback registrations for all events
view commit • Handle PAPI component read failures gracefully.
view commit • untangling circular dependency between APEX and HPX in cmake
view commit • Merge remote-tracking branch 'github/develop' into apex_callback_refactoring
view commit • fixing debug message
view commit • Restoring nested timers after yield/resume
view commit • Fixing scoped_thread and return from failed new_task
view commit • Merge branch 'apex_callback_refactoring' into develop
view commit • Splitting screen_output into verbose for environent variables