Skip to content

Releases: UO-OACISS/apex

Patch release v2.4.1

15 Jun 22:53
64c4f4c
Compare
Choose a tag to compare

Emergency patch to fix HPX collectives API change in next HPX release.

APEX Release v2.4.0

28 May 22:53
Compare
Choose a tag to compare

This is an update to APEX, with several new features including:

  • New simulated annealing search for policies
  • New Kokkos kernel autotuning support
  • Memory leak detection (experimental)
  • Updated scatterplot support, including counters and updated Python scripts to use python3
  • HIP/ROCm Roctracer support

Full list of commits:

  • view commit • Don't enable examples by default
  • view commit • Kokkos doesn't like it if you replace the OpenMP library at runtime. So OMPT support now has to be explicitly enabled by --apex:ompt to preload the OpenMP runtime library (if desired).
  • view commit • Adding kokkos tuning support. Needs work.
  • view commit • Kokkos tuning working, but AH not getting right answer.
  • view commit • Working, but AH still stuck in local minima.
  • view commit • Adding/fixing PBS and SLURM variables
  • view commit • Fixing build error without kokkos autotuning
  • view commit • Trying to improve convergence for kokkos autotuning
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Debugging Kokkos tuning issues
  • view commit • Adding Kokkos tooling header, eliminates need to require Kokkos as a dependency
  • view commit • Adding quotes around path to harmony home
  • view commit • Working Kokkos autotuner. This uses a Nelder Mead search, with an initial radius of 0.5 centered on the initial point requested by Kokkos (if specified). Future work includes caching results and trying other search strategies like simulated annealing.
  • view commit • Refactoring kokkos tuning away from profiling, making it possible to disable it
  • view commit • Updating to python3
  • view commit • Writing a memory wrapper report. There's a huge amount of CUPTI memory leaks, and they happen when the first real call to CUDA happens. I can't force that call, or ignore memory during the first "real" call, yet.
  • view commit • Cleaner way of preventing "false"(?) CUPTI memory leaks.
  • view commit • Fixing memory leaks and instability during shutdown. When using the memory tracker, make sure that the reporting is done before the BFD address resolution infrastructure is destroyed.
  • view commit • Adding task tree ASCII output, for issue #150
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Adding "Remainder" to tree ASCII output.
  • view commit • Adding support for ratio and ordinal values
  • view commit • Fixing tree ASCII output and memory leak reporting.
  • view commit • Tasktree human readable is now in a file, and hierarchically sorted by time.
  • view commit • Making --apex:quiet truly quiet
  • view commit • Adding direct multidimensional simulated annealing search.
  • view commit • GCC 9.3.0 has an internal pedantic compiler error. So turning off pedantic.
  • view commit • Updating subproject build of LLVM OpenMP runtime for GCC
  • view commit • Fixing race condition in startup of memory wrapper, I hope...
  • view commit • Updating scripts to python3
  • view commit • Adding counter scatterplot support, too
  • view commit • Allowing for custom scatterplot fractions. To change from the default of 1% (0.01), set APEX_SCATTERPLOT_FRACTION equal to some value between 0.0 and 1.0.
  • view commit • Adding counter scatterplot script
  • view commit • Updating scatterplot scripts to handle larger scales
  • view commit • Do lazy opening of sample files so that the correct Node ID is used
  • view commit • improving colors
  • view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
  • view commit • More scatterplot cleanup
  • view commit • Updating escape sequence for new python
  • view commit • Fixing x axis to make all subgraphs uniform
  • view commit • Fixing dlsym() wrapper function to use templates for the function types, it's better than just blindly casting. Better to let the type system help us.
  • view commit • Added HIP to the configure and added a test case. It seems to work. Now have to add the actual roctracer support.
  • view commit • ROCTX support added.
  • view commit • Working callback support for HIP. Next step is to add activity support, and link the correlation IDs. That should be modeled after the CUPTI support.
  • view commit • Updating scatterplot scripts to add mean values
  • view commit • Working HIP with actions
  • view commit • Merge branch 'develop' into hip
  • view commit • Testing HIP code with CUDA config
  • view commit • Working HIP memory tracki...
  • Read more

    Patch release v2.3.2

    13 Apr 18:07
    Compare
    Choose a tag to compare

    Patch release for bug fixes.

    Commits in this release:

  • view commit • Updating documentation
  • view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
  • view commit • Checking for nvcc 10 and gcc 8 and setting flags accordingly
  • view commit • Adding periodic plugin example, enabling static global constructors and destructors
  • view commit • Adding pthread wrapper and screen_output to policy plugin example
  • view commit • Update README.md
  • view commit • Re-enablling ability to get vector of available profiles, updated periodic example
  • view commit • Don't pin threads by default, it's kind of broken on summit
  • view commit • Fixing HPX build due to static global constructor
  • view commit • Fixing bug #134. Changing from pthread_setaffinity_np() to sched_get/setaffinity()
  • view commit • Fixing issue #135 When tracking CPU/GPU activity, the memory allocation counters should be associated with the thread making the call, when writing to OTF2 traces. This change allows for an optional argument to the apex::sample_value call that indicates whether the counter is assocaited with the specific thread or the process as a whole (the default).
  • view commit • Fixing #137. Now explicitly tracking all memory allocations and frees on both the host and the device.
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Re-enable pinning by default
  • view commit • Fixing #136. Now have the ability to capture task tree, not just graph. No more cycles!
  • view commit • Adding dependency_tree class
  • view commit • Fixing build errors for -std=c++11 compliance
  • view commit • Initial memory wrapper, bugs everywhere
  • view commit • Adding additional MPI rank detection support
  • view commit • Fixing build issue with HPX due to modified sample_value function
  • view commit • Fixing cuda 10.1 build errors.
  • view commit • Fixing gperftool config by finding correct include location
  • view commit • Fixing gperftool config by finding correct include location
  • view commit • Removing some high-overhead and useless counters
  • view commit • Working memory wrapper for malloc/free, removing pointers from name demangling due to instability
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Adding support for calloc and realloc
  • view commit • Fixing comment
  • view commit • Adding memory wrapper code for HPX configurations
  • view commit • Updating copyright to 2021
  • view commit • Fixing measurement output when dump is called multiple times.
  • view commit • Fixing tasktree processing for non-timers, adding to apex_exec script
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Fixing elapsed time in graphs and shortening timer names by not including full file name and path by default
  • view commit • Fixing concurrency handler static global variable
  • view commit • Fix HPX barriers in OTF2 output
  • view commit • Merge pull request #143 from severinstrobl/otf2_hpx_barriers
  • view commit • Enabling LLVM 11 to build cuda examples
  • view commit • Forgot to set profiler to "stopped" when adding async activity.
  • view commit • Removing APEX counters (llvm won't link them?)
  • view commit • Cleaning up timers. We had been using a custom clock in order to use rdtsc on Intel platforms, but that's kind of pointless. It becomes a nightmare when trying to convert for OTF2 traces, and CUDA (and other GPUs) only provide timestamps in nanoseconds. Therefore, all timing is assumed to be done in nanoseconds now.
  • view commit • Flush CUPTI before dumping.
  • view commit • Need to move forward declaration.
  • view commit • Only override the rank if suspect it's wrong
  • view commit • Updating version number
  • view commit • Updating version number.
  • view commit • Merge branch 'develop'
  • Version 2.3.1 - Patch Release

    28 Jan 17:16
    Compare
    Choose a tag to compare

    This release patches CMake variables to make them consistent across the project, and some bug fixes for OpenMP initial/implicit tasks and support for CUDA 11.

    Commits in this release:

    • view commit • Beginning the process of cleaning up the CMAKE config
    • view commit • Testing the CMake cleanup before merge with develop
    • view commit • Merge branch 'cmake_cleanup' into develop
    • view commit • Removing explicit check for cori or edison and replacing it with a check for Cray KNL
    • view commit • Fixing anonymous OpenMP regions, where appopriate For implicit tasks, barriers and barrier wait events, either there is no codeptr associated with it, or there is. For implicit tasks, use the codeptr of the parent. For barriers, don't do anything. An anonymous barrier usually means that the thread is idle between parallel regions.
    • view commit • Correcting support for CUDA 11 in NVML interface.
    • view commit • Changing USE_LM_SENSORS to APEX_WITH_LM_SENSORS
    • view commit • Fixing paths for lm sensors in HPX configs, and correcting cmake warnings.
    • view commit • Updating APEX version
    • view commit • Fixing demangle config for HPX

    2.3.0 Release

    08 Jan 17:15
    Compare
    Choose a tag to compare

    This release contains many bug fixes, and some new features. New features include:

  • Kokkos support
  • OpenACC profiling support
  • NVIDIA CUDA/CUPTI support
  • NVIDIA NVML support
  • RAJA support
  • Compiler-based instrumentation support
  • Additional /proc/self data
  • Disable RDTSC timer on `x86_64` architectures
  • Minimal MPI profiling support
  • HPX reduction for OTF2 event unification (when HPX networking enabled)
  • Ported to PGI, Intel compilers
  • Updated `apex_exec` script for parsing command line arguments
  • Event filtering
  • Documentation updates

  • All of the commits for this release:

  • view commit • Adding new "pre-shutdown" event for listeners The profiler_listener, otf2_listener and trace_event_listener all need to take a timestamp when the program is finished, but when CUPTI asynchronous processing has to happen, that can dialate the trace because the final timestamp doesn't get taken until long after the buffers are processed. Now, the timestamp is taken before the buffers are processed. All asynchronous background processing also needs to be disabled, so that there aren't new events in the trace after the last timestamp.
  • view commit • Adding kokkos support.
  • view commit • Porting to PGI on Summit
  • view commit • Adding kokkos support.
  • view commit • Fixing bug in memcpy activity The stream ID wasn't getting captured, causing overlapping timers in the OTF2 trace.
  • view commit • Add MPI_Finalize wrapper When configuring APEX with MPI support, wrap the MPI_Finalize function so that we can use MPI functions during OTF2 event unification instead of the filesystem.
  • view commit • Unify the final timestamp At the end of exeuction, exchange final timestamps so that the OTF2 trace has an accurate final timestamp.
  • view commit • Don't finalize profiles if background stats not computed
  • view commit • Adding MPI to some CUDA examples to test the event unification support.
  • view commit • Debugging kokkos support on summit
  • view commit • Merge branch 'kokkos' of github.com:khuck/xpress-apex into kokkos
  • view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
  • view commit • Adding kokkos support.
  • view commit • Debugging kokkos support on summit
  • view commit • Allow HPX configs to disable RDTSC
  • view commit • Updating to renamed perfstubs API calls
  • view commit • Merge branch 'kokkos' of github.com:khuck/xpress-apex into kokkos
  • view commit • Fixing race conditions between processes when doing OTF2 event unification
  • view commit • Merge branch 'kokkos' into develop
  • view commit • Two changes: making Jupyter support a runtime option and updating some OMPT initialization. This is the beginning of the process of updating OMPT support to fully support OpenMP 5.0 including target directives.
  • view commit • Fixing task dependencies in OpenMP/OMPT
  • view commit • Check if OMPT was initialized before forcing shutdown
  • view commit • Merge branch 'master' into develop
  • view commit • Adding simple OpenMP test
  • view commit • Debugging with Intel 20 compiler. Still lots of shutdown problems.
  • view commit • Updates for OpenMP and OpenACC support. Target offloading with OpenACC to CUDA is now supported.
  • view commit • Adding NVML support
  • view commit • Working NVML support for utilization
  • view commit • Adding NVML find support for HPX configs
  • view commit • Fixing cmake error with nvml
  • view commit • Adding lots more NVML data. Clock, power, temp, PCIe throughput
  • view commit • Enabling /proc/self/status by default
  • view commit • Be smarter about which devices to monitor with NVML
  • view commit • Adding driver support for when changing devices, to make sure NVML is capturing the right device
  • view commit • Adding NVML nvlink statistics
  • view commit • Don't build openmp examples with compiler without openmp support
  • view commit • Merge branch 'master' into develop
  • view commit • Silly CMake bug
  • view commit • Removing contention in google event tracer, but still have to flush buffers occasionally.
  • view commit • Adding command line argument processing for apex_exec script
  • view commit • Fixing doxygen warning
  • view commit • Updating documentation to v2.2.0
  • view commit • Fixing label for host-allocated memory in Cuda
  • view commit • Always delete OTF2 archive if exists at startup
  • view commit • Updating copyright and documentation
  • view commit • Fixing support for std::unique_ptr with clang
  • view commit • Updating readthedocs documentation
  • view commit • removing debug...
  • Read more

    Version 2.2.0

    05 Aug 16:38
    Compare
    Choose a tag to compare

    This release contains many updates and fixes. Of note is new support for CUDA/CUPTI events, and the ability to detect MPI applications even though HPX or APEX aren't configured with MPI support.

    Changes:

    • view commit • Change to personal fork of concurrentqueue for stability
    • view commit • Cleaning up clang pedantic errors
    • view commit • Tweaking build system to support Windows
    • view commit • Merge pull request #122 from STEllAR-GROUP/fixing_windows_support
    • view commit • Adding annotation for process_profiles task
    • view commit • Cleaning up the dot/graphviz output
    • view commit • Adding "untied timers" option. With this option enabled, a profiler can be started on one OS thread and stopped on another. APEX won't keep track of the profiler stack.
    • view commit • Fixing unit conversion when writing out TAU profiles
    • view commit • Add capture of /proc/self/status Threads value
    • view commit • Capture the number of OS context switches
    • view commit • Cleaning up thread swap test
    • view commit • Adding additional error messages to PAPI component support
    • view commit • Debugging PAPI error checking
    • view commit • Updating to support binutils 2.34 API changes, adding pthread.h include header where needed
    • view commit • Updating deprecated HPX headers
    • view commit • First step in adding CUDA support Adding a CUDA example and adding CUDA/CUPTI headers through CMake.
    • view commit • Adding another cuda example
    • view commit • Working kernel measurement
    • view commit • Basic callback and activity support enabled
    • view commit • Done with initial implementation
    • view commit • Disable thread affinity for HPX configurations
    • view commit • Minor change to support running in MPI environment when MPI is not used by HPX or the APEX configuration. This happens when HPX is configured without a parcel port, and APEX thinks all ranks are 0. This change adds a check for MPI environment variables to validate the MPI rank that was passed in.
    • view commit • Adding MPI rank/size detection support for MPICH ...which also covers MVAPICH, Intel, Cray, etc. Also added some PBS/torque support, but unfortunately they don't provide an environment variable that specifies the total number of ranks. Maybe in the future we could have that be a special APEX environment variable that specifies the total number of ranks, if needed.
    • view commit • First step in adding CUDA support Adding a CUDA example and adding CUDA/CUPTI headers through CMake.
    • view commit • Adding another cuda example
    • view commit • Working kernel measurement
    • view commit • Basic callback and activity support enabled
    • view commit • Done with initial implementation
    • view commit • Merge branch 'cuda_support' of github.com:khuck/xpress-apex into cuda_support
    • view commit • Adding CUDA task dependency support
    • view commit • task dependency working! When GPU callbacks are made, we map the correlation ID to the task_wrapper associated with the parent. Then the GPU activity can be linked to the parent that launched it. also added two more examples.
    • view commit • Working CUDA support with task graphs and correct annotations This commit contains a nasty bug in task_identifier, where any identifier string gets "in place" modified when demangled. That can cause problems later when if map of said task_identifiers is modified. This will be merged to develop when the full support with tracing is merged.
    • view commit • Adding basic CUDA counters to the support for kernels and memory transfers.
    • view commit • Adding HPX config support for CUDA/CUPTI
    • view commit • Minor typo in HPX configuration
    • view commit • More changes for HPX support
    • view commit • Testing with cuda 10.1 and fixing config Testing with older cuda revealed that some installations are different.
    • view commit • Fixing bugs in shutdown. During shutdown, the asynchronous buffers were processed but the static strings that some labels depended on went out of scope. So the strings got corrupted. This is fixed by using const char * strings instead of const std::string&. Also, the counters are way too much overhead, so they are now optional.
    • view commit • Adding Google Chrome trace event support
    • view commit • Working (rudimentary) Google Trace Event support. This support only handles timers, no counters (yet).
    • view commit • Merge branch 'chrome_trace_event' into develop
    • view commit • Fixing implementation of public profile processing function to work with gcc 8
    • view commit • Minor change to add cudart to the link
    • view commit • Merge branch 'cuda_support' of https://github.com/khuck/xpress-apex into cuda_support
    • view commit • Minor changes to CUDA support and Google trace The Google trace support needs to be refactored, but otherwise this seems to be working.
    • view commit • Merge...
    Read more

    Version 2.1.9

    22 Apr 14:35
    7402e50
    Compare
    Choose a tag to compare

    Bug fixes and updates to support changes in HPX.

    Bug fix and maintenance release, version 2.1.9

  • view commit • Adding spack and cmake to buildbot build process
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Initializing reset counter in profile constructor
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • should have tested before commit.
  • view commit • Fixing deadlock in policy shutdown, and segfault when profiles aren't processed before exit
  • view commit • Cleaning up the pedantic compiler flags And resolving pedantic compiler warnings.
  • view commit • Cleanup changes introduced a bug, this fixes it The cleanup changes caused APEX to request HPX to schedule profile processing during shutdown, but unfortunately HPX has already stopped by then. Instead, force synchronous processing of remaining profile data from the on_dump() event.
  • view commit • Fixing parallel buildbot for HPX builds
  • view commit • Still can't build more than 4 wide on ktau
  • view commit • Changing HPX tasks from actions to regular hpx::async calls
  • view commit • should have used hpx::apply()
  • view commit • Use moodycamel queue from hpx::concurrency namespace
  • view commit • Merge pull request #121 from msimberg/moodycamel
  • view commit • Merge pull request #120 from khuck/master
  • Version v2.1.8

    25 Mar 00:46
    Compare
    Choose a tag to compare

    Bug fixes and updates to support changes in HPX.

  • view commit • Fixing CSV output bug Only node 0 was getting written.
  • view commit • Adding MPI OTF2 test to make sure event unification works correctly
  • view commit • Expanding the C++ demo to make it more useful
  • view commit • Fixing performance bug with a lock being held tooo long when processing profile objects from the queues.
  • view commit • Cleaning up apex::reset behavior
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Updating to latest perfstubs API
  • view commit • Fixing mismatched apex_init() declaration
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Adding counters to CSV output.
  • view commit • Do write out the APEX_MAIN timer at exit.
  • view commit • Fixing location of perfstubs git repo so that checkouts happen correctly
  • view commit • Fixing location of perfstubs headers
  • view commit • Fixing git checkout for good this time
  • view commit • Ignoring the perfstubs directory git clean will wipe out the perfstubs directory without this change to .gitignore
  • view commit • Changing location of papi on test build system
  • view commit • Merge branch 'develop'
  • view commit • Updates to support fixes for HPX issue #4438
  • view commit • Merge pull request #119 from khuck/fixes_for_hpx_4441
  • view commit • Merge branch 'develop'
  • v2.1.7 Release to sync up with HPX v1.4.0

    10 Dec 23:20
    Compare
    Choose a tag to compare

    Bug fixes and refactoring to support new HPX modularization effort. APEX is no longer called from anywhere in HPX, but APEX does still make HPX calls. The previous circular dependency has been refactored out. HPX now has an external_timer class that provides a plugin API that APEX registers at program load. When HPX runs, the external_timer class will make callbacks to the registered library (APEX).

    List of commits:

    v2.1.6

    13 Nov 21:22
    Compare
    Choose a tag to compare

    Refactoring to remove circular dependency between HPX and APEX. libhpx no longer calls APEX directly, it is handled through a callback API.

  • view commit • Initial refactor to eliminate circular build dependency between APEX and HPX
  • view commit • Changing to explicit callback registrations for all events
  • view commit • Handle PAPI component read failures gracefully.
  • view commit • untangling circular dependency between APEX and HPX in cmake
  • view commit • Merge remote-tracking branch 'github/develop' into apex_callback_refactoring
  • view commit • fixing debug message
  • view commit • Restoring nested timers after yield/resume
  • view commit • Fixing scoped_thread and return from failed new_task
  • view commit • Merge branch 'apex_callback_refactoring' into develop
  • view commit • Splitting screen_output into verbose for environent variables