Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][XPTI] Refactoring framework to use 128-bit keys for collision elimination #14467

Merged
merged 46 commits into from
Aug 27, 2024

Conversation

tovinkere
Copy link
Contributor

Previous implementation of the XPTI framework used 64-bit hash values to represent trace points in the code and this has led to a few of hash collisions.This refactoring moves to a 128-bit key to guarantee uniqueness. The changes needed to SYCL runtime to fully migrate to newer APIs will be pushed as a separate Part 2 pull request. Current pull request include changes to the XPTI framework and minor changes to SYCL runtime to reflect the transition to 128-bit keys and ensure validity of the tests.

  • 128-bit keys for internal storage and lookups
  • Support 64-bit universal IDs for backward compatibility
  • Updated tests to handle legacy API and new APIs for correctness tests
  • Updated performance tests to report metrics for both 64-bit and 28-bit native APIs
  • Updated SYCL instrumentation to return a new trace event for each instance of a trace point. Earlier implementation always returned the same trace event for a give trace point as the metadata associated with a trace event was deemed to be invariant. However, with the need for mutable metadata, this change is required.
  • Minor updates to documentation

NOTE: Since more events are generated due to the creation of a new trace event for each trace point instance, some tests that rely on event sequences may have to be updated.

tovinkere added 15 commits June 20, 2024 23:14
  + Add functions to set and enable scope notification of
    tracepoints when tracepoints are refactored to support
    self notification

Signed-off-by: Vasanth Tovinkere <[email protected]>
 + Moved from std::unordered_map for most containers to emhash8
   which improves performance on microbenchmarks from ~1.5X to
   6X for various operations. Also improves the performance of
   callback handlers by ~1.6X
 + Added 128-bit data structure for implementing the collision
   free Universal ID

Signed-off-by: Vasanth Tovinkere <[email protected]>
  + Framework now is built on 128-bit keys using new APIs and
    legacy API support is handled with 64-bit mappings to
    128-bit keys

Signed-off-by: Vasanth Tovinkere <[email protected]>
  + New approach to saving all necessary data in TLS so lookups
    can be avoided. Lookups take the most amount of time/event.
  + Moved from emhash8 to phmap as phmap provides both node_hash_map
    and flat_hash_map and the performance is better than std
    implementations
  + Added tests to test APIs and its correctness
  + Updated documentation for framework implementation and headers

Signed-off-by: Vasanth Tovinkere <[email protected]>
  + Basic collector will now handle a few more events

Signed-off-by: Vasanth Tovinkere <[email protected]>
  + Revamped implementation to use the same amount of memory, but
    have the option of reducing the number of lookups
  + Updated unit tests to reflect the new approach where 128-bit
    and 64-bit keys are always accessible.

Signed-off-by: Vasanth Tovinkere <[email protected]>
  + Updated scomped objects and added new methods that combine
    UID creation, payload registration and trace point creation
    into a single call xptiRegisterTracepointScope () or
    xptiCreateTracepoint().
  + Updated tests and addded documentation

Signed-off-by: Vasanth Tovinkere <[email protected]>
  + Previous implementation of XPTI required the trace event to be
    invariant for a given code location. However, with multiple
    threads being active at the same tracepoint, previous design
    could not handle mutable metadata. TO assist with this, every
    visit to a tracepoint generates a trace event.

Signed-off-by: Vasanth Tovinkere <[email protected]>
 + New macros to help make instrumenting code easier. A Part 2
   PR will migrate all instrumentation using older API to use
   new helper classes and macros

Signed-off-by: Vasanth Tovinkere <[email protected]>
Signed-off-by: Vasanth Tovinkere <[email protected]>
@tovinkere tovinkere requested a review from a team as a code owner July 6, 2024 15:51
 + basic_event_collection_linux.cpp test was expecting a
   "kernel_name" metadata for an edge event. That should
   be covered by the source and target universal IDs.
   Edges do not have a kernel associated with them.

Signed-off-by: Vasanth Tovinkere <[email protected]>
Signed-off-by: Vasanth Tovinkere <[email protected]>
  + Since these hasmaps excel for different types of keys,
    using both for specific maps gives the best performance.

Signed-off-by: Vasanth Tovinkere <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
@KseniyaTikhomirova
Copy link
Contributor

@intel/llvm-gatekeepers hi, we have failed clang-format check here but the files it is reporting are 3rd party project files we do not want (or not eligible?) to align with our clang-format style. Could we ignore this check and merge pull request?
I also tried to apply clang-format-ignore approach to disable check for those folders completely but our formatting check just ignores it. Code wrapping in files do not seem valid to me as well since check for whole external project should be eliminated.

Could you please merge PR as it is or probably recommend something to disable clang-format check in an appropriate way?

Jenkins failure is a scanning tool failure @stdale-intel or @tovinkere could provide details about. According to comment earlier - new lib addition is approved and scanning tools will be updated.

@againull
Copy link
Contributor

So, current version of the PR adds third-party headers into the repo.
Another approach is to fetch sources via cmake at build time as we do for other third-party things: https://github.com/intel/llvm/pull/15043/files#diff-b010dfcb1e2569c873284a83f6490f96657ed6c3742aa2bf072c7d16c1229c5c

@intel/llvm-gatekeepers Does anybody have an opinion which approach would be better for this case? Committing into the repo or fetching via cmake?

@steffenlarsen
Copy link
Contributor

steffenlarsen commented Aug 13, 2024

I would definitely prefer the fetching solution if possible. That way we also do not have to go through this issue every time we need to update the headers. May also be good to know what would be realistic to upstream, if XPTI is meant to be upstreamed, @sergey-semenov | @elizabethandrews .

@steffenlarsen
Copy link
Contributor

Jenkins failure is a scanning tool failure @stdale-intel or @tovinkere could provide details about. According to comment earlier - new lib addition is approved and scanning tools will be updated.

It doesn't seem to be only the scanning tool. RHEL build fails as well.

@againull againull merged commit 283073a into intel:sycl Aug 27, 2024
13 checks passed
againull added a commit to againull/llvm that referenced this pull request Aug 27, 2024
@againull
Copy link
Contributor

PR to fix post-commit failures is opened here:
#15209

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants