-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable tracing of thread pool tasks using NVTX #630
Enable tracing of thread pool tasks using NVTX #630
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
2aca97e
to
4efb3cf
Compare
#define KVIKIO_NVTX_FUNC_RANGE_IMPL() NVTX3_FUNC_RANGE_IN(kvikio::libkvikio_domain) | ||
|
||
// Implementation of KVIKIO_NVTX_SCOPED_RANGE(...) | ||
#define KVIKIO_NVTX_SCOPED_RANGE_IMPL_3(message, payload_v, color) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why the variable has to be named payload_v
. Otherwise payload
would cause compile errors. Perhaps a name look-up related issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably because of the name nvtx3::payload used in the macro
@@ -192,7 +194,8 @@ std::future<std::size_t> FileHandle::pread(void* buf, | |||
std::size_t gds_threshold, | |||
bool sync_default_stream) | |||
{ | |||
KVIKIO_NVTX_MARKER("FileHandle::pread()", size); | |||
auto& [nvtx_color, call_idx] = detail::get_next_color_and_call_idx(); | |||
KVIKIO_NVTX_SCOPED_RANGE("FileHandle::pread()", size, nvtx_color); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with RemoteHandle
, the NVTX marker here is replaced with the scoped range.
786d20b
to
37df91a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I only have a minor suggestion
37df91a
to
32971de
Compare
32971de
to
0e4ecd5
Compare
Performance checkFour I/O benchmarks from libcudf were used to check if this PR causes runtime performance regression. System
Results
parquet_read_io_compression
orc_read_io_compression
json_read_io
csv_read_io
|
I have also tested it in a no-cuda environment with no issue. |
/merge |
// Rename the worker thread in the thread pool to improve clarity from nsys-ui. | ||
// Note: This NVTX feature is currently not supported by nsys-ui. | ||
thread_local std::once_flag call_once_per_thread; | ||
std::call_once(call_once_per_thread, | ||
[] { nvtx_manager::rename_current_thread("thread pool"); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note thread_local
compiles into a critical section that blocks other threads (even in the fast-path "already initialised" case, I think). See https://yosefk.com/blog/cxx-thread-local-storage-performance.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, we could use thread_pool's thread initialization ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…thread initialization (#637) This PR makes the following minor fixes: - Use the correct file permission flags corresponding to the `644` code. - Use the correct flag for the `cuMemHostAlloc` call. - For the thread pool, replace the `thread_local` call-once section (which may negatively affect performance; see #630 (comment)) with more idiomatic worker thread initialization function. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Vukasin Milovanovic (https://github.com/vuule) URL: #637
Should we add this test condition to CI? |
Yes, a smoke test of the C++ examples would be good! |
This PR implements the basic feature outlined in #631.
The two good-to-haves are currently blocked.