[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796

GeorgeWeb · 2024-06-27T13:16:42Z

This commit implements the experimental urKernelSuggestMaxCooperativeGroupCountExp, for the Cuda adapter, to retrieve the maximum number of cooperative groups that can be launched on the device.

Additionally, the changes also cache the result of the CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT Cuda driver query which is used to calculate the device wide maximum cooperative groups, because the Cuda occupancy query used has per SM (Multiprocessor) semantics.

Testing and related changes enabling querying this from SYCL: intel/llvm#14333

konradkusiak97

Nice, LGTM.

pbalcer · 2024-06-27T16:27:56Z

2024-06-27T14:32:19.4840797Z Failed Tests (1):
2024-06-27T14:32:19.4849324Z   SYCL :: GroupAlgorithm/root_group.cpp

GeorgeWeb · 2024-06-27T16:36:15Z

2024-06-27T14:32:19.4840797Z Failed Tests (1):
2024-06-27T14:32:19.4849324Z   SYCL :: GroupAlgorithm/root_group.cpp

@pbalcer Yeah aware, thanks! The root group barrier is currently not supported correctly for cooperative-group kernels in the CUDA backend, so the intel/llvm corresponding PR will be XFAIL-ing it until it is implemented.

It previously passed because the query was returning a single group and it was calling a work-group level barrier rather than device-wide (cross-work-group).

…ter backend from the sycl runtime This change is required in order to implement per-device semantics for the urKernelSuggestMaxCooperativeGroupCountExp query.

GeorgeWeb · 2024-09-06T11:49:20Z

After last rebase, there's a:

SYCL :: Regression/device_num.cpp

e2e failure that seems unrelated.

GeorgeWeb requested a review from a team as a code owner June 27, 2024 13:16

GeorgeWeb requested a review from konradkusiak97 June 27, 2024 13:16

github-actions bot added the cuda CUDA adapter specific issues label Jun 27, 2024

konradkusiak97 approved these changes Jun 27, 2024

View reviewed changes

GeorgeWeb mentioned this pull request Jun 27, 2024

[SYCL] Implement max_num_work_groups from the launch queries extension intel/llvm#14333

Merged

GeorgeWeb force-pushed the georgi/ur_kernel_max_active_wgs branch 3 times, most recently from 66532d8 to c612317 Compare July 4, 2024 14:06

GeorgeWeb added the experimental Experimental feature additions/changes/specification label Jul 4, 2024

This was referenced Jul 5, 2024

[SYCL][CUDA] XFAIL the root_group barrier test for Cuda until it is implemented correctly intel/llvm#14461

Closed

Failing root_group test when more than 1 work-group is launched. intel/llvm#14462

Closed

GeorgeWeb force-pushed the georgi/ur_kernel_max_active_wgs branch from c612317 to 2359df1 Compare August 13, 2024 12:32

GeorgeWeb force-pushed the georgi/ur_kernel_max_active_wgs branch 2 times, most recently from ac747c3 to 9dcdc62 Compare September 2, 2024 10:48

GeorgeWeb added 2 commits September 6, 2024 11:11

[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda

9b1ad90

Move Cuda device-specific resource limit checking logic into the adap…

45a781f

…ter backend from the sycl runtime This change is required in order to implement per-device semantics for the urKernelSuggestMaxCooperativeGroupCountExp query.

GeorgeWeb force-pushed the georgi/ur_kernel_max_active_wgs branch from 9dcdc62 to 45a781f Compare September 6, 2024 10:11

GeorgeWeb added the ready to merge Added to PR's which are ready to merge label Sep 6, 2024

omarahmed1111 merged commit eb63d1a into oneapi-src:main Sep 10, 2024
71 of 72 checks passed

npmiller mentioned this pull request Sep 16, 2024

cudaOccupancyMaxActiveBlocksPerMultiprocessor #1424

Closed

jinz2014 mentioned this pull request Jan 24, 2025

[HIP] Implement urKernelSuggestMaxCooperativeGroupCountExp for HIP #2617

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796

[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796

GeorgeWeb commented Jun 27, 2024 •

edited

Loading

konradkusiak97 left a comment

pbalcer commented Jun 27, 2024

GeorgeWeb commented Jun 27, 2024 •

edited

Loading

GeorgeWeb commented Sep 6, 2024 •

edited

Loading

[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796

[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796

Conversation

GeorgeWeb commented Jun 27, 2024 • edited Loading

konradkusiak97 left a comment

Choose a reason for hiding this comment

pbalcer commented Jun 27, 2024

GeorgeWeb commented Jun 27, 2024 • edited Loading

GeorgeWeb commented Sep 6, 2024 • edited Loading

GeorgeWeb commented Jun 27, 2024 •

edited

Loading

GeorgeWeb commented Jun 27, 2024 •

edited

Loading

GeorgeWeb commented Sep 6, 2024 •

edited

Loading