-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796
[CUDA] Implement urKernelSuggestMaxCooperativeGroupCountExp for Cuda #1796
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM.
|
@pbalcer Yeah aware, thanks! The root group barrier is currently not supported correctly for cooperative-group kernels in the CUDA backend, so the intel/llvm corresponding PR will be It previously passed because the query was returning a single group and it was calling a work-group level barrier rather than device-wide (cross-work-group). |
66532d8
to
c612317
Compare
c612317
to
2359df1
Compare
ac747c3
to
9dcdc62
Compare
…ter backend from the sycl runtime This change is required in order to implement per-device semantics for the urKernelSuggestMaxCooperativeGroupCountExp query.
9dcdc62
to
45a781f
Compare
After last rebase, there's a: SYCL :: Regression/device_num.cpp e2e failure that seems unrelated. |
This commit implements the experimental
urKernelSuggestMaxCooperativeGroupCountExp
, for the Cuda adapter, to retrieve the maximum number of cooperative groups that can be launched on the device.Additionally, the changes also cache the result of the
CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT
Cuda driver query which is used to calculate the device wide maximum cooperative groups, because the Cuda occupancy query used has per SM (Multiprocessor) semantics.Testing and related changes enabling querying this from SYCL: intel/llvm#14333