-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncth…
…reads_count" from CUDA. (#8297) Issue: #8289 ### Brief Summary From the CUDA document: Devices of compute capability 2.x and higher support three variations of __syncthreads() described below. ```cpp int __syncthreads_count(int predicate); ``` is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns the number of threads for which predicate evaluates to non-zero. ```cpp int __syncthreads_and(int predicate); ``` is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for all of them. ```cpp int __syncthreads_or(int predicate); ``` is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for any of them. This PR just add these three operations for CUDA only, the API looks like: ```python def sync_all_nonzero(predicate): # __syncthreads_and def sync_any_nonzero(predicate): # __syncthreads_or def sync_count_nonzero(predicate): #__syncthreads_count ``` And the predicate is always expected to be ti.int32 ### Walkthrough Overall, the code is just modified from the CUDA WARP operations, the implementation is pretty straightforward. I tried to add some similar tests to the WARP operations, and all tests are passed on my local machine. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
1ae0e46
commit b8d7ffd
Showing
6 changed files
with
129 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters