Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. #8297

Merged
merged 3 commits into from
Oct 31, 2023

Conversation

wanmeihuali
Copy link
Contributor

Issue: #8289

Brief Summary

From the CUDA document:
Devices of compute capability 2.x and higher support three variations of __syncthreads() described below.

int __syncthreads_count(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns the number of threads for which predicate evaluates to non-zero.

int __syncthreads_and(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for all of them.

int __syncthreads_or(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for any of them.

This PR just add these three operations for CUDA only, the API looks like:

def sync_all_nonzero(predicate): # __syncthreads_and

def sync_any_nonzero(predicate): # __syncthreads_or

def sync_count_nonzero(predicate): #__syncthreads_count

And the predicate is always expected to be ti.int32

Walkthrough

Overall, the code is just modified from the CUDA WARP operations, the implementation is pretty straightforward. I tried to add some similar tests to the WARP operations, and all tests are passed on my local machine.

@CLAassistant
Copy link

CLAassistant commented Jul 23, 2023

CLA assistant check
All committers have signed the CLA.

@netlify
Copy link

netlify bot commented Jul 23, 2023

Deploy Preview for docsite-preview ready!

Name Link
🔨 Latest commit a8f9bd3
🔍 Latest deploy log https://app.netlify.com/sites/docsite-preview/deploys/653f836f93d6e800086e43d9
😎 Deploy Preview https://deploy-preview-8297--docsite-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@wanmeihuali wanmeihuali changed the title Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. [simt][cuda]Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. Jul 23, 2023
@wanmeihuali wanmeihuali changed the title [simt][cuda]Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. [simt][cuda] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. Jul 23, 2023
@wanmeihuali wanmeihuali changed the title [simt][cuda] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. [simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. Jul 23, 2023
@lin-hitonami
Copy link
Contributor

/rebase

Copy link
Contributor

@lin-hitonami lin-hitonami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for the late review.

@lin-hitonami lin-hitonami merged commit b8d7ffd into taichi-dev:master Oct 31, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants