[simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. #8297

wanmeihuali · 2023-07-23T22:25:55Z

Brief Summary

From the CUDA document:
Devices of compute capability 2.x and higher support three variations of __syncthreads() described below.

int __syncthreads_count(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns the number of threads for which predicate evaluates to non-zero.

int __syncthreads_and(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for all of them.

int __syncthreads_or(int predicate);

is identical to __syncthreads() with the additional feature that it evaluates predicate for all threads of the block and returns non-zero if and only if predicate evaluates to non-zero for any of them.

This PR just add these three operations for CUDA only, the API looks like:

def sync_all_nonzero(predicate): # __syncthreads_and

def sync_any_nonzero(predicate): # __syncthreads_or

def sync_count_nonzero(predicate): #__syncthreads_count

And the predicate is always expected to be ti.int32

Walkthrough

Overall, the code is just modified from the CUDA WARP operations, the implementation is pretty straightforward. I tried to add some similar tests to the WARP operations, and all tests are passed on my local machine.

CLAassistant · 2023-07-23T22:26:00Z

All committers have signed the CLA.

netlify · 2023-07-23T22:26:08Z

✅ Deploy Preview for docsite-preview ready!

Name	Link
🔨 Latest commit	`a8f9bd3`
🔍 Latest deploy log	https://app.netlify.com/sites/docsite-preview/deploys/653f836f93d6e800086e43d9
😎 Deploy Preview	https://deploy-preview-8297--docsite-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

lin-hitonami · 2023-10-30T10:19:53Z

/rebase

for more information, see https://pre-commit.ci

lin-hitonami

LGTM! Sorry for the late review.

wanmeihuali changed the title ~~Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA.~~ [simt][cuda]Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. Jul 23, 2023

wanmeihuali changed the title ~~[simt][cuda]Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA.~~ [simt][cuda] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. Jul 23, 2023

wanmeihuali changed the title ~~[simt][cuda] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA.~~ [simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. Jul 23, 2023

wanmeihuali and others added 3 commits October 30, 2023 10:20

try support __syncthreads_and

5ac43b1

support all three syncthread functions.

d93d342

[pre-commit.ci] auto fixes from pre-commit.com hooks

a8f9bd3

for more information, see https://pre-commit.ci

taichi-gardener force-pushed the master branch from 5987825 to a8f9bd3 Compare October 30, 2023 10:20

lin-hitonami approved these changes Oct 31, 2023

View reviewed changes

lin-hitonami merged commit b8d7ffd into taichi-dev:master Oct 31, 2023
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. #8297

[simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. #8297

wanmeihuali commented Jul 23, 2023

CLAassistant commented Jul 23, 2023 •

edited

Loading

netlify bot commented Jul 23, 2023 •

edited

Loading

lin-hitonami commented Oct 30, 2023

lin-hitonami left a comment

[simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. #8297

[simt] Support "__syncthreads_and", "__syncthreads_or", and "__syncthreads_count" from CUDA. #8297

Conversation

wanmeihuali commented Jul 23, 2023

Brief Summary

Walkthrough

CLAassistant commented Jul 23, 2023 • edited Loading

netlify bot commented Jul 23, 2023 • edited Loading

✅ Deploy Preview for docsite-preview ready!

lin-hitonami commented Oct 30, 2023

lin-hitonami left a comment

Choose a reason for hiding this comment

CLAassistant commented Jul 23, 2023 •

edited

Loading

netlify bot commented Jul 23, 2023 •

edited

Loading