Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add max work-group size kernel properties #14518

Merged
merged 30 commits into from
Sep 25, 2024

Conversation

frasercrmck
Copy link
Contributor

@frasercrmck frasercrmck commented Jul 10, 2024

This patch adds two kernel properties to allow users to specify the maximum work-group size that a kernel will be invoked with.

The max_work_group_size property corresponds to the intel::max_work_group_size function attribute, but can be specified with 1, 2, or 3 dimensions (unlike the attribute which accepts only 3).

The max_linear_work_group_size property is similar but is always a single value which denotes the combined linear (total) work-group size. This can be used when the user cannot guarantee a maximum bound in each of the dimensions they wish to run the kernel, but can guarantee a total. This acts similarly to CUDA's maxThreadsPerBlock launch bounds property.

This patch also wires up the 'max_work_group_size' property to the equivalent SPIR-V execution mode, which should hopefully improve certain use cases.

This patch adds two kernel properties to allow users to specify the
maximum work-group size that a kernel will be invoked with.

The `max_work_group_size` property corresponds to the
`intel::max_work_group_size` function attribute, but can be specified
with 1, 2, or 3 dimensions (unlike the attribute which accepts only 3).

The `max_total_work_group_size` property is similar but is always a
single value which denotes the combined total work-group size. This can
be used when the user cannot guarantee a maximum bound in each of the
dimensions they wish to run the kernel, but can guarantee a total. This
acts similarly to CUDA's `maxThreadsPerBlock` launch bounds property.

This patch also wires up the 'max_work_group_size' property to the
equivalent SPIR-V execution mode, which should hopefully improve certain
use cases.
@frasercrmck frasercrmck requested review from a team as code owners July 10, 2024 14:41
@frasercrmck
Copy link
Contributor Author

Split from #14448, @steffenlarsen and @gmlueck

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec changes LGTM. Just a minor formatting thing.

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a CodeGen lit test to exercise the changes in NVPTX.cpp?

@frasercrmck
Copy link
Contributor Author

Could you please add a CodeGen lit test to exercise the changes in NVPTX.cpp?

Sorry, do you mean a backend CodeGen LIT test, or a frontend CodeGen test?

There are separate (upstream) backend CodeGen tests already for the NVVM annotations, like this one.

This PR adds frontend "CodeGen" tests in sycl/test/check_device_code/extensions/properties, as that's where the LLVM IR codegen tests for the other properties takes place. I think this is the same discussion as in #14502.

Copy link
Contributor

@KseniyaTikhomirova KseniyaTikhomirova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL part looks ok to me.

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompileTimePropertiesPass changes LGTM

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec changes LGTM

@frasercrmck
Copy link
Contributor Author

ping @intel/dpcpp-spirv-reviewers, thanks

frasercrmck added a commit to frasercrmck/unified-runtime that referenced this pull request Aug 29, 2024
These two properties allow the program to specify a maximum work-group
size in various ways.

They are intended to be targeted from languages such as SYCL (see
intel/llvm#14518).

This PR implements them for CUDA and Native CPU. It should also be able
support them for HIP, in the same fashion. Other adapters using SPIR-V
and/or Level Zero would require further changes to both of those
specifications.
@frasercrmck
Copy link
Contributor Author

ping @intel/dpcpp-spirv-reviewers

frasercrmck added a commit to frasercrmck/unified-runtime that referenced this pull request Sep 10, 2024
These two properties allow the program to specify a maximum work-group
size in various ways.

They are intended to be targeted from languages such as SYCL (see
intel/llvm#14518).

This PR implements them for CUDA and Native CPU. It should also be able
support them for HIP, in the same fashion. Other adapters using SPIR-V
and/or Level Zero would require further changes to both of those
specifications.
frasercrmck added a commit to frasercrmck/unified-runtime that referenced this pull request Sep 11, 2024
These two properties allow the program to specify a maximum work-group
size in various ways.

They are intended to be targeted from languages such as SYCL (see
intel/llvm#14518).

This PR implements them for CUDA and Native CPU. It should also be able
support them for HIP, in the same fashion. Other adapters using SPIR-V
and/or Level Zero would require further changes to both of those
specifications.
frasercrmck added a commit to frasercrmck/unified-runtime that referenced this pull request Sep 12, 2024
These two properties allow the program to specify a maximum work-group
size in various ways.

They are intended to be targeted from languages such as SYCL (see
intel/llvm#14518).

This PR implements them for CUDA and Native CPU. It should also be able
support them for HIP, in the same fashion. Other adapters using SPIR-V
and/or Level Zero would require further changes to both of those
specifications.
frasercrmck added a commit to frasercrmck/unified-runtime that referenced this pull request Sep 24, 2024
These two properties allow the program to specify a maximum work-group
size in various ways.

They are intended to be targeted from languages such as SYCL (see
intel/llvm#14518).

This PR implements them for CUDA and Native CPU. It should also be able
support them for HIP, in the same fashion. Other adapters using SPIR-V
and/or Level Zero would require further changes to both of those
specifications.
@frasercrmck
Copy link
Contributor Author

Three E2E tests failing on linux:

Failed Tests (3):
  SYCL :: DeviceDependencies/free_function_kernels.cpp
  SYCL :: LLVMIntrinsicLowering/bitreverse.cpp
  SYCL :: LLVMIntrinsicLowering/sub_byte_bitreverse.cpp

All for the same reason:

clang++: error: option '-fno-sycl-allow-device-dependencies' is deprecated and will be removed in a future release

Looks like a pre-existing problem.

@frasercrmck
Copy link
Contributor Author

Same on Windows:

********************
Failed Tests (2):
  SYCL :: DeviceDependencies/free_function_kernels.cpp
  SYCL :: LLVMIntrinsicLowering/bitreverse.cpp

@frasercrmck
Copy link
Contributor Author

@intel/llvm-gatekeepers this is ready to merge, thanks. The failures are unrelated (see above).

@ldrumm ldrumm merged commit af8361c into intel:sycl Sep 25, 2024
11 of 13 checks passed
@frasercrmck frasercrmck deleted the sycl-max-wg-size-kernel-props branch September 25, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants