-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add max work-group size kernel properties #14518
[SYCL] Add max work-group size kernel properties #14518
Conversation
This patch adds two kernel properties to allow users to specify the maximum work-group size that a kernel will be invoked with. The `max_work_group_size` property corresponds to the `intel::max_work_group_size` function attribute, but can be specified with 1, 2, or 3 dimensions (unlike the attribute which accepts only 3). The `max_total_work_group_size` property is similar but is always a single value which denotes the combined total work-group size. This can be used when the user cannot guarantee a maximum bound in each of the dimensions they wish to run the kernel, but can guarantee a total. This acts similarly to CUDA's `maxThreadsPerBlock` launch bounds property. This patch also wires up the 'max_work_group_size' property to the equivalent SPIR-V execution mode, which should hopefully improve certain use cases.
Split from #14448, @steffenlarsen and @gmlueck |
sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec changes LGTM. Just a minor formatting thing.
sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc
Outdated
Show resolved
Hide resolved
…ties.asciidoc Co-authored-by: Greg Lueck <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a CodeGen lit test to exercise the changes in NVPTX.cpp?
Sorry, do you mean a backend CodeGen LIT test, or a frontend CodeGen test? There are separate (upstream) backend CodeGen tests already for the NVVM annotations, like this one. This PR adds frontend "CodeGen" tests in sycl/test/check_device_code/extensions/properties, as that's where the LLVM IR codegen tests for the other properties takes place. I think this is the same discussion as in #14502. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL part looks ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompileTimePropertiesPass
changes LGTM
sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec changes LGTM
ping @intel/dpcpp-spirv-reviewers, thanks |
These two properties allow the program to specify a maximum work-group size in various ways. They are intended to be targeted from languages such as SYCL (see intel/llvm#14518). This PR implements them for CUDA and Native CPU. It should also be able support them for HIP, in the same fashion. Other adapters using SPIR-V and/or Level Zero would require further changes to both of those specifications.
ping @intel/dpcpp-spirv-reviewers |
These two properties allow the program to specify a maximum work-group size in various ways. They are intended to be targeted from languages such as SYCL (see intel/llvm#14518). This PR implements them for CUDA and Native CPU. It should also be able support them for HIP, in the same fashion. Other adapters using SPIR-V and/or Level Zero would require further changes to both of those specifications.
These two properties allow the program to specify a maximum work-group size in various ways. They are intended to be targeted from languages such as SYCL (see intel/llvm#14518). This PR implements them for CUDA and Native CPU. It should also be able support them for HIP, in the same fashion. Other adapters using SPIR-V and/or Level Zero would require further changes to both of those specifications.
These two properties allow the program to specify a maximum work-group size in various ways. They are intended to be targeted from languages such as SYCL (see intel/llvm#14518). This PR implements them for CUDA and Native CPU. It should also be able support them for HIP, in the same fashion. Other adapters using SPIR-V and/or Level Zero would require further changes to both of those specifications.
These two properties allow the program to specify a maximum work-group size in various ways. They are intended to be targeted from languages such as SYCL (see intel/llvm#14518). This PR implements them for CUDA and Native CPU. It should also be able support them for HIP, in the same fashion. Other adapters using SPIR-V and/or Level Zero would require further changes to both of those specifications.
Three E2E tests failing on linux:
All for the same reason:
Looks like a pre-existing problem. |
Same on Windows:
|
@intel/llvm-gatekeepers this is ready to merge, thanks. The failures are unrelated (see above). |
This patch adds two kernel properties to allow users to specify the maximum work-group size that a kernel will be invoked with.
The
max_work_group_size
property corresponds to theintel::max_work_group_size
function attribute, but can be specified with 1, 2, or 3 dimensions (unlike the attribute which accepts only 3).The
max_linear_work_group_size
property is similar but is always a single value which denotes the combined linear (total) work-group size. This can be used when the user cannot guarantee a maximum bound in each of the dimensions they wish to run the kernel, but can guarantee a total. This acts similarly to CUDA'smaxThreadsPerBlock
launch bounds property.This patch also wires up the 'max_work_group_size' property to the equivalent SPIR-V execution mode, which should hopefully improve certain use cases.