-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add blog post on OpenCL.jl 0.10. (#49)
- Loading branch information
Showing
1 changed file
with
166 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
+++ | ||
title = "OpenCL.jl 0.10: Now with native Julia kernels" | ||
author = "Tim Besard" | ||
abstract = """ | ||
Version 0.10 of OpenCL.jl is a significant release that adds support for native Julia | ||
kernels. This necessitated a major overhaul of the package's internals, bringing the | ||
package in line with modern Julia GPU programming practices.""" | ||
+++ | ||
|
||
{{abstract}} | ||
|
||
|
||
## Native Julia kernels | ||
|
||
The highlight of this release is the addition of **a compiler that makes it possible to | ||
write OpenCL kernels in Julia** instead of having to use OpenCL C and accompanying | ||
string-based APIs. Let's illustrate using the typical `vadd` vector-additional example, | ||
which starts by generating some data and uploading it to the GPU: | ||
|
||
```julia | ||
using OpenCL | ||
|
||
dims = (2,) | ||
a = round.(rand(Float32, dims) * 100) | ||
b = round.(rand(Float32, dims) * 100) | ||
c = similar(a) | ||
|
||
d_a = CLArray(a) | ||
d_b = CLArray(b) | ||
d_c = CLArray(c) | ||
``` | ||
|
||
The typical way to write a kernel is to use a string with OpenCL C code, which is then | ||
compiled and executed on the GPU. This is done as follows: | ||
|
||
```julia | ||
const source = """ | ||
__kernel void vadd(__global const float *a, | ||
__global const float *b, | ||
__global float *c) { | ||
int i = get_global_id(0); | ||
c[i] = a[i] + b[i]; | ||
}""" | ||
|
||
prog = cl.Program(; source) |> cl.build! | ||
kern = cl.Kernel(prog, "vadd") | ||
|
||
len = prod(dims) | ||
clcall(kern, Tuple{Ptr{Float32}, Ptr{Float32}, Ptr{Float32}}, | ||
d_a, d_b, d_c; global_size=(len,)) | ||
``` | ||
|
||
With the new GPUCompiler.jl-based compiler, you can now write the kernel in Julia just like | ||
with our other back-ends: | ||
|
||
```julia | ||
function vadd(a, b, c) | ||
i = get_global_id() | ||
@inbounds c[i] = a[i] + b[i] | ||
return | ||
end | ||
|
||
len = prod(dims) | ||
@opencl global_size=len vadd(d_a, d_b, d_c) | ||
``` | ||
|
||
This is of course a much more natural way to write kernels, and it also allows for OpenCL.jl | ||
to be plugged into the rest of the JuliaGPU ecosystem. Concretely, OpenCL.jl now implements | ||
the GPUArrays.jl interface, enabling lots of vendor-neutral functionality, and also provides | ||
a KernelAbstractions.jl back-end for use with the plenty of libraries that build on top of | ||
KernelAbstractions.jl. | ||
|
||
There is no free lunch, though, and **the native compiler functionality currently relies on | ||
your OpenCL driver supporting SPIR-V**. This is sadly not a common feature, e.g., neither | ||
NVIDIA or ADM's OpenCL drivers support it, only Intel's. But if you are stuck with a driver | ||
that does not support SPIR-V, there is still hope: SPIR-V can be compiled back to OpenCL C, | ||
using [Google `clspv`](https://github.com/google/clspv). If you are interested, check out | ||
[this issue](https://github.com/JuliaGPU/OpenCL.jl/issues/234) and feel free to reach out. | ||
|
||
|
||
## Breaking API changes | ||
|
||
Existing users of OpenCL.jl will of course have noticed that even the string-based example | ||
above uses a different API than before. In order to support the new compiler, and bring | ||
OpenCL.jl in line with modern Julia programming practices, we have **significantly | ||
overhauled the package's internals as well as some external APIs**. | ||
|
||
The most significant high-level changes include: | ||
|
||
- Memory management is now done using `CLArray`, backed by Shared Virtual Memory (SVM), | ||
instead of opaque buffers. Raw buffers are still supported, but not compatible with native | ||
kernel execution (because they can not be converted to a pointer). | ||
- Kernels are called using the new `clcall` function, which performs automatic conversion | ||
of objects much like how `ccall` works. | ||
|
||
At the lower-level (of the `cl` submodule), the changes are more extensive: | ||
|
||
- Context, device and queue arguments have been removed from most APIs, and are now stored | ||
in task-local storage. These values can be queried (`cl.platform()`, `cl.device()`, etc) | ||
and set (`cl.platform!(platform)`, `cl.device!(device)`, etc) as needed. | ||
- As part of the above change, questionable APIs like `cl.create_some_context()` and | ||
`cl.devices()` have been removed; | ||
- The `Buffer` API has been completely reworked. It now only provides low-level | ||
functionality, such as `unsafe_copyto!` or `unsafe_map!`, while high-level functionality | ||
like `copy!` is implemented for the CLArray type; | ||
- The `cl.info` method, and the `getindex` overloading to access properties of OpenCL | ||
objects, have been replaced by `getproperty` overloading on the objects themselves | ||
(e.g., `cl.info(dev, :name)` and `dev[:name]` are now simply `dev.name`); | ||
- The blocking `cl.launch` has been replaced by a nonblocking `cl.call`, while also removing | ||
the `getindex`-overloading shorthand. However, it's recommended to use the | ||
newly-added `cl.clcall` function, which takes an additional tuple type argument and | ||
performs automatic conversions of arguments to those types. This makes it possible to | ||
pass a `CLArray` to an OpenCL C function expecting Buffer-backed pointers, for example. | ||
- Argument conversion has been removed; the user should make sure Julia arguments passed to | ||
kernels match the OpenCL argument types (i.e., no empty types, 4-element tuples for a | ||
3-element `float3` arguments). | ||
- The `to_host` function has been replaced by simply calling `Array` on the `CLArray`. | ||
- Queue and execution capabilities of a device are now to be queried using dedicated | ||
functions, `cl.queue_properties` and `cl.exec_capabilities`. | ||
|
||
Working towards the first stable version of this package, we anticipate having to make even | ||
more breaking changes. However, we want to get the current changes out there to get feedback | ||
from the community. If some of the removed functionality is crucial to your workflow, feel | ||
free to reach out and we can discuss how to best support it in the future. | ||
|
||
|
||
## JLL-based OpenCL drivers | ||
|
||
Another significant change is the **integration with OpenCL drivers built and provided using | ||
Julia's BinaryBuilder infrastructure**. Over time, this should simplify the installation of | ||
OpenCL drivers by avoiding the need to install global drivers. For now, the only driver | ||
provided as a JLL is a CPU driver based on the [Portable Computing Language (PoCL) | ||
library](https://portablecl.org/). This driver can be used by simply installing and loading | ||
`pocl_jll` before you start using OpenCL.jl: | ||
|
||
```julia-repl | ||
julia> using OpenCL, pocl_jll | ||
julia> OpenCL.versioninfo() | ||
OpenCL.jl version 0.10.0 | ||
Toolchain: | ||
- Julia v1.11.2 | ||
- OpenCL_jll v2024.5.8+1 | ||
Available platforms: 1 | ||
- Portable Computing Language | ||
OpenCL 3.0, PoCL 6.0 Apple, Release, RELOC, SPIR-V, LLVM 16.0.6jl, SLEEF, DISTRO, POCL_DEBUG | ||
· cpu (fp16, fp64, il) | ||
``` | ||
|
||
Notice the `il` capability reported by `OpenCL.versioninfo()`, indicating that PoCL supports | ||
SPIR-V and can thus be used with the new native Julia kernel compiler. In fact, this is one | ||
of the goals of reworking OpenCL.jl: to provide a CPU fallback implementation for use with | ||
Julia GPU libraries. | ||
|
||
|
||
## Work towards OpenCL.jl 1.0 | ||
|
||
This release is a significant step towards a stable 1.0 release of OpenCL.jl, bringing the | ||
package in line with our other Julia GPU-backends. Our focus is on improving OpenCL.jl in | ||
order to support a CPU fallback back-end for KernelAbstractions.jl based on PoCL. If you | ||
are a user of OpenCL.jl, or are interested in using the package in the future, please test | ||
out this release with your application and/or driver, and provide feedback on the changes | ||
we've made. Pull requests are greatly appreciated, and we are happy to help you get started | ||
with contributing to the package. |