Skip to content

Commit

Permalink
Blogpost on CUDA 4.0 (#28)
Browse files Browse the repository at this point in the history
Co-authored-by: Alexis Montoison <[email protected]>
  • Loading branch information
maleadt and amontoison authored Feb 2, 2023
1 parent ff65a83 commit a4b6707
Show file tree
Hide file tree
Showing 2 changed files with 191 additions and 2 deletions.
184 changes: 184 additions & 0 deletions post/2023-02-01-cuda_4.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
+++
title = "CUDA.jl 4.0"
author = "Tim Besard"
abstract = """
CUDA.jl 4.0 is a breaking release that introduces the use of JLLs to provide the CUDA
toolkit. This makes it possible to compile other binary libaries against the CUDA runtime,
and use them together with CUDA.jl. The release also brings CUSPARSE improvements,
the ability to limit memory use, and many bug fixes and performance improvements."""
+++

{{abstract}}


## JLLs for CUDA artifacts

While CUDA.jl has been using binary artifacts for a while, it was manually managing
installation and selection of them, i.e., not by using standardised JLL packages. This
complicated use of the artifacts by other packages, and made it difficult to build other
binary packages against the CUDA runtime.

With CUDA.jl 4.0, we now use JLLs to load the CUDA driver and runtime. Specifically, there
are two JLLs in play: `CUDA_Driver_jll` and `CUDA_Runtime_jll`. The former is responsible
for loading the CUDA driver library (possibly upgrading it using a forward-compatible
version), and determining the CUDA version that your set-up supports:

```julia-repl
❯ JULIA_DEBUG=CUDA_Driver_jll julia
julia> using CUDA_Driver_jll
┌ System CUDA driver found at libcuda.so.1, detected as version 12.0.0
└ @ CUDA_Driver_jll
┌ System CUDA driver is recent enough; not using forward-compatible driver
└ @ CUDA_Driver_jll
```

With the driver identified and loaded, `CUDA_Runtime_jll` can select a compatible toolkit. By
default, it uses the latest supported toolkit that is compatible with the driver:

```julia-repl
julia> using CUDA_Runtime_jll
julia> CUDA_Runtime_jll.cuda_toolkits
10-element Vector{VersionNumber}:
v"10.2.0"
v"11.0.0"
v"11.1.0"
v"11.2.0"
v"11.3.0"
v"11.4.0"
v"11.5.0"
v"11.6.0"
v"11.7.0"
v"11.8.0"
julia> CUDA_Runtime_jll.host_platform
Linux x86_64 {cuda=11.8}
```

As you can see, the selected CUDA runtime is encoded in the host platform. This makes it
possible for Julia to automatically select compatible versions of other binary packages. For
example, if we install and load `SuiteSparse_GPU_jll`, which right now [provides
builds](https://github.com/JuliaPackaging/Yggdrasil/blob/2f5a64d9f61d0f1b619367b03b5cecae979ed6d1/S/SuiteSparse/SuiteSparse_GPU/build_tarballs.jl#L104-L126)
for CUDA 10.2, 11.0 and 12.0, the artifact resolution code knows to load the build for CUDA
11.0 which is compatible with the selected CUDA 11.8 runtime:

```
julia> using SuiteSparse_GPU_jll
julia> SuiteSparse_GPU_jll.best_wrapper
"~/.julia/packages/SuiteSparse_GPU_jll/.../x86_64-linux-gnu-cuda+11.0.jl"
```

The change to JLLs requires a breaking change: the `JULIA_CUDA_VERSION` and
`JULIA_CUDA_USE_BINARYBUILDER` environment variables have been removed, and are replaced by
preferences that are set in the current environment. For convenience, you can set these
preferences by calling `CUDA.set_runtime_version!`:

```julia-repl
❯ julia --project
julia> using CUDA
julia> CUDA.runtime_version()
v"11.8.0"
julia> CUDA.set_runtime_version!(v"11.7")
┌ Set CUDA Runtime version preference to 11.7,
└ please re-start Julia for this to take effect.
❯ julia --project
julia> using CUDA
julia> CUDA.runtime_version()
v"11.7.0"
julia> using CUDA_Runtime_jll
julia> CUDA_Runtime_jll.host_platform
Linux x86_64 {cuda=11.7}
```

The changed preference is reflected in the host platform, which means that you can use this
mechanism to load a different builds of other binary packages. For example, if you rely on a
package or JLL that does not yet have a build for CUDA 12, you could set the preference to
`v"11.x"` to load an available build.

For discovering a local runtime, you can set the version to `"local"`, which will replace
the use of `CUDA_Runtime_jll` by `CUDA_Runtime_discovery.jl`, an API-compatible package that
replaces the JLL with a local runtime discovery mechanism:

```julia-repl
❯ julia --project
julia> CUDA.set_runtime_version!("local")
┌ Set CUDA Runtime version preference to local,
└ please re-start Julia for this to take effect.
❯ JULIA_DEBUG=CUDA_Runtime_Discovery julia --project
julia> using CUDA
┌ Looking for CUDA toolkit via environment variables CUDA_PATH
└ @ CUDA_Runtime_Discovery
┌ Looking for binary ptxas in /opt/cuda
│ all_locations =
│ 2-element Vector{String}:
│ "/opt/cuda"
│ "/opt/cuda/bin"
└ @ CUDA_Runtime_Discovery
┌ Debug: Found ptxas at /opt/cuda/bin/ptxas
└ @ CUDA_Runtime_Discovery
...
```


## Memory limits

By popular demand, support for memory limits has been reinstated. This functionality had
been removed after the switch to CUDA memory pools, as the memory pool allocator does
not yet support memory limits. Awaiting improvements by NVIDIA, we have added functionality
to impose memory limits from the Julia side, in the form of two environment variables:

- `JULIA_CUDA_SOFT_MEMORY_LIMIT`: This is an advisory limit, used to configure the memory
pool, which will result in the pool being shrunk down to the requested limit at every
synchronization point. That means that the pool may temporarily grow beyond the limit.
This limit is unavailable when disabling memory pools (with `JULIA_CUDA_MEMORY_POOL=none`).
- `JULIA_CUDA_HARD_MEMORY_LIMIT`: This is a hard limit, checked before every allocation.
Doing so is relatively expensive, so it is recommended to use the soft limit instead.

The value of these variables can be formatted as a numer of bytes, optionally followed by
a unit, or as a percentage of the total device memory. Examples: `100M`, `50%`, `1.5GiB`,
`10000`.


## CUSPARSE improvements

Thanks to the work of [@amontoison](https://github.com/amontoison), the CUSPARSE interface
has undergone many improvements:

- Better support of the `CuSparseMatrixCOO` format with, in particular, the addition of
`CuSparseMatrixCOO * CuVector` and `CuSparseMatrixCOO * CuMatrix` products;
- Routines specialized for `-`, `+`, `*` operations between sparse matrices
(`CuSparseMatrixCOO`, `CuSparseMatrixCSC` and `CuSparseMatrixCSR`) have been interfaced;
- New generic routines for backward and forward sweeps with sparse triangular matrices are
now used by `\ `;
- `CuMatrix * CuSparseVector` and `CuMatrix * CuSparseMatrix` products have been added;
- Conversions between sparse and dense matrices have been updated for using more recent and
optimized routines;
- High-level Julia functions for the new set of sparse BLAS 1 routines such as dot products
between `CuSparseVector`;
- Add missing dispatchs for `mul!` and `ldiv!` functions;
- Interfacing of almost all new CUSPARSE routines added by the CUDA toolkits `v"11.x"`.


## Other changes

- Removal of the CUDNN, CUTENSOR, CUTENSORNET and CUSTATEVEC submodules: These have been
moved into their own packages, respectively cuDNN.jl, cuTENSOR.jl, cuTensorNet.jl and
cuStateVec.jl (note the change in capitalization, now following NVIDIA's naming scheme);
- Removal of the NVTX submodule: NVTX.jl should be used instead, which is a more complete
implementation of the NVTX API;
- Support for CUDA 11.8 (support for CUDA 12.0 is being worked on);
- Support for Julia 1.9.


## Backport releases

Because CUDA.jl 4.0 is a breaking release, two additional releases have been made that
backport bugfixes and select features:

- CUDA.jl 3.12.1 and 3.12.2: backports of bugfixes since 3.12
- CUDA.jl 3.13.0: additionally adding the memory limit functionality
9 changes: 7 additions & 2 deletions utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@ function blogpost_entry(fpath)
rpath = joinpath(rpath, "index.md")
end
hidden = pagevar(rpath, :hidden)
!isnothing(hidden) && hidden && return nothing
if hidden === true
return nothing
end
title = pagevar(rpath, :title)::String
y, m, d = getdate(fpath)
rpath = replace(fpath, r"\.md$" => "")
Expand All @@ -84,7 +86,10 @@ end

function hfun_blogposts()
io = IOBuffer()
elements = filter!(!=("index.md"), readdir("post"))
elements = filter(readdir("post")) do entry
entry in ("index.md", ".DS_Store") && return false
return true
end
entries = [blogpost_entry(fp) for fp in elements]
entries = [e for e in entries if !isnothing(e)]
append!(entries, blogpost_external_entries())
Expand Down

0 comments on commit a4b6707

Please sign in to comment.