-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Alexis Montoison <[email protected]>
- Loading branch information
1 parent
ff65a83
commit a4b6707
Showing
2 changed files
with
191 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
+++ | ||
title = "CUDA.jl 4.0" | ||
author = "Tim Besard" | ||
abstract = """ | ||
CUDA.jl 4.0 is a breaking release that introduces the use of JLLs to provide the CUDA | ||
toolkit. This makes it possible to compile other binary libaries against the CUDA runtime, | ||
and use them together with CUDA.jl. The release also brings CUSPARSE improvements, | ||
the ability to limit memory use, and many bug fixes and performance improvements.""" | ||
+++ | ||
|
||
{{abstract}} | ||
|
||
|
||
## JLLs for CUDA artifacts | ||
|
||
While CUDA.jl has been using binary artifacts for a while, it was manually managing | ||
installation and selection of them, i.e., not by using standardised JLL packages. This | ||
complicated use of the artifacts by other packages, and made it difficult to build other | ||
binary packages against the CUDA runtime. | ||
|
||
With CUDA.jl 4.0, we now use JLLs to load the CUDA driver and runtime. Specifically, there | ||
are two JLLs in play: `CUDA_Driver_jll` and `CUDA_Runtime_jll`. The former is responsible | ||
for loading the CUDA driver library (possibly upgrading it using a forward-compatible | ||
version), and determining the CUDA version that your set-up supports: | ||
|
||
```julia-repl | ||
❯ JULIA_DEBUG=CUDA_Driver_jll julia | ||
julia> using CUDA_Driver_jll | ||
┌ System CUDA driver found at libcuda.so.1, detected as version 12.0.0 | ||
└ @ CUDA_Driver_jll | ||
┌ System CUDA driver is recent enough; not using forward-compatible driver | ||
└ @ CUDA_Driver_jll | ||
``` | ||
|
||
With the driver identified and loaded, `CUDA_Runtime_jll` can select a compatible toolkit. By | ||
default, it uses the latest supported toolkit that is compatible with the driver: | ||
|
||
```julia-repl | ||
julia> using CUDA_Runtime_jll | ||
julia> CUDA_Runtime_jll.cuda_toolkits | ||
10-element Vector{VersionNumber}: | ||
v"10.2.0" | ||
v"11.0.0" | ||
v"11.1.0" | ||
v"11.2.0" | ||
v"11.3.0" | ||
v"11.4.0" | ||
v"11.5.0" | ||
v"11.6.0" | ||
v"11.7.0" | ||
v"11.8.0" | ||
julia> CUDA_Runtime_jll.host_platform | ||
Linux x86_64 {cuda=11.8} | ||
``` | ||
|
||
As you can see, the selected CUDA runtime is encoded in the host platform. This makes it | ||
possible for Julia to automatically select compatible versions of other binary packages. For | ||
example, if we install and load `SuiteSparse_GPU_jll`, which right now [provides | ||
builds](https://github.com/JuliaPackaging/Yggdrasil/blob/2f5a64d9f61d0f1b619367b03b5cecae979ed6d1/S/SuiteSparse/SuiteSparse_GPU/build_tarballs.jl#L104-L126) | ||
for CUDA 10.2, 11.0 and 12.0, the artifact resolution code knows to load the build for CUDA | ||
11.0 which is compatible with the selected CUDA 11.8 runtime: | ||
|
||
``` | ||
julia> using SuiteSparse_GPU_jll | ||
julia> SuiteSparse_GPU_jll.best_wrapper | ||
"~/.julia/packages/SuiteSparse_GPU_jll/.../x86_64-linux-gnu-cuda+11.0.jl" | ||
``` | ||
|
||
The change to JLLs requires a breaking change: the `JULIA_CUDA_VERSION` and | ||
`JULIA_CUDA_USE_BINARYBUILDER` environment variables have been removed, and are replaced by | ||
preferences that are set in the current environment. For convenience, you can set these | ||
preferences by calling `CUDA.set_runtime_version!`: | ||
|
||
```julia-repl | ||
❯ julia --project | ||
julia> using CUDA | ||
julia> CUDA.runtime_version() | ||
v"11.8.0" | ||
julia> CUDA.set_runtime_version!(v"11.7") | ||
┌ Set CUDA Runtime version preference to 11.7, | ||
└ please re-start Julia for this to take effect. | ||
❯ julia --project | ||
julia> using CUDA | ||
julia> CUDA.runtime_version() | ||
v"11.7.0" | ||
julia> using CUDA_Runtime_jll | ||
julia> CUDA_Runtime_jll.host_platform | ||
Linux x86_64 {cuda=11.7} | ||
``` | ||
|
||
The changed preference is reflected in the host platform, which means that you can use this | ||
mechanism to load a different builds of other binary packages. For example, if you rely on a | ||
package or JLL that does not yet have a build for CUDA 12, you could set the preference to | ||
`v"11.x"` to load an available build. | ||
|
||
For discovering a local runtime, you can set the version to `"local"`, which will replace | ||
the use of `CUDA_Runtime_jll` by `CUDA_Runtime_discovery.jl`, an API-compatible package that | ||
replaces the JLL with a local runtime discovery mechanism: | ||
|
||
```julia-repl | ||
❯ julia --project | ||
julia> CUDA.set_runtime_version!("local") | ||
┌ Set CUDA Runtime version preference to local, | ||
└ please re-start Julia for this to take effect. | ||
❯ JULIA_DEBUG=CUDA_Runtime_Discovery julia --project | ||
julia> using CUDA | ||
┌ Looking for CUDA toolkit via environment variables CUDA_PATH | ||
└ @ CUDA_Runtime_Discovery | ||
┌ Looking for binary ptxas in /opt/cuda | ||
│ all_locations = | ||
│ 2-element Vector{String}: | ||
│ "/opt/cuda" | ||
│ "/opt/cuda/bin" | ||
└ @ CUDA_Runtime_Discovery | ||
┌ Debug: Found ptxas at /opt/cuda/bin/ptxas | ||
└ @ CUDA_Runtime_Discovery | ||
... | ||
``` | ||
|
||
|
||
## Memory limits | ||
|
||
By popular demand, support for memory limits has been reinstated. This functionality had | ||
been removed after the switch to CUDA memory pools, as the memory pool allocator does | ||
not yet support memory limits. Awaiting improvements by NVIDIA, we have added functionality | ||
to impose memory limits from the Julia side, in the form of two environment variables: | ||
|
||
- `JULIA_CUDA_SOFT_MEMORY_LIMIT`: This is an advisory limit, used to configure the memory | ||
pool, which will result in the pool being shrunk down to the requested limit at every | ||
synchronization point. That means that the pool may temporarily grow beyond the limit. | ||
This limit is unavailable when disabling memory pools (with `JULIA_CUDA_MEMORY_POOL=none`). | ||
- `JULIA_CUDA_HARD_MEMORY_LIMIT`: This is a hard limit, checked before every allocation. | ||
Doing so is relatively expensive, so it is recommended to use the soft limit instead. | ||
|
||
The value of these variables can be formatted as a numer of bytes, optionally followed by | ||
a unit, or as a percentage of the total device memory. Examples: `100M`, `50%`, `1.5GiB`, | ||
`10000`. | ||
|
||
|
||
## CUSPARSE improvements | ||
|
||
Thanks to the work of [@amontoison](https://github.com/amontoison), the CUSPARSE interface | ||
has undergone many improvements: | ||
|
||
- Better support of the `CuSparseMatrixCOO` format with, in particular, the addition of | ||
`CuSparseMatrixCOO * CuVector` and `CuSparseMatrixCOO * CuMatrix` products; | ||
- Routines specialized for `-`, `+`, `*` operations between sparse matrices | ||
(`CuSparseMatrixCOO`, `CuSparseMatrixCSC` and `CuSparseMatrixCSR`) have been interfaced; | ||
- New generic routines for backward and forward sweeps with sparse triangular matrices are | ||
now used by `\ `; | ||
- `CuMatrix * CuSparseVector` and `CuMatrix * CuSparseMatrix` products have been added; | ||
- Conversions between sparse and dense matrices have been updated for using more recent and | ||
optimized routines; | ||
- High-level Julia functions for the new set of sparse BLAS 1 routines such as dot products | ||
between `CuSparseVector`; | ||
- Add missing dispatchs for `mul!` and `ldiv!` functions; | ||
- Interfacing of almost all new CUSPARSE routines added by the CUDA toolkits `v"11.x"`. | ||
|
||
|
||
## Other changes | ||
|
||
- Removal of the CUDNN, CUTENSOR, CUTENSORNET and CUSTATEVEC submodules: These have been | ||
moved into their own packages, respectively cuDNN.jl, cuTENSOR.jl, cuTensorNet.jl and | ||
cuStateVec.jl (note the change in capitalization, now following NVIDIA's naming scheme); | ||
- Removal of the NVTX submodule: NVTX.jl should be used instead, which is a more complete | ||
implementation of the NVTX API; | ||
- Support for CUDA 11.8 (support for CUDA 12.0 is being worked on); | ||
- Support for Julia 1.9. | ||
|
||
|
||
## Backport releases | ||
|
||
Because CUDA.jl 4.0 is a breaking release, two additional releases have been made that | ||
backport bugfixes and select features: | ||
|
||
- CUDA.jl 3.12.1 and 3.12.2: backports of bugfixes since 3.12 | ||
- CUDA.jl 3.13.0: additionally adding the memory limit functionality |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters