Blogpost on CUDA 4.0 (#28)

Co-authored-by: Alexis Montoison <[email protected]>
JuliaGPU · Feb 2, 2023 · a4b6707 · a4b6707
1 parent ff65a83
commit a4b6707
Show file tree

Hide file tree

Showing 2 changed files with 191 additions and 2 deletions.
diff --git a/post/2023-02-01-cuda_4.0.md b/post/2023-02-01-cuda_4.0.md
@@ -0,0 +1,184 @@
++++
+title = "CUDA.jl 4.0"
+author = "Tim Besard"
+abstract = """
+  CUDA.jl 4.0 is a breaking release that introduces the use of JLLs to provide the CUDA
+  toolkit. This makes it possible to compile other binary libaries against the CUDA runtime,
+  and use them together with CUDA.jl. The release also brings CUSPARSE improvements,
+  the ability to limit memory use, and many bug fixes and performance improvements."""
++++
+
+{{abstract}}
+
+
+## JLLs for CUDA artifacts
+
+While CUDA.jl has been using binary artifacts for a while, it was manually managing
+installation and selection of them, i.e., not by using standardised JLL packages. This
+complicated use of the artifacts by other packages, and made it difficult to build other
+binary packages against the CUDA runtime.
+
+With CUDA.jl 4.0, we now use JLLs to load the CUDA driver and runtime. Specifically, there
+are two JLLs in play: `CUDA_Driver_jll` and `CUDA_Runtime_jll`. The former is responsible
+for loading the CUDA driver library (possibly upgrading it using a forward-compatible
+version), and determining the CUDA version that your set-up supports:
+
+```julia-repl
+❯ JULIA_DEBUG=CUDA_Driver_jll julia
+julia> using CUDA_Driver_jll
+┌ System CUDA driver found at libcuda.so.1, detected as version 12.0.0
+└ @ CUDA_Driver_jll
+┌ System CUDA driver is recent enough; not using forward-compatible driver
+└ @ CUDA_Driver_jll
+```
+
+With the driver identified and loaded, `CUDA_Runtime_jll` can select a compatible toolkit. By
+default, it uses the latest supported toolkit that is compatible with the driver:
+
+```julia-repl
+julia> using CUDA_Runtime_jll
+
+julia> CUDA_Runtime_jll.cuda_toolkits
+10-element Vector{VersionNumber}:
+ v"10.2.0"
+ v"11.0.0"
+ v"11.1.0"
+ v"11.2.0"
+ v"11.3.0"
+ v"11.4.0"
+ v"11.5.0"
+ v"11.6.0"
+ v"11.7.0"
+ v"11.8.0"
+
+julia> CUDA_Runtime_jll.host_platform
+Linux x86_64 {cuda=11.8}
+```
+
+As you can see, the selected CUDA runtime is encoded in the host platform. This makes it
+possible for Julia to automatically select compatible versions of other binary packages. For
+example, if we install and load `SuiteSparse_GPU_jll`, which right now [provides
+builds](https://github.com/JuliaPackaging/Yggdrasil/blob/2f5a64d9f61d0f1b619367b03b5cecae979ed6d1/S/SuiteSparse/SuiteSparse_GPU/build_tarballs.jl#L104-L126)
+for CUDA 10.2, 11.0 and 12.0, the artifact resolution code knows to load the build for CUDA
+11.0 which is compatible with the selected CUDA 11.8 runtime:
+
+```
+julia> using SuiteSparse_GPU_jll
+
+julia> SuiteSparse_GPU_jll.best_wrapper
+"~/.julia/packages/SuiteSparse_GPU_jll/.../x86_64-linux-gnu-cuda+11.0.jl"
+```
+
+The change to JLLs requires a breaking change: the `JULIA_CUDA_VERSION` and
+`JULIA_CUDA_USE_BINARYBUILDER` environment variables have been removed, and are replaced by
+preferences that are set in the current environment. For convenience, you can set these
+preferences by calling `CUDA.set_runtime_version!`:
+
+```julia-repl
+❯ julia --project
+julia> using CUDA
+julia> CUDA.runtime_version()
+v"11.8.0"
+
+julia> CUDA.set_runtime_version!(v"11.7")
+┌ Set CUDA Runtime version preference to 11.7,
+└ please re-start Julia for this to take effect.
+
+❯ julia --project
+julia> using CUDA
+julia> CUDA.runtime_version()
+v"11.7.0"
+
+julia> using CUDA_Runtime_jll
+julia> CUDA_Runtime_jll.host_platform
+Linux x86_64 {cuda=11.7}
+```
+
+The changed preference is reflected in the host platform, which means that you can use this
+mechanism to load a different builds of other binary packages. For example, if you rely on a
+package or JLL that does not yet have a build for CUDA 12, you could set the preference to
+`v"11.x"` to load an available build.
+
+For discovering a local runtime, you can set the version to `"local"`, which will replace
+the use of `CUDA_Runtime_jll` by `CUDA_Runtime_discovery.jl`, an API-compatible package that
+replaces the JLL with a local runtime discovery mechanism:
+
+```julia-repl
+❯ julia --project
+julia> CUDA.set_runtime_version!("local")
+┌ Set CUDA Runtime version preference to local,
+└ please re-start Julia for this to take effect.
+
+❯ JULIA_DEBUG=CUDA_Runtime_Discovery julia --project
+julia> using CUDA
+┌ Looking for CUDA toolkit via environment variables CUDA_PATH
+└ @ CUDA_Runtime_Discovery
+┌ Looking for binary ptxas in /opt/cuda
+│   all_locations =
+│    2-element Vector{String}:
+│     "/opt/cuda"
+│     "/opt/cuda/bin"
+└ @ CUDA_Runtime_Discovery
+┌ Debug: Found ptxas at /opt/cuda/bin/ptxas
+└ @ CUDA_Runtime_Discovery
+...
+```
+
+
+## Memory limits
+
+By popular demand, support for memory limits has been reinstated. This functionality had
+been removed after the switch to CUDA memory pools, as the memory pool allocator does
+not yet support memory limits. Awaiting improvements by NVIDIA, we have added functionality
+to impose memory limits from the Julia side, in the form of two environment variables:
+
+- `JULIA_CUDA_SOFT_MEMORY_LIMIT`: This is an advisory limit, used to configure the memory
+  pool, which will result in the pool being shrunk down to the requested limit at every
+  synchronization point. That means that the pool may temporarily grow beyond the limit.
+  This limit is unavailable when disabling memory pools (with `JULIA_CUDA_MEMORY_POOL=none`).
+- `JULIA_CUDA_HARD_MEMORY_LIMIT`: This is a hard limit, checked before every allocation.
+  Doing so is relatively expensive, so it is recommended to use the soft limit instead.
+
+The value of these variables can be formatted as a numer of bytes, optionally followed by
+a unit, or as a percentage of the total device memory. Examples: `100M`, `50%`, `1.5GiB`,
+`10000`.
+
+
+## CUSPARSE improvements
+
+Thanks to the work of [@amontoison](https://github.com/amontoison), the CUSPARSE interface
+has undergone many improvements:
+
+- Better support of the `CuSparseMatrixCOO` format with, in particular, the addition of
+  `CuSparseMatrixCOO * CuVector` and `CuSparseMatrixCOO * CuMatrix` products;
+- Routines specialized for `-`, `+`, `*` operations between sparse matrices
+  (`CuSparseMatrixCOO`, `CuSparseMatrixCSC` and `CuSparseMatrixCSR`) have been interfaced;
+- New generic routines for backward and forward sweeps with sparse triangular matrices are
+  now used by `\ `;
+- `CuMatrix * CuSparseVector` and `CuMatrix * CuSparseMatrix` products have been added;
+- Conversions between sparse and dense matrices have been updated for using more recent and
+  optimized routines;
+- High-level Julia functions for the new set of sparse BLAS 1 routines such as dot products
+  between `CuSparseVector`;
+- Add missing dispatchs for `mul!` and `ldiv!` functions;
+- Interfacing of almost all new CUSPARSE routines added by the CUDA toolkits `v"11.x"`.
+
+
+## Other changes
+
+- Removal of the CUDNN, CUTENSOR, CUTENSORNET and CUSTATEVEC submodules: These have been
+  moved into their own packages, respectively cuDNN.jl, cuTENSOR.jl, cuTensorNet.jl and
+  cuStateVec.jl (note the change in capitalization, now following NVIDIA's naming scheme);
+- Removal of the NVTX submodule: NVTX.jl should be used instead, which is a more complete
+  implementation of the NVTX API;
+- Support for CUDA 11.8 (support for CUDA 12.0 is being worked on);
+- Support for Julia 1.9.
+
+
+## Backport releases
+
+Because CUDA.jl 4.0 is a breaking release, two additional releases have been made that
+backport bugfixes and select features:
+
+- CUDA.jl 3.12.1 and 3.12.2: backports of bugfixes since 3.12
+- CUDA.jl 3.13.0: additionally adding the memory limit functionality
diff --git a/utils.jl b/utils.jl
@@ -69,7 +69,9 @@ function blogpost_entry(fpath)
         rpath = joinpath(rpath, "index.md")
     end
     hidden = pagevar(rpath, :hidden)
-    !isnothing(hidden) && hidden && return nothing
+    if hidden === true
+        return nothing
+    end
     title = pagevar(rpath, :title)::String
     y, m, d = getdate(fpath)
     rpath = replace(fpath, r"\.md$" => "")
@@ -84,7 +86,10 @@ end
 
 function hfun_blogposts()
     io = IOBuffer()
-    elements = filter!(!=("index.md"), readdir("post"))
+    elements = filter(readdir("post")) do entry
+        entry in ("index.md", ".DS_Store") && return false
+        return true
+    end
     entries = [blogpost_entry(fp) for fp in elements]
     entries = [e for e in entries if !isnothing(e)]
     append!(entries, blogpost_external_entries())