Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal.jl v1.4 blog post #47

Merged
merged 5 commits into from
Oct 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions post/2024-10-07-metal-1.4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
+++
title = "Metal.jl 1.4: Improved random numbers"
author = "Christian Guinard"
abstract = """
Metal.jl 1.4 adds higher-quality random number generators from the Metal Performance
Shaders library. Some limitations apply, with a fallback to the current implementation
in those situations."""
+++
{{abstract}}


## `Metal.rand` and friends

Using functionality provided by the Metal Performance Shaders (MPS) library, Metal.jl now
comes with much improved GPU random number generators. Uniform distributions using
`Metal.rand` (and its in-place variant `Metal.rand!`) are available for all Metal-supported
integer types and `Float32`. However, due to [Metal API
limitations](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc),
8-bit and 16-bit integers may fall back to the lower-quality GPUArrays.jl random number
generator if their size in bytes is not a multiple of 4. Normally distributed `Float32`
values can be generated for with `Metal.randn` and `Metal.randn!`, while `Float16` is not
supported by the MPS library and will always fall back to the GPUArrays implementation.

The easiest way to use these is to use the Metal convenience functions `Metal.rand[n][!]` as
you would the usual functions from the Random.jl standard library:

```julia-repl
julia> a = Metal.rand(Float32, 2)
2-element MtlVector{Float32, Metal.PrivateStorage}:
0.95755994
0.7110207

julia> Metal.randn!(a)
2-element MtlVector{Float32, Metal.PrivateStorage}:
1.7230463
0.55636907
```

However, the Random.jl methods can also be used by providing the appropriate `RNG` either
from `MPS.default_rng()` or `MPS.RNG()` to the standard `Random.rand[n][!]` functions:


```julia-repl
julia> using Random

julia> rng = MPS.RNG();

julia> Random.rand(rng, 2)
2-element MtlVector{Float32, Metal.PrivateStorage}:
0.8941469
0.67628527
```

Seeding is done by calling `Metal.seed!` for the global RNG, or `Random.seed!` when working
with an explicit `RNG` object.


## Other improvements since the last blog post

- Since v0.5: `MtlArray` storage mode has been parameterized, allowing one to create a
shared storage `MtlArray` by calling `MtlArray{eltype, ndims, Metal.SharedStorage}(...)`.
- Since v0.3: MPS-accelerated decompositions were added.
- Various performance improvements
- *Many* bug fixes.


## Future work

Although Metal.jl is now in v1, there is still work to be done to make it as fast and
feature-complete as possible. In particular:

- Metal.jl is now using native ObjectiveC FFI for wrapping Metal APIs. However, these
wrappers have to be written manually for every piece of Objective-C code. *We are looking
for help with improving Clang.jl and ObjectiveC.jl* to [enable the automatic generation of
these wrappers](https://github.com/JuliaInterop/ObjectiveC.jl/issues/41);
- The MPS wrappers are incomplete, automatic wrapper generation would greatly help with full
MPS support;
- To implement a full-featured KernelAbstractions.jl back-end, Metal atomic operations need
to [be hooked up to Atomix](https://github.com/JuliaGPU/Metal.jl/issues/218);
- [Full support for BFloat16 values](https://github.com/JuliaGPU/Metal.jl/issues/298), which
has been supported since Metal 3.1 (macOS 14), is not yet available in Metal.jl. There is,
however, a [draft PR](https://github.com/JuliaGPU/Metal.jl/pull/446) in the works. Check
it out if you're interested in helping out;
- Some functionality present in CUDA.jl [could be ported to Metal.jl to improve
usability](https://github.com/JuliaGPU/Metal.jl/issues/443).
Loading