Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal.jl v1.4 blog post #47

Merged
merged 5 commits into from
Oct 7, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions post/2024-10-02-metal-1.4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
+++
title = "Metal.jl 1.4: Metal.rand"
author = "Christian Guinard"
abstract = """
Metal.jl 1.4 adds higher-quality on-device random number generation from Metal Performance
Shaders. Some limitations apply, with fallback to the previously-existing rand
implementation in those situations."""
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved
+++
{{abstract}}

## Metal.rand and friends
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved

Using functionality provided by the Metal Performance Shaders, improved on-gpu random number
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved
generation has been implemented. Uniform distributions using `Metal.rand` (and its in-place
variant `Metal.rand!`) are available for all Metal-supported integer types and Float32.
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved
However, due to Metal [API](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc)
limitations, 8-bit and 16-bit integers may fall back to the lower-quality GPUArrays.jl random
numbers if their size in bytes is not a multiple of 4. Normally distributed Float32 values can be
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved
generated for with `Metal.randn` and `Metal.randn!`. Float16 is not supported by the Metal
Performance Shaders RNG, and will always fall back to the GPUArrays implementation.
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved

The easiest way to use these is to use the Metal convenience functions `Metal.rand[n][!]`
as you would the usual functions. However, the regular Random.jl methods can also be used
by providing the appropriate `RNG` either from `MPS.default_rng()` or `MPS.RNG()` to the
standard `Random.rand[n][!]` functions.
maleadt marked this conversation as resolved.
Show resolved Hide resolved



## Other improvements since the last blog post

- Since v0.5: `MtlArray` storage mode has been parameterized, allowing one to create a shared storage `MtlArray`
by calling `MtlArray{eltype, ndims, Metal.SharedStorage}(...)`.
- Since v0.3: MPS-accelerated decompositions were added.
- Various performance improvements
- *Many* bug fixes.


## Future work

Although Metal.jl is now in v1, there is still work to be done to make it as fast and
feature-complete. In particular:

- since the last blog post, Metal.jl has started using native ObjectiveC FFI for wrapping
Metal APIs. However, these wrappers have to be written manually for every piece of
Objective-C code. We are looking for help improving Clang.jl and ObjectiveC.jl to enable
the automatic generation of these wrappers. See tracking [issue](https://github.com/JuliaInterop/ObjectiveC.jl/issues/41);
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved
- the MPS wrappers are incomplete, automatic wrapper generation would greatly help with
full MPS support;
- support for atomic operations is missing, which is required to implement a full-featured
KernelAbstractions.jl back-end. See tracking [issue](https://github.com/JuliaGPU/Metal.jl/issues/218);
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved
- full support for BFloat16 values, which has been supported since Metal 3.1 (macOS 14),
is not yet in Metal.jl. See tracking [issue](https://github.com/JuliaGPU/Metal.jl/issues/298);
- some functionality present in CUDA.jl could be ported to Metal.jl to improve usability.
See tracking [issue](https://github.com/JuliaGPU/Metal.jl/issues/443);
- general performance improvements. In particular, improvements to the ObjectiveC.jl type model
could greatly reduce the number of allocations currently necessary for every
Objective-C/Metal operation. See tracking [issue](https://github.com/JuliaInterop/ObjectiveC.jl/issues/13).
christiangnrd marked this conversation as resolved.
Show resolved Hide resolved
Loading