M1 GPU `mps` device integration #596

pacman100 · 2022-08-02T13:34:18Z

What does this PR do?

Enables users to leverage Apple M1 GPUs via mps device type in PyTorch for faster training and inference than CPU.
Sample config after running the accelerate config command:

compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MPS
downcast_bf16: 'no'
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 1
use_cpu: false

run cv_example.py with and without MPS to gauge the speedup. The speedup being ~7.5X over CPU. This experiment is done on a Mac Pro with M1 chip having 8 CPU performance cores (+2 efficiency cores), 14 GPU cores and 16GB of unified memory.

#mps
time accelerate launch --config_file ~/mps_config.yaml ~/Code/accelerate/examples/cv_example.py --data_dir images

#cpu because of `--cpu` arg
time accelerate launch --config_file ~/mps_config.yaml ~/Code/accelerate/examples/cv_example.py --cpu --data_dir images

Device	Training + Evaluation Time (minutes)	Accuracy post training %
CPU	38:19.86	89.38
MPS	5:03.98	89.38

Note: Pre-requisites: Installing torch with mps support

# installing torch with m1 support on mac
# install python 3.10.5
# check the platform
import platform
platform.platform()
'macOS-12.5-arm64-arm-64bit' 
# (This is compatible as the macOS version is above 12.3 and it is the ARM64 version) 
# install torch 1.12 via the below command
# pip3 install torch torchvision torchaudio
# test the `mps` device support
>>> import torch
>>> torch.has_mps
True
>>> a = torch.Tensor([10,11])
>>> a.to("mps")
/Users/mac/ml/lib/python3.10/site-packages/torch/_tensor_str.py:103: UserWarning: The operator 'aten::bitwise_and.Tensor_out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
tensor([10.0000, 11.0000], device='mps:0')

Attaching plots showing GPU usage and CPU usage when they are enabled correspondingly:

GPU M1 mps enabled:

Only CPU training:

Note: For nlp_example.py the time saving is 30% over CPU but the metrics are too bad when compared to CPU-only training. This means certain operations in BERT model are going wrong using mps device and this needs to be fixed by PyTorch.

HuggingFaceDocBuilderDev · 2022-08-02T13:37:56Z

The documentation is not available anymore as the PR was closed or merged.

src/accelerate/accelerator.py

muellerzr

Thanks for this! Left a suggestion to make sure that we get GPU tests actually running and passing, as I assume that's the right move here :)

sgugger

Nice addition! I left some comments and we should also have some documentation around that integration (flagging that BERT has a loss of performance for instance).

Tests can be added in other PRs once we have better access to a machine with M1.

src/accelerate/accelerator.py

src/accelerate/commands/launch.py

src/accelerate/state.py

Co-Authored-By: Sylvain Gugger <[email protected]>

muellerzr

A few comments for now until the spacing nits are fixed and I can view it better on the website :)

docs/source/usage_guides/mps.mdx

src/accelerate/accelerator.py

docs/source/usage_guides/mps.mdx

Co-authored-by: Zachary Mueller <[email protected]>

docs/source/usage_guides/mps.mdx

muellerzr

Great work! I left some final doc nits for you 😄

Co-Authored-By: Zachary Mueller <[email protected]>

docs/source/usage_guides/mps.mdx

faraday · 2022-09-09T16:38:41Z

notebook_launcher doesn't seem to be modified in order to utilize this change that supports MPS device.

pacman100 added 4 commits July 29, 2022 11:16

fixing metric computation

e13f069

refactoring

2a3b8e6

Mac M1 GPU mps device support

0fc0fe5

Update state.py

b042377

pacman100 requested review from muellerzr and sgugger August 2, 2022 13:34

reverting the nlp_example.py changes from the copied branch

0a3ba86

muellerzr reviewed Aug 2, 2022

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

muellerzr reviewed Aug 2, 2022

View reviewed changes

muellerzr linked an issue Aug 2, 2022 that may be closed by this pull request

Add Apple's M1 metal integration #453

Closed

sgugger approved these changes Aug 2, 2022

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

src/accelerate/commands/launch.py Outdated Show resolved Hide resolved

src/accelerate/state.py Outdated Show resolved Hide resolved

pacman100 and others added 3 commits August 2, 2022 21:56

Merge branch 'main' into smangrul/mps-support

1786f25

resolve comments

a0eafde

Co-Authored-By: Sylvain Gugger <[email protected]>

docs quality

9c252c0

muellerzr reviewed Aug 2, 2022

View reviewed changes

docs/source/usage_guides/mps.mdx Outdated Show resolved Hide resolved

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

docs/source/usage_guides/mps.mdx Show resolved Hide resolved

docs/source/usage_guides/mps.mdx Outdated Show resolved Hide resolved

pacman100 and others added 2 commits August 2, 2022 23:04

Update docs/source/usage_guides/mps.mdx

deca6f0

Co-authored-by: Zachary Mueller <[email protected]>

resolving comments

aadd578

muellerzr reviewed Aug 2, 2022

View reviewed changes

docs/source/usage_guides/mps.mdx Outdated Show resolved Hide resolved

muellerzr reviewed Aug 2, 2022

View reviewed changes

docs/source/usage_guides/mps.mdx Outdated Show resolved Hide resolved

muellerzr reviewed Aug 2, 2022

View reviewed changes

docs/source/usage_guides/mps.mdx Show resolved Hide resolved

muellerzr approved these changes Aug 2, 2022

View reviewed changes

pacman100 and others added 4 commits August 2, 2022 23:40

resolving comments

3d4192b

Co-Authored-By: Zachary Mueller <[email protected]>

resolving comments

ba63ecd

resolving comments

4cb709c

resolving comments on docs

4067054

Co-Authored-By: Zachary Mueller <[email protected]>

sgugger approved these changes Aug 3, 2022

View reviewed changes

pacman100 merged commit afa7490 into huggingface:main Aug 3, 2022

pacman100 deleted the smangrul/mps-support branch August 4, 2022 05:47

julien-c reviewed Aug 11, 2022

View reviewed changes

docs/source/usage_guides/mps.mdx Show resolved Hide resolved

pcuenca mentioned this pull request Nov 2, 2022

Fix device_map="auto" on CPU-only envs #797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M1 GPU `mps` device integration #596

M1 GPU `mps` device integration #596

pacman100 commented Aug 2, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 2, 2022 •

edited

Loading

muellerzr left a comment

sgugger left a comment

muellerzr left a comment

muellerzr left a comment

faraday commented Sep 9, 2022 •

edited

Loading

M1 GPU mps device integration #596

M1 GPU mps device integration #596

Conversation

pacman100 commented Aug 2, 2022 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 2, 2022 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

faraday commented Sep 9, 2022 • edited Loading

M1 GPU `mps` device integration #596

M1 GPU `mps` device integration #596

pacman100 commented Aug 2, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 2, 2022 •

edited

Loading

faraday commented Sep 9, 2022 •

edited

Loading