-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M1 GPU mps
device integration
#596
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Left a suggestion to make sure that we get GPU tests actually running and passing, as I assume that's the right move here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition! I left some comments and we should also have some documentation around that integration (flagging that BERT has a loss of performance for instance).
Tests can be added in other PRs once we have better access to a machine with M1.
Co-Authored-By: Sylvain Gugger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments for now until the spacing nits are fixed and I can view it better on the website :)
Co-authored-by: Zachary Mueller <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I left some final doc nits for you 😄
Co-Authored-By: Zachary Mueller <[email protected]>
Co-Authored-By: Zachary Mueller <[email protected]>
|
What does this PR do?
mps
device type in PyTorch for faster training and inference than CPU.accelerate config
command:cv_example.py
with and without MPS to gauge the speedup. The speedup being ~7.5X over CPU. This experiment is done on a Mac Pro with M1 chip having 8 CPU performance cores (+2 efficiency cores), 14 GPU cores and 16GB of unified memory.Note: Pre-requisites: Installing torch with
mps
supportAttaching plots showing GPU usage and CPU usage when they are enabled correspondingly:
GPU M1
![Screenshot 2022-08-02 at 4 29 58 PM](https://user-images.githubusercontent.com/13534540/182387427-6c059dd3-b2b2-4dc8-8ad3-1c8bc3f159f2.png)
mps
enabled:Only CPU training:
![Screenshot 2022-08-02 at 6 58 39 PM](https://user-images.githubusercontent.com/13534540/182387566-90e86341-1712-4749-a541-5b7fefdd357a.png)
Note: For
![Screenshot 2022-08-02 at 4 13 15 PM](https://user-images.githubusercontent.com/13534540/182389112-8e3a58d7-5ba8-47db-8487-79e5590ab8f4.png)
nlp_example.py
the time saving is 30% over CPU but the metrics are too bad when compared to CPU-only training. This means certain operations in BERT model are going wrong usingmps
device and this needs to be fixed by PyTorch.