-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference is very slow on Mac m1 #1064
Comments
Is this compiled with the |
Yes, I have this in my cargo toml mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git", branch = "master", features = ["metal"] } |
Seconded. It seemed to take a nosedive about a month ago. |
@hiive I think I may have a solution for your case. On Metal, our preallocation for a large PagedAttention KV cache can cause slowdowns for some reason. I would recommend checking out the @hhamud how much memory is available on your system? |
Mac M1 Pro 32gb |
Describe the bug
I am using a fine tuned model based on microsoft's phi3.5 mini called 'Sciphi/triplex' which aims to help you extract entities and relationships from a piece of text, however it is very slow. It takes approximately 30s when it should take 3s.
Any ideas on why this would be?
Code is here:
Latest commit or version
master branch, latest commit
The text was updated successfully, but these errors were encountered: