Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ggml : update softmax n_task calculation (ggml-org#5126)
updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.
- Loading branch information