ggml: softmax op: update the n_task calculation #5126

snadampal · 2024-01-25T21:51:51Z

updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3 (r7g.16xl instance).

fixes##5103

ggml.c

updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.

snadampal mentioned this pull request Jan 25, 2024

[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

Closed

ggerganov reviewed Jan 26, 2024

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

ggml: softmax op: update the n_task calculation

209e413

updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3.

snadampal force-pushed the softmax_ntask branch from f10e67d to 209e413 Compare January 26, 2024 16:06

ggerganov approved these changes Jan 26, 2024

View reviewed changes

ggerganov merged commit 7032f4f into ggml-org:master Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: softmax op: update the n_task calculation #5126

ggml: softmax op: update the n_task calculation #5126

snadampal commented Jan 25, 2024 •

edited

Loading

ggml: softmax op: update the n_task calculation #5126

ggml: softmax op: update the n_task calculation #5126

Conversation

snadampal commented Jan 25, 2024 • edited Loading

snadampal commented Jan 25, 2024 •

edited

Loading