[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

snadampal · 2024-01-24T00:29:49Z

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

System: AWS Graviton3, c7g.16xl instance with Ubuntu 22.04
llama.cpp version: latest, commit: 6f9939d

the following commit is capping the num of tasks to 4. I would like to understand why 4?

commit adf3de4f69ff7e44131222f05f9c7447ac0be3cb (HEAD, tag: b1605)
Author: Georgi Gerganov <[email protected]>
Date:   Sun Dec 3 15:56:22 2023 +0200

    ggml : fix soft max out-of-bounds access (#4307)

    ggml-ci

Without 4 and just using the src num rows or n_threads for n_tasks, the prompt eval performance is improved by 4% for DOT kernels and 9% for MMLA kernels (PR)
n_tasks = MIN(n_threads, ggml_nrows(node->src[0]));

Reproducer:
./main -m /llama.cpp/models/open_llama_13b/ggml-model-q8_0.gguf -c 1015 -n 256 -t 64 --file <input_file.txt>

The text was updated successfully, but these errors were encountered:

ggerganov · 2024-01-25T19:46:57Z

Not really sure where the limit of 4 came from - probably I did some measurements and decided that there is no reason to use larger number of tasks. But it's likely something to be revisited and updated

snadampal · 2024-01-25T20:10:54Z

thanks. since it was done to take care of out of bounds issue, could you please share some details about the original issue? so that we can try to arrive at some formula to take care of systems with different number of threads.

ggerganov · 2024-01-25T20:22:38Z

Back then, we had 2 different functions that specified the number of tasks for each op and they had to be kept in sync. If they were not in sync (i.e. return 2 different values), it could cause out-of-bounds access due to small wdata array

Now, we have only one function: ggml_get_n_tasks(), so the original issue is no longer relevant and in theory it should work with any n_task smaller than the number of rows

snadampal · 2024-01-25T21:52:53Z

great! I have raised this PR to update it.

snadampal · 2024-01-26T17:29:21Z

the PR is merged, closing the issue.

snadampal added the bug-unconfirmed label Jan 24, 2024

snadampal changed the title ~~[soft max] capping the num tasks to 4 is regressing the prompt eval perf by 9%~~ [soft max] capping the num tasks to 4 is limiting the prompt eval perf Jan 24, 2024

snadampal mentioned this issue Jan 25, 2024

ggml: softmax op: update the n_task calculation #5126

Merged

snadampal closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

snadampal commented Jan 24, 2024 •

edited

Loading

ggerganov commented Jan 25, 2024

snadampal commented Jan 25, 2024

ggerganov commented Jan 25, 2024

snadampal commented Jan 25, 2024

snadampal commented Jan 26, 2024

[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

Comments

snadampal commented Jan 24, 2024 • edited Loading

ggerganov commented Jan 25, 2024

snadampal commented Jan 25, 2024

ggerganov commented Jan 25, 2024

snadampal commented Jan 25, 2024

snadampal commented Jan 26, 2024

snadampal commented Jan 24, 2024 •

edited

Loading