ggml: add hw_accel in data structure #2054
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this PR is for multi purpose:
(1) try to fix issue ggerganov/ggml#795
(2) borrow some advantages from PyTorch(the user could specify whether a GGML OP(such as mulmat) is accelerated by a specify backend)
(3) prepare for submit Qualcomm's QNN backend to upstream GGML from "PoC: Add Qualcomm mobile SoC native backend for GGML,zhouwg/kantv#121 ". whisper.cpp at the first, then llama.cpp, because llama.cpp is much more complicated then whisper.cpp
pls refer to this commit:zhouwg/kantv@ce44da6
or
pls refer to this commit: https://github.com/zhouwg/kantv/blob/kantv-poc-with-qnn/core/ggml/llamacpp/ggml.c#L16137
this is a workaround(it breaks OO principle in internal of original GGML) solution/method for this TODO in original ggml: https://github.com/ggerganov/ggml/blob/master/src/ggml-backend.c#L1127
I personally think this member is not redundant(it's NOT same to existing "backend" in "struct ggml_tensor") and it will NOT bring side-effect to existing codes. of course, I understand we should not bring too much "useful codes" into existing implementation of GGML internal and we should keep GGML as compact/clean as possible.
updated on 04-17-2024, not essential, there is another better method(workaround) for this TODO in original ggml: https://github.com/ggerganov/ggml/blob/master/src/ggml-backend.c#L1127.
in the fact, the "gpu_device" in struct whisper_context_params is same to use_hwaccel semantically. a special value of "gpu_device" could be considered non hardware acceleration or fall into the original default backend.
there are 2 * n combinations here: 2(use_gpu:true/false) * n (gpu_device:0-n)
so I'd like to close this PR accordingly.
updated on 04-17-2024,22:58
this member is useful/helpful for some scenarios. so I'd like reopen this PR.