ggml: add hw_accel in data structure #2054

zhouwg · 2024-04-15T02:30:48Z

this PR is for multi purpose:
(1) try to fix issue ggerganov/ggml#795

(2) borrow some advantages from PyTorch(the user could specify whether a GGML OP(such as mulmat) is accelerated by a specify backend)

(3) prepare for submit Qualcomm's QNN backend to upstream GGML from "PoC: Add Qualcomm mobile SoC native backend for GGML,zhouwg/kantv#121 ". whisper.cpp at the first, then llama.cpp, because llama.cpp is much more complicated then whisper.cpp

pls refer to this commit:zhouwg/kantv@ce44da6

or

pls refer to this commit: https://github.com/zhouwg/kantv/blob/kantv-poc-with-qnn/core/ggml/llamacpp/ggml.c#L16137

this is a workaround(it breaks OO principle in internal of original GGML) solution/method for this TODO in original ggml: https://github.com/ggerganov/ggml/blob/master/src/ggml-backend.c#L1127

I personally think this member is not redundant(it's NOT same to existing "backend" in "struct ggml_tensor") and it will NOT bring side-effect to existing codes. of course, I understand we should not bring too much "useful codes" into existing implementation of GGML internal and we should keep GGML as compact/clean as possible.

updated on 04-17-2024, not essential, there is another better method(workaround) for this TODO in original ggml: https://github.com/ggerganov/ggml/blob/master/src/ggml-backend.c#L1127.

in the fact, the "gpu_device" in struct whisper_context_params is same to use_hwaccel semantically. a special value of "gpu_device" could be considered non hardware acceleration or fall into the original default backend.

there are 2 * n combinations here: 2(use_gpu:true/false) * n (gpu_device:0-n)

   struct whisper_context_params {
        bool  use_gpu;
        int   gpu_device;  // CUDA device

        // [EXPERIMENTAL] Token-level timestamps with DTW
        bool dtw_token_timestamps;
        enum whisper_alignment_heads_preset dtw_aheads_preset;

        int dtw_n_top;
        struct whisper_aheads dtw_aheads;

        size_t dtw_mem_size; // TODO: remove
    };

so I'd like to close this PR accordingly.

updated on 04-17-2024,22:58

this member is useful/helpful for some scenarios. so I'd like reopen this PR.

zhouwg · 2024-04-17T15:44:53Z

@hey-shashikant, thanks for your time and approval. could you help to take a look another PR:#2073 (same to this PR)?

this PR is useful&helpful for some scenarios.

Thank again.

zhou.weiguo added 2 commits April 15, 2024 10:26

ggml: add hw_accel in data structure

e75c029

ggml: add hw_accel in data structure

15d5a21

hey-shashikant approved these changes Apr 16, 2024

View reviewed changes

zhouwg closed this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: add hw_accel in data structure #2054

ggml: add hw_accel in data structure #2054

zhouwg commented Apr 15, 2024 •

edited

Loading

zhouwg commented Apr 17, 2024 •

edited

Loading

ggml: add hw_accel in data structure #2054

ggml: add hw_accel in data structure #2054

Conversation

zhouwg commented Apr 15, 2024 • edited Loading

zhouwg commented Apr 17, 2024 • edited Loading

zhouwg commented Apr 15, 2024 •

edited

Loading

zhouwg commented Apr 17, 2024 •

edited

Loading