Update README with OpenBlas support #105

paolorechia · 2025-02-24T20:26:36Z

This should work and provide some speedup depending on your CPU (tested compiling on original whisper.cpp repo: https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#blas-cpu-support-via-openblas)

Package installs without any issues.

TODO: test bindings work fine.

absadiki · 2025-02-25T02:51:34Z

Thanks @paolorechia for taking the time to test the package on the other backends.
You can run the test suite on the test folder to check if everything is Okey.
Let me know if it works and I'll go ahead and merge this PR.

paolorechia · 2025-02-25T05:27:19Z

Thanks @paolorechia for taking the time to test the package on the other backends. You can run the test suite on the test folder to check if everything is Okey. Let me know if it works and I'll go ahead and merge this PR.

Oh, thanks for letting me know, that’s useful. I was testing with my own code.

Will run the test suite when I get the chance.

I was able to compile with both OpenBLAS and OpenVINO flags but it doesn’t seem like the pybindings load the OpenVINO model automatically like the original whisper-cpp does - or I did something wrong.

Any ideas if your library supports OpenVINO?

absadiki · 2025-02-26T01:08:40Z

I haven't tested the library with OpenVINO, but if the compilation was successful then it'll probably work.
According to the documentation, I don't think whisper.cpp loads the openVINO model automatically, you need to convert it to the IR representation first and place it in the same folder as the ggml model to be loaded at runtime.

paolorechia · 2025-02-26T07:38:18Z

@absadiki you're absolutely right, I was able to convert a model and load it with whisper-cpp cli and whisper-server. So maybe if I just copy the converted model to the right path the pywhispercpp will pick it up?

Tests are passing

paolorechia · 2025-02-26T07:46:57Z

Unfortunately, I don't think it's picking up the OpenVINO model automatically:

from pywhispercpp.model import Model

WHISPER_MODEL = "/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin"
Model(WHISPER_MODEL)

whisper_init_from_file_with_params_no_state: loading model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   31.46 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   36.13 MB
whisper_init_state: compute buffer (encode) =  926.53 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =   99.10 MB
whisper_full_with_state: input is too short - 950 ms < 1000 ms. consider padding the input audio with silence

Where running the CLI loads it instead:

./build/bin/whisper-cli -m /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin -f short-test.wav -t 16
whisper_init_from_file_with_params_no_state: loading model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init_gpu: no GPU found
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   10.49 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   36.13 MB
whisper_init_state: compute buffer (encode) =  212.29 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =   99.10 MB
whisper_ctx_init_openvino_encoder_with_state: loading OpenVINO model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder_with_state: first run on a device may take a while ...
whisper_openvino_init: path_model = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml, device = CPU, cache_dir = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino-cache
whisper_ctx_init_openvino_encoder_with_state: OpenVINO model loaded

absadiki · 2025-02-28T23:44:38Z

@paolorechia, I've checked the whisper-cli source code, I thought that openvino was similar to the other backends but it turns out that we need to initialize the openvino encoder manually. Please checkout to this branch and give it a try.
I've added the openvino parameters to the model class, you can use it as follows:

model = Model(model_path, use_openvino=True) # this will work similarly to the whisper-cli

Let me know if you find any issues ?

paolorechia · 2025-03-01T08:22:15Z

@absadiki wow, thank you so much for looking into this, I’ll give it a try this weekend when I get some time and will let you know

absadiki · 2025-03-01T20:39:07Z

You are welcome :)
Sure, take your time.

paolorechia · 2025-03-02T12:35:28Z

@absadiki testing...

from pywhispercpp.model import Model

WHISPER_MODEL = "/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin"

model = Model(WHISPER_MODEL, use_openvino=True)

segments = model.transcribe("test_sound.wav")
print(segments)

It transcribes but logs an error on initializing the encoder:

whisper_init_from_file_with_params_no_state: loading model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   10.49 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   36.13 MB
whisper_init_state: compute buffer (encode) =  212.29 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =   99.10 MB
whisper_ctx_init_openvino_encoder_with_state: loading OpenVINO model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder_with_state: first run on a device may take a while ...
whisper_openvino_init: path_model = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml, device = CPU, cache_dir = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino-cache
in openvino encoder compile routine: exception: Check 'false' failed at src/inference/src/core.cpp:100:
[ NETWORK_NOT_READ ] Unable to read the model: /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml Please check that model format: xml is supported and the model is correct. Available frontends: 

whisper_ctx_init_openvino_encoder_with_state: failed to init OpenVINO encoder from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml'
Progress:   0%
Progress: 100%
[t0=0, t1=300, text=Let's record this video again]

paolorechia · 2025-03-02T12:39:11Z

Seems like it's trying to load the model from the wrong path. Tried also with a bit cleaner code snippet

from pywhispercpp.model import Model

WHISPER_MODEL = "large-v3-turbo"

model = Model(WHISPER_MODEL, use_openvino=True)

segments = model.transcribe("test_sound.wav")
print(segments)

paolo@paolo-Inspiron-14-Plus-7440:~/dev/whisper-notes/whisper-notes-server$ ls -lh /home/paolo/.local/share/pywhis
percpp/models/
total 2.6G
-rw-rw-r-- 1 paolo paolo 1.6G Feb 24 21:36 ggml-large-v3-turbo.bin
drwxr-xr-x 2 paolo paolo 4.0K Mar  2 13:38 ggml-large-v3-turbo-encoder-openvino-cache
-rw-rw-r-- 1 paolo paolo 548M Feb 24 21:54 ggml-large-v3-turbo-q5_0.bin
-rw-rw-r-- 1 paolo paolo 515M Feb 24 22:03 ggml-medium-q5_0.bin

paolorechia · 2025-03-02T12:40:40Z

@absadiki Actually hold on, I recall I had to manually convert the models, let me copying those to pywhispercpp folder

paolorechia · 2025-03-02T12:44:10Z

No, no luck. I tried copying *openvino* files ( cp -r *openvino* /home/paolo/.local/share/pywhispercpp/models/) that I had in my whisper.cpp:

paolo@paolo-Inspiron-14-Plus-7440:~/dev/whisper-notes/whisper-notes-server$ ls -lh /home/paolo/.local/share/pywhispercpp/models/
total 5.0G
-rw-rw-r-- 1 paolo paolo 2.0K Mar  2 13:41 convert-whisper-to-openvino.py
drwxr-xr-x 2 paolo paolo 4.0K Mar  2 13:41 ggml-base.en-encoder-openvino-cache
-rw-rw-r-- 1 paolo paolo 1.6G Feb 24 21:36 ggml-large-v3-turbo.bin
-rw-rw-r-- 1 paolo paolo 2.4G Mar  2 13:41 ggml-large-v3-turbo-encoder-openvino.bin
drwxr-xr-x 2 paolo paolo 4.0K Mar  2 13:41 ggml-large-v3-turbo-encoder-openvino-cache
-rw-rw-r-- 1 paolo paolo 1.5M Mar  2 13:41 ggml-large-v3-turbo-encoder-openvino.xml
-rw-rw-r-- 1 paolo paolo 548M Feb 24 21:54 ggml-large-v3-turbo-q5_0.bin
-rw-rw-r-- 1 paolo paolo 515M Feb 24 22:03 ggml-medium-q5_0.bin
drwxrwxr-x 6 paolo paolo 4.0K Mar  2 13:42 openvino_conv_env
-rw-rw-r-- 1 paolo paolo   41 Mar  2 13:42 requirements-openvino.txt

But I still get the same error. Unfortunately I don't know enough to debug this further.

paolorechia · 2025-03-02T12:47:17Z

I think I'll drop making this work with OpenVINO for now and just stick to OpenBlas, it's good enough for my use case.

absadiki · 2025-03-02T21:36:42Z

@paolorechia,

But I still get the same error. Unfortunately I don't know enough to debug this further.

What errors are you encountering after placing all the models in the same folder? I think you were almost there.

I thought the instructions in the whisper.cpp README were clear enough, but here is a step-by-step guide on how to make it work, at least how it worked on my end:

Download a ggml model to a custom location.
Generate the IR xml encoder using the python script
Place the XML file in the same folder as the ggml model. At runtime, whisper.cpp will look for this XML file in the same folder (similar to the whisper-cli). (Alternatively, you can use openvino_model_path to provide a custom path):

model = Model('path/to/ggml.bin', use_openvino=True, openvino_model_path='path/to/xml')

source setupvars.sh
python your_script.py

BTW, if you don't need to convert the models yourself, you can find some pre-converted models in the intel HF repo
Download the zip folder and extract it to a custom location, and point the model class to this ggml.bin model.

Let me know if you need any further help!

paolorechia · 2025-03-03T08:00:38Z

What errors are you encountering after placing all the models in the same folder?

in openvino encoder compile routine: exception: Check 'false' failed at src/inference/src/core.cpp:100:
[ NETWORK_NOT_READ ] Unable to read the model: /home/paolo/.local/share/pywhispercpp/models/ggml-large-v3-turbo-encoder-openvino.xml Please check that model format: xml is supported and the model is correct. Available frontends: 

whisper_ctx_init_openvino_encoder_with_state: failed to init OpenVINO encoder from '/home/paolo/.local/share/pywhispercpp/models/ggml-large-v3-turbo-encoder-openvino.xml'

paolorechia · 2025-03-03T08:17:14Z

As far as I can trace, it crashes inside whisper.cpp inside this file

openvino/whisper-openvino-encoder.cpp

Somewhere inside this code, as the logged exception matches the string in the caught exception below:

   23     try {
   24         ov::Core core;
   25 
   26         if (cache_dir) {
   27             // enables caching of device-specific 'blobs' during core.compile_model
   28             // routine. This speeds up calls to compile_model for successive runs.
   29             core.set_property(ov::cache_dir(cache_dir));
   30         }
   31 
   32         //Read the OpenVINO encoder IR (.xml/.bin) from disk, producing an ov::Model object.
   33         std::shared_ptr<ov::Model> model = core.read_model(path_model);
   34 
   35         // Produce a compiled-model object, given the device ("CPU", "GPU", etc.)
   36         auto compiledModel = core.compile_model(model, device);
   37 
   38         // From the compiled model object, create an infer request. This is the thing that we
   39         //  we will use later on to trigger inference execution.
   40         context->inferRequest = compiledModel.create_infer_request();
   41     }
   42     catch (const std::exception& error) {
   43         std::cout << "in openvino encoder compile routine: exception: " << error.what() << std::endl;
   44         delete context;
   45         context = nullptr;
   46     }

And this is likely the logged error

[ NETWORK_NOT_READ ] Unable to read the model: /home/paolo/models/ggml-large-v3-encoder-openvino.xml Please check that model format: xml is supported and the model is correct. Available frontends:

This doesn't make any sense to me, as the file exists and it worked with the original whisper.cpp.

I also tried downloading the some files from the HF link you sent me, but it still doesn't work. It seems something is wrong either with my python environment (I'm using uv) or when compiling the bindings.

paolorechia · 2025-03-03T08:41:44Z

There's also a check failed mentioning this:

https://github.com/openvinotoolkit/openvino/blob/b4452d5630442e91cf84db5acd3d991f3d1f34c2/src/inference/src/core.cpp#L100C5-L100C27

Which just confirms it's unable to read the model

paolorechia · 2025-03-03T08:51:15Z

Anyways, my hunch is something is wrong with my installed OpenVINO environment that is linked to pywhispercpp. Maybe there's a version mismatch somewhere, or I am somehow missing the XML plugin to read the file. I cannot tell for sure, and I'm out of time to keep on debugging this.

I guess if OpenVINO worked for you we can close this issue, I won't be investing much more time now.

paolorechia added 2 commits February 24, 2025 21:25

Update README with OpenBlas support

1a437e8

Add section for OpenVINO

1e286c6

Fix typo

466f4bc

paolorechia marked this pull request as ready for review February 26, 2025 07:38

Change instruction from uv to pip

d2368e6

paolorechia closed this Mar 3, 2025

paolorechia reopened this Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README with OpenBlas support #105

Update README with OpenBlas support #105

paolorechia commented Feb 24, 2025 •

edited

Loading

absadiki commented Feb 25, 2025

paolorechia commented Feb 25, 2025

absadiki commented Feb 26, 2025

paolorechia commented Feb 26, 2025 •

edited

Loading

paolorechia commented Feb 26, 2025

absadiki commented Feb 28, 2025

paolorechia commented Mar 1, 2025

absadiki commented Mar 1, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

absadiki commented Mar 2, 2025

paolorechia commented Mar 3, 2025

paolorechia commented Mar 3, 2025

paolorechia commented Mar 3, 2025

paolorechia commented Mar 3, 2025 •

edited

Loading

Update README with OpenBlas support #105

Are you sure you want to change the base?

Update README with OpenBlas support #105

Conversation

paolorechia commented Feb 24, 2025 • edited Loading

absadiki commented Feb 25, 2025

paolorechia commented Feb 25, 2025

absadiki commented Feb 26, 2025

paolorechia commented Feb 26, 2025 • edited Loading

paolorechia commented Feb 26, 2025

absadiki commented Feb 28, 2025

paolorechia commented Mar 1, 2025

absadiki commented Mar 1, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

paolorechia commented Mar 2, 2025

absadiki commented Mar 2, 2025

paolorechia commented Mar 3, 2025

paolorechia commented Mar 3, 2025

paolorechia commented Mar 3, 2025

paolorechia commented Mar 3, 2025 • edited Loading

paolorechia commented Feb 24, 2025 •

edited

Loading

paolorechia commented Feb 26, 2025 •

edited

Loading

paolorechia commented Mar 3, 2025 •

edited

Loading