Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README with OpenBlas support #105

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

paolorechia
Copy link

@paolorechia paolorechia commented Feb 24, 2025

This should work and provide some speedup depending on your CPU (tested compiling on original whisper.cpp repo: https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#blas-cpu-support-via-openblas)

Package installs without any issues.

TODO: test bindings work fine.

@absadiki
Copy link
Owner

Thanks @paolorechia for taking the time to test the package on the other backends.
You can run the test suite on the test folder to check if everything is Okey.
Let me know if it works and I'll go ahead and merge this PR.

@paolorechia
Copy link
Author

Thanks @paolorechia for taking the time to test the package on the other backends. You can run the test suite on the test folder to check if everything is Okey. Let me know if it works and I'll go ahead and merge this PR.

Oh, thanks for letting me know, that’s useful. I was testing with my own code.

Will run the test suite when I get the chance.

I was able to compile with both OpenBLAS and OpenVINO flags but it doesn’t seem like the pybindings load the OpenVINO model automatically like the original whisper-cpp does - or I did something wrong.

Any ideas if your library supports OpenVINO?

@absadiki
Copy link
Owner

I haven't tested the library with OpenVINO, but if the compilation was successful then it'll probably work.
According to the documentation, I don't think whisper.cpp loads the openVINO model automatically, you need to convert it to the IR representation first and place it in the same folder as the ggml model to be loaded at runtime.

@paolorechia
Copy link
Author

paolorechia commented Feb 26, 2025

@absadiki you're absolutely right, I was able to convert a model and load it with whisper-cpp cli and whisper-server. So maybe if I just copy the converted model to the right path the pywhispercpp will pick it up?

Tests are passing
image

@paolorechia paolorechia marked this pull request as ready for review February 26, 2025 07:38
@paolorechia
Copy link
Author

Unfortunately, I don't think it's picking up the OpenVINO model automatically:

from pywhispercpp.model import Model

WHISPER_MODEL = "/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin"
Model(WHISPER_MODEL)
whisper_init_from_file_with_params_no_state: loading model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   31.46 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   36.13 MB
whisper_init_state: compute buffer (encode) =  926.53 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =   99.10 MB
whisper_full_with_state: input is too short - 950 ms < 1000 ms. consider padding the input audio with silence

Where running the CLI loads it instead:

./build/bin/whisper-cli -m /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin -f short-test.wav -t 16
whisper_init_from_file_with_params_no_state: loading model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init_gpu: no GPU found
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   10.49 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   36.13 MB
whisper_init_state: compute buffer (encode) =  212.29 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =   99.10 MB
whisper_ctx_init_openvino_encoder_with_state: loading OpenVINO model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder_with_state: first run on a device may take a while ...
whisper_openvino_init: path_model = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml, device = CPU, cache_dir = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino-cache
whisper_ctx_init_openvino_encoder_with_state: OpenVINO model loaded

@absadiki
Copy link
Owner

@paolorechia, I've checked the whisper-cli source code, I thought that openvino was similar to the other backends but it turns out that we need to initialize the openvino encoder manually. Please checkout to this branch and give it a try.
I've added the openvino parameters to the model class, you can use it as follows:

model = Model(model_path, use_openvino=True) # this will work similarly to the whisper-cli

Let me know if you find any issues ?

@paolorechia
Copy link
Author

@absadiki wow, thank you so much for looking into this, I’ll give it a try this weekend when I get some time and will let you know

@absadiki
Copy link
Owner

absadiki commented Mar 1, 2025

You are welcome :)
Sure, take your time.

@paolorechia
Copy link
Author

@absadiki testing...

from pywhispercpp.model import Model

WHISPER_MODEL = "/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin"

model = Model(WHISPER_MODEL, use_openvino=True)

segments = model.transcribe("test_sound.wav")
print(segments)

It transcribes but logs an error on initializing the encoder:

whisper_init_from_file_with_params_no_state: loading model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   10.49 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   36.13 MB
whisper_init_state: compute buffer (encode) =  212.29 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =   99.10 MB
whisper_ctx_init_openvino_encoder_with_state: loading OpenVINO model from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder_with_state: first run on a device may take a while ...
whisper_openvino_init: path_model = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml, device = CPU, cache_dir = /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino-cache
in openvino encoder compile routine: exception: Check 'false' failed at src/inference/src/core.cpp:100:
[ NETWORK_NOT_READ ] Unable to read the model: /home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml Please check that model format: xml is supported and the model is correct. Available frontends: 

whisper_ctx_init_openvino_encoder_with_state: failed to init OpenVINO encoder from '/home/paolo/dev/whisper-notes/whisper.cpp/models/ggml-large-v3-turbo-encoder-openvino.xml'
Progress:   0%
Progress: 100%
[t0=0, t1=300, text=Let's record this video again]

@paolorechia
Copy link
Author

Seems like it's trying to load the model from the wrong path. Tried also with a bit cleaner code snippet

from pywhispercpp.model import Model

WHISPER_MODEL = "large-v3-turbo"

model = Model(WHISPER_MODEL, use_openvino=True)

segments = model.transcribe("test_sound.wav")
print(segments)
paolo@paolo-Inspiron-14-Plus-7440:~/dev/whisper-notes/whisper-notes-server$ ls -lh /home/paolo/.local/share/pywhis
percpp/models/
total 2.6G
-rw-rw-r-- 1 paolo paolo 1.6G Feb 24 21:36 ggml-large-v3-turbo.bin
drwxr-xr-x 2 paolo paolo 4.0K Mar  2 13:38 ggml-large-v3-turbo-encoder-openvino-cache
-rw-rw-r-- 1 paolo paolo 548M Feb 24 21:54 ggml-large-v3-turbo-q5_0.bin
-rw-rw-r-- 1 paolo paolo 515M Feb 24 22:03 ggml-medium-q5_0.bin

@paolorechia
Copy link
Author

@absadiki Actually hold on, I recall I had to manually convert the models, let me copying those to pywhispercpp folder

@paolorechia
Copy link
Author

No, no luck. I tried copying *openvino* files ( cp -r *openvino* /home/paolo/.local/share/pywhispercpp/models/) that I had in my whisper.cpp:

paolo@paolo-Inspiron-14-Plus-7440:~/dev/whisper-notes/whisper-notes-server$ ls -lh /home/paolo/.local/share/pywhispercpp/models/
total 5.0G
-rw-rw-r-- 1 paolo paolo 2.0K Mar  2 13:41 convert-whisper-to-openvino.py
drwxr-xr-x 2 paolo paolo 4.0K Mar  2 13:41 ggml-base.en-encoder-openvino-cache
-rw-rw-r-- 1 paolo paolo 1.6G Feb 24 21:36 ggml-large-v3-turbo.bin
-rw-rw-r-- 1 paolo paolo 2.4G Mar  2 13:41 ggml-large-v3-turbo-encoder-openvino.bin
drwxr-xr-x 2 paolo paolo 4.0K Mar  2 13:41 ggml-large-v3-turbo-encoder-openvino-cache
-rw-rw-r-- 1 paolo paolo 1.5M Mar  2 13:41 ggml-large-v3-turbo-encoder-openvino.xml
-rw-rw-r-- 1 paolo paolo 548M Feb 24 21:54 ggml-large-v3-turbo-q5_0.bin
-rw-rw-r-- 1 paolo paolo 515M Feb 24 22:03 ggml-medium-q5_0.bin
drwxrwxr-x 6 paolo paolo 4.0K Mar  2 13:42 openvino_conv_env
-rw-rw-r-- 1 paolo paolo   41 Mar  2 13:42 requirements-openvino.txt

But I still get the same error. Unfortunately I don't know enough to debug this further.

@paolorechia
Copy link
Author

I think I'll drop making this work with OpenVINO for now and just stick to OpenBlas, it's good enough for my use case.

@absadiki
Copy link
Owner

absadiki commented Mar 2, 2025

@paolorechia,

But I still get the same error. Unfortunately I don't know enough to debug this further.

What errors are you encountering after placing all the models in the same folder? I think you were almost there.

I thought the instructions in the whisper.cpp README were clear enough, but here is a step-by-step guide on how to make it work, at least how it worked on my end:

  1. Download a ggml model to a custom location.
  2. Generate the IR xml encoder using the python script
  3. Place the XML file in the same folder as the ggml model. At runtime, whisper.cpp will look for this XML file in the same folder (similar to the whisper-cli). (Alternatively, you can use openvino_model_path to provide a custom path):
model = Model('path/to/ggml.bin', use_openvino=True, openvino_model_path='path/to/xml') 
  1. source setupvars.sh
  2. python your_script.py

BTW, if you don't need to convert the models yourself, you can find some pre-converted models in the intel HF repo
Download the zip folder and extract it to a custom location, and point the model class to this ggml.bin model.

Let me know if you need any further help!

@paolorechia
Copy link
Author

What errors are you encountering after placing all the models in the same folder?

in openvino encoder compile routine: exception: Check 'false' failed at src/inference/src/core.cpp:100:
[ NETWORK_NOT_READ ] Unable to read the model: /home/paolo/.local/share/pywhispercpp/models/ggml-large-v3-turbo-encoder-openvino.xml Please check that model format: xml is supported and the model is correct. Available frontends: 

whisper_ctx_init_openvino_encoder_with_state: failed to init OpenVINO encoder from '/home/paolo/.local/share/pywhispercpp/models/ggml-large-v3-turbo-encoder-openvino.xml'

@paolorechia
Copy link
Author

As far as I can trace, it crashes inside whisper.cpp inside this file

openvino/whisper-openvino-encoder.cpp

Somewhere inside this code, as the logged exception matches the string in the caught exception below:

   23     try {
   24         ov::Core core;
   25 
   26         if (cache_dir) {
   27             // enables caching of device-specific 'blobs' during core.compile_model
   28             // routine. This speeds up calls to compile_model for successive runs.
   29             core.set_property(ov::cache_dir(cache_dir));
   30         }
   31 
   32         //Read the OpenVINO encoder IR (.xml/.bin) from disk, producing an ov::Model object.
   33         std::shared_ptr<ov::Model> model = core.read_model(path_model);
   34 
   35         // Produce a compiled-model object, given the device ("CPU", "GPU", etc.)
   36         auto compiledModel = core.compile_model(model, device);
   37 
   38         // From the compiled model object, create an infer request. This is the thing that we
   39         //  we will use later on to trigger inference execution.
   40         context->inferRequest = compiledModel.create_infer_request();
   41     }
   42     catch (const std::exception& error) {
   43         std::cout << "in openvino encoder compile routine: exception: " << error.what() << std::endl;
   44         delete context;
   45         context = nullptr;
   46     }

And this is likely the logged error

[ NETWORK_NOT_READ ] Unable to read the model: /home/paolo/models/ggml-large-v3-encoder-openvino.xml Please check that model format: xml is supported and the model is correct. Available frontends: 

This doesn't make any sense to me, as the file exists and it worked with the original whisper.cpp.

I also tried downloading the some files from the HF link you sent me, but it still doesn't work. It seems something is wrong either with my python environment (I'm using uv) or when compiling the bindings.

@paolorechia paolorechia closed this Mar 3, 2025
@paolorechia paolorechia reopened this Mar 3, 2025
@paolorechia
Copy link
Author

There's also a check failed mentioning this:

https://github.com/openvinotoolkit/openvino/blob/b4452d5630442e91cf84db5acd3d991f3d1f34c2/src/inference/src/core.cpp#L100C5-L100C27

Which just confirms it's unable to read the model

@paolorechia
Copy link
Author

paolorechia commented Mar 3, 2025

Anyways, my hunch is something is wrong with my installed OpenVINO environment that is linked to pywhispercpp. Maybe there's a version mismatch somewhere, or I am somehow missing the XML plugin to read the file. I cannot tell for sure, and I'm out of time to keep on debugging this.

I guess if OpenVINO worked for you we can close this issue, I won't be investing much more time now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants