Convert whisper models to onnx format #238

csukuangfj · 2023-08-05T14:44:57Z

We are trying to support whisper models in sherpa-onnx for non-streaming speech recognition.

In the first step, we have managed to convert the model to onnx format and have successfully tested the exported onnx model in Python using greedy search.

TODOs

Modify kaldi-native-fbank to support features used by whisper
Add C++ implementation for whisper (implement greedy search first)
Video demos

csukuangfj · 2023-08-07T06:31:56Z

Please visit
https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
to try whisper models within your browser

jackwenshann · 2024-02-28T07:27:03Z

你好，如何将自己重官方下载的pytorch_model.bin转换成onnx格式呢。

csukuangfj · 2024-02-28T07:29:52Z

你好，如何将自己重官方下载的pytorch_model.bin转换成onnx格式呢。

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/export-onnx.html
这个是文档。里面有具体的步骤和代码链接

jackwenshann · 2024-02-28T07:30:48Z

好的，我试一下

jackwenshann · 2024-02-28T07:39:22Z

你好，如何将自己重官方下载的pytorch_model.bin转换成onnx格式呢。

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/export-onnx.html 这个是文档。里面有具体的步骤和代码链接

你好，这个脚本是自动在官网下载模型转换的。目前我这边有个模型是基于tiny.en进行微调训练的，想直接将训练出来的bin文件转换成对应的onnx格式，有什么好的方法吗。是否需要先转换成.pt格式，非常期待你的回答，谢谢

csukuangfj · 2024-02-28T08:54:32Z

你好，如何将自己重官方下载的pytorch_model.bin转换成onnx格式呢。

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/export-onnx.html 这个是文档。里面有具体的步骤和代码链接

你好，这个脚本是自动在官网下载模型转换的。目前我这边有个模型是基于tiny.en进行微调训练的，想直接将训练出来的bin文件转换成对应的onnx格式，有什么好的方法吗。是否需要先转换成.pt格式，非常期待你的回答，谢谢

.bin 或者 .pt 只是后缀不同？官网提供的 .bin, 也是用 torch.save() 存储的.

你的模型文件，只要和官方的一样，就可以用我们的脚本导出成 onnx.

你运行的时候，有碰到什么问题么

jackwenshann · 2024-02-28T08:59:26Z

你好，如何将自己重官方下载的pytorch_model.bin转换成onnx格式呢。

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/export-onnx.html 这个是文档。里面有具体的步骤和代码链接

你好，这个脚本是自动在官网下载模型转换的。目前我这边有个模型是基于tiny.en进行微调训练的，想直接将训练出来的bin文件转换成对应的onnx格式，有什么好的方法吗。是否需要先转换成.pt格式，非常期待你的回答，谢谢

.bin 或者 .pt 只是后缀不同？官网提供的 .bin, 也是用 torch.save() 存储的.

你的模型文件，只要和官方的一样，就可以用我们的脚本导出成 onnx.

你运行的时候，有碰到什么问题么

目前只要执行你那边的提供的脚本，会自动取下载官方的模型，强行修改脚本指定我们的模型会提示
dims = ModelDimensions(**checkpoint["dims"])
KeyError: 'dims'错误

csukuangfj · 2024-02-28T09:10:57Z

你的 checkpoint 是如何保存的？

我上面说，你要和官方的完全兼容。

我建议你看下官方提供的 bin 里面，state_dict 包含什么

ziggy1209 · 2024-03-25T08:51:35Z

你好，如何将自己重官方下载的pytorch_model.bin转换成onnx格式呢。

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/export-onnx.html 这个是文档。里面有具体的步骤和代码链接

你好，这个脚本是自动在官网下载模型转换的。目前我这边有个模型是基于tiny.en进行微调训练的，想直接将训练出来的bin文件转换成对应的onnx格式，有什么好的方法吗。是否需要先转换成.pt格式，非常期待你的回答，谢谢

.bin 或者 .pt 只是后缀不同？官网提供的 .bin, 也是用 torch.save() 存储的.
你的模型文件，只要和官方的一样，就可以用我们的脚本导出成 onnx.
你运行的时候，有碰到什么问题么

目前只要执行你那边的提供的脚本，会自动取下载官方的模型，强行修改脚本指定我们的模型会提示 dims = ModelDimensions(**checkpoint["dims"]) KeyError: 'dims'错误

Hi there!
Just encountered the same problem while exporting a model distilled by myself to onnx. Looks like there's a mismatch between the model compatible with whisper.load_model() and the model saved by the distillation script.
The keys of a model compatible with whisper.load_model() are ['dims','model_state_dict'], whereas the distilled model is comprised of its state dict ONLY.
Guess a manual update of the keys would do the trick?

csukuangfj · 2024-03-25T09:24:42Z

please see

sherpa-onnx/scripts/whisper/export-onnx.py

Line 315 in 83a10a5

    
                           wget -O distil-medium-en-original-model.bin https://huggingface.co/distil-whisper/distil-medium.en/resolve/main/original-model.bin

you need to find out how original-model.bin is generated.

ziggy1209 · 2024-03-25T09:49:39Z

please see

sherpa-onnx/scripts/whisper/export-onnx.py

Line 315 in 83a10a5

wget -O distil-medium-en-original-model.bin https://huggingface.co/distil-whisper/distil-medium.en/resolve/main/original-model.bin

you need to find out how original-model.bin is generated.

This model can be safely loaded by whisper.load_model(), but any student model generated by create_student_model.py does have a mismatch in keys with the previous one.

csukuangfj · 2024-03-25T11:38:16Z

Here is the data contained in the original_model.bin and also for models from whisper:

(Pdb) p checkpoint.keys()
dict_keys(['dims', 'model_state_dict'])
(Pdb) p checkpoint['dims']
{'n_mels': 80, 'n_vocab': 51865, 'n_audio_ctx': 1500, 'n_audio_state': 384, 'n_audio_head': 6, 'n_audio_layer': 4, 'n_text_ctx': 448, 'n_text_state': 384, 'n_text_head': 6, 'n_text_layer': 4}

(Pdb) p checkpoint['model_state_dict'].keys()
dict_keys(['decoder.positional_embedding', 'encoder.positional_embedding', 'decoder.token_embedding.weight', 'decoder.blocks.0.mlp_ln.weight', 'decoder.blocks.0.mlp_ln.bias', 'decoder.blocks.0.mlp.0.weight', 'decoder.blocks.0.mlp.0.bias', 'decoder.blocks.0.mlp.2.weight', 'decoder.blocks.0.mlp.2.bias', 'decoder.blocks.0.attn_ln.weight', 'decoder.blocks.0.attn_ln.bias', 'decoder.blocks.0.attn.query.weight', 'decoder.blocks.0.attn.query.bias', 'decoder.blocks.0.attn.key.weight', 'decoder.blocks.0.attn.value.weight', 'decoder.blocks.0.attn.value.bias', 'decoder.blocks.0.attn.out.weight', 'decoder.blocks.0.attn.out.bias', 'decoder.blocks.0.cross_attn_ln.weight', 'decoder.blocks.0.cross_attn_ln.bias', 'decoder.blocks.0.cross_attn.query.weight', 'decoder.blocks.0.cross_attn.query.bias', 'decoder.blocks.0.cross_attn.key.weight', 'decoder.blocks.0.cross_attn.value.weight', 'decoder.blocks.0.cross_attn.value.bias', 'decoder.blocks.0.cross_attn.out.weight', 'decoder.blocks.0.cross_attn.out.bias', 'decoder.blocks.1.mlp_ln.weight', 'decoder.blocks.1.mlp_ln.bias', 'decoder.blocks.1.mlp.0.weight', 'decoder.blocks.1.mlp.0.bias', 'decoder.blocks.1.mlp.2.weight', 'decoder.blocks.1.mlp.2.bias', 'decoder.blocks.1.attn_ln.weight', 'decoder.blocks.1.attn_ln.bias', 'decoder.blocks.1.attn.query.weight', 'decoder.blocks.1.attn.query.bias', 'decoder.blocks.1.attn.key.weight', 'decoder.blocks.1.attn.value.weight', 'decoder.blocks.1.attn.value.bias', 'decoder.blocks.1.attn.out.weight', 'decoder.blocks.1.attn.out.bias', 'decoder.blocks.1.cross_attn_ln.weight', 'decoder.blocks.1.cross_attn_ln.bias', 'decoder.blocks.1.cross_attn.query.weight', 'decoder.blocks.1.cross_attn.query.bias', 'decoder.blocks.1.cross_attn.key.weight', 'decoder.blocks.1.cross_attn.value.weight', 'decoder.blocks.1.cross_attn.value.bias', 'decoder.blocks.1.cross_attn.out.weight', 'decoder.blocks.1.cross_attn.out.bias', 'decoder.blocks.2.mlp_ln.weight', 'decoder.blocks.2.mlp_ln.bias', 'decoder.blocks.2.mlp.0.weight', 'decoder.blocks.2.mlp.0.bias', 'decoder.blocks.2.mlp.2.weight', 'decoder.blocks.2.mlp.2.bias', 'decoder.blocks.2.attn_ln.weight', 'decoder.blocks.2.attn_ln.bias', 'decoder.blocks.2.attn.query.weight', 'decoder.blocks.2.attn.query.bias', 'decoder.blocks.2.attn.key.weight', 'decoder.blocks.2.attn.value.weight', 'decoder.blocks.2.attn.value.bias', 'decoder.blocks.2.attn.out.weight', 'decoder.blocks.2.attn.out.bias', 'decoder.blocks.2.cross_attn_ln.weight', 'decoder.blocks.2.cross_attn_ln.bias', 'decoder.blocks.2.cross_attn.query.weight', 'decoder.blocks.2.cross_attn.query.bias', 'decoder.blocks.2.cross_attn.key.weight', 'decoder.blocks.2.cross_attn.value.weight', 'decoder.blocks.2.cross_attn.value.bias', 'decoder.blocks.2.cross_attn.out.weight', 'decoder.blocks.2.cross_attn.out.bias', 'decoder.blocks.3.mlp_ln.weight', 'decoder.blocks.3.mlp_ln.bias', 'decoder.blocks.3.mlp.0.weight', 'decoder.blocks.3.mlp.0.bias', 'decoder.blocks.3.mlp.2.weight', 'decoder.blocks.3.mlp.2.bias', 'decoder.blocks.3.attn_ln.weight', 'decoder.blocks.3.attn_ln.bias', 'decoder.blocks.3.attn.query.weight', 'decoder.blocks.3.attn.query.bias', 'decoder.blocks.3.attn.key.weight', 'decoder.blocks.3.attn.value.weight', 'decoder.blocks.3.attn.value.bias', 'decoder.blocks.3.attn.out.weight', 'decoder.blocks.3.attn.out.bias', 'decoder.blocks.3.cross_attn_ln.weight', 'decoder.blocks.3.cross_attn_ln.bias', 'decoder.blocks.3.cross_attn.query.weight', 'decoder.blocks.3.cross_attn.query.bias', 'decoder.blocks.3.cross_attn.key.weight', 'decoder.blocks.3.cross_attn.value.weight', 'decoder.blocks.3.cross_attn.value.bias', 'decoder.blocks.3.cross_attn.out.weight', 'decoder.blocks.3.cross_attn.out.bias', 'decoder.ln.weight', 'decoder.ln.bias', 'encoder.conv1.weight', 'encoder.conv1.bias', 'encoder.conv2.weight', 'encoder.conv2.bias', 'encoder.blocks.0.mlp_ln.weight', 'encoder.blocks.0.mlp_ln.bias', 'encoder.blocks.0.mlp.0.weight', 'encoder.blocks.0.mlp.0.bias', 'encoder.blocks.0.mlp.2.weight', 'encoder.blocks.0.mlp.2.bias', 'encoder.blocks.0.attn_ln.weight', 'encoder.blocks.0.attn_ln.bias', 'encoder.blocks.0.attn.query.weight', 'encoder.blocks.0.attn.query.bias', 'encoder.blocks.0.attn.key.weight', 'encoder.blocks.0.attn.value.weight', 'encoder.blocks.0.attn.value.bias', 'encoder.blocks.0.attn.out.weight', 'encoder.blocks.0.attn.out.bias', 'encoder.blocks.1.mlp_ln.weight', 'encoder.blocks.1.mlp_ln.bias', 'encoder.blocks.1.mlp.0.weight', 'encoder.blocks.1.mlp.0.bias', 'encoder.blocks.1.mlp.2.weight', 'encoder.blocks.1.mlp.2.bias', 'encoder.blocks.1.attn_ln.weight', 'encoder.blocks.1.attn_ln.bias', 'encoder.blocks.1.attn.query.weight', 'encoder.blocks.1.attn.query.bias', 'encoder.blocks.1.attn.key.weight', 'encoder.blocks.1.attn.value.weight', 'encoder.blocks.1.attn.value.bias', 'encoder.blocks.1.attn.out.weight', 'encoder.blocks.1.attn.out.bias', 'encoder.blocks.2.mlp_ln.weight', 'encoder.blocks.2.mlp_ln.bias', 'encoder.blocks.2.mlp.0.weight', 'encoder.blocks.2.mlp.0.bias', 'encoder.blocks.2.mlp.2.weight', 'encoder.blocks.2.mlp.2.bias', 'encoder.blocks.2.attn_ln.weight', 'encoder.blocks.2.attn_ln.bias', 'encoder.blocks.2.attn.query.weight', 'encoder.blocks.2.attn.query.bias', 'encoder.blocks.2.attn.key.weight', 'encoder.blocks.2.attn.value.weight', 'encoder.blocks.2.attn.value.bias', 'encoder.blocks.2.attn.out.weight', 'encoder.blocks.2.attn.out.bias', 'encoder.blocks.3.mlp_ln.weight', 'encoder.blocks.3.mlp_ln.bias', 'encoder.blocks.3.mlp.0.weight', 'encoder.blocks.3.mlp.0.bias', 'encoder.blocks.3.mlp.2.weight', 'encoder.blocks.3.mlp.2.bias', 'encoder.blocks.3.attn_ln.weight', 'encoder.blocks.3.attn_ln.bias', 'encoder.blocks.3.attn.query.weight', 'encoder.blocks.3.attn.query.bias', 'encoder.blocks.3.attn.key.weight', 'encoder.blocks.3.attn.value.weight', 'encoder.blocks.3.attn.value.bias', 'encoder.blocks.3.attn.out.weight', 'encoder.blocks.3.attn.out.bias', 'encoder.ln_post.weight', 'encoder.ln_post.bias'])
(Pdb)

Please make sure your saved model has a structure like the above dict.

Convert whisper models to onnx format

e54c6fb

csukuangfj added the ready label Aug 5, 2023

csukuangfj added 21 commits August 6, 2023 17:18

Add model config for whisper

c46477a

Begin to add recognizer for whisper

b557b85

Add encoder.forward and decoder.forward

e9b593e

first working version for greedy search

76da234

Add base64 decode

518592b

add greedy search decoder for whisper

a336cef

update export script

ccd7546

Fix building on Linux

34ee69a

Merge remote-tracking branch 'origin/convert-whisper' into whisper

4b8b924

add int8 models

23f7b8e

Add CI to export whisper to onnx

c3f6592

small fixes

b234841

small fixes

13cc99e

upload to huggingface

950724d

upload to huggingface

c839506

small fixes

77b21f3

small fixes

283669c

Update Python API for whisper

4442760

small fixes

8d1421c

minor fixes

8f84a11

minor fixes

8dcac1a

csukuangfj merged commit 45b9d4a into k2-fsa:master Aug 7, 2023

csukuangfj deleted the convert-whisper branch August 7, 2023 04:34

This was referenced Aug 8, 2023

[Help wanted] Modify MFC examples to support whisper models #241

Closed

[Help wanted] Update C API to support whisper models #242

Closed

XiaYucca pushed a commit to XiaYucca/sherpa-onnx that referenced this pull request Jan 9, 2025

Support whisper models (k2-fsa#238)

a537d12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert whisper models to onnx format #238

Convert whisper models to onnx format #238

csukuangfj commented Aug 5, 2023 •

edited

Loading

csukuangfj commented Aug 7, 2023

jackwenshann commented Feb 28, 2024

csukuangfj commented Feb 28, 2024

jackwenshann commented Feb 28, 2024

jackwenshann commented Feb 28, 2024

csukuangfj commented Feb 28, 2024

jackwenshann commented Feb 28, 2024

csukuangfj commented Feb 28, 2024

ziggy1209 commented Mar 25, 2024

csukuangfj commented Mar 25, 2024

ziggy1209 commented Mar 25, 2024

csukuangfj commented Mar 25, 2024

Convert whisper models to onnx format #238

Convert whisper models to onnx format #238

Conversation

csukuangfj commented Aug 5, 2023 • edited Loading

TODOs

csukuangfj commented Aug 7, 2023

jackwenshann commented Feb 28, 2024

csukuangfj commented Feb 28, 2024

jackwenshann commented Feb 28, 2024

jackwenshann commented Feb 28, 2024

csukuangfj commented Feb 28, 2024

jackwenshann commented Feb 28, 2024

csukuangfj commented Feb 28, 2024

ziggy1209 commented Mar 25, 2024

csukuangfj commented Mar 25, 2024

ziggy1209 commented Mar 25, 2024

csukuangfj commented Mar 25, 2024

csukuangfj commented Aug 5, 2023 •

edited

Loading