Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support streaming zipformer CTC #496

Merged
merged 13 commits into from
Dec 22, 2023

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Dec 20, 2023

Usage

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2

./build/bin/sherpa-onnx \
  --debug=1 \
  --zipformer2-ctc-model=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/tokens.txt \
  ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000000.wav \
  ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000001.wav \
  ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000002.wav

The output is given below:

+ ./build/bin/sherpa-onnx --debug=1 --zipformer2-ctc-model=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx --tokens=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/tokens.txt ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000000.wav ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000001.wav ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000002.wav
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx --debug=1 --zipformer2-ctc-model=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx --tokens=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/tokens.txt ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000000.wav ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000001.wav ./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000002.wav 

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model="./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx"), tokens="./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/tokens.txt", num_threads=1, debug=True, provider="cpu", model_type=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=True, max_active_paths=4, hotwords_score=1.5, hotwords_file="", decoding_method="greedy_search")
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/online-zipformer2-ctc-model.cc:Init:262 ---zipformer2_ctc---
num_heads=4,4,4,8,4,4
num_encoder_layers=2,2,3,4,3,2
cnn_module_kernels=31,31,15,15,15,31
model_type=zipformer2
T=45
model_author=k2-fsa
version=1
comment=streaming ctc zipformer2
left_context_len=128,64,32,16,32,64
decode_chunk_len=32
value_head_dims=12,12,12,12,12,12
encoder_dims=192,256,384,512,384,256
onnx.infer=onnxruntime.quant
query_head_dims=32,32,32,32,32,32

encoder_dims: 192 256 384 512 384 256 
query_head_dims: 32 32 32 32 32 32 
value_head_dims: 12 12 12 12 12 12 
num_heads: 4 4 4 8 4 4 
num_encoder_layers: 2 2 3 4 3 2 
cnn_module_kernels: 31 31 15 15 15 31 
left_context_len: 128 64 32 16 32 64 
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/online-zipformer2-ctc-model.cc:Init:298 T: 45
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/online-zipformer2-ctc-model.cc:Init:299 decode_chunk_len_: 32
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/online-zipformer2-ctc-model.cc:Init:300 vocab_size_: 2000
./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000000.wav
Elapsed seconds: 1.1, Real time factor (RTF): 0.19
 对我做了介绍那么我想说的是大家如果对我的研究感兴趣
{"is_final":false, "segment":0, "start_time":0.00, "text": " 对我做了介绍那么我想说的是大家如果对我的研究感兴趣", "timestamps": [0.00, 0.52, 0.76, 0.84, 1.08, 1.24, 1.96, 2.04, 2.24, 2.36, 2.56, 2.68, 2.80, 3.28, 3.40, 3.60, 3.72, 3.84, 3.96, 4.04, 4.16, 4.28, 4.36, 4.60, 4.84], "tokens":[" 对", "我", "做", "了", "介", "绍", "那", "么", "我", "想", "说", "的", "是", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣"]}

./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000001.wav
Elapsed seconds: 1, Real time factor (RTF): 0.2
 重点想谈三个问题首先就是这一轮全球金融动荡的表现
{"is_final":false, "segment":0, "start_time":0.00, "text": " 重点想谈三个问题首先就是这一轮全球金融动荡的表现", "timestamps": [0.00, 0.44, 0.76, 0.96, 1.08, 1.16, 1.40, 1.64, 2.28, 2.44, 2.76, 2.96, 3.20, 3.32, 3.44, 3.64, 3.76, 3.92, 4.04, 4.16, 4.24, 4.28, 4.32, 4.40, 4.60, 4.92], "tokens":[" 重", "点", "想", "谈", "三", "个", "问", "题", "首", "先", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "<0xE8>", "<0x8D>", "<0xA1>", "的", "表", "现"]}

./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000002.wav
Elapsed seconds: 0.96, Real time factor (RTF): 0.21
 所深入地分析这一次全球金融动荡背后的根源
{"is_final":false, "segment":0, "start_time":0.00, "text": " 所深入地分析这一次全球金融动荡背后的根源", "timestamps": [0.00, 0.48, 0.72, 0.84, 1.04, 1.36, 1.72, 2.00, 2.24, 2.68, 2.88, 3.00, 3.16, 3.28, 3.36, 3.40, 3.44, 3.64, 3.76, 3.88, 4.00, 4.32], "tokens":[" 所", "深", "入", "地", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "<0xE8>", "<0x8D>", "<0xA1>", "背", "后", "的", "根", "源"]}

TODOs

  • Add CI tests
  • Support other APIs, e.g., C/Python/Kotlin/Swift/Go/C#/Javascript, Android, iOS, etc.

@csukuangfj csukuangfj merged commit e475e75 into k2-fsa:master Dec 22, 2023
176 of 179 checks passed
@csukuangfj csukuangfj deleted the streaming-zipformer-ctc branch December 22, 2023 05:46
XiaYucca pushed a commit to XiaYucca/sherpa-onnx that referenced this pull request Jan 9, 2025
* Support streaming zipformer CTC

* test online zipformer2 CTC

* Update doc of sherpa-onnx.cc

* Add Python APIs for streaming zipformer2 ctc

* Add Python API examples for streaming zipformer2 ctc

* Swift API for streaming zipformer2 CTC

* NodeJS API for streaming zipformer2 CTC

* Kotlin API for streaming zipformer2 CTC

* Golang API for streaming zipformer2 CTC

* C# API for streaming zipformer2 CTC

* Release v1.9.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant