-
Notifications
You must be signed in to change notification settings - Fork 563
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Export speaker verification models from NeMo to ONNX
- Loading branch information
1 parent
afc81ec
commit 45c3a76
Showing
10 changed files
with
448 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
.github/workflows/export-nemo-speaker-verification-to-onnx.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
name: export-nemo-speaker-verification-to-onnx | ||
|
||
on: | ||
workflow_dispatch: | ||
|
||
concurrency: | ||
group: export-nemo-speaker-verification-to-onnx-${{ github.ref }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
export-nemo-speaker-verification-to-onnx: | ||
if: github.repository_owner == 'k2-fsa' || github.repository_owner == 'csukuangfj' | ||
name: export nemo speaker verification models to ONNX | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ubuntu-latest] | ||
python-version: ["3.10"] | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
|
||
- name: Setup Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Run | ||
shell: bash | ||
run: | | ||
cd scripts/nemo/speaker-verification | ||
./run.sh | ||
mv -v *.onnx ../../.. | ||
- name: Release | ||
uses: svenstaro/upload-release-action@v2 | ||
with: | ||
file_glob: true | ||
file: ./*.onnx | ||
overwrite: true | ||
repo_name: k2-fsa/sherpa-onnx | ||
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }} | ||
tag: speaker-recongition-models |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Introduction | ||
|
||
This directory contains scripts for exporting models | ||
from [NeMo](https://github.com/NVIDIA/NeMo/) to onnx | ||
so that you can use them in `sherpa-onnx`. | ||
|
||
- [./speaker-verification](./speaker-verification) contains models for speaker verification. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Introduction | ||
|
||
This directory contains script for exporting speaker verification models | ||
from [NeMo](https://github.com/NVIDIA/NeMo/) to onnx | ||
so that you can use them in `sherpa-onnx`. | ||
|
||
Specifically, the following 4 models are exported to `sherpa-onnx` | ||
from | ||
[this page](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speaker_recognition/results.html#speaker-recognition-models): | ||
|
||
- [titanet_large](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/titanet_large), | ||
- [titanet_small](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/titanet_small) | ||
- [speakerverification_speakernet](https://ngc.nvidia.com/catalog/models/nvidia:nemo:speakerverification_speakernet) | ||
- [ecapa_tdnn](https://ngc.nvidia.com/catalog/models/nvidia:nemo:ecapa_tdnn) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
#!/usr/bin/env python3 | ||
# Copyright 2024 Xiaomi Corp. (authors: Fangjun Kuang) | ||
|
||
import argparse | ||
from typing import Dict | ||
|
||
import nemo.collections.asr as nemo_asr | ||
import onnx | ||
import torch | ||
|
||
|
||
def get_args(): | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument( | ||
"--model", | ||
type=str, | ||
required=True, | ||
choices=[ | ||
"speakerverification_speakernet", | ||
"titanet_large", | ||
"titanet_small", | ||
"ecapa_tdnn", | ||
], | ||
) | ||
return parser.parse_args() | ||
|
||
|
||
def add_meta_data(filename: str, meta_data: Dict[str, str]): | ||
"""Add meta data to an ONNX model. It is changed in-place. | ||
Args: | ||
filename: | ||
Filename of the ONNX model to be changed. | ||
meta_data: | ||
Key-value pairs. | ||
""" | ||
model = onnx.load(filename) | ||
for key, value in meta_data.items(): | ||
meta = model.metadata_props.add() | ||
meta.key = key | ||
meta.value = str(value) | ||
|
||
onnx.save(model, filename) | ||
|
||
|
||
@torch.no_grad() | ||
def main(): | ||
args = get_args() | ||
speaker_model_config = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained( | ||
model_name=args.model, return_config=True | ||
) | ||
preprocessor_config = speaker_model_config["preprocessor"] | ||
|
||
print(args.model) | ||
print(speaker_model_config) | ||
print(preprocessor_config) | ||
|
||
assert preprocessor_config["n_fft"] == 512, preprocessor_config | ||
|
||
assert ( | ||
preprocessor_config["_target_"] | ||
== "nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor" | ||
), preprocessor_config | ||
|
||
assert preprocessor_config["frame_splicing"] == 1, preprocessor_config | ||
|
||
speaker_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained( | ||
model_name=args.model | ||
) | ||
speaker_model.eval() | ||
filename = f"nemo_en_{args.model}.onnx" | ||
speaker_model.export(filename) | ||
|
||
print(f"Adding metadata to {filename}") | ||
|
||
comment = "This model is from NeMo." | ||
url = { | ||
"titanet_large": "https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/titanet_large", | ||
"titanet_small": "https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/titanet_small", | ||
"speakerverification_speakernet": "https://ngc.nvidia.com/catalog/models/nvidia:nemo:speakerverification_speakernet", | ||
"ecapa_tdnn": "https://ngc.nvidia.com/catalog/models/nvidia:nemo:ecapa_tdnn", | ||
}[args.model] | ||
|
||
language = "English" | ||
|
||
meta_data = { | ||
"framework": "nemo", | ||
"language": language, | ||
"url": url, | ||
"comment": comment, | ||
"sample_rate": preprocessor_config["sample_rate"], | ||
"output_dim": speaker_model_config["decoder"]["emb_sizes"], | ||
"feature_normalize_type": preprocessor_config["normalize"], | ||
"window_size_ms": int(float(preprocessor_config["window_size"]) * 1000), | ||
"window_stride_ms": int(float(preprocessor_config["window_stride"]) * 1000), | ||
"window_type": preprocessor_config["window"], # e.g., hann | ||
"feat_dim": preprocessor_config["features"], | ||
} | ||
print(meta_data) | ||
add_meta_data(filename=filename, meta_data=meta_data) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
#!/usr/bin/env bash | ||
# Copyright 2024 Xiaomi Corp. (authors: Fangjun Kuang) | ||
|
||
set -ex | ||
|
||
function install_nemo() { | ||
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py | ||
python3 get-pip.py | ||
|
||
pip install torch==2.1.0+cpu torchaudio==2.1.0+cpu -f https://download.pytorch.org/whl/torch_stable.html | ||
|
||
pip install wget text-unidecode matplotlib>=3.3.2 onnx onnxruntime pybind11 Cython einops kaldi-native-fbank soundfile | ||
|
||
sudo apt-get install -q -y sox libsndfile1 ffmpeg python3-pip | ||
|
||
BRANCH='main' | ||
python3 -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr] | ||
} | ||
|
||
install_nemo | ||
|
||
model_list=( | ||
speakerverification_speakernet | ||
titanet_large | ||
titanet_small | ||
# ecapa_tdnn # causes errors, see https://github.com/NVIDIA/NeMo/issues/8168 | ||
) | ||
|
||
for model in ${model_list[@]}; do | ||
python3 ./export-onnx.py --model $model | ||
done | ||
|
||
ls -lh | ||
|
||
function download_test_data() { | ||
wget -q https://github.com/csukuangfj/sr-data/raw/main/test/3d-speaker/speaker1_a_en_16k.wav | ||
wget -q https://github.com/csukuangfj/sr-data/raw/main/test/3d-speaker/speaker1_b_en_16k.wav | ||
wget -q https://github.com/csukuangfj/sr-data/raw/main/test/3d-speaker/speaker2_a_en_16k.wav | ||
} | ||
|
||
download_test_data | ||
|
||
for model in ${model_list[@]}; do | ||
python3 ./test-onnx.py \ | ||
--model nemo_en_${model}.onnx \ | ||
--file1 ./speaker1_a_en_16k.wav \ | ||
--file2 ./speaker1_b_en_16k.wav | ||
|
||
python3 ./test-onnx.py \ | ||
--model nemo_en_${model}.onnx \ | ||
--file1 ./speaker1_a_en_16k.wav \ | ||
--file2 ./speaker2_a_en_16k.wav | ||
done |
Oops, something went wrong.