Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime/gpu] Add GPU Hotwords #1860

Merged
merged 20 commits into from
May 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
f72cecd
add Efficient Conformer implementation
zwglory Dec 26, 2022
7719cec
fix trailing whitespace, formatting and semantic
zwglory Dec 26, 2022
c2e5479
Ensures consistency of forward_chunk interface and deletes all runtim…
zwglory Dec 30, 2022
48331bf
Merge branch 'wenet-e2e:main' into main
zwglory Dec 30, 2022
77553d6
[EfficientConformer] add Aishell-1 Results
zwglory Jan 3, 2023
e78ea0b
Merge branch 'wenet-e2e:main' into main
zwglory Feb 1, 2023
d0297f2
Merge branch 'wenet-e2e:main' into main
zwglory Feb 16, 2023
1b9554a
[EfficientConformer] support ONNX GPU export, add librispeech results…
zwglory Feb 21, 2023
ad7529a
Merge branch 'wenet-e2e:main' into main
zwglory Feb 21, 2023
8b12bd9
[Efficient Conformer] add model params in README.
zwglory Feb 22, 2023
39d2c09
fix trailing whitespace
zwglory Feb 22, 2023
7277209
Merge branch 'wenet-e2e:main' into main
zwglory Mar 20, 2023
6be37a7
[Efficient Conformer] remove concat after to simplify the code flow
zwglory Mar 20, 2023
573a3dc
Merge branch 'wenet-e2e:main' into main
zwglory Mar 21, 2023
d6ba7f1
[Efficient Conformer] add huggingface model download link
zwglory Mar 21, 2023
117d965
Merge branch 'wenet-e2e:main' into main
zwglory Apr 26, 2023
8151915
Add GPU hotwords.
zwglory May 18, 2023
c5b36d4
delete dockerfile, and fix from_dlpack.clone() in model_repo_hotwords
zwglory May 19, 2023
dce2743
[gpu hotwords] fix trailing whitespace.
zwglory May 19, 2023
288c22b
[gpu hotwords] remove hotwords directory and merge it into regular mo…
zwglory May 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions runtime/gpu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,92 @@ Our chunksize is 16 * 4 * 10 = 640 ms, so we should care about the perf of laten
* Add language model: set `--lm_path` in the `convert_start_server.sh`. Notice the path of your language model is the path in docker.
* You may refer to `wenet/bin/recognize_onnx.py` to run inference locally. If you want to add language model locally, you may refer to [here](https://github.com/Slyne/ctc_decoder/blob/master/README.md#usage)

#### Add Hotwords
* Add hotwords: If you use offline model, modify `hotwords_path` in `model_repo/scoring/config_template.pbtxt` (`None`->`/ws/model_repo/scoring/hotwords.yaml`). Otherwise, in streaming model, modify hotwords_path (`None`->`/ws/model_repo/wenet/hotwords.yaml`) in `model_repo_stateful/wenet/config_template.pbtxt`.
* Then follow the steps in Instructions to start the hotwords server.

We use `client.py` to test the effect of hotwords on AISHELL-1 Test dataset and AISHELL-1 hostwords sub-testsets.

[AISHELL-1 Test dataset](https://www.openslr.org/33/)
* Test set contains 7176 utterances (5 hours) from 20 speakers.

| model (FP16) | RTF | CER |
|------------------------------|---------|--------|
| offline model w/o hotwords | 0.00437 | 4.6805 |
| offline model w/ hotwords | 0.00428 | 4.5841 |
| streaming model w/o hotwords | 0.01231 | 5.2777 |
| streaming model w/ hotwords | 0.01195 | 5.1850 |

[AISHELL-1 hostwords sub-testsets](https://www.modelscope.cn/datasets/speech_asr/speech_asr_aishell1_hotwords_testsets/summary)

* Test set contains 235 utterances with 187 entities words.

| model (FP16) | Latency (s) | CER | Recall | Precision | F1-score |
|----------------------------|-------------|-------|--------|-----------|----------|
| offline model w/o hotwords | 5.8673 | 13.85 | 0.27 | 0.99 | 0.43 |
| offline model w/ hotwords | 5.6601 | 11.96 | 0.47 | 0.97 | 0.63 |

Decoding result

| Label | hotwords | pred w/o hotwords | pred w/ hotwords |
|----------------------|-----------|------------------------------|------------------------------|
| 以及拥有陈露的女单项目 | 陈露 | 以及拥有**陈鹭**的女单项目 | 以及拥有**陈露**的女单项目 |
| 庞清和佟健终于可以放心地考虑退役的事情了 | 庞清<br/>佟健 | **庞青**和**董建**终于可以放心地考虑退役的事情了 | **庞清**和**佟健**终于可以放心地考虑退役的事情了 |
| 赵继宏老板电器做厨电已经三十多年了 | 赵继宏 | **赵继红**老板电器做厨店已经三十多年了 | **赵继宏**老板电器做厨电已经三十多年了 |

##### Tested ENV
* CPU:40 Core, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
* GPU:NVIDIA GeForce RTX 2080 Ti
* Acoustic model: [20210601_u2++_conformer_exp (AISHELL-1)](https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.md)

Refer to more results: https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/results
##### Hotwords usage
Please refer to the following steps how to use hotwordsboosting.
* Step 1. Initialize HotWordsScorer
```
# if you don't want to use hotwords. set hotwords_scorer=None(default),
# vocab_list is Chinese characters.
hot_words = {'再接': 10, '再厉': -10, '好好学习': 100}
hotwords_scorer = HotWordsScorer(hot_words, vocab_list, is_character_based=True)
```
If you set is_character_based is True (default mode), the first step is to combine Chinese characters into words, if words in hotwords dictionary then add hotwords score. If you set is_character_based is False, all words in the fixed window will be enumerated.

* Step 2. Add hotwords_scorer when decoding
```
result = ctc_beam_search_decoder_batch(batch_chunk_log_prob_seq,
batch_chunk_log_probs_idx,
batch_root_trie,
batch_start,
beam_size, num_processes,
blank_id, space_id,
cutoff_prob, scorer, hotwords_scorer)
```
Please refer to [swig/test/test_zh.py](https://github.com/Slyne/ctc_decoder/blob/master/swig/test/test_zh.py#L108) for how to decode with hotwordsboosting.

##### Hotwords evaluation

Prepare decode result file `with_hotwords_ali.log` and label file `aishell1_text_hotwords`
```
# utt \t text
BAC009S0764W0179 国务院发展研究中心市场经济研究所副所长邓郁松认为
BAC009S0764W0205 本报记者王颖春国家发改委近日发出通知
```

Run the script for evaluation, with multiple result files separated by ';'.
```
cd runtime/gpu/scripts
python compute_hotwords_f1.py \
--label="aishell1_text_hotwords" \
--preds="with_hotwords_ali.log;data/without_hotwords_ali.log" \
--hotword="../model_repo/scoring/hotwords.yaml"
```

Output ner file:
```
BAC009S0764W0179 国务院发展研究中心市场经济研究所副所长邓郁松认为 邓郁松
BAC009S0764W0205 本报记者王颖春国家发改委近日发出通知 王颖春
```

#### Dynamic Left Chunks
For online model, training with dynamic left chunk option on will help further improve the model accuracy.
Let's take a look at the below table.
Expand Down
43 changes: 41 additions & 2 deletions runtime/gpu/model_repo/scoring/1/model.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,27 @@
# Copyright (c) 2021 NVIDIA CORPORATION
# 2023 58.com(Wuba) Inc AI Lab.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import triton_python_backend_utils as pb_utils
import numpy as np
import multiprocessing
from torch.utils.dlpack import from_dlpack
from swig_decoders import ctc_beam_search_decoder_batch, \
Scorer, PathTrie, TrieVector, map_batch
Scorer, HotWordsScorer, PathTrie, TrieVector, map_batch
import json
import os
import yaml

class TritonPythonModel:
"""Your Python model must use the same class name. Every Python model
Expand Down Expand Up @@ -51,6 +67,7 @@ def initialize(self, args):
self.feature_size = encoder_config['dims'][-1]

self.lm = None
self.hotwords_scorer = None
self.init_ctc_rescore(self.model_config['parameters'])
print('Initialized Rescoring!')

Expand All @@ -73,6 +90,8 @@ def init_ctc_rescore(self, parameters):
cutoff_prob = float(value)
elif key == "lm_path":
lm_path = value
elif key == "hotwords_path":
hotwords_path = value
elif key == "alpha":
alpha = float(value)
elif key == "beta":
Expand All @@ -89,6 +108,17 @@ def init_ctc_rescore(self, parameters):
if lm_path and os.path.exists(lm_path):
self.lm = Scorer(alpha, beta, lm_path, vocab)
print("Successfully load language model!")
if hotwords_path and os.path.exists(hotwords_path):
self.hotwords = self.load_hotwords(hotwords_path)
max_order = 4
if self.hotwords is not None:
for w in self.hotwords:
max_order = max(max_order, len(w))
self.hotwords_scorer = HotWordsScorer(self.hotwords, vocab,
window_length=max_order,
SPACE_ID=-2,
is_character_based=True)
print(f"Successfully load hotwords! Hotwords orders = {max_order}")
self.vocabulary = vocab
self.bidecoder = bidecoder
sos = eos = len(vocab) - 1
Expand All @@ -110,6 +140,14 @@ def load_vocab(self, vocab_file):
vocab[id] = char
return id2vocab, vocab

def load_hotwords(self, hotwords_file):
"""
load hotwords.yaml
"""
with open(hotwords_file, 'r', encoding="utf-8") as fin:
configs = yaml.load(fin, Loader=yaml.FullLoader)
return configs

def execute(self, requests):
"""`execute` must be implemented in every Python model. `execute`
function receives a list of pb_utils.InferenceRequest as the only
Expand Down Expand Up @@ -184,7 +222,8 @@ def execute(self, requests):
blank_id=self.blank_id,
space_id=-2,
cutoff_prob=self.cutoff_prob,
ext_scorer=self.lm)
ext_scorer=self.lm,
hotwords_scorer=self.hotwords_scorer)
all_hyps = []
all_ctc_score = []
max_seq_len = 0
Expand Down
12 changes: 7 additions & 5 deletions runtime/gpu/model_repo/scoring/config_template.pbtxt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021 NVIDIA CORPORATION
# 2023 58.com(Wuba) Inc AI Lab.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -26,11 +27,14 @@ parameters [
value: { string_value: "#bidecoder"}
},
{
key: "lm_path"
key: "lm_path",
value: { string_value: "#lm_path"}
},
{
key: "hotwords_path",
value : { string_value: "None"}
}
]

input [
{
name: "encoder_out"
Expand All @@ -54,15 +58,13 @@ input [
dims: [-1, #beam_size]
}
]

output [
{
name: "OUTPUT0"
data_type: TYPE_STRING
dims: [1]
}
]

dynamic_batching {
preferred_batch_size: [ 16, 32 ]
}
Expand Down
Loading