For complete introdution and usage, please see the original repository Alexander-H-Liu/End-to-end-ASR-Pytorch.
- SpecAugment
- Label Smoothing
- VGG encoder with Layer Normalization
- Learning rate scheduler
- Stabilizing joint acoustic and language model beam decoding by threshold
Download librispeech datset first in OpenSLR website
Modify script/train.sh
, script/train_lm.sh
, config/librispeech_asr.yaml
, and config/librispeech_lm.yaml
first. GPU is required.
bash script/train.sh <asr name> <cuda id>
bash script/train_lm.sh <lm name> <cuda id>
Modify script/test.sh
and config/librispeech_test.sh
first. Increase the number of --njobs
can speed up decoding process, but might cause OOM.
bash script/test.sh <asr name> <cuda id>
This baseline is composed of a character-based joint CTC-attention ASR model and an RNNLM which were trained on the LibriSpeech train-clean-100
. The perplexity of the LM on the dev-clean
set is 2.79.
Decoding | DEV WER(%) | TEST WER(%) |
---|---|---|
Greedy | 14.74 | 14.80 |
B=2 + LM | 12.89 | 12.93 |
B=4 + LM | 11.67 | 11.74 |
B=8 + LM | 11.35 | 11.42 |