End-to-end Automatic Speech Recognition Systems - PyTorch Implementation

For complete introdution and usage, please see the original repository Alexander-H-Liu/End-to-end-ASR-Pytorch.

New features

Added layer-wise transfer learning
Supports multiple development sets
Supports FreqCNN (frequency-divided CNN extractor) for whispered speech recognition.
Supports DLHLP corpus for the course Deep Learning for Human Language Processing

Instructions

Training

Modify script/train.sh, script/train_lm.sh, config/librispeech_asr.yaml, and config/librispeech_lm.yaml first. GPU is required.

bash script/train.sh <asr name> <cuda id>
bash script/train_lm.sh <lm name> <cuda id>

Testing

Modify script/test.sh and config/librispeech_test.sh first. Increase the number of --njobs can speed up decoding process, but might cause OOM.

bash script/test.sh <asr name> <cuda id>

LibriSpeech 100hr Baseline

This baseline is composed of a character-based joint CTC-attention ASR model and an RNNLM which were trained on the LibriSpeech train-clean-100. The perplexity of the LM on the dev-clean set is 3.66.

Decoding	DEV WER(%)	TEST WER(%)
Greedy	25.4	25.9

DLHLP Baseline

This baseline is composed of a character-based joint CTC-attention ASR model and an RNN-LM which were trained on the DLHLP training set.

Decoding	DEV CER/WER(%)	TEST CER/WER(%)
SpecAugment + Greedy	1.0 / 3.4	0.8 / 3.1
SpecAugment + Beam=5	0.8 / 2.9	0.7 / 2.6

TODO

CTC beam decoding (testing)
SpecAugment (will be released)
Multiple corpora training (will be released)
Support of WSJ and Switchboard dataset (under construction)
Combination of CTC and RNN-LM: RNN transducer (under construction)

Citation

@inproceedings{liu2019adversarial,
  title={Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model},
  author={Liu, Alexander and Lee, Hung-yi and Lee, Lin-shan},
  booktitle={International Conference on Speech RecognitionAcoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  organization={IEEE}
}

@inproceedings{alex2019sequencetosequence,
    title={Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding},
    author={Alexander H. Liu and Tzu-Wei Sung and Shun-Po Chuang and Hung-yi Lee and Lin-shan Lee},
    booktitle={International Conference on Speech RecognitionAcoustics, Speech and Signal Processing (ICASSP)},
    year={2020},
    organization={IEEE}
}

@inproceedings{chang2020endtoend,
    title={End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training},
    author={Heng-Jui Chang and Alexander H. Liu and Hung-yi Lee and Lin-shan Lee},
    booktitle={Spoken Language Technology Workshop (SLT)},
    year={2021},
    organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
bin		bin
config		config
corpus		corpus
script		script
src		src
tests		tests
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
eval_beam.py		eval_beam.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-end Automatic Speech Recognition Systems - PyTorch Implementation

New features

Instructions

Training

Testing

LibriSpeech 100hr Baseline

DLHLP Baseline

TODO

Citation

About

Releases

Packages

Languages

License

faliwang/End-to-end-ASR-Pytorch-DLHLP

Folders and files

Latest commit

History

Repository files navigation

End-to-end Automatic Speech Recognition Systems - PyTorch Implementation

New features

Instructions

Training

Testing

LibriSpeech 100hr Baseline

DLHLP Baseline

TODO

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages