Offline ST systems for IWSLT 2021

The goal of the Offline Speech Translation Task is to examine automatic methods for translating audio speech in one language into text in the target language and to answer: is the cascaded solution still the dominant technology in spoken language translation? (official website)

Here, we release our systems submitted to IWSLT2021 and show how to evaluate the systems. For more details about the model structure and training datasets, see our system report.

@inproceedings{zhao2021iwslt,
  author       = {Chengqi Zhao and Zhicheng Liu and Jian Tong and Tao Wang 
                    and Mingxuan Wang and Rong Ye and Qianqian Dong and Jun Cao and Lei Li},
  booktitle    = {Proceedings of the 18th International Conference on Spoken Language Translation},
  title        = {The Volctrans Neural Speech Translation System for IWSLT 2021},
  year         = {2021},
}

Results & Models

The major training data for the offline ST task is MuST-C V2. We report results on dev/testsets of both MuST-C V1&V2 as a reference.

ASR

	Transformer ASR (WER) [asr.tgz]
MuST-C v2 dev [hypo]	5.2
MuST-C v2 tst-COM [hypo]	5.7
MuST-C v1 dev [hypo]	10.6
MuST-C v1 tst-COM [hypo]	7.4
iwslt.tst2020 [hypo]	-
iwslt.tst2021 [hypo]	-

MT

We report detokenized BLEU (by sacrebleu toolkit) for MT models.

	MuST-C v2 dev	MuST-C v2 tst-COM	MuST-C v1 dev	MuST-C v1 tst-COM
MT(w/o punc. & lc) [mt1.tgz]	32.0 [hypo] [hypo_notag] [bleu]	34.1 [hypo] [hypo_notag] [bleu]	32.2 [hypo] [hypo_notag] [bleu]	34.0 [hypo] [hypo_notag] [bleu]
MT(w punc. & tc) [mt1.tgz]	33.8 [hypo] [hypo_notag] [bleu]	36.2 [hypo] [hypo_notag] [bleu]	33.7 [hypo] [hypo_notag] [bleu]	35.9 [hypo] [hypo_notag] [bleu]
ensemble MT(w/o punc. & lc) [mt1.tgz, mt2.tgz, mt3.tgz, mt4.tgz]	33.8 [hypo] [hypo_notag] [bleu]	35.2 [hypo] [hypo_notag] [bleu]	33.8 [hypo] [hypo_notag] [bleu]	35.3 [hypo] [hypo_notag] [bleu]
ensemble MT(w punc. & tc) [mt1.tgz, mt2.tgz, mt3.tgz, mt4.tgz]	34.7 [hypo] [hypo_notag] [bleu]	36.7 [hypo] [hypo_notag] [bleu]	34.6 [hypo] [hypo_notag] [bleu]	36.2 [hypo] [hypo_notag] [bleu]

ST

We report detokenized BLEU (by sacrebleu toolkit) for ST models.

The BLEU scores of iwslt.tst2020&2021 are from the IWSLT 2021 organizers. Note that there are two references in 2021, so the results of iwslt.tst2021 mean "BLEU ref2 / BLEU ref1 / BLEU both".

#	SYSTEM	MuST-C v2 dev	MuST-C v2 tst-COM	MuST-C v1 dev	MuST-C v1 tst-COM	iwslt.tst2020	iwslt.tst2021
1	cascade (ASR -> MT)	29.9 [hypo] [hypo_notag] [bleu]	32.1 [hypo] [hypo_notag] [bleu]	28.4 [hypo] [hypo_notag] [bleu]	31.3 [hypo] [hypo_notag] [bleu]	21.0 [hypo] [hypo_notag]	20.3/16.4/27.7 [hypo] [hypo_notag]
2	cascade (ASR -> ensemble MT)	31.7 [hypo] [hypo_notag] [bleu]	33.3 [hypo] [hypo_notag] [bleu]	30.1 [hypo] [hypo_notag] [bleu]	32.3 [hypo] [hypo_notag] [bleu]	22.2 [hypo] [hypo_notag]	21.8/17.1/29.5 [hypo] [hypo_notag]
3	direct ST base [st0.tgz]	23.9 [hypo] [hypo_notag] [bleu]	23.9 [hypo] [hypo_notag] [bleu]	-	-	-	-
4	direct ST [st1.tgz]	28.9 [hypo] [hypo_notag] [bleu]	29.9 [hypo] [hypo_notag] [bleu]	27.9 [hypo] [hypo_notag] [bleu]	29.5 [hypo] [hypo_notag] [bleu]	- [hypo] [hypo_notag]	- [hypo] [hypo_notag]
5	direct ST++ [st2.tgz]	29.6 [hypo] [hypo_notag] [bleu]	30.4 [hypo] [hypo_notag] [bleu]	28.3 [hypo] [hypo_notag] [bleu]	29.7 [hypo] [hypo_notag] [bleu]	21.6 [hypo] [hypo_notag]	20.4/17.0/28.1 [hypo] [hypo_notag]
6	direct ST++* [st3.tgz]	30.0 [hypo] [hypo_notag] [bleu]	30.2 [hypo] [hypo_notag] [bleu]	28.2 [hypo] [hypo_notag] [bleu]	29.6 [hypo] [hypo_notag] [bleu]	- [hypo] [hypo_notag]	- [hypo] [hypo_notag]
7	ensemble (4, 5, 6)	30.4 [hypo] [hypo_notag] [bleu]	31.2 [hypo] [hypo_notag] [bleu]	29.0 [hypo] [hypo_notag] [bleu]	30.6 [hypo] [hypo_notag] [bleu]	22.4 [hypo] [hypo_notag]	21.1/17.5/29.2 [hypo] [hypo_notag]
8	direct ST + fbank2vec-512 [f2v_st.tgz]	28.7 [hypo] [hypo_notag] [bleu]	29.1 [hypo] [hypo_notag] [bleu]	26.7 [hypo] [hypo_notag] [bleu]	27.6 [hypo] [hypo_notag] [bleu]	-	-
9	PMTL-ST + fbank2vec-768 [f2v_pmtl.tgz]	29.6 [hypo] [hypo_notag] [bleu]	29.6 [hypo] [hypo_notag] [bleu]	26.9 [hypo] [hypo_notag] [bleu]	28.1 [hypo] [hypo_notag] [bleu]	-	-
10	PMTL-ST + fbank2vec-768 ++ [f2v_pmtlplus.tgz]	30.8 [hypo] [hypo_notag] [bleu]	31.1 [hypo] [hypo_notag] [bleu]	28.8 [hypo] [hypo_notag] [bleu]	30.1 [hypo] [hypo_notag] [bleu]	-	-
11	PMTL-ST + fbank2vec-768 ++* [f2v_pmtlplus2.tgz]	30.9 [hypo] [hypo_notag] [bleu]	31.1 [hypo] [hypo_notag] [bleu]	28.8 [hypo] [hypo_notag] [bleu]	30.1 [hypo] [hypo_notag] [bleu]	23.5 [hypo] [hypo_notag]	21.6/18.2/30.6 [hypo] [hypo_notag]
12	ensemble (10, 11)	31.0 [hypo] [hypo_notag] [bleu]	31.1 [hypo] [hypo_notag] [bleu]	28.8 [hypo] [hypo_notag] [bleu]	30.1 [hypo] [hypo_notag] [bleu]	-	-
13	ensemble (9, 10, 11)	31.4 [hypo] [hypo_notag] [bleu]	31.5 [hypo] [hypo_notag] [bleu]	29.3 [hypo] [hypo_notag] [bleu]	30.6 [hypo] [hypo_notag] [bleu]	-	-
14	ensemble (8, 9, 10, 11)	31.6 [hypo] [hypo_notag] [bleu]	31.8 [hypo] [hypo_notag] [bleu]	29.5 [hypo] [hypo_notag] [bleu]	30.8 [hypo] [hypo_notag] [bleu]	24.3 [hypo] [hypo_notag]	21.7/18.7/31.3 [hypo] [hypo_notag]

How to reproduce

Here we only introduce how to reproduce the hypothesis and BLEU scores above. For more details about the model structure and training datasets, see our system report, and see speech-to-text recipe for how to train end-to-end ST models with NeurST.

MT

Step 1: download and untar the checkpoint for MT (e.g., mt1/)

Step 2: run

 ./scripts/evaluate_mt.sh mustc-v2-dev mt1/ ./

It will automatically download the test files, translate and generate following files:

mustc_v2.0_en-de.dev.de.hypo.txt: the translations
mustc_v2.0_en-de.dev.de.hypo.notag.txt: the translations without tags such as applause, laughing etc.
mustc_v2.0_en-de.dev.bleu.txt: the BLEU scores

ST (Cascade & E2E)

Step 1: download and untar the checkpoint for ST (e.g., asr/ and mt/ for cascade system, st3/ for end-to-end system)

Step 2: run

# cascade 
./scripts/evaluate_cascade.sh mustc-v2-dev asr/ mt1/ ./

# end-to-end
./scripts/evaluate_e2e.sh mustc-v2-dev st3/ ./

It will also generate the hypothesis files and BLEU scores. The available testsets are:

mustc-v2-dev
mustc-v2-tst
mustc-v1-dev
mustc-v1-tst
tst2020
tst2021

For IWSLT official testsets (tst2020 & tst2021), only hypothesis files are produced.

Additionally, for model ensemble, we can simply provide multiple checkpoint paths separated by a comma, e.g., st1/,st2/,st3/ for the ensemble ST model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OFFLINE.md

OFFLINE.md

Offline ST systems for IWSLT 2021

Results & Models

ASR

MT

ST

How to reproduce

MT

ST (Cascade & E2E)

Files

OFFLINE.md

Latest commit

History

OFFLINE.md

File metadata and controls

Offline ST systems for IWSLT 2021

Results & Models

ASR

MT

ST

How to reproduce

MT

ST (Cascade & E2E)