Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Source code for EMNLP 2023 paper: Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Data Processing

Take German for example. Firstly, download MuST-C v1.0 archive MUSTC_v1.0_en-de.tar.gz to the ${MUSTC_ROOT} path, and uncompress it:

LANG=de
MUSTC_ROOT=/path/data/en-${LANG}$
tar -xzvf MUSTC_v1.0_en-de.tar.gz

Then, run the script to prepare data manifest.

python3 examples/speech_to_text/prep_mustc_data_raw.py --data-root ${MUSTC_ROOT} \
  --tgt-lang ${LANG}

The generated .tsv should be expanded with the field of source language text and doubled with asr task. Here's some examples from the .tsv file.

id      audio   n_frames        tgt_text        speaker tgt_lang        src_text        src_lang
ted_2529_66     /xxx/en-de/data/train/wav/ted_2529.wav:9517120:61760      61760   Ich hatte den Vorteil einer Perspektive von dieser Breite.  spk.2529        de      I had the benefit of a spectrum this wide.      en
ted_1257_134    /xxx/en-de/data/train/wav/ted_1257.wav:13876160:80960     80960   And outside the library, I wanted to make a place to cultivate your mind.   spk.1257        en      And outside the library, I wanted to make a place to cultivate your mind.       en
...
ted_190_62      /xxx/en-de/data/train/wav/ted_190.wav:7045920:47360       47360   Simple question: if you can't read and write, how do you manage your contact information?   spk.190 en      Simple question: if you can't read and write, how do you manage your contact information?   en
ted_1771_81     /xxx/en-de/data/train/wav/ted_1771.wav:9624320:25600      25600   This is my message to you. spk.1771 en      This is my message to you.      en

The preprocessed directory ${MUSTC_ROOT} should look like as follows:

.
├── en-de
│   ├── config_wave.yaml
│   ├── data
│   ├── dev_wavecif_joint.tsv
│   ├── docs
│   ├── segment
│   ├── spm_unigram10000_st.model
│   ├── spm_unigram10000_st.txt
│   ├── spm_unigram10000_st.vocab
│   ├── train_wavecif_joint.tsv
│   ├── tst-COMMON_wavecif_joint.tsv
│   ├── tst-HE_wavecif_joint.tsv
└── MUSTC_v1.0_en-de.tar.gz

Training

Train an offline model with multitask pretraining.

sh fast_scripts/train_fast_pretraining.sh

Train an offline model with CIF module.

sh fast_scripts/train_fast_offline.sh

Train an offline model with FKD.

sh fast_scripts/train_fast.sh

Evaluation

Offline Translation

sh fast_scripts/eval_offline.sh

Streaming Translation

Note that the offline models need to be converted to support streaming translation task.

sh fast_scripts/convert_fast_offline_to_online.sh

or

sh fast_scripts/convert_fast_to_online.sh

We release ours models for streaming inference:

en-de: checkpoint |config|sentencepiece.model|sentencepiece.vocab|sentencepiece.txt
en-es: checkpoint |config|sentencepiece.model|sentencepiece.vocab|sentencepiece.txt
en-fr: checkpoint |config|sentencepiece.model|sentencepiece.vocab|sentencepiece.txt

Streaming Inference

Vanilla strategy

sh fast_scripts/eval_simul_vanilla.sh

FAI strategy

sh fast_scripts/eval_simul_fai.sh

Citation

If the paper or the code helps you, please cite the paper in the following format :

@inproceedings{fu-etal-2023-adapting,
    title = "Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference",
    author = "Fu, Biao and Liao, Minpeng and Fan, Kai and Huang, Zhongqiang and Chen, Boxing and Chen, Yidong and Shi, Xiaodong",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.1033",
    doi = "10.18653/v1/2023.emnlp-main.1033",
    pages = "16600--16619",
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
fast_scripts		fast_scripts
scripts		scripts
tests		tests
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Data Processing

Training

Evaluation

Offline Translation

Streaming Translation

Citation

About

Releases

Packages

Languages

License

biaofuxmu/FAST

Folders and files

Latest commit

History

Repository files navigation

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Data Processing

Training

Evaluation

Offline Translation

Streaming Translation

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages