wav2vec2mdd

End-to-End Mispronunciation Detection via wav2vec2.0

We provide some useful script for fine-tuning wav2vec2.0 on L2-ARCTIC.(process data/finetune/evaluate) evaluate part are come from https://github.com/cageyoko/CTC-Attention-Mispronunciation

checkpoint/log

Install Requirements

fairseq
Flashlight Python Bindings if you face some problems to install it, you can use aother decode pipeline.
Evaluating the trained model requires tool kaldi

Fine-tune a pre-trained model with CTC

We provide some useful script for fine-tuning wav2vec2.0 on L2-ARCTIC.

Prepare training data manifest

$ python l2_label.py /path/to/waves --dest /manifest/path

Fine-tune a pre-trained model

Edit the run.sh

#!/usr/python/bin/

export CUDA_VISIBLE_DEVICES=1 # GPU device ID
DATASET=/manifest/path

FAIRSEQ_PATH=/path/to/fairseq
valid_subset=valid
model_path=/path/to/pretrain_model.pt  # do not use finetuned model
config_dir=/path/to/config 

config_name=base_finetune # made by reffering https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/config/finetuning/base_10m.yaml
labels=phn
python3 $FAIRSEQ_PATH/fairseq_cli/hydra_train.py \
    distributed_training.distributed_port=0 \
    task.labels=$labels \
    task.data=$DATASET \
    dataset.valid_subset=$valid_subset \
    distributed_training.distributed_world_size=1 \
    model.w2v_path=$model_path \
    --config-dir $config_dir \
    --config-name $config_name

and

$ sh run.sh

Evaluating a CTC model

clone the respository to local

git clone https://github.com/cageyoko/CTC-Attention-Mispronunciation

Edit the evaluate.sh

#!/usr/python/bin/

# Evaluating the CTC model
export CUDA_VISIBLE_DEVICES=0
DATASET=/manifest/path
FAIRSEQ_PATH=/path/to/fairseq

python3 $FAIRSEQ_PATH/examples/speech_recognition/infer.py $DATASET --task audio_pretraining \
--nbest 1 --path /path/to/checkpoints/checkpoint_best.pt --gen-subset test --results-path $DATASET --w2l-decoder viterbi \
--lm-weight 0 --word-score -1 --sil-weight 0 --criterion ctc --labels phn --max-tokens 640000

# Env 
export KALDI_ROOT=/path/to/kaldi
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/irstlm/bin/:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C

# calculate the result of MDD
python3 result.py
align-text ark:ref.txt  ark:annotation.txt ark,t:- | wer_per_utt_details.pl > ref_human_detail
align-text ark:annotation.txt  ark:hypo.txt ark,t:- | wer_per_utt_details.pl > human_our_detail
align-text ark:ref.txt  ark:hypo.txt ark,t:- | wer_per_utt_details.pl > ref_our_detail
python3 ins_del_sub_cor_analysis.py
rm ref_human_detail human_our_detail ref_our_detail

and

$ sh evaluate.sh >> result

What's more

we are going to make wav2vec2-based model to provide diagnose information in near future, Please stay tuned.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
config		config
data		data
README.md		README.md
l2_label.py		l2_label.py
phone39.table		phone39.table
result.py		result.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wav2vec2mdd

checkpoint/log

Install Requirements

Fine-tune a pre-trained model with CTC

Prepare training data manifest

Fine-tune a pre-trained model

Evaluating a CTC model

What's more

About

Releases

Packages

Languages

vocaliodmiku/wav2vec2mdd

Folders and files

Latest commit

History

Repository files navigation

wav2vec2mdd

checkpoint/log

Install Requirements

Fine-tune a pre-trained model with CTC

Prepare training data manifest

Fine-tune a pre-trained model

Evaluating a CTC model

What's more

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages