This repository contains Speechbrain recipes to fine-tune Wav2Vec2 models on a phone classification task. Following factors were analysed:
- Fine-tuning Wav2Vec2,
- Pre-training datasets,
- Model size,
- fine-tuning datasets.
Results of this work have been published at the Interspeech 2024 conference.
- The
recipes
folder contains all Speechbrain recipes. - Results obtained are available in the
confusion-matrix/
folder.
For confidentiality reasons, datasets are not included. This work relies on the C2SI, CommonPhone and BREF corpora.
Details of some of the Speechbrain recipes set up in this repository.
unfrozen-cp-3k-large-accents
is the best recipe published in the Interspeech paper listed below.unfrozen-cp-3k-large-accents-argmax
takes the maximum of all 6 segments (1024-dim). LeakyReLu.unfrozen-cp-3k-large-concatenate
take both central segments (2048-dim) as input to the classifier.
If you use this work, please cite as:
@inproceedings{maisonneuve24,
title = {Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models},
author = {Malo Maisonneuve and Corinne Fredouille and Muriel Lalain and Alain Ghio and Virginie Woisard},
year = {2024},
booktitle = {Interspeech 2024},
pages = {1970--1974},
doi = {10.21437/Interspeech.2024-267},
issn = {2958-1796},
}