GitHub - slinusc/speaker_identification_evaluation: Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks

Abstract

This study evaluates the performance of three advanced speech encoder models—Wav2Vec 2.0, XLS-R, and Whisper—in speaker identification tasks. By fine-tuning these models and analyzing their layer-wise representations using SVCCA, k-means clustering, and t-SNE visualizations, we found that Wav2Vec 2.0 and XLSR capture speaker-specific features effectively in their early layers, with fine-tuning improving stability and performance. Whisper showed better performance in deeper layers. Additionally, we determined the optimal number of transformer layers for each model when fine-tuned for speaker identification tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
Evaluating_the_Effectiveness_of_Transformer_Layers_in_Wav2Vec_2_0__XLS_R__and_Whisper_for_Speaker_Identification_Tasks_stuhllin_saxermi1 (2).pdf		Evaluating_the_Effectiveness_of_Transformer_Layers_in_Wav2Vec_2_0__XLS_R__and_Whisper_for_Speaker_Identification_Tasks_stuhllin_saxermi1 (2).pdf
README.md		README.md
SVCCA.py		SVCCA.py
XLS-R.ipynb		XLS-R.ipynb
analysis.ipynb		analysis.ipynb
svcaa_layer_wise.py		svcaa_layer_wise.py
w2vec.ipynb		w2vec.ipynb
whisper.ipynb		whisper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

About

Releases

Packages

Contributors 2

Languages

slinusc/speaker_identification_evaluation

Folders and files

Latest commit

History

Repository files navigation

Abstract

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages