This code provides the official PyTorch implementation of the paper "Boosting Speech Enhancement with Clean Self-Supervised Features via Conditional Variational Autoencoders". This work has been submitted to ICASSP 2024.
- Ensure you have the VoiceBank-DEMAND dataset. You can download it here
- Set up the BigVGAN vocoder. Instructions and the pre-trained model ("bigvgan_22khz_80band") can be found here
- Transfer the
__init__.py
file from the./bigvgan_dummy
to the BigVGAN repository and remove the./bigvgan_dummy
folder
# Install required packages
pip install -r requirements.txt
# Create a directory for training logs
mkdir training_log
# Set up symbolic links for your dataset and BigVGAN directories
# Replace 'path_to_dataset' with the actual path to the directory containing the VoiceBank-DEMAND dataset folder
# Replace 'path_to_bigvgan_repo' with the actual path to the BigVGAN repository directory
ln -s path_to_dataset Dataset
ln -s path_to_bigvgan_repo bigvgan
# Preprocess the dataset preparing for the spectrograms and self-supervised features
python preprocessing.py
To train the CVAE model, use the following command:
python train.py --gpu 0 --logdir ssf_cvae
Listen to audio samples in the './audio_samples' directory
This project is licensed under the MIT License.