- 📝: The paper related to this project is available on arXiv on this link.
- 🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
- 🌐: For more information about the project, visit our project page.
- 🏆: Explore Leaderboards on Papers With Code.
pip install -r requirements.txt
- 🟣:
headset_microphone
( Not available for Bandwidth Extension as it is the reference mic ) - 🟡:
throat_microphone
- 🟢:
forehead_accelerometer
- 🔵:
rigid_in_ear_microphone
- 🔴:
soft_in_ear_microphone
- 🧊:
temple_vibration_pickup
-
EBEN for Bandwidth Extension
- Train and test on
speech_clean
, for recordings in a quiet environment:python run.py \ lightning_datamodule=bwe \ lightning_datamodule.sensor=throat_microphone \ lightning_module=eben \ lightning_module.generator.p=2 \ +callbacks=[bwe_checkpoint] \ ++trainer.check_val_every_n_epoch=15 \ ++trainer.max_epochs=500
- Train on
speech_clean
mixed withspeechless_noisy
and test onspeech_noisy
, for recordings in a noisy environment: (weights initialized from vibravox_EBEN_models )python run.py \ lightning_datamodule=noisybwe \ lightning_datamodule.sensor=throat_microphone \ lightning_module=eben \ lightning_module.description=from_pretrained-throat_microphone \ ++lightning_module.generator=dummy \ ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \ ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \ ++lightning_module.discriminator=dummy \ ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \ ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \ +callbacks=[bwe_checkpoint] \ ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \ ++trainer.check_val_every_n_epoch=15 \ ++trainer.max_epochs=200
- Train and test on
-
wav2vec2 for Speech to Phoneme
- Train and test on
speech_clean
, for recordings in a quiet environment: (weights initialized from facebook/wav2vec2-base-fr-voxpopuli )
python run.py \ lightning_datamodule=stp \ lightning_datamodule.sensor=throat_microphone \ lightning_module=wav2vec2_for_stp \ lightning_module.optimizer.lr=1e-5 \ ++trainer.max_epochs=10
- Train and test on
speech_noisy
, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )
python run.py \ lightning_datamodule=stp \ lightning_datamodule.sensor=throat_microphone \ lightning_datamodule.subset=speech_noisy \ lightning_datamodule/data_augmentation=aggressive \ lightning_module=wav2vec2_for_stp \ lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \ lightning_module.optimizer.lr=1e-6 \ ++trainer.max_epochs=30
- Train and test on
-
ECAPA2 for Speaker Verification:
- Test the model on
speech_clean
:
python run.py \ lightning_datamodule=spkv \ lightning_module=ecapa2 \ logging=csv \ ++trainer.limit_train_batches=0 \ ++trainer.limit_val_batches=0
- Test on
speech_clean
mixed withspeechless_noisy
, representative ofspeech_noisy
with the exact same pairs that were used onspeech_clean
, allowing direct comparison of results:
python run.py \ lightning_datamodule=spkv \ lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \ lightning_datamodule.subset=speech_noisy_mixed \ lightning_module=ecapa2 \ logging=csv \ ++trainer.limit_train_batches=0 \ ++trainer.limit_val_batches=0
- Test the model on