ActiveExtract

A pytorch implement of Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech. This paper has been submitted to ICASSP2024.

Arxiv: https://arxiv.org/pdf/2309.08408.pdf

This project aims at real-world speech scenarios where conversations are sparsely overlapped.

Usage

There are three stages to train ActiveExtract

Pretrain an ASD module using TalkSet.

You can train it by yourself according to https://github.com/TaoRuijie/TalkNet-ASD or just load it from a pretrained model (Checkpoint/TalkNet_TalkSet.model).

Pretrain ActiveExtract on highly overlapped speech dataset VoxCeleb2-2Mix.

The ASD module is fixed during this stage

Finetune ActiveExtract on sparsely overlapped speech dataset IEMOCAP-2Mix.

The ASD module is fixed during this stage

You can find trained models in 'Checkpoint' folder.

Demo page

You can find audio samples from this link: https://activeextract.github.io/

Contact Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Checkpoint		Checkpoint
finetune		finetune
pretrain		pretrain
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ActiveExtract

Usage

Demo page

About

Releases

Packages

Languages

mrjunjieli/ActiveExtract

Folders and files

Latest commit

History

Repository files navigation

ActiveExtract

Usage

Demo page

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages