diff --git a/README.md b/README.md index 414b65c..466ba6a 100644 --- a/README.md +++ b/README.md @@ -228,6 +228,23 @@ INTERSPEECH 2024 Papers: A complete collection of influential and exciting resea Videos + + + Speaker Diarization + + + Papers + + + Preprints + + + Open Code + + + Videos + + diff --git a/sections/2024/main/speaker-diarization.md b/sections/2024/main/speaker-diarization.md new file mode 100644 index 0000000..e3818fb --- /dev/null +++ b/sections/2024/main/speaker-diarization.md @@ -0,0 +1,51 @@ +# INTERSPEECH-2024-Papers + + + + + + + + + + +
Application + + App + +
Previous Collections + + Conference + +
+ +
+ + + + + + + + + +
+ +## Speaker Diarization + +![Section Papers](https://img.shields.io/badge/Section%20Papers-12-42BA16) ![Preprint Papers](https://img.shields.io/badge/Preprint%20Papers-0-b31b1b) ![Papers with Open Code](https://img.shields.io/badge/Papers%20with%20Open%20Code-0-1D7FBF) ![Papers with Video](https://img.shields.io/badge/Papers%20with%20Video-0-FF0000) + +| **Title** | **Repo** | **Paper** | **Video** | +|-----------|:--------:|:---------:|:---------:| +| [Investigating Confidence Estimation Measures for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/chowdhury24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/chowdhury24_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2406.17124-b31b1b.svg)](https://arxiv.org/abs/2406.17124) | :heavy_minus_sign: | +| [Speakers Unembedded: Embedding-Free Approach to Long-Form Neural Diarization](https://www.isca-archive.org/interspeech_2024/li24x_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/li24x_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2406.18679-b31b1b.svg)](https://arxiv.org/abs/2406.18679) | :heavy_minus_sign: | +| [On the Success and Limitations of Auxiliary Network based Word-Level End-to-End Neural Speaker Diarization](https://www.isca-archive.org/interspeech_2024/huang24d_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/huang24d_interspeech.pdf) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=uMV1W8MmwSw) | +| [EEND-M2F: Masked-Attention Mask Transformers for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/harkonen24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/harkonen24_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2401.12600-b31b1b.svg)](https://arxiv.org/abs/2401.12600) | :heavy_minus_sign: | +| [AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-Step Cross-Attention for Robust Speaker Diarization in the Wild](https://www.isca-archive.org/interspeech_2024/yin24_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://afl-net.github.io/afl-net/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/yin24_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2312.05730-b31b1b.svg)](https://arxiv.org/abs/2312.05730) | :heavy_minus_sign: | +| [Exploiting Wavelet Scattering Transform for an Unsupervised Speaker Diarization in Deep Neural Network Framework](https://www.isca-archive.org/interspeech_2024/arya24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/arya24_interspeech.pdf) | :heavy_minus_sign: | +| [Variable Segment Length and Domain-Adapted Feature Optimization for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/zhang24b_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/zhang24b_interspeech.pdf) | :heavy_minus_sign: | +| [Efficient Speaker Embedding Extraction using a Twofold Sliding Window Algorithm for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/choi24d_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/choi24d_interspeech.pdf) | :heavy_minus_sign: | +| [DiarizationLM: Speaker Diarization Post-Processing with Large Language Models](https://www.isca-archive.org/interspeech_2024/wang24h_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://github.com/google/speaker-id/tree/master/DiarizationLM)
[![GitHub](https://img.shields.io/github/stars/google/speaker-id?style=flat)](https://github.com/google/speaker-id)
[![Hugging Face](https://img.shields.io/badge/🤗-demo-FFD21F.svg)](https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF)
[![Hugging Face](https://img.shields.io/badge/🤗-model-FFD21F.svg)](https://huggingface.co/google/DiarizationLM-8b-Fisher-v2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/wang24h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2401.03506-b31b1b.svg)](https://arxiv.org/abs/2401.03506) | :heavy_minus_sign: | +| [Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-Traffic Control](https://www.isca-archive.org/interspeech_2024/blatt24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/blatt24_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2406.13842-b31b1b.svg)](https://arxiv.org/abs/2406.13842) | :heavy_minus_sign: | +| [On the Calibration of Powerset Speaker Diarization Models](https://www.isca-archive.org/interspeech_2024/plaquet24_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://frenchkrab.github.io/IS2024-powerset-calibration/)
[![GitHub](https://img.shields.io/github/stars/FrenchKrab/IS2024-powerset-calibration?style=flat)](https://github.com/FrenchKrab/IS2024-powerset-calibration) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/plaquet24_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2409.15885-b31b1b.svg)](https://arxiv.org/abs/2409.15885) | :heavy_minus_sign: | +| [Specializing Self-Supervised Speech Representations for Speaker Segmentation](https://www.isca-archive.org/interspeech_2024/baroudi24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/baroudi24_interspeech.pdf) | :heavy_minus_sign: |