Skip to content

Commit

Permalink
Summary
Browse files Browse the repository at this point in the history
  • Loading branch information
DmitryRyumin committed Dec 23, 2024
1 parent d9327b7 commit efd0e58
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 0 deletions.
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,23 @@ INTERSPEECH 2024 Papers: A complete collection of influential and exciting resea
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/l2-speech-bilingualism-and-code-switching.md"><img src="https://img.shields.io/badge/0-FF0000" alt="Videos"></a>
</td>
</tr>
<tr>
<td>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md">Speaker Diarization</a>
</td>
<td>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/12-42BA16" alt="Papers"></a>
</td>
<td>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/0-b31b1b" alt="Preprints"></a>
</td>
<td>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/0-1D7FBF" alt="Open Code"></a>
</td>
<td>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/0-FF0000" alt="Videos"></a>
</td>
</tr>
</tbody>
</table>

Expand Down
51 changes: 51 additions & 0 deletions sections/2024/main/speaker-diarization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# INTERSPEECH-2024-Papers

<table>
<tr>
<td><strong>Application</strong></td>
<td>
<a href="https://huggingface.co/spaces/DmitryRyumin/NewEraAI-Papers" style="float:left;">
<img src="https://img.shields.io/badge/🤗-NewEraAI--Papers-FFD21F.svg" alt="App" />
</a>
</td>
</tr>
<tr>
<td><strong>Previous Collections</strong></td>
<td>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/README_2023.md">
<img src="http://img.shields.io/badge/INTERSPEECH-2023-0C1C43.svg" alt="Conference">
</a>
</td>
</tr>
</table>

<div align="center">
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/l2-speech-bilingualism-and-code-switching.md">
<img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/left.svg" width="40" alt="" />
</a>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/README.md">
<img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/home.svg" width="40" alt="" />
</a>
<a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speech-and-audio-analysis-and-representations.md">
<img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/right.svg" width="40" alt="" />
</a>
</div>

## Speaker Diarization

![Section Papers](https://img.shields.io/badge/Section%20Papers-12-42BA16) ![Preprint Papers](https://img.shields.io/badge/Preprint%20Papers-0-b31b1b) ![Papers with Open Code](https://img.shields.io/badge/Papers%20with%20Open%20Code-0-1D7FBF) ![Papers with Video](https://img.shields.io/badge/Papers%20with%20Video-0-FF0000)

| **Title** | **Repo** | **Paper** | **Video** |
|-----------|:--------:|:---------:|:---------:|
| [Investigating Confidence Estimation Measures for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/chowdhury24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/chowdhury24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2406.17124-b31b1b.svg)](https://arxiv.org/abs/2406.17124) | :heavy_minus_sign: |
| [Speakers Unembedded: Embedding-Free Approach to Long-Form Neural Diarization](https://www.isca-archive.org/interspeech_2024/li24x_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/li24x_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2406.18679-b31b1b.svg)](https://arxiv.org/abs/2406.18679) | :heavy_minus_sign: |
| [On the Success and Limitations of Auxiliary Network based Word-Level End-to-End Neural Speaker Diarization](https://www.isca-archive.org/interspeech_2024/huang24d_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/huang24d_interspeech.pdf) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=uMV1W8MmwSw) |
| [EEND-M2F: Masked-Attention Mask Transformers for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/harkonen24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/harkonen24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.12600-b31b1b.svg)](https://arxiv.org/abs/2401.12600) | :heavy_minus_sign: |
| [AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-Step Cross-Attention for Robust Speaker Diarization in the Wild](https://www.isca-archive.org/interspeech_2024/yin24_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://afl-net.github.io/afl-net/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/yin24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.05730-b31b1b.svg)](https://arxiv.org/abs/2312.05730) | :heavy_minus_sign: |
| [Exploiting Wavelet Scattering Transform for an Unsupervised Speaker Diarization in Deep Neural Network Framework](https://www.isca-archive.org/interspeech_2024/arya24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/arya24_interspeech.pdf) | :heavy_minus_sign: |
| [Variable Segment Length and Domain-Adapted Feature Optimization for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/zhang24b_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/zhang24b_interspeech.pdf) | :heavy_minus_sign: |
| [Efficient Speaker Embedding Extraction using a Twofold Sliding Window Algorithm for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/choi24d_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/choi24d_interspeech.pdf) | :heavy_minus_sign: |
| [DiarizationLM: Speaker Diarization Post-Processing with Large Language Models](https://www.isca-archive.org/interspeech_2024/wang24h_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://github.com/google/speaker-id/tree/master/DiarizationLM) <br /> [![GitHub](https://img.shields.io/github/stars/google/speaker-id?style=flat)](https://github.com/google/speaker-id) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-demo-FFD21F.svg)](https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-model-FFD21F.svg)](https://huggingface.co/google/DiarizationLM-8b-Fisher-v2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/wang24h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.03506-b31b1b.svg)](https://arxiv.org/abs/2401.03506) | :heavy_minus_sign: |
| [Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-Traffic Control](https://www.isca-archive.org/interspeech_2024/blatt24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/blatt24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2406.13842-b31b1b.svg)](https://arxiv.org/abs/2406.13842) | :heavy_minus_sign: |
| [On the Calibration of Powerset Speaker Diarization Models](https://www.isca-archive.org/interspeech_2024/plaquet24_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://frenchkrab.github.io/IS2024-powerset-calibration/) <br /> [![GitHub](https://img.shields.io/github/stars/FrenchKrab/IS2024-powerset-calibration?style=flat)](https://github.com/FrenchKrab/IS2024-powerset-calibration) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/plaquet24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2409.15885-b31b1b.svg)](https://arxiv.org/abs/2409.15885) | :heavy_minus_sign: |
| [Specializing Self-Supervised Speech Representations for Speaker Segmentation](https://www.isca-archive.org/interspeech_2024/baroudi24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/baroudi24_interspeech.pdf) | :heavy_minus_sign: |

0 comments on commit efd0e58

Please sign in to comment.