Summary

DmitryRyumin · Dec 23, 2024 · efd0e58 · efd0e58
1 parent d9327b7
commit efd0e58
Show file tree

Hide file tree

Showing 2 changed files with 68 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -228,6 +228,23 @@ INTERSPEECH 2024 Papers: A complete collection of influential and exciting resea
                 <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/l2-speech-bilingualism-and-code-switching.md"><img src="https://img.shields.io/badge/0-FF0000" alt="Videos"></a>
             </td>
         </tr>
+        <tr>
+            <td>
+                <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md">Speaker Diarization</a>
+            </td>
+            <td>
+                <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/12-42BA16" alt="Papers"></a>
+            </td>
+            <td>
+                <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/0-b31b1b" alt="Preprints"></a>
+            </td>
+            <td>
+                <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/0-1D7FBF" alt="Open Code"></a>
+            </td>
+            <td>
+                <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speaker-diarization.md"><img src="https://img.shields.io/badge/0-FF0000" alt="Videos"></a>
+            </td>
+        </tr>
     </tbody>
 </table>
 

diff --git a/sections/2024/main/speaker-diarization.md b/sections/2024/main/speaker-diarization.md
@@ -0,0 +1,51 @@
+# INTERSPEECH-2024-Papers
+
+<table>
+    <tr>
+        <td><strong>Application</strong></td>
+        <td>
+            <a href="https://huggingface.co/spaces/DmitryRyumin/NewEraAI-Papers" style="float:left;">
+                <img src="https://img.shields.io/badge/🤗-NewEraAI--Papers-FFD21F.svg" alt="App" />
+            </a>
+        </td>
+    </tr>
+    <tr>
+        <td><strong>Previous Collections</strong></td>
+        <td>
+            <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/README_2023.md">
+                <img src="http://img.shields.io/badge/INTERSPEECH-2023-0C1C43.svg" alt="Conference">
+            </a>
+        </td>
+    </tr>
+</table>
+
+<div align="center">
+    <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/l2-speech-bilingualism-and-code-switching.md">
+        <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/left.svg" width="40" alt="" />
+    </a>
+    <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/README.md">
+        <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/home.svg" width="40" alt="" />
+    </a>
+    <a href="https://github.com/DmitryRyumin/INTERSPEECH-2023-24-Papers/blob/main/sections/2024/main/speech-and-audio-analysis-and-representations.md">
+        <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/right.svg" width="40" alt="" />
+    </a>
+</div>
+
+## Speaker Diarization
+
+![Section Papers](https://img.shields.io/badge/Section%20Papers-12-42BA16) ![Preprint Papers](https://img.shields.io/badge/Preprint%20Papers-0-b31b1b) ![Papers with Open Code](https://img.shields.io/badge/Papers%20with%20Open%20Code-0-1D7FBF) ![Papers with Video](https://img.shields.io/badge/Papers%20with%20Video-0-FF0000)
+
+| **Title** | **Repo** | **Paper** | **Video** |
+|-----------|:--------:|:---------:|:---------:|
+| [Investigating Confidence Estimation Measures for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/chowdhury24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/chowdhury24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2406.17124-b31b1b.svg)](https://arxiv.org/abs/2406.17124) | :heavy_minus_sign: |
+| [Speakers Unembedded: Embedding-Free Approach to Long-Form Neural Diarization](https://www.isca-archive.org/interspeech_2024/li24x_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/li24x_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2406.18679-b31b1b.svg)](https://arxiv.org/abs/2406.18679) | :heavy_minus_sign: |
+| [On the Success and Limitations of Auxiliary Network based Word-Level End-to-End Neural Speaker Diarization](https://www.isca-archive.org/interspeech_2024/huang24d_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/huang24d_interspeech.pdf) | [![YouTube](https://img.shields.io/badge/YouTube-%23FF0000.svg?style=for-the-badge&logo=YouTube&logoColor=white)](https://www.youtube.com/watch?v=uMV1W8MmwSw) |
+| [EEND-M2F: Masked-Attention Mask Transformers for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/harkonen24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/harkonen24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.12600-b31b1b.svg)](https://arxiv.org/abs/2401.12600) | :heavy_minus_sign: |
+| [AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-Step Cross-Attention for Robust Speaker Diarization in the Wild](https://www.isca-archive.org/interspeech_2024/yin24_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://afl-net.github.io/afl-net/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/yin24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.05730-b31b1b.svg)](https://arxiv.org/abs/2312.05730) | :heavy_minus_sign: |
+| [Exploiting Wavelet Scattering Transform for an Unsupervised Speaker Diarization in Deep Neural Network Framework](https://www.isca-archive.org/interspeech_2024/arya24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/arya24_interspeech.pdf) | :heavy_minus_sign: |
+| [Variable Segment Length and Domain-Adapted Feature Optimization for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/zhang24b_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/zhang24b_interspeech.pdf) | :heavy_minus_sign: |
+| [Efficient Speaker Embedding Extraction using a Twofold Sliding Window Algorithm for Speaker Diarization](https://www.isca-archive.org/interspeech_2024/choi24d_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/choi24d_interspeech.pdf) | :heavy_minus_sign: |
+| [DiarizationLM: Speaker Diarization Post-Processing with Large Language Models](https://www.isca-archive.org/interspeech_2024/wang24h_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://github.com/google/speaker-id/tree/master/DiarizationLM) <br /> [![GitHub](https://img.shields.io/github/stars/google/speaker-id?style=flat)](https://github.com/google/speaker-id) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-demo-FFD21F.svg)](https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-model-FFD21F.svg)](https://huggingface.co/google/DiarizationLM-8b-Fisher-v2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/wang24h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.03506-b31b1b.svg)](https://arxiv.org/abs/2401.03506) | :heavy_minus_sign: |
+| [Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-Traffic Control](https://www.isca-archive.org/interspeech_2024/blatt24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/blatt24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2406.13842-b31b1b.svg)](https://arxiv.org/abs/2406.13842) | :heavy_minus_sign: |
+| [On the Calibration of Powerset Speaker Diarization Models](https://www.isca-archive.org/interspeech_2024/plaquet24_interspeech.html) | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://frenchkrab.github.io/IS2024-powerset-calibration/) <br /> [![GitHub](https://img.shields.io/github/stars/FrenchKrab/IS2024-powerset-calibration?style=flat)](https://github.com/FrenchKrab/IS2024-powerset-calibration) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/plaquet24_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2409.15885-b31b1b.svg)](https://arxiv.org/abs/2409.15885) | :heavy_minus_sign: |
+| [Specializing Self-Supervised Speech Representations for Speaker Segmentation](https://www.isca-archive.org/interspeech_2024/baroudi24_interspeech.html) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2024/baroudi24_interspeech.pdf) | :heavy_minus_sign: |