From 44371bd07496c638dc9e5bbbd681581e6506a354 Mon Sep 17 00:00:00 2001
From: MaloMn <malo.maisonneuve@gmail.com>
Date: Thu, 22 Feb 2024 10:43:06 +0100
Subject: [PATCH] Updated links to current archived urls for ISCA papers

---
 README.md | 2284 ++++++++++++++++++++++++++---------------------------
 1 file changed, 1142 insertions(+), 1142 deletions(-)
diff --git a/README.md b/README.md
index 75d7037..15771fd 100644
--- a/README.md
+++ b/README.md
@@ -282,12 +282,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1686 | Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech | [![GitHub](https://img.shields.io/github/stars/DmitryRyumin/OCEANAI?style=flat)](https://github.com/DmitryRyumin/OCEANAI) <br /> [![Documentation Status](https://readthedocs.org/projects/oceanai/badge/?version=latest)](https://oceanai.readthedocs.io/en/latest/?badge=latest) <br /> [![PyPI](https://img.shields.io/pypi/v/oceanai)](https://pypi.org/project/oceanai/) <br /> [![MuPTA](https://img.shields.io/badge/MuPTA-dataset-20BEFF.svg)](https://hci.nw.ru/en/pages/mupta-corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ryumina23_interspeech.pdf) |
-| 1049 | MOCKS 1.0: Multilingual Open Custom Keyword Spotting Testset | [![Hugging Face](https://img.shields.io/badge/🤗-MOCKS-FFD21F.svg)](https://huggingface.co/datasets/voiceintelligenceresearch/MOCKS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pudo23_interspeech.pdf) |
-| 2150 | MD3: The Multi-Dialect Dataset of Dialogues | [![Kaggle](https://img.shields.io/badge/kaggle-dataset-20BEFF.svg)](https://www.kaggle.com/datasets/jacobeis99/md3en) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eisenstein23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11355-b31b1b.svg)](https://arxiv.org/abs/2305.11355) |
-| 2279 | MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | [![GitHub](https://img.shields.io/github/stars/facebookresearch/muavic?style=flat)](https://github.com/facebookresearch/muavic) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/anwar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00628-b31b1b.svg)](https://arxiv.org/abs/2303.00628) |
-| 1828 | Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/suwanbandit23_interspeech.pdf) |
-| 2351 | HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11252-b31b1b.svg)](https://arxiv.org/abs/2306.11252) |
+| 1686 | Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech | [![GitHub](https://img.shields.io/github/stars/DmitryRyumin/OCEANAI?style=flat)](https://github.com/DmitryRyumin/OCEANAI) <br /> [![Documentation Status](https://readthedocs.org/projects/oceanai/badge/?version=latest)](https://oceanai.readthedocs.io/en/latest/?badge=latest) <br /> [![PyPI](https://img.shields.io/pypi/v/oceanai)](https://pypi.org/project/oceanai/) <br /> [![MuPTA](https://img.shields.io/badge/MuPTA-dataset-20BEFF.svg)](https://hci.nw.ru/en/pages/mupta-corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ryumina23_interspeech.pdf) |
+| 1049 | MOCKS 1.0: Multilingual Open Custom Keyword Spotting Testset | [![Hugging Face](https://img.shields.io/badge/🤗-MOCKS-FFD21F.svg)](https://huggingface.co/datasets/voiceintelligenceresearch/MOCKS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pudo23_interspeech.pdf) |
+| 2150 | MD3: The Multi-Dialect Dataset of Dialogues | [![Kaggle](https://img.shields.io/badge/kaggle-dataset-20BEFF.svg)](https://www.kaggle.com/datasets/jacobeis99/md3en) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eisenstein23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11355-b31b1b.svg)](https://arxiv.org/abs/2305.11355) |
+| 2279 | MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | [![GitHub](https://img.shields.io/github/stars/facebookresearch/muavic?style=flat)](https://github.com/facebookresearch/muavic) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/anwar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00628-b31b1b.svg)](https://arxiv.org/abs/2303.00628) |
+| 1828 | Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/suwanbandit23_interspeech.pdf) |
+| 2351 | HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11252-b31b1b.svg)](https://arxiv.org/abs/2306.11252) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -299,12 +299,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 749 | Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03594-b31b1b.svg)](https://arxiv.org/abs/2306.03594) |
-| 1292 | Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttsbylzc.github.io/ttsdemo202303/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23n_interspeech.pdf) |
-| 1317 | EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00648-b31b1b.svg)](https://arxiv.org/abs/2306.00648) |
-| 806 | Laughter Synthesis using Pseudo Phonetic Tokens with a Large-Scale In-the-Wild Laughter Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://aria-k-alethia.github.io/2023laughter-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/Aria-K-Alethia/laughter-synthesis?style=flat)](https://github.com/Aria-K-Alethia/laughter-synthesis) <br /> [![Laughterscape](https://img.shields.io/badge/Laughterscape-corpus-20BEFF.svg)](https://sites.google.com/site/shinnosuketakamichi/research-topics/laughter_corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12442-b31b1b.svg)](https://arxiv.org/abs/2305.12442) |
-| 2270 | Explicit Intensity Control for Accented Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttslr.github.io/Ai-TTS/) <br /> [![GitHub](https://img.shields.io/github/stars/ttslr/Ai-TTS?style=flat)](https://github.com/ttslr/Ai-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23u_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15364-b31b1b.svg)](https://arxiv.org/abs/2210.15364) |
-| 834 | Comparing Normalizing Flows and Diffusion Models for Prosody and Acoustic Modelling in Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23o_interspeech.pdf) |
+| 749 | Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03594-b31b1b.svg)](https://arxiv.org/abs/2306.03594) |
+| 1292 | Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttsbylzc.github.io/ttsdemo202303/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23n_interspeech.pdf) |
+| 1317 | EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00648-b31b1b.svg)](https://arxiv.org/abs/2306.00648) |
+| 806 | Laughter Synthesis using Pseudo Phonetic Tokens with a Large-Scale In-the-Wild Laughter Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://aria-k-alethia.github.io/2023laughter-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/Aria-K-Alethia/laughter-synthesis?style=flat)](https://github.com/Aria-K-Alethia/laughter-synthesis) <br /> [![Laughterscape](https://img.shields.io/badge/Laughterscape-corpus-20BEFF.svg)](https://sites.google.com/site/shinnosuketakamichi/research-topics/laughter_corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12442-b31b1b.svg)](https://arxiv.org/abs/2305.12442) |
+| 2270 | Explicit Intensity Control for Accented Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttslr.github.io/Ai-TTS/) <br /> [![GitHub](https://img.shields.io/github/stars/ttslr/Ai-TTS?style=flat)](https://github.com/ttslr/Ai-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23u_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15364-b31b1b.svg)](https://arxiv.org/abs/2210.15364) |
+| 834 | Comparing Normalizing Flows and Diffusion Models for Prosody and Acoustic Modelling in Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23o_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -316,12 +316,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2484 | Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/duquenne23_interspeech.pdf) |
-| 1063 | Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13204-b31b1b.svg)](https://arxiv.org/abs/2305.13204) |
-| 648 | StyleS2ST: Zero-Shot Style Transfer for Direct Speech-to-Speech Translation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://styles2st.github.io/StyleS2ST/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/song23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17732-b31b1b.svg)](https://arxiv.org/abs/2305.17732) |
-| 1767 | Joint Speech Translation and Named Entity Recognition | [![GitHub](https://img.shields.io/github/stars/hlt-mt/FBK-fairseq?style=flat)](https://github.com/hlt-mt/FBK-fairseq/blob/master/fbk_works/JOINT_ST_NER2023.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gaido23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.11987-b31b1b.svg)](https://arxiv.org/abs/2210.11987) |
-| 2050 | Analysis of Acoustic Information in End-to-End Spoken Language Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sant23_interspeech.pdf) |
-| 2004 | LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23oa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02809-b31b1b.svg)](https://arxiv.org/abs/2211.02809) |
+| 2484 | Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/duquenne23_interspeech.pdf) |
+| 1063 | Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13204-b31b1b.svg)](https://arxiv.org/abs/2305.13204) |
+| 648 | StyleS2ST: Zero-Shot Style Transfer for Direct Speech-to-Speech Translation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://styles2st.github.io/StyleS2ST/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/song23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17732-b31b1b.svg)](https://arxiv.org/abs/2305.17732) |
+| 1767 | Joint Speech Translation and Named Entity Recognition | [![GitHub](https://img.shields.io/github/stars/hlt-mt/FBK-fairseq?style=flat)](https://github.com/hlt-mt/FBK-fairseq/blob/master/fbk_works/JOINT_ST_NER2023.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gaido23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.11987-b31b1b.svg)](https://arxiv.org/abs/2210.11987) |
+| 2050 | Analysis of Acoustic Information in End-to-End Spoken Language Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sant23_interspeech.pdf) |
+| 2004 | LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23oa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02809-b31b1b.svg)](https://arxiv.org/abs/2211.02809) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -333,12 +333,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1213 | DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/pyf98/DPHuBERT?style=flat)](https://github.com/pyf98/DPHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17651-b31b1b.svg)](https://arxiv.org/abs/2305.17651) |
-| 1040 | Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/salah-zaiem/augmentations_adaptation?style=flat)](https://github.com/salah-zaiem/augmentations_adaptation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zaiem23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00481-b31b1b.svg)](https://arxiv.org/abs/2306.00481) |
-| 387 | Dual Acoustic Linguistic Self-Supervised Representation Learning for Cross-Domain Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23e_interspeech.pdf) |
-| 2166 | O-1: Self-Training with Oracle and 1-best Hypothesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baskar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2308.07486-b31b1b.svg)](https://arxiv.org/abs/2308.07486) |
-| 822 | MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets | [![GitHub](https://img.shields.io/github/stars/ddlBoJack/MT4SSL?style=flat)](https://github.com/ddlBoJack/MT4SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23d_interspeech.pdf) <br />  [![arXiv](https://img.shields.io/badge/arXiv-2211.07321-b31b1b.svg)](https://arxiv.org/abs/2211.07321) |
-| 1802 | Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lamyeemui23_interspeech.pdf) |
+| 1213 | DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/pyf98/DPHuBERT?style=flat)](https://github.com/pyf98/DPHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17651-b31b1b.svg)](https://arxiv.org/abs/2305.17651) |
+| 1040 | Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/salah-zaiem/augmentations_adaptation?style=flat)](https://github.com/salah-zaiem/augmentations_adaptation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zaiem23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00481-b31b1b.svg)](https://arxiv.org/abs/2306.00481) |
+| 387 | Dual Acoustic Linguistic Self-Supervised Representation Learning for Cross-Domain Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23e_interspeech.pdf) |
+| 2166 | O-1: Self-Training with Oracle and 1-best Hypothesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baskar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2308.07486-b31b1b.svg)](https://arxiv.org/abs/2308.07486) |
+| 822 | MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets | [![GitHub](https://img.shields.io/github/stars/ddlBoJack/MT4SSL?style=flat)](https://github.com/ddlBoJack/MT4SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23d_interspeech.pdf) <br />  [![arXiv](https://img.shields.io/badge/arXiv-2211.07321-b31b1b.svg)](https://arxiv.org/abs/2211.07321) |
+| 1802 | Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lamyeemui23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -350,12 +350,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1781 | Chinese EFL Learners' Perception of English Prosodic Focus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23da_interspeech.pdf) |
-| 315 | Pitch Accent Variation and the Interpretation of Rising and Falling Intonation in American English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sostarics23_interspeech.pdf) |
-| 1033 | Tonal Coarticulation as a Cue for Upcoming Prosodic Boundary | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kuang23_interspeech.pdf) |
-| 2116 | Alignment of Beat Gestures and Prosodic Prominence in German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/repp23_interspeech.pdf) |
-| 1454 | Creak Prevalence and Prosodic Context in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/white23_interspeech.pdf) |
-| 1651 | Speech Reduction: Position within French Prosodic Structure | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bodur23_interspeech.pdf) |
+| 1781 | Chinese EFL Learners' Perception of English Prosodic Focus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23da_interspeech.pdf) |
+| 315 | Pitch Accent Variation and the Interpretation of Rising and Falling Intonation in American English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sostarics23_interspeech.pdf) |
+| 1033 | Tonal Coarticulation as a Cue for Upcoming Prosodic Boundary | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kuang23_interspeech.pdf) |
+| 2116 | Alignment of Beat Gestures and Prosodic Prominence in German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/repp23_interspeech.pdf) |
+| 1454 | Creak Prevalence and Prosodic Context in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/white23_interspeech.pdf) |
+| 1651 | Speech Reduction: Position within French Prosodic Structure | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bodur23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -367,10 +367,10 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 637 | Transvelar Nasal Coupling Contributing to Speaker Characteristics in Non-nasal Vowels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23d_interspeech.pdf) |
-| 286 | Speech Synthesis from Articulatory Movements Recorded by Real-time MRI | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/otani23_interspeech.pdf) |
-| 2283 | The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN | [![GitHub](https://img.shields.io/github/stars/byronthecoder/S-RNN-4-ART?style=flat)](https://github.com/byronthecoder/S-RNN-4-ART) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuan23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05088-b31b1b.svg)](https://arxiv.org/abs/2306.05088) |
-| 1933 | Did You See that? Exploring the Role of Vision in the Development of Consonant Feature Contrasts in Children with Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mahshie23_interspeech.pdf) |
+| 637 | Transvelar Nasal Coupling Contributing to Speaker Characteristics in Non-nasal Vowels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23d_interspeech.pdf) |
+| 286 | Speech Synthesis from Articulatory Movements Recorded by Real-time MRI | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/otani23_interspeech.pdf) |
+| 2283 | The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN | [![GitHub](https://img.shields.io/github/stars/byronthecoder/S-RNN-4-ART?style=flat)](https://github.com/byronthecoder/S-RNN-4-ART) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuan23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05088-b31b1b.svg)](https://arxiv.org/abs/2306.05088) |
+| 1933 | Did You See that? Exploring the Role of Vision in the Development of Consonant Feature Contrasts in Children with Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mahshie23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -382,12 +382,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2017 | Automatic Assessments of Dysarthric Speech: the Usability of Acoustic-Phonetic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vanbemmel23_interspeech.pdf) |
-| 1455 | Classification of Multi-class Vowels and Fricatives from Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/venkatathirumalakumar23_interspeech.pdf) |
-| 1627 | Parameter-efficient Dysarthric Speech Recognition using Adapter Fusion and Householder Transformation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qi23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07090-b31b1b.svg)](https://arxiv.org/abs/2306.07090) |
-| 2481 | Few-Shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hermann23_interspeech.pdf) <br /> [![idiap](https://img.shields.io/badge/idiap.ch.5055-FF6A00.svg)](http://publications.idiap.ch/index.php/publications/show/5055) |
-| 1921 | Latent Phrase Matching for Dysarthric Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yee23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05446-b31b1b.svg)](https://arxiv.org/abs/2306.05446) |
-| 173 | Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification | [![GitHub](https://img.shields.io/github/stars/juice500ml/dysarthria-gop?style=flat)](https://github.com/juice500ml/dysarthria-gop) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yeo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18392-b31b1b.svg)](https://arxiv.org/abs/2305.18392) |
+| 2017 | Automatic Assessments of Dysarthric Speech: the Usability of Acoustic-Phonetic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vanbemmel23_interspeech.pdf) |
+| 1455 | Classification of Multi-class Vowels and Fricatives from Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/venkatathirumalakumar23_interspeech.pdf) |
+| 1627 | Parameter-efficient Dysarthric Speech Recognition using Adapter Fusion and Householder Transformation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qi23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07090-b31b1b.svg)](https://arxiv.org/abs/2306.07090) |
+| 2481 | Few-Shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hermann23_interspeech.pdf) <br /> [![idiap](https://img.shields.io/badge/idiap.ch.5055-FF6A00.svg)](http://publications.idiap.ch/index.php/publications/show/5055) |
+| 1921 | Latent Phrase Matching for Dysarthric Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yee23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05446-b31b1b.svg)](https://arxiv.org/abs/2306.05446) |
+| 173 | Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification | [![GitHub](https://img.shields.io/github/stars/juice500ml/dysarthria-gop?style=flat)](https://github.com/juice500ml/dysarthria-gop) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yeo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18392-b31b1b.svg)](https://arxiv.org/abs/2305.18392) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -399,10 +399,10 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1562 | CQNV: A Combination of Coarsely Quantized Bitstream and Neural Vocoder for Low Rate Speech Coding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zheng23c_interspeech.pdf) |
-| 1234 | Target Speech Extraction with Conditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kamo23_interspeech.pdf) |
-| 883 | Towards Fully Quantized Neural Networks For Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ssi-research/FQSE?style=flat)](https://github.com/ssi-research/FQSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cohen23_interspeech.pdf) |
-| 980 | Complex Image Generation SwinTransformer Network for Audio Denoising | [![GitHub](https://img.shields.io/github/stars/YoushanZhang/CoxImgSwinTransformer?style=flat)](https://github.com/YoushanZhang/CoxImgSwinTransformer) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23p_interspeech.pdf) |
+| 1562 | CQNV: A Combination of Coarsely Quantized Bitstream and Neural Vocoder for Low Rate Speech Coding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zheng23c_interspeech.pdf) |
+| 1234 | Target Speech Extraction with Conditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kamo23_interspeech.pdf) |
+| 883 | Towards Fully Quantized Neural Networks For Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ssi-research/FQSE?style=flat)](https://github.com/ssi-research/FQSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cohen23_interspeech.pdf) |
+| 980 | Complex Image Generation SwinTransformer Network for Audio Denoising | [![GitHub](https://img.shields.io/github/stars/YoushanZhang/CoxImgSwinTransformer?style=flat)](https://github.com/YoushanZhang/CoxImgSwinTransformer) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23p_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -414,81 +414,81 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2118 | Using Text Injection to Improve Recognition of Personal Identifiers in Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/blau23_interspeech.pdf) |
-| 837 | Investigating Wav2Vec2 Context Representations and the Effects of Fine-Tuning, a Case-Study of a Finnish Model | [![GitHub](https://img.shields.io/github/stars/aalto-speech/Wav2vec2Interpretation?style=flat)](https://github.com/aalto-speech/Wav2vec2Interpretation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/grosz23_interspeech.pdf) |
-| 872 | Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lehecka23_interspeech.pdf) |
-| 177 | Iteratively Improving Speech Recognition and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://demosamplesites.github.io/IterativeASR_VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15055-b31b1b.svg)](https://arxiv.org/abs/2305.15055) |
-| 2001 | LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fatehi23_interspeech.pdf) <br /> [![nottingham-repo](https://img.shields.io/badge/nottingham-22183323-1A296B.svg)](https://nottingham-repository.worktribe.com/output/22183323) |
-| 746 | TranUSR: Phoneme-to-Word Transcoder based Unified Speech Representation Learning for Cross-Lingual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xue23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13629-b31b1b.svg)](https://arxiv.org/abs/2305.13629) |
-| 1124 | Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23e_interspeech.pdf) |
-| 2417 | GhostRNN: Reducing State Redundancy in RNN with Cheap Operations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23g_interspeech.pdf) |
-| 1442 | Task-Agnostic Structured Pruning of Speech Representation Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23da_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01385-b31b1b.svg)](https://arxiv.org/abs/2306.01385) |
-| 485 | Factual Consistency Oriented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kanda23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12369-b31b1b.svg)](https://arxiv.org/abs/2302.12369) |
-| 1036 | Multi-Head State Space Model for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fathullah23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12498-b31b1b.svg)](https://arxiv.org/abs/2305.12498) |
-| 341 | Cascaded Multi-task Adaptive Learning based on Neural Architecture Search | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23_interspeech.pdf) |
-| 2359 | Probing Self-Supervised Speech Models for Phonetic and Phonemic Information: A Case Study in Aspiration | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martin23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06232-b31b1b.svg)](https://arxiv.org/abs/2306.06232) |
-| 739 | Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/harding23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/selective-biasing-with-trie-based-contextual-adapters-for-personalised-speech-recognition-using-neural-transducers) |
-| 213 | A More Accurate Internal Language Model Score Estimation for the Hybrid Autoregressive Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23b_interspeech.pdf) |
-| 106 | Attention Gate between Capsules in Fully Capsule-Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23_interspeech.pdf) |
-|2585 | OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bengaliai.github.io/asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rakib23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.09688-b31b1b.svg)](https://arxiv.org/abs/2305.09688) |
-| 1316 | ML-SUPERB: Multilingual Speech Universal PERformance Benchmark | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/ml_superb/asr1) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10615-b31b1b.svg)](https://arxiv.org/abs/2305.10615) |
-| 2389 | General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23l_interspeech.pdf) |
-| 275 | Joint Instance Reconstruction and Feature Sub-space Alignment for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23_interspeech.pdf) |
-| 2280 | Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/moriya23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15971-b31b1b.svg)](https://arxiv.org/abs/2305.15971) |
-| 1272 | Random Utterance Concatenation based Data Augmentation for Improving Short-Video Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15876-b31b1b.svg)](https://arxiv.org/abs/2210.15876) |
-| 1189 | Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers | [![GitHub](https://img.shields.io/github/stars/NMS05/Adapter-Incremental-Continual-Learning-AST?style=flat)](https://github.com/NMS05/Adapter-Incremental-Continual-Learning-AST) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/muthuchamyselvaraj23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14314-b31b1b.svg)](https://arxiv.org/abs/2302.14314) |
-| 223 | Rethinking Speech Recognition with a Multimodal Perspective via Acoustic and Semantic Cooperative Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14049-b31b1b.svg)](https://arxiv.org/abs/2305.14049) |
-| 923 | Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://liangzheng-zl.github.io/bedit-web/) <br /> [![GitHub](https://img.shields.io/github/stars/Liangzheng-ZL/BEdit-TTS?style=flat)](https://github.com/Liangzheng-ZL/BEdit-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08588-b31b1b.svg)](https://arxiv.org/abs/2306.08588) |
-| 2258 | Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01031-b31b1b.svg)](https://arxiv.org/abs/2306.01031) |
-| 1184 | DCCRN-KWS: An Audio Bias based Model for Noise Robust Small-Footprint Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lv23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12331-b31b1b.svg)](https://arxiv.org/abs/2305.12331) |
-| 1609 | OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02541-b31b1b.svg)](https://arxiv.org/abs/2306.02541) |
-| 2136 | Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bleeker23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2304.08862-b31b1b.svg)](https://arxiv.org/abs/2304.08862) |
-| 788 | Rehearsal-Free Online Continual Learning for Automatic Speech Recognition | [![GitHub](https://img.shields.io/github/stars/StevenVdEeckt/online-cl-for-asr?style=flat)](https://github.com/StevenVdEeckt/online-cl-for-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vandereeckt23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.10860-b31b1b.svg)](https://arxiv.org/abs/2306.10860) |
-| 496 | ASR Data Augmentation in Low-Resource Settings using Cross-Lingual Multi-Speaker TTS and Cross-Lingual Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Edresson/Wav2Vec-Wrapper/tree/main/Papers/TTS-Augmentation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/casanova23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2204.00618-b31b1b.svg)](https://arxiv.org/abs/2204.00618) |
-| 642 | Personality-aware Training based Speaker Adaptation for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/shibeiing/Personality-aware-Training-PAT?style=flat)](https://github.com/shibeiing/Personality-aware-Training-PAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gu23_interspeech.pdf) |
-| 2257 | Target Vocabulary Recognition based on Multi-task Learning with Decomposed Teacher Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ito23b_interspeech.pdf) |
-| 679 | Wave to Syntax: Probing Spoken Language Models for Syntax | [![GitHub](https://img.shields.io/github/stars/techsword/wave-to-syntax?style=flat)](https://github.com/techsword/wave-to-syntax) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18957-b31b1b.svg)](https://arxiv.org/abs/2305.18957) |
-| 720 | Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/naowarat23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/effective-training-of-attention-based-contextual-biasing-adapters-with-synthetic-audio-for-personalised-asr) |
-| 630 | Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08920-b31b1b.svg)](https://arxiv.org/abs/2306.08920) |
-| 1118 | SlothSpeech: Denial-of-Service Attack Against Speech Recognition Models | [![GitHub](https://img.shields.io/github/stars/0xrutvij/SlothSpeech?style=flat)](https://github.com/0xrutvij/SlothSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/haque23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00794-b31b1b.svg)](https://arxiv.org/abs/2306.00794) |
-| 503 | CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23d_interspeech.pdf) |
-| 159 | Exploring Sources of Racial Bias in Automatic Speech Recognition through the Lens of Rhythmic Variation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lai23_interspeech.pdf) |
-| 1440 | Can Contextual Biasing Remain Effective with Whisper and GPT-2? | [![GitHub](https://img.shields.io/github/stars/BriansIDP/WhisperBiasing?style=flat)](https://github.com/BriansIDP/WhisperBiasing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01942-b31b1b.svg)](https://arxiv.org/abs/2306.01942) |
-| 221 | Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/nttcslab/m2d/tree/master/speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/niizumi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14079-b31b1b.svg)](https://arxiv.org/abs/2305.14079) |
-| 2207 | Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cui23c_interspeech.pdf) |
-| 1216 | MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition | [![GitHub](https://img.shields.io/github/stars/jiamin1013/mixrep-espnet?style=flat)](https://github.com/jiamin1013/mixrep-espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23_interspeech.pdf) |
-| 1192 | Improving Chinese Mandarin Speech Recognition using Graph Embedding Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23h_interspeech.pdf) |
-| 1276 | Adapting Multi-Lingual ASR Models for Handling Multiple Talkers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18747-b31b1b.svg)](https://arxiv.org/abs/2305.18747) |
-| 1221 | Adapter-Tuning with Effective Token-Dependent Representation Shift for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ng23c_interspeech.pdf) |
-| 1010 | Model-Internal Slot-Triggered Biasing for Domain Expansion in Neural Transducer ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23c_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/model-internal-slot-triggered-biasing-for-domain-expansion-in-neural-transducer-asr-models) |
-| 2508 | Delay-Penalized CTC Implemented based on Finite State Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yao23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11539-b31b1b.svg)](https://arxiv.org/abs/2305.11539) |
-| 2589 | Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/vistaar?style=flat)](https://github.com/AI4Bharat/vistaar) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhogale23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15386-b31b1b.svg)](https://arxiv.org/abs/2305.15386) |
-| 1091 | Domain Adaptive Self-Supervised Training of Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23_interspeech.pdf) |
-| 1105 | There is more than One Kind of Robustness: Fooling Whisper with Adversarial Examples | [![GitHub](https://img.shields.io/github/stars/RaphaelOlivier/whisper_attack?style=flat)](https://github.com/RaphaelOlivier/whisper_attack) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/olivier23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17316-b31b1b.svg)](https://arxiv.org/abs/2210.17316) |
-| 1064 | MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations | [![GitHub](https://img.shields.io/github/stars/CHeggan/MT-SLVR?style=flat)](https://github.com/CHeggan/MT-SLVR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/heggan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17191-b31b1b.svg)](https://arxiv.org/abs/2305.17191) |
-| 1176 | Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23l_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06672-b31b1b.svg)](https://arxiv.org/abs/2306.06672) |
-| 759 | Blank-Regularized CTC for Frame Skipping in Neural Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23l_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11558-b31b1b.svg)](https://arxiv.org/abs/2305.11558) |
-| 2406 | The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jayakumar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19584-b31b1b.svg)](https://arxiv.org/abs/2305.19584) |
-| 2354 | Improving RNN-Transducers with Acoustic LookAhead | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/unni23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.05006-b31b1b.svg)](https://arxiv.org/abs/2307.05006) |
-| 1847 | Everyone has an Accent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/markl23_interspeech.pdf) |
-| 2124 | Some Voices are too Common: Building Fair Speech Recognition Systems using the Common-Voice Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/maison23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03773-b31b1b.svg)](https://arxiv.org/abs/2306.03773) |
-| 1168 | Information Magnitude based Dynamic Sub-Sampling for Speech-to-Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23u_interspeech.pdf) |
-| 353 | Towards Multi-task Learning of Speech and Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/nikvaessen/disjoint-mtl?style=flat)](https://github.com/nikvaessen/disjoint-mtl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vaessen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12773-b31b1b.svg)](https://arxiv.org/abs/2302.12773) |
-| 2186 | Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23f_interspeech.pdf) |
-| 1012 | 2-bit Conformer Quantization for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rybakov23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16619-b31b1b.svg)](https://arxiv.org/abs/2305.16619) |
-| 167 | Time-Domain Speech Enhancement for Robust Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.13318-b31b1b.svg)](https://arxiv.org/abs/2210.13318) |
-| 257 | Multi-Channel Multi-Speaker Transformer for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yifan23_interspeech.pdf) |
-| 733 | Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ye23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15875-b31b1b.svg)](https://arxiv.org/abs/2306.15875) |
-| 2463 | Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/miwa23_interspeech.pdf) |
-| 767 | Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) |
-| 970 | Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/raissi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) |
-| 791 | MMSpeech: Multi-Modal Multi-Task Encoder-Decoder Pre-training for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.00500-b31b1b.svg)](https://arxiv.org/abs/2212.00500) |
-| 2499 | Biased Self-Supervised Learning for ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kreyssig23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02536-b31b1b.svg)](https://arxiv.org/abs/2211.02536) |
-| 1300 | A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23q_interspeech.pdf) |
-| 2470 | Wav2Vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23h_interspeech.pdf) |
-| 770 | BAT: Boundary aware Transducer for Memory-Efficient and Low-Latency ASR | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/an23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11571-b31b1b.svg)](https://arxiv.org/abs/2305.11571) |
-| 1342 | Bayes Risk Transducer: Transducer with Controllable Alignment Prediction | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tian23_interspeech.pdf) |
-| 783 | Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alastruey23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06954-b31b1b.svg)](https://arxiv.org/abs/2306.06954) |
+| 2118 | Using Text Injection to Improve Recognition of Personal Identifiers in Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/blau23_interspeech.pdf) |
+| 837 | Investigating Wav2Vec2 Context Representations and the Effects of Fine-Tuning, a Case-Study of a Finnish Model | [![GitHub](https://img.shields.io/github/stars/aalto-speech/Wav2vec2Interpretation?style=flat)](https://github.com/aalto-speech/Wav2vec2Interpretation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/grosz23_interspeech.pdf) |
+| 872 | Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lehecka23_interspeech.pdf) |
+| 177 | Iteratively Improving Speech Recognition and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://demosamplesites.github.io/IterativeASR_VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15055-b31b1b.svg)](https://arxiv.org/abs/2305.15055) |
+| 2001 | LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fatehi23_interspeech.pdf) <br /> [![nottingham-repo](https://img.shields.io/badge/nottingham-22183323-1A296B.svg)](https://nottingham-repository.worktribe.com/output/22183323) |
+| 746 | TranUSR: Phoneme-to-Word Transcoder based Unified Speech Representation Learning for Cross-Lingual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xue23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13629-b31b1b.svg)](https://arxiv.org/abs/2305.13629) |
+| 1124 | Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23e_interspeech.pdf) |
+| 2417 | GhostRNN: Reducing State Redundancy in RNN with Cheap Operations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23g_interspeech.pdf) |
+| 1442 | Task-Agnostic Structured Pruning of Speech Representation Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23da_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01385-b31b1b.svg)](https://arxiv.org/abs/2306.01385) |
+| 485 | Factual Consistency Oriented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kanda23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12369-b31b1b.svg)](https://arxiv.org/abs/2302.12369) |
+| 1036 | Multi-Head State Space Model for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fathullah23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12498-b31b1b.svg)](https://arxiv.org/abs/2305.12498) |
+| 341 | Cascaded Multi-task Adaptive Learning based on Neural Architecture Search | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23_interspeech.pdf) |
+| 2359 | Probing Self-Supervised Speech Models for Phonetic and Phonemic Information: A Case Study in Aspiration | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martin23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06232-b31b1b.svg)](https://arxiv.org/abs/2306.06232) |
+| 739 | Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/harding23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/selective-biasing-with-trie-based-contextual-adapters-for-personalised-speech-recognition-using-neural-transducers) |
+| 213 | A More Accurate Internal Language Model Score Estimation for the Hybrid Autoregressive Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23b_interspeech.pdf) |
+| 106 | Attention Gate between Capsules in Fully Capsule-Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23_interspeech.pdf) |
+|2585 | OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bengaliai.github.io/asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rakib23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.09688-b31b1b.svg)](https://arxiv.org/abs/2305.09688) |
+| 1316 | ML-SUPERB: Multilingual Speech Universal PERformance Benchmark | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/ml_superb/asr1) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10615-b31b1b.svg)](https://arxiv.org/abs/2305.10615) |
+| 2389 | General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23l_interspeech.pdf) |
+| 275 | Joint Instance Reconstruction and Feature Sub-space Alignment for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23_interspeech.pdf) |
+| 2280 | Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/moriya23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15971-b31b1b.svg)](https://arxiv.org/abs/2305.15971) |
+| 1272 | Random Utterance Concatenation based Data Augmentation for Improving Short-Video Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15876-b31b1b.svg)](https://arxiv.org/abs/2210.15876) |
+| 1189 | Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers | [![GitHub](https://img.shields.io/github/stars/NMS05/Adapter-Incremental-Continual-Learning-AST?style=flat)](https://github.com/NMS05/Adapter-Incremental-Continual-Learning-AST) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/muthuchamyselvaraj23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14314-b31b1b.svg)](https://arxiv.org/abs/2302.14314) |
+| 223 | Rethinking Speech Recognition with a Multimodal Perspective via Acoustic and Semantic Cooperative Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14049-b31b1b.svg)](https://arxiv.org/abs/2305.14049) |
+| 923 | Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://liangzheng-zl.github.io/bedit-web/) <br /> [![GitHub](https://img.shields.io/github/stars/Liangzheng-ZL/BEdit-TTS?style=flat)](https://github.com/Liangzheng-ZL/BEdit-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08588-b31b1b.svg)](https://arxiv.org/abs/2306.08588) |
+| 2258 | Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01031-b31b1b.svg)](https://arxiv.org/abs/2306.01031) |
+| 1184 | DCCRN-KWS: An Audio Bias based Model for Noise Robust Small-Footprint Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lv23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12331-b31b1b.svg)](https://arxiv.org/abs/2305.12331) |
+| 1609 | OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02541-b31b1b.svg)](https://arxiv.org/abs/2306.02541) |
+| 2136 | Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bleeker23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2304.08862-b31b1b.svg)](https://arxiv.org/abs/2304.08862) |
+| 788 | Rehearsal-Free Online Continual Learning for Automatic Speech Recognition | [![GitHub](https://img.shields.io/github/stars/StevenVdEeckt/online-cl-for-asr?style=flat)](https://github.com/StevenVdEeckt/online-cl-for-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vandereeckt23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.10860-b31b1b.svg)](https://arxiv.org/abs/2306.10860) |
+| 496 | ASR Data Augmentation in Low-Resource Settings using Cross-Lingual Multi-Speaker TTS and Cross-Lingual Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Edresson/Wav2Vec-Wrapper/tree/main/Papers/TTS-Augmentation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/casanova23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2204.00618-b31b1b.svg)](https://arxiv.org/abs/2204.00618) |
+| 642 | Personality-aware Training based Speaker Adaptation for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/shibeiing/Personality-aware-Training-PAT?style=flat)](https://github.com/shibeiing/Personality-aware-Training-PAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gu23_interspeech.pdf) |
+| 2257 | Target Vocabulary Recognition based on Multi-task Learning with Decomposed Teacher Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ito23b_interspeech.pdf) |
+| 679 | Wave to Syntax: Probing Spoken Language Models for Syntax | [![GitHub](https://img.shields.io/github/stars/techsword/wave-to-syntax?style=flat)](https://github.com/techsword/wave-to-syntax) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18957-b31b1b.svg)](https://arxiv.org/abs/2305.18957) |
+| 720 | Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/naowarat23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/effective-training-of-attention-based-contextual-biasing-adapters-with-synthetic-audio-for-personalised-asr) |
+| 630 | Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08920-b31b1b.svg)](https://arxiv.org/abs/2306.08920) |
+| 1118 | SlothSpeech: Denial-of-Service Attack Against Speech Recognition Models | [![GitHub](https://img.shields.io/github/stars/0xrutvij/SlothSpeech?style=flat)](https://github.com/0xrutvij/SlothSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/haque23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00794-b31b1b.svg)](https://arxiv.org/abs/2306.00794) |
+| 503 | CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23d_interspeech.pdf) |
+| 159 | Exploring Sources of Racial Bias in Automatic Speech Recognition through the Lens of Rhythmic Variation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lai23_interspeech.pdf) |
+| 1440 | Can Contextual Biasing Remain Effective with Whisper and GPT-2? | [![GitHub](https://img.shields.io/github/stars/BriansIDP/WhisperBiasing?style=flat)](https://github.com/BriansIDP/WhisperBiasing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01942-b31b1b.svg)](https://arxiv.org/abs/2306.01942) |
+| 221 | Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/nttcslab/m2d/tree/master/speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/niizumi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14079-b31b1b.svg)](https://arxiv.org/abs/2305.14079) |
+| 2207 | Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cui23c_interspeech.pdf) |
+| 1216 | MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition | [![GitHub](https://img.shields.io/github/stars/jiamin1013/mixrep-espnet?style=flat)](https://github.com/jiamin1013/mixrep-espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23_interspeech.pdf) |
+| 1192 | Improving Chinese Mandarin Speech Recognition using Graph Embedding Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23h_interspeech.pdf) |
+| 1276 | Adapting Multi-Lingual ASR Models for Handling Multiple Talkers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18747-b31b1b.svg)](https://arxiv.org/abs/2305.18747) |
+| 1221 | Adapter-Tuning with Effective Token-Dependent Representation Shift for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ng23c_interspeech.pdf) |
+| 1010 | Model-Internal Slot-Triggered Biasing for Domain Expansion in Neural Transducer ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23c_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/model-internal-slot-triggered-biasing-for-domain-expansion-in-neural-transducer-asr-models) |
+| 2508 | Delay-Penalized CTC Implemented based on Finite State Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yao23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11539-b31b1b.svg)](https://arxiv.org/abs/2305.11539) |
+| 2589 | Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/vistaar?style=flat)](https://github.com/AI4Bharat/vistaar) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhogale23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15386-b31b1b.svg)](https://arxiv.org/abs/2305.15386) |
+| 1091 | Domain Adaptive Self-Supervised Training of Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23_interspeech.pdf) |
+| 1105 | There is more than One Kind of Robustness: Fooling Whisper with Adversarial Examples | [![GitHub](https://img.shields.io/github/stars/RaphaelOlivier/whisper_attack?style=flat)](https://github.com/RaphaelOlivier/whisper_attack) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/olivier23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17316-b31b1b.svg)](https://arxiv.org/abs/2210.17316) |
+| 1064 | MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations | [![GitHub](https://img.shields.io/github/stars/CHeggan/MT-SLVR?style=flat)](https://github.com/CHeggan/MT-SLVR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/heggan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17191-b31b1b.svg)](https://arxiv.org/abs/2305.17191) |
+| 1176 | Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23l_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06672-b31b1b.svg)](https://arxiv.org/abs/2306.06672) |
+| 759 | Blank-Regularized CTC for Frame Skipping in Neural Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23l_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11558-b31b1b.svg)](https://arxiv.org/abs/2305.11558) |
+| 2406 | The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jayakumar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19584-b31b1b.svg)](https://arxiv.org/abs/2305.19584) |
+| 2354 | Improving RNN-Transducers with Acoustic LookAhead | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/unni23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.05006-b31b1b.svg)](https://arxiv.org/abs/2307.05006) |
+| 1847 | Everyone has an Accent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/markl23_interspeech.pdf) |
+| 2124 | Some Voices are too Common: Building Fair Speech Recognition Systems using the Common-Voice Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/maison23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03773-b31b1b.svg)](https://arxiv.org/abs/2306.03773) |
+| 1168 | Information Magnitude based Dynamic Sub-Sampling for Speech-to-Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23u_interspeech.pdf) |
+| 353 | Towards Multi-task Learning of Speech and Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/nikvaessen/disjoint-mtl?style=flat)](https://github.com/nikvaessen/disjoint-mtl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vaessen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12773-b31b1b.svg)](https://arxiv.org/abs/2302.12773) |
+| 2186 | Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23f_interspeech.pdf) |
+| 1012 | 2-bit Conformer Quantization for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rybakov23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16619-b31b1b.svg)](https://arxiv.org/abs/2305.16619) |
+| 167 | Time-Domain Speech Enhancement for Robust Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.13318-b31b1b.svg)](https://arxiv.org/abs/2210.13318) |
+| 257 | Multi-Channel Multi-Speaker Transformer for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yifan23_interspeech.pdf) |
+| 733 | Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ye23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15875-b31b1b.svg)](https://arxiv.org/abs/2306.15875) |
+| 2463 | Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/miwa23_interspeech.pdf) |
+| 767 | Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) |
+| 970 | Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/raissi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) |
+| 791 | MMSpeech: Multi-Modal Multi-Task Encoder-Decoder Pre-training for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.00500-b31b1b.svg)](https://arxiv.org/abs/2212.00500) |
+| 2499 | Biased Self-Supervised Learning for ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kreyssig23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02536-b31b1b.svg)](https://arxiv.org/abs/2211.02536) |
+| 1300 | A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23q_interspeech.pdf) |
+| 2470 | Wav2Vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23h_interspeech.pdf) |
+| 770 | BAT: Boundary aware Transducer for Memory-Efficient and Low-Latency ASR | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/an23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11571-b31b1b.svg)](https://arxiv.org/abs/2305.11571) |
+| 1342 | Bayes Risk Transducer: Transducer with Controllable Alignment Prediction | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tian23_interspeech.pdf) |
+| 783 | Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alastruey23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06954-b31b1b.svg)](https://arxiv.org/abs/2306.06954) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -500,91 +500,91 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1173 | Robust Prototype Learning for Anomalous Sound Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zeng23b_interspeech.pdf) |
-| 982 | A Multimodal Prototypical Approach for Unsupervised Sound Classification | [![GitHub](https://img.shields.io/github/stars/sakshamsingh1/audio_text_proto?style=flat)](https://github.com/sakshamsingh1/audio_text_proto) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kushwaha23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12300-b31b1b.svg)](https://arxiv.org/abs/2306.12300) |
-| 563 | Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms | [![GitHub](https://img.shields.io/github/stars/ph-w2000/S2pecNet?style=flat)](https://github.com/ph-w2000/S2pecNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wen23_interspeech.pdf) |
-| 1082 | Adapting Language-Audio Models as Few-Shot Audio Learners | [![GitHub](https://img.shields.io/github/stars/JinhuaLiang/lam4fsl?style=flat)](https://github.com/JinhuaLiang/lam4fsl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17719-b31b1b.svg)](https://arxiv.org/abs/2305.17719) |
-| 734 | TFECN: Time-Frequency Enhanced ConvNet for Audio Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23l_interspeech.pdf) |
-| 350 | Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23b_interspeech.pdf) |
-| 1174 | Fine-Tuning Audio Spectrogram Transformer with Task-Aware Adapters for Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23n_interspeech.pdf) |
-| 1210 | Small Footprint Multi-Channel Network for Keyword Spotting with Centroid based Awareness | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2204.05445-b31b1b.svg)](https://arxiv.org/abs/2204.05445) |
-| 1380 | Few-Shot Class-Incremental Audio Classification using Adaptively-Refined Prototypes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18045-b31b1b.svg)](https://arxiv.org/abs/2305.18045) |
-| 1549 | Interpretable Latent Space using Space-Filling Curves for Phonetic Analysis in Voice Conversion | [![GitLab](https://img.shields.io/gitlab/stars/speech-interaction-technology-aalto-university/sfvq)](https://gitlab.com/speech-interaction-technology-aalto-university/sfvq) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vali23_interspeech.pdf) <br /> [![Aalto](https://img.shields.io/badge/aalto-fi-005EB8.svg)](https://research.aalto.fi/en/publications/interpretable-latent-space-using-space-filling-curves-for-phoneti) |
-| 1861 | Topological Data Analysis for Speech Processing | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://topohubert.github.io/speech-topology-webpages/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tulchinskii23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.17223-b31b1b.svg)](https://arxiv.org/abs/2211.17223) |
-| 1329 | Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation | [![GitHub](https://img.shields.io/github/stars/sungnyun/ARMHuBERT?style=flat)](https://github.com/sungnyun/ARMHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11685-b31b1b.svg)](https://arxiv.org/abs/2305.11685) |
-| 932 | Personalized Acoustic Scene Classification in Ultra-Low Power Embedded Devices using Privacy-Preserving Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/koppelmann23_interspeech.pdf) |
-| 176 | Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/boschresearch/soundsee-background-domain-switch?style=flat)](https://github.com/boschresearch/soundsee-background-domain-switch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23_interspeech.pdf) |
-| 1021 | Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning | [![GitHub](https://img.shields.io/github/stars/Yuanbo2020/HGRL?style=flat)](https://github.com/Yuanbo2020/HGRL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hou23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://personal.ee.surrey.ac.uk/Personal/W.Wang/papers/Hou%20etal_INTERSPEECH_2023.pdf) |
-| 2416 | Anomalous Sound Detection using Self-Attention-based Frequency Pattern Analysis of Machine Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23fa_interspeech.pdf) |
-| 1478 | Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23c_interspeech.pdf) |
-| 575 | Differential Privacy enabled Dementia Classification: An Exploration of the Privacy-Accuracy Trade-off in Speech Signal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bn23_interspeech.pdf) |
-| 1595 | Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous.4open.science/w/INTERSPEECH2023-F8C4/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ka_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05709-b31b1b.svg)](https://arxiv.org/abs/2306.05709) |
-| 1816 | Towards Multi-Lingual Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/swarupbehera/mAQA?style=flat)](https://github.com/swarupbehera/mAQA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/behera23_interspeech.pdf) |
-| 1344 | Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liao23_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/370843606_Blind_Estimation_of_Room_Impulse_Response_from_Monaural_Reverberant_Speech_with_Segmental_Generative_Neural_Network) |
-| 358 | Emotion-aware Audio-Driven Face Animation via Contrastive Feature Disentanglement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ren23_interspeech.pdf) |
-| 591 | Anomalous Sound Detection based on Sound Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shimonishi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15859-b31b1b.svg)](https://arxiv.org/abs/2305.15859) |
-| 2089 | Random Forest Classification of Breathing Phases from Audio Signals Recorded using Mobile Devices | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fahed23_interspeech.pdf) |
-| 1581 | GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ahn23b_interspeech.pdf) |
-| 477 | Wav2ToBI: A New Approach to Automatic ToBI Transcription | [![GitHub](https://img.shields.io/github/stars/reginazhai/Wav2ToBI?style=flat)](https://github.com/reginazhai/Wav2ToBI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhai23_interspeech.pdf) |
-| 344 | Joint-Former: Jointly Regularized and Locally Down-Sampled Conformer for Semi-Supervised Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/mastergofujs/Joint-Former?style=flat)](https://github.com/mastergofujs/Joint-Former) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23b_interspeech.pdf) |
-| 245 | Towards Attention-based Contrastive Learning for Audio Spoof Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/goel23_interspeech.pdf) |
-| 2488 | Masked Audio Modeling with CLAP and Multi-Objective Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23d_interspeech.pdf) |
-| 1904 | Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems | [![GitHub](https://img.shields.io/github/stars/mrusci/ondevice-fewshot-kws?style=flat)](https://github.com/mrusci/ondevice-fewshot-kws) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rusci23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02161-b31b1b.svg)](https://arxiv.org/abs/2306.02161) |
-| 481 | Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/azeemi23_interspeech.pdf) |
-| 491 | Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18419-b31b1b.svg)](https://arxiv.org/abs/2305.18419) |
-| 684 | Multi-Microphone Automatic Speech Segmentation in Meetings based on Circular Harmonics Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mariotte23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04268-b31b1b.svg)](https://arxiv.org/abs/2306.04268) |
-| 542 | Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23h_interspeech.pdf) |
-| 88 | Insights Into End-to-End Audio-to-Score Transcription with Real Recordings: A Case Study with Saxophone Works | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martinezsevilla23_interspeech.pdf) |
-| 2193 | Whisper-AT: Noise-Robust Automatic Speech Recognizers are also Strong Audio Event Taggers | [![GitHub](https://img.shields.io/github/stars/YuanGongND/whisper-at?style=flat)](https://github.com/YuanGongND/whisper-at) <br /> [![PyPI](https://img.shields.io/pypi/v/whisper-at)](https://pypi.org/project/whisper-at/) <br /> [![Whisper-AT](https://img.shields.io/badge/🤗-demo-FFD21F.svg)](https://huggingface.co/spaces/yuangongfdu/whisper-at) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.03183-b31b1b.svg)](https://arxiv.org/abs/2307.03183) |
-| 1621 | Synthetic Voice Spoofing Detection based on Feature Pyramid Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23b_interspeech.pdf) |
-| 1383 | Learning a Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23c_interspeech.pdf) |
-| 2011 | Application of Knowledge Distillation to Multi-Task Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kerpicci23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.16611-b31b1b.svg)](https://arxiv.org/abs/2210.16611) |
-| 2297 | DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18441-b31b1b.svg)](https://arxiv.org/abs/2305.18441) |
-| 1965 | Variational Classifier for Unsupervised Anomalous Sound Detection under Domain Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/almudevar23_interspeech.pdf) |
-| 745 | FlexiAST: Flexibility is What AST Needs | [![GitHub](https://img.shields.io/github/stars/JiuFengSC/FlexiAST_INTERSPEECH23?style=flat)](https://github.com/JiuFengSC/FlexiAST_INTERSPEECH23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.09286-b31b1b.svg)](https://arxiv.org/abs/2307.09286) |
-| 1579 | MCR-Data2vec 2.0: Improving Self-Supervised Speech Pre-training via Model-Level Consistency Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08463-b31b1b.svg)](https://arxiv.org/abs/2306.08463) |
-| 914 | Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention | [![GitHub](https://img.shields.io/github/stars/liuxubo717/V-ACT?style=flat)](https://github.com/liuxubo717/V-ACT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23l_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.16428-b31b1b.svg)](https://arxiv.org/abs/2210.16428) |
-| 165 | Time-Frequency Domain Filter-and-Sum Network for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/JonathanDZ/TF-FaSNet?style=flat)](https://github.com/JonathanDZ/TF-FaSNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deng23_interspeech.pdf) |
-| 801 | Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23h_interspeech.pdf) |
-| 1431 | An Efficient Speech Separation Network based on Recurrent Fusion Dilated Convolution and Channel Attention | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ca_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05887-b31b1b.svg)](https://arxiv.org/abs/2306.05887) |
-| 2015 | Binaural Sound Localization in Noisy Environments using Frequency-based Audio Vision Transformer (FAViT) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/phokhinanan23_interspeech.pdf) |
-| 1723 | Contrastive Learning based Deep Latent Masking for Music Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23i_interspeech.pdf) |
-| 655 | Speaker Extraction with Detection of Presence and Absence of Target Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23k_interspeech.pdf) |
-| 889 | PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23k_interspeech.pdf) |
-| 2117 | Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning | [![GitHub](https://img.shields.io/github/stars/apple/ml-spatial-librispeech?style=flat)](https://github.com/apple/ml-spatial-librispeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sarabia23_interspeech.pdf) <br /> [![Apple](https://img.shields.io/badge/apple-ml-FE9901.svg)](https://machinelearning.apple.com/research/spatial-librispeech) |
-| 1309 | Image-Driven Audio-Visual Universal Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23q_interspeech.pdf) |
-| 2520 | Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fras23_interspeech.pdf) |
-| 1766 | SDNet: Stream-Attention and Dual-Feature Learning Network for Ad-hoc Array Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23na_interspeech.pdf) |
-| 2451 | Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baek23_interspeech.pdf) |
-| 164 | Multi-Channel Separation of Dynamic Speech and Sound Events | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.robinscheibler.org/interspeech2023-moving-iva-samples/) <br /> [![GitHub](https://img.shields.io/github/stars/fakufaku/interspeech2023-moving-iva-samples?style=flat)](https://github.com/fakufaku/interspeech2023-moving-iva-samples) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fujimura23_interspeech.pdf) |
-| 2545 | Rethinking the Visual Cues in Audio-Visual Speaker Extraction | [![GitHub](https://img.shields.io/github/stars/mrjunjieli/DAVSE?style=flat)](https://github.com/mrjunjieli/DAVSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ja_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02625-b31b1b.svg)](https://arxiv.org/abs/2306.02625) |
-| 85 | Using Semi-Supervised Learning for Monaural Time-Domain Speech Separation with a Self-Supervised Learning-based SI-SNR Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dang23_interspeech.pdf) |
-| 1158 | Investigation of Training Mute-Expressive End-to-End Speech Separation Networks for an Unknown Number of Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23d_interspeech.pdf) |
-| 2369 | SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cho23_interspeech.pdf) |
-| 613 | Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23g_interspeech.pdf) |
-| 714 | FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization | [![GitHub](https://img.shields.io/github/stars/Audio-WestlakeU/FN-SSL?style=flat)](https://github.com/Audio-WestlakeU/FN-SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19610-b31b1b.svg)](https://arxiv.org/abs/2305.19610) |
-| 696 | A Neural State-Space Modeling Approach to Efficient Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16932-b31b1b.svg)](https://arxiv.org/abs/2305.16932) |
-| 1777 | Locate and Beamform: Two-Dimensional Locating All-Neural Beamformer for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/FYJNEVERFOLLOWS/LaBNet?style=flat)](https://github.com/FYJNEVERFOLLOWS/LaBNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fu23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10821-b31b1b.svg)](https://arxiv.org/abs/2305.10821) |
-| 518 | Monaural Speech Separation Method based on Recurrent Attention with Parallel Branches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23f_interspeech.pdf) |
-| 979 | Ontology-Aware Learning and Evaluation for Audio Tagging | [![GitHub](https://img.shields.io/github/stars/haoheliu/ontology-aware-audio-tagging?style=flat)](https://github.com/haoheliu/ontology-aware-audio-tagging) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12195-b31b1b.svg)](https://arxiv.org/abs/2211.12195) |
-| 951 | What do Self-Supervised Speech Representations Encode? An Analysis of Languages, Varieties, Speaking Styles and Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://gitlab.tugraz.at/speech/speechcodebookanalysis) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/linke23_interspeech.pdf) |
-| 1696 | A Compressed Synthetic Speech Detection Method with Compression Feature Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ca_interspeech.pdf) |
-| 572 | Outlier-aware Inlier Modeling and Multi-Scale Scoring for Anomalous Sound Detection via Multitask Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23j_interspeech.pdf) |
-| 263 | MOSLight: A Lightweight Data-Efficient System for Non-Intrusive Speech Quality Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23c_interspeech.pdf) |
-| 1626 | A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation | [![GitHub](https://img.shields.io/github/stars/HaRry-qaq/MSAT?style=flat)](https://github.com/HaRry-qaq/MSAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16592-b31b1b.svg)](https://arxiv.org/abs/2305.16592) |
-| 2494 | MTANet: Multi-band Time-Frequency Attention Network for Singing Melody Extraction from Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Annmixiu/MTANet?style=flat)](https://github.com/Annmixiu/MTANet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23i_interspeech.pdf) |
-| 119 | Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer based on Generative Adversarial Network | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wavelandspeech.github.io/xiaoice2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chunhui23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.14666-b31b1b.svg)](https://arxiv.org/abs/2210.14666) |
-| 2190 | Do Vocal Breath Sounds Encode Gender cues for Automatic Gender Classification? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/solanki23_interspeech.pdf) |
-| 202 | Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation using Improved Differentiable Automatic Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sugiura23_interspeech.pdf) |
-| 1430 | A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis | [![GitHub](https://img.shields.io/github/stars/xiaoli1996/SSBPR?style=flat)](https://github.com/xiaoli1996/SSBPR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23b_interspeech.pdf) |
-| 528 | RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Dream-High/RMVPE?style=flat)](https://github.com/Dream-High/RMVPE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15412-b31b1b.svg)](https://arxiv.org/abs/2306.15412) |
-| 832 | Spatialization Quality Metric for Binaural Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/manocha23_interspeech.pdf) |
-| 428 | AsthmaSCELNet: A Lightweight Supervised Contrastive Embedding Learning Framework for Asthma Classification using Lung Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/roy23_interspeech.pdf) |
-| 1426 | Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification | [![GitHub](https://img.shields.io/github/stars/raymin0223/patch-mix_contrastive_learning?style=flat)](https://github.com/raymin0223/patch-mix_contrastive_learning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bae23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14032-b31b1b.svg)](https://arxiv.org/abs/2305.14032) |
-| 2115 | Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/richter23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1ydqKLgO18TFrMzFC2Bz_6y3Uml0bUaaN/view) |
-| 852 | AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://pages.cs.huji.ac.il/adiyoss-lab/AudioToken/) <br /> [![GitHub](https://img.shields.io/github/stars/guyyariv/AudioToken?style=flat)](https://github.com/guyyariv/AudioToken) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yariv23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13050-b31b1b.svg)](https://arxiv.org/abs/2305.13050) |
-| 209 | Obstructive Sleep Apnea Screening with Breathing Sounds and Respiratory Effort: A Multimodal Deep Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/romero23_interspeech.pdf) |
-| 2275 | Investigation of Music Emotion Recognition based on Segmented Semi-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23f_interspeech.pdf) |
+| 1173 | Robust Prototype Learning for Anomalous Sound Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zeng23b_interspeech.pdf) |
+| 982 | A Multimodal Prototypical Approach for Unsupervised Sound Classification | [![GitHub](https://img.shields.io/github/stars/sakshamsingh1/audio_text_proto?style=flat)](https://github.com/sakshamsingh1/audio_text_proto) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kushwaha23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12300-b31b1b.svg)](https://arxiv.org/abs/2306.12300) |
+| 563 | Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms | [![GitHub](https://img.shields.io/github/stars/ph-w2000/S2pecNet?style=flat)](https://github.com/ph-w2000/S2pecNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wen23_interspeech.pdf) |
+| 1082 | Adapting Language-Audio Models as Few-Shot Audio Learners | [![GitHub](https://img.shields.io/github/stars/JinhuaLiang/lam4fsl?style=flat)](https://github.com/JinhuaLiang/lam4fsl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17719-b31b1b.svg)](https://arxiv.org/abs/2305.17719) |
+| 734 | TFECN: Time-Frequency Enhanced ConvNet for Audio Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23l_interspeech.pdf) |
+| 350 | Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23b_interspeech.pdf) |
+| 1174 | Fine-Tuning Audio Spectrogram Transformer with Task-Aware Adapters for Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23n_interspeech.pdf) |
+| 1210 | Small Footprint Multi-Channel Network for Keyword Spotting with Centroid based Awareness | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2204.05445-b31b1b.svg)](https://arxiv.org/abs/2204.05445) |
+| 1380 | Few-Shot Class-Incremental Audio Classification using Adaptively-Refined Prototypes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18045-b31b1b.svg)](https://arxiv.org/abs/2305.18045) |
+| 1549 | Interpretable Latent Space using Space-Filling Curves for Phonetic Analysis in Voice Conversion | [![GitLab](https://img.shields.io/gitlab/stars/speech-interaction-technology-aalto-university/sfvq)](https://gitlab.com/speech-interaction-technology-aalto-university/sfvq) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vali23_interspeech.pdf) <br /> [![Aalto](https://img.shields.io/badge/aalto-fi-005EB8.svg)](https://research.aalto.fi/en/publications/interpretable-latent-space-using-space-filling-curves-for-phoneti) |
+| 1861 | Topological Data Analysis for Speech Processing | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://topohubert.github.io/speech-topology-webpages/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tulchinskii23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.17223-b31b1b.svg)](https://arxiv.org/abs/2211.17223) |
+| 1329 | Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation | [![GitHub](https://img.shields.io/github/stars/sungnyun/ARMHuBERT?style=flat)](https://github.com/sungnyun/ARMHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11685-b31b1b.svg)](https://arxiv.org/abs/2305.11685) |
+| 932 | Personalized Acoustic Scene Classification in Ultra-Low Power Embedded Devices using Privacy-Preserving Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/koppelmann23_interspeech.pdf) |
+| 176 | Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/boschresearch/soundsee-background-domain-switch?style=flat)](https://github.com/boschresearch/soundsee-background-domain-switch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23_interspeech.pdf) |
+| 1021 | Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning | [![GitHub](https://img.shields.io/github/stars/Yuanbo2020/HGRL?style=flat)](https://github.com/Yuanbo2020/HGRL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hou23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://personal.ee.surrey.ac.uk/Personal/W.Wang/papers/Hou%20etal_INTERSPEECH_2023.pdf) |
+| 2416 | Anomalous Sound Detection using Self-Attention-based Frequency Pattern Analysis of Machine Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23fa_interspeech.pdf) |
+| 1478 | Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23c_interspeech.pdf) |
+| 575 | Differential Privacy enabled Dementia Classification: An Exploration of the Privacy-Accuracy Trade-off in Speech Signal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bn23_interspeech.pdf) |
+| 1595 | Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous.4open.science/w/INTERSPEECH2023-F8C4/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ka_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05709-b31b1b.svg)](https://arxiv.org/abs/2306.05709) |
+| 1816 | Towards Multi-Lingual Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/swarupbehera/mAQA?style=flat)](https://github.com/swarupbehera/mAQA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/behera23_interspeech.pdf) |
+| 1344 | Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liao23_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/370843606_Blind_Estimation_of_Room_Impulse_Response_from_Monaural_Reverberant_Speech_with_Segmental_Generative_Neural_Network) |
+| 358 | Emotion-aware Audio-Driven Face Animation via Contrastive Feature Disentanglement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ren23_interspeech.pdf) |
+| 591 | Anomalous Sound Detection based on Sound Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shimonishi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15859-b31b1b.svg)](https://arxiv.org/abs/2305.15859) |
+| 2089 | Random Forest Classification of Breathing Phases from Audio Signals Recorded using Mobile Devices | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fahed23_interspeech.pdf) |
+| 1581 | GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ahn23b_interspeech.pdf) |
+| 477 | Wav2ToBI: A New Approach to Automatic ToBI Transcription | [![GitHub](https://img.shields.io/github/stars/reginazhai/Wav2ToBI?style=flat)](https://github.com/reginazhai/Wav2ToBI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhai23_interspeech.pdf) |
+| 344 | Joint-Former: Jointly Regularized and Locally Down-Sampled Conformer for Semi-Supervised Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/mastergofujs/Joint-Former?style=flat)](https://github.com/mastergofujs/Joint-Former) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23b_interspeech.pdf) |
+| 245 | Towards Attention-based Contrastive Learning for Audio Spoof Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/goel23_interspeech.pdf) |
+| 2488 | Masked Audio Modeling with CLAP and Multi-Objective Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23d_interspeech.pdf) |
+| 1904 | Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems | [![GitHub](https://img.shields.io/github/stars/mrusci/ondevice-fewshot-kws?style=flat)](https://github.com/mrusci/ondevice-fewshot-kws) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rusci23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02161-b31b1b.svg)](https://arxiv.org/abs/2306.02161) |
+| 481 | Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/azeemi23_interspeech.pdf) |
+| 491 | Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18419-b31b1b.svg)](https://arxiv.org/abs/2305.18419) |
+| 684 | Multi-Microphone Automatic Speech Segmentation in Meetings based on Circular Harmonics Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mariotte23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04268-b31b1b.svg)](https://arxiv.org/abs/2306.04268) |
+| 542 | Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23h_interspeech.pdf) |
+| 88 | Insights Into End-to-End Audio-to-Score Transcription with Real Recordings: A Case Study with Saxophone Works | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martinezsevilla23_interspeech.pdf) |
+| 2193 | Whisper-AT: Noise-Robust Automatic Speech Recognizers are also Strong Audio Event Taggers | [![GitHub](https://img.shields.io/github/stars/YuanGongND/whisper-at?style=flat)](https://github.com/YuanGongND/whisper-at) <br /> [![PyPI](https://img.shields.io/pypi/v/whisper-at)](https://pypi.org/project/whisper-at/) <br /> [![Whisper-AT](https://img.shields.io/badge/🤗-demo-FFD21F.svg)](https://huggingface.co/spaces/yuangongfdu/whisper-at) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.03183-b31b1b.svg)](https://arxiv.org/abs/2307.03183) |
+| 1621 | Synthetic Voice Spoofing Detection based on Feature Pyramid Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23b_interspeech.pdf) |
+| 1383 | Learning a Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23c_interspeech.pdf) |
+| 2011 | Application of Knowledge Distillation to Multi-Task Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kerpicci23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.16611-b31b1b.svg)](https://arxiv.org/abs/2210.16611) |
+| 2297 | DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18441-b31b1b.svg)](https://arxiv.org/abs/2305.18441) |
+| 1965 | Variational Classifier for Unsupervised Anomalous Sound Detection under Domain Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/almudevar23_interspeech.pdf) |
+| 745 | FlexiAST: Flexibility is What AST Needs | [![GitHub](https://img.shields.io/github/stars/JiuFengSC/FlexiAST_INTERSPEECH23?style=flat)](https://github.com/JiuFengSC/FlexiAST_INTERSPEECH23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.09286-b31b1b.svg)](https://arxiv.org/abs/2307.09286) |
+| 1579 | MCR-Data2vec 2.0: Improving Self-Supervised Speech Pre-training via Model-Level Consistency Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08463-b31b1b.svg)](https://arxiv.org/abs/2306.08463) |
+| 914 | Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention | [![GitHub](https://img.shields.io/github/stars/liuxubo717/V-ACT?style=flat)](https://github.com/liuxubo717/V-ACT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23l_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.16428-b31b1b.svg)](https://arxiv.org/abs/2210.16428) |
+| 165 | Time-Frequency Domain Filter-and-Sum Network for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/JonathanDZ/TF-FaSNet?style=flat)](https://github.com/JonathanDZ/TF-FaSNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deng23_interspeech.pdf) |
+| 801 | Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23h_interspeech.pdf) |
+| 1431 | An Efficient Speech Separation Network based on Recurrent Fusion Dilated Convolution and Channel Attention | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ca_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05887-b31b1b.svg)](https://arxiv.org/abs/2306.05887) |
+| 2015 | Binaural Sound Localization in Noisy Environments using Frequency-based Audio Vision Transformer (FAViT) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/phokhinanan23_interspeech.pdf) |
+| 1723 | Contrastive Learning based Deep Latent Masking for Music Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23i_interspeech.pdf) |
+| 655 | Speaker Extraction with Detection of Presence and Absence of Target Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23k_interspeech.pdf) |
+| 889 | PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23k_interspeech.pdf) |
+| 2117 | Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning | [![GitHub](https://img.shields.io/github/stars/apple/ml-spatial-librispeech?style=flat)](https://github.com/apple/ml-spatial-librispeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sarabia23_interspeech.pdf) <br /> [![Apple](https://img.shields.io/badge/apple-ml-FE9901.svg)](https://machinelearning.apple.com/research/spatial-librispeech) |
+| 1309 | Image-Driven Audio-Visual Universal Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23q_interspeech.pdf) |
+| 2520 | Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fras23_interspeech.pdf) |
+| 1766 | SDNet: Stream-Attention and Dual-Feature Learning Network for Ad-hoc Array Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23na_interspeech.pdf) |
+| 2451 | Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baek23_interspeech.pdf) |
+| 164 | Multi-Channel Separation of Dynamic Speech and Sound Events | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.robinscheibler.org/interspeech2023-moving-iva-samples/) <br /> [![GitHub](https://img.shields.io/github/stars/fakufaku/interspeech2023-moving-iva-samples?style=flat)](https://github.com/fakufaku/interspeech2023-moving-iva-samples) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fujimura23_interspeech.pdf) |
+| 2545 | Rethinking the Visual Cues in Audio-Visual Speaker Extraction | [![GitHub](https://img.shields.io/github/stars/mrjunjieli/DAVSE?style=flat)](https://github.com/mrjunjieli/DAVSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ja_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02625-b31b1b.svg)](https://arxiv.org/abs/2306.02625) |
+| 85 | Using Semi-Supervised Learning for Monaural Time-Domain Speech Separation with a Self-Supervised Learning-based SI-SNR Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dang23_interspeech.pdf) |
+| 1158 | Investigation of Training Mute-Expressive End-to-End Speech Separation Networks for an Unknown Number of Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23d_interspeech.pdf) |
+| 2369 | SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cho23_interspeech.pdf) |
+| 613 | Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23g_interspeech.pdf) |
+| 714 | FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization | [![GitHub](https://img.shields.io/github/stars/Audio-WestlakeU/FN-SSL?style=flat)](https://github.com/Audio-WestlakeU/FN-SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19610-b31b1b.svg)](https://arxiv.org/abs/2305.19610) |
+| 696 | A Neural State-Space Modeling Approach to Efficient Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16932-b31b1b.svg)](https://arxiv.org/abs/2305.16932) |
+| 1777 | Locate and Beamform: Two-Dimensional Locating All-Neural Beamformer for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/FYJNEVERFOLLOWS/LaBNet?style=flat)](https://github.com/FYJNEVERFOLLOWS/LaBNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fu23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10821-b31b1b.svg)](https://arxiv.org/abs/2305.10821) |
+| 518 | Monaural Speech Separation Method based on Recurrent Attention with Parallel Branches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23f_interspeech.pdf) |
+| 979 | Ontology-Aware Learning and Evaluation for Audio Tagging | [![GitHub](https://img.shields.io/github/stars/haoheliu/ontology-aware-audio-tagging?style=flat)](https://github.com/haoheliu/ontology-aware-audio-tagging) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12195-b31b1b.svg)](https://arxiv.org/abs/2211.12195) |
+| 951 | What do Self-Supervised Speech Representations Encode? An Analysis of Languages, Varieties, Speaking Styles and Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://gitlab.tugraz.at/speech/speechcodebookanalysis) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/linke23_interspeech.pdf) |
+| 1696 | A Compressed Synthetic Speech Detection Method with Compression Feature Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ca_interspeech.pdf) |
+| 572 | Outlier-aware Inlier Modeling and Multi-Scale Scoring for Anomalous Sound Detection via Multitask Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23j_interspeech.pdf) |
+| 263 | MOSLight: A Lightweight Data-Efficient System for Non-Intrusive Speech Quality Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23c_interspeech.pdf) |
+| 1626 | A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation | [![GitHub](https://img.shields.io/github/stars/HaRry-qaq/MSAT?style=flat)](https://github.com/HaRry-qaq/MSAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16592-b31b1b.svg)](https://arxiv.org/abs/2305.16592) |
+| 2494 | MTANet: Multi-band Time-Frequency Attention Network for Singing Melody Extraction from Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Annmixiu/MTANet?style=flat)](https://github.com/Annmixiu/MTANet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23i_interspeech.pdf) |
+| 119 | Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer based on Generative Adversarial Network | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wavelandspeech.github.io/xiaoice2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chunhui23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.14666-b31b1b.svg)](https://arxiv.org/abs/2210.14666) |
+| 2190 | Do Vocal Breath Sounds Encode Gender cues for Automatic Gender Classification? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/solanki23_interspeech.pdf) |
+| 202 | Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation using Improved Differentiable Automatic Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sugiura23_interspeech.pdf) |
+| 1430 | A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis | [![GitHub](https://img.shields.io/github/stars/xiaoli1996/SSBPR?style=flat)](https://github.com/xiaoli1996/SSBPR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23b_interspeech.pdf) |
+| 528 | RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Dream-High/RMVPE?style=flat)](https://github.com/Dream-High/RMVPE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15412-b31b1b.svg)](https://arxiv.org/abs/2306.15412) |
+| 832 | Spatialization Quality Metric for Binaural Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/manocha23_interspeech.pdf) |
+| 428 | AsthmaSCELNet: A Lightweight Supervised Contrastive Embedding Learning Framework for Asthma Classification using Lung Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/roy23_interspeech.pdf) |
+| 1426 | Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification | [![GitHub](https://img.shields.io/github/stars/raymin0223/patch-mix_contrastive_learning?style=flat)](https://github.com/raymin0223/patch-mix_contrastive_learning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bae23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14032-b31b1b.svg)](https://arxiv.org/abs/2305.14032) |
+| 2115 | Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/richter23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1ydqKLgO18TFrMzFC2Bz_6y3Uml0bUaaN/view) |
+| 852 | AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://pages.cs.huji.ac.il/adiyoss-lab/AudioToken/) <br /> [![GitHub](https://img.shields.io/github/stars/guyyariv/AudioToken?style=flat)](https://github.com/guyyariv/AudioToken) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yariv23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13050-b31b1b.svg)](https://arxiv.org/abs/2305.13050) |
+| 209 | Obstructive Sleep Apnea Screening with Breathing Sounds and Respiratory Effort: A Multimodal Deep Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/romero23_interspeech.pdf) |
+| 2275 | Investigation of Music Emotion Recognition based on Segmented Semi-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23f_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -596,56 +596,56 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2344 | Diacritic Recognition Performance in Arabic ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/aldarmaki23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14022-b31b1b.svg)](https://arxiv.org/abs/2302.14022) |
-| 990 | Personalization for BERT-based Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kolehmainen23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/personalization-for-bert-based-discriminative-speech-recognition-rescoring) |
-| 2182 | On the N-gram Approximation of Pre-trained Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/krishnan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06892-b31b1b.svg)](https://arxiv.org/abs/2306.06892) |
-| 2147 | Record Deduplication for Entity Distribution Modeling in ASR Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06246-b31b1b.svg)](https://arxiv.org/abs/2306.06246) |
-| 2205 | Learning When to Trust Which Teacher for Weakly Supervised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/agrawal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12012-b31b1b.svg)](https://arxiv.org/abs/2306.12012) |
-| 1313 | Text-Only Domain Adaptation using Unified Speech-Text Representation in Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04076-b31b1b.svg)](https://arxiv.org/abs/2306.04076) |
-| 1378 | Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23f_interspeech.pdf) |
-| 2479 | Knowledge Distillation Approach for Efficient Internal Language Model Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23u_interspeech.pdf) |
-| 276 | Language Model Personalization for Improved Touchscreen Typing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/adhikary23_interspeech.pdf) |
-| 1223 | Blank Collapse: Compressing CTC Emission for the Faster Decoding | [![GitHub](https://img.shields.io/github/stars/minkjung/blankcollapse?style=flat)](https://github.com/minkjung/blankcollapse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jung23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17017-b31b1b.svg)](https://arxiv.org/abs/2210.17017) |
-| 403 | Improving Joint Speech-Text Representations without Alignment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peyser23_interspeech.pdf) |
-| 1941 | Leveraging Cross-Utterance Context for ASR Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/flynn23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16903-b31b1b.svg)](https://arxiv.org/abs/2306.16903) |
-| 423 | Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation | [![GitHub](https://img.shields.io/github/stars/MingLunHan/CIF-HieraDist?style=flat)](https://github.com/MingLunHan/CIF-HieraDist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/han23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.13003-b31b1b.svg)](https://arxiv.org/abs/2301.13003) |
-| 1517 | Integration of Frame- and Label-Synchronous Beam Search for Streaming Encoder-Decoder Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tsunoo23_interspeech.pdf) |
-| 1071 | A Neural Time Alignment Module for End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23e_interspeech.pdf) |
-| 599 | Accelerating Transducers through Adjacent Token Merging | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16009-b31b1b.svg)](https://arxiv.org/abs/2306.16009) |
-| 617 | Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11569-b31b1b.svg)](https://arxiv.org/abs/2305.11569) |
-| 2292 | Language-Routing Mixture of Experts for Multi-Lingual and Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23sa_interspeech.pdf) |
-| 1437 | Embedding Articulatory Constraints for Low-Resource Speech Recognition based on Large Pre-trained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23h_interspeech.pdf) |
-| 2051 | Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18108-b31b1b.svg)](https://arxiv.org/abs/2305.18108) |
-| 768 | SpellMapper: A Non-Autoregressive Neural Spellchecker for ASR Customization with Candidate Retrieval based on N-Gram Mappings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb) <br /> [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/antonova23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02317-b31b1b.svg)](https://arxiv.org/abs/2306.02317) |
-| 2037 | Text Injection for Capitalization and Turn-Taking Prediction in Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bijwadia23_interspeech.pdf) |
-| 1281 | Confidence-based Ensembles of End-to-End Speech Recognition Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Confidence_Ensembles.ipynb) <br /> [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gitman23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15824-b31b1b.svg)](https://arxiv.org/abs/2306.15824) |
-| 1050 | Unsupervised Code-Switched Text Generation from Parallel Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chi23_interspeech.pdf) |
-| 258 | A Binary Keyword Spotting System With Error-Diffusion Speech Feature Binarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23b_interspeech.pdf) |
-| 621 | Language-Universal Phonetic Encoder for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11576-b31b1b.svg)](https://arxiv.org/abs/2305.11576) |
-| 863 | A Lexical-aware Non-Autoregressive Transformer-based ASR Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10839-b31b1b.svg)](https://arxiv.org/abs/2305.10839) |
-| 1841 | Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vanvuren23_interspeech.pdf) |
-| 61 | A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization | [![GitHub](https://img.shields.io/github/stars/SamsungLabs/myQASR?style=flat)](https://github.com/SamsungLabs/myQASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fish23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.12659-b31b1b.svg)](https://arxiv.org/abs/2307.12659) |
-| 137 | Modeling Dependent Structure for Utterances in ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.05281-b31b1b.svg)](https://arxiv.org/abs/2209.05281) |
-| 757 | ASR for Low Resource and Multilingual Noisy Code-Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/verma23_interspeech.pdf) |
-| 390 | Accurate and Reliable Confidence Estimation based on Non-Autoregressive End-to-End Speech Recognition System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10680-b31b1b.svg)](https://arxiv.org/abs/2305.10680) |
-| 737 | Combining Multilingual Resources and Models to Develop State-of-the-Art E2E ASR for Swedish | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mateju23_interspeech.pdf) |
-| 1171 | Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-Streaming Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.06735-b31b1b.svg)](https://arxiv.org/abs/2301.06735) |
-| 1867 | Towards Continually Learning New Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pham23_interspeech.pdf) |
-| 1616 | N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00456-b31b1b.svg)](https://arxiv.org/abs/2303.00456) |
-| 1432 | SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23g_interspeech.pdf) |
-| 1162 | miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gulzar23_interspeech.pdf) |
-| 1469 | CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning | [![GitHub](https://img.shields.io/github/stars/louislau1129/CoMFLP?style=flat)](https://github.com/louislau1129/CoMFLP) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23p_interspeech.pdf) |
-| 1337 | Exploration on HuBERT with Multiple Resolution | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01084-b31b1b.svg)](https://arxiv.org/abs/2306.01084) |
-| 2045 | Quantization-aware and Tensor-compressed Training of Transformers for Natural Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23s_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01076-b31b1b.svg)](https://arxiv.org/abs/2306.01076) |
-| 2355 | Word-Level Confidence Estimation for CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/naowarat23b_interspeech.pdf) |
-| 2235 | Multilingual Contextual Adapters to Improve Custom Word Recognition in Low-Resource Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kulshreshtha23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00759-b31b1b.svg)](https://arxiv.org/abs/2307.00759) |
-| 614 | Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zheng23_interspeech.pdf) |
-| 1303 | 4D ASR: Joint Modeling of CTC, Attention, Transducer, and Mask-Predict Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sudo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.10818-b31b1b.svg)](https://arxiv.org/abs/2212.10818) |
-| 1086 | Neural Model Reprogramming with Similarity based Mapping for Low-Resource Spoken Command Recognition | [![GitHub](https://img.shields.io/github/stars/dodohow1011/SpeechAdvReprogram?style=flat)](https://github.com/dodohow1011/SpeechAdvReprogram) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2110.03894-b31b1b.svg)](https://arxiv.org/abs/2110.03894) |
-| 262 | Language-Specific Boundary Learning for Improving Mandarin-English Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fan23_interspeech.pdf) |
-| 480 | Mixture-of-Expert Conformer for Streaming Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15663-b31b1b.svg)](https://arxiv.org/abs/2305.15663) |
-| 1665 | Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switch-board Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23x_interspeech.pdf) |
-| 2544 | Compressed MoE ASR Model based on Knowledge Distillation and Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuan23c_interspeech.pdf) |
+| 2344 | Diacritic Recognition Performance in Arabic ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/aldarmaki23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14022-b31b1b.svg)](https://arxiv.org/abs/2302.14022) |
+| 990 | Personalization for BERT-based Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kolehmainen23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/personalization-for-bert-based-discriminative-speech-recognition-rescoring) |
+| 2182 | On the N-gram Approximation of Pre-trained Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/krishnan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06892-b31b1b.svg)](https://arxiv.org/abs/2306.06892) |
+| 2147 | Record Deduplication for Entity Distribution Modeling in ASR Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06246-b31b1b.svg)](https://arxiv.org/abs/2306.06246) |
+| 2205 | Learning When to Trust Which Teacher for Weakly Supervised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/agrawal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12012-b31b1b.svg)](https://arxiv.org/abs/2306.12012) |
+| 1313 | Text-Only Domain Adaptation using Unified Speech-Text Representation in Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04076-b31b1b.svg)](https://arxiv.org/abs/2306.04076) |
+| 1378 | Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23f_interspeech.pdf) |
+| 2479 | Knowledge Distillation Approach for Efficient Internal Language Model Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23u_interspeech.pdf) |
+| 276 | Language Model Personalization for Improved Touchscreen Typing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/adhikary23_interspeech.pdf) |
+| 1223 | Blank Collapse: Compressing CTC Emission for the Faster Decoding | [![GitHub](https://img.shields.io/github/stars/minkjung/blankcollapse?style=flat)](https://github.com/minkjung/blankcollapse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jung23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17017-b31b1b.svg)](https://arxiv.org/abs/2210.17017) |
+| 403 | Improving Joint Speech-Text Representations without Alignment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peyser23_interspeech.pdf) |
+| 1941 | Leveraging Cross-Utterance Context for ASR Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/flynn23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16903-b31b1b.svg)](https://arxiv.org/abs/2306.16903) |
+| 423 | Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation | [![GitHub](https://img.shields.io/github/stars/MingLunHan/CIF-HieraDist?style=flat)](https://github.com/MingLunHan/CIF-HieraDist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/han23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.13003-b31b1b.svg)](https://arxiv.org/abs/2301.13003) |
+| 1517 | Integration of Frame- and Label-Synchronous Beam Search for Streaming Encoder-Decoder Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tsunoo23_interspeech.pdf) |
+| 1071 | A Neural Time Alignment Module for End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23e_interspeech.pdf) |
+| 599 | Accelerating Transducers through Adjacent Token Merging | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16009-b31b1b.svg)](https://arxiv.org/abs/2306.16009) |
+| 617 | Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11569-b31b1b.svg)](https://arxiv.org/abs/2305.11569) |
+| 2292 | Language-Routing Mixture of Experts for Multi-Lingual and Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23sa_interspeech.pdf) |
+| 1437 | Embedding Articulatory Constraints for Low-Resource Speech Recognition based on Large Pre-trained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23h_interspeech.pdf) |
+| 2051 | Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18108-b31b1b.svg)](https://arxiv.org/abs/2305.18108) |
+| 768 | SpellMapper: A Non-Autoregressive Neural Spellchecker for ASR Customization with Candidate Retrieval based on N-Gram Mappings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb) <br /> [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/antonova23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02317-b31b1b.svg)](https://arxiv.org/abs/2306.02317) |
+| 2037 | Text Injection for Capitalization and Turn-Taking Prediction in Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bijwadia23_interspeech.pdf) |
+| 1281 | Confidence-based Ensembles of End-to-End Speech Recognition Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Confidence_Ensembles.ipynb) <br /> [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gitman23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15824-b31b1b.svg)](https://arxiv.org/abs/2306.15824) |
+| 1050 | Unsupervised Code-Switched Text Generation from Parallel Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chi23_interspeech.pdf) |
+| 258 | A Binary Keyword Spotting System With Error-Diffusion Speech Feature Binarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23b_interspeech.pdf) |
+| 621 | Language-Universal Phonetic Encoder for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11576-b31b1b.svg)](https://arxiv.org/abs/2305.11576) |
+| 863 | A Lexical-aware Non-Autoregressive Transformer-based ASR Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10839-b31b1b.svg)](https://arxiv.org/abs/2305.10839) |
+| 1841 | Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vanvuren23_interspeech.pdf) |
+| 61 | A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization | [![GitHub](https://img.shields.io/github/stars/SamsungLabs/myQASR?style=flat)](https://github.com/SamsungLabs/myQASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fish23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.12659-b31b1b.svg)](https://arxiv.org/abs/2307.12659) |
+| 137 | Modeling Dependent Structure for Utterances in ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.05281-b31b1b.svg)](https://arxiv.org/abs/2209.05281) |
+| 757 | ASR for Low Resource and Multilingual Noisy Code-Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/verma23_interspeech.pdf) |
+| 390 | Accurate and Reliable Confidence Estimation based on Non-Autoregressive End-to-End Speech Recognition System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10680-b31b1b.svg)](https://arxiv.org/abs/2305.10680) |
+| 737 | Combining Multilingual Resources and Models to Develop State-of-the-Art E2E ASR for Swedish | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mateju23_interspeech.pdf) |
+| 1171 | Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-Streaming Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.06735-b31b1b.svg)](https://arxiv.org/abs/2301.06735) |
+| 1867 | Towards Continually Learning New Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pham23_interspeech.pdf) |
+| 1616 | N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00456-b31b1b.svg)](https://arxiv.org/abs/2303.00456) |
+| 1432 | SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23g_interspeech.pdf) |
+| 1162 | miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gulzar23_interspeech.pdf) |
+| 1469 | CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning | [![GitHub](https://img.shields.io/github/stars/louislau1129/CoMFLP?style=flat)](https://github.com/louislau1129/CoMFLP) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23p_interspeech.pdf) |
+| 1337 | Exploration on HuBERT with Multiple Resolution | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01084-b31b1b.svg)](https://arxiv.org/abs/2306.01084) |
+| 2045 | Quantization-aware and Tensor-compressed Training of Transformers for Natural Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23s_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01076-b31b1b.svg)](https://arxiv.org/abs/2306.01076) |
+| 2355 | Word-Level Confidence Estimation for CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/naowarat23b_interspeech.pdf) |
+| 2235 | Multilingual Contextual Adapters to Improve Custom Word Recognition in Low-Resource Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kulshreshtha23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00759-b31b1b.svg)](https://arxiv.org/abs/2307.00759) |
+| 614 | Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zheng23_interspeech.pdf) |
+| 1303 | 4D ASR: Joint Modeling of CTC, Attention, Transducer, and Mask-Predict Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sudo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.10818-b31b1b.svg)](https://arxiv.org/abs/2212.10818) |
+| 1086 | Neural Model Reprogramming with Similarity based Mapping for Low-Resource Spoken Command Recognition | [![GitHub](https://img.shields.io/github/stars/dodohow1011/SpeechAdvReprogram?style=flat)](https://github.com/dodohow1011/SpeechAdvReprogram) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2110.03894-b31b1b.svg)](https://arxiv.org/abs/2110.03894) |
+| 262 | Language-Specific Boundary Learning for Improving Mandarin-English Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fan23_interspeech.pdf) |
+| 480 | Mixture-of-Expert Conformer for Streaming Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15663-b31b1b.svg)](https://arxiv.org/abs/2305.15663) |
+| 1665 | Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switch-board Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23x_interspeech.pdf) |
+| 2544 | Compressed MoE ASR Model based on Knowledge Distillation and Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuan23c_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -657,41 +657,41 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2044 | Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model | [![GitHub](https://img.shields.io/github/stars/jasonppy/syllable-discovery?style=flat)](https://github.com/jasonppy/syllable-discovery) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11435-b31b1b.svg)](https://arxiv.org/abs/2305.11435) |
-| 2032 | Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization | [![GitHub](https://img.shields.io/github/stars/jasonppy/PromptingWhisper?style=flat)](https://github.com/jasonppy/PromptingWhisper) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11095-b31b1b.svg)](https://arxiv.org/abs/2305.11095) |
-| 235 | Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/moore23_interspeech.pdf) |
-| 268 | Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sanabria23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02153-b31b1b.svg)](https://arxiv.org/abs/2306.02153) |
-| 601 | CASA-ASR: Context-Aware Speaker-Attributed ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12459-b31b1b.svg)](https://arxiv.org/abs/2305.12459) |
-| 1321 | Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/takahashi23_interspeech.pdf) |
-| 1167 | AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark | [![GitHub](https://img.shields.io/github/stars/liyunlongaaa/AD-TUNING?style=flat)](https://github.com/liyunlongaaa/AD-TUNING) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23n_interspeech.pdf) |
-| 190 | Distilling Knowledge from Gaussian Process Teacher to Neural Network Student | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wong23_interspeech.pdf) |
-| 135 | Segmental SpeechCLIP: Utilizing Pretrained Image-Text Models for Audio-Visual Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhati23_interspeech.pdf) |
-| 421 | Towards Hate Speech Detection in Low-Resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jacobs23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00410-b31b1b.svg)](https://arxiv.org/abs/2306.00410) |
-| 385 | Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification through Meta-Learning | [![GitHub](https://img.shields.io/github/stars/ByteFuse/MAMLCon?style=flat)](https://github.com/ByteFuse/MAMLCon) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vandermerwe23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13080-b31b1b.svg)](https://arxiv.org/abs/2305.13080) |
-| 664 | Online Punctuation Restoration using ELECTRA Model for Streaming ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/polacek23_interspeech.pdf) |
-| 2066 | Language Agnostic Data-Driven Inverse Text Normalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23s_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.08506-b31b1b.svg)](https://arxiv.org/abs/2301.08506) |
-| 1079 | How to Estimate Model Transferability of Pre-trained Speech Models? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01015-b31b1b.svg)](https://arxiv.org/abs/2306.01015) |
-| 1655 | Transcribing Speech as Spoken and Written Dual Text using an Autoregressive Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ihori23_interspeech.pdf) |
-| 587 | Phonetic and Prosody-aware Self-Supervised Learning Approach for Non-Native Fluency Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11438-b31b1b.svg)](https://arxiv.org/abs/2305.11438) |
-| 380 | Disentangling the Contribution of Non-Native Speech in Automated Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23_interspeech.pdf) |
-| 337 | A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ryu23_interspeech.pdf) |
-| 1635 | Assessing Intelligibility in Non-Native Speech: Comparing Measures Obtained at Different Levels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23d_interspeech.pdf) |
-| 585 | End-to-End Word-Level Pronunciation Assessment with MASK Pre-training | [![GitHub](https://img.shields.io/github/stars/liangyukang/MPA-InterSpeech2023?style=flat)](https://github.com/liangyukang/MPA-InterSpeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02682-b31b1b.svg)](https://arxiv.org/abs/2306.02682) |
-| 550 | A Hierarchical Context-aware Modeling Approach for Multi-Aspect and Multi-Granular Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chao23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18146-b31b1b.svg)](https://arxiv.org/abs/2305.18146) |
-| 2541 | Automatic Prediction of Language Learners' Listenability using Speech and Text Features Extracted from Listening Drills | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23j_interspeech.pdf) |
-| 2371 | Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-Level Goodness of Pronunciation Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shekar23b_interspeech.pdf) |
-| 1899 | Adapting an Unadaptable ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01208-b31b1b.svg)](https://arxiv.org/abs/2306.01208) |
-| 533 | Addressing Cold Start Problem for End-to-End Automatic Speech Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14310-b31b1b.svg)](https://arxiv.org/abs/2306.14310) |
-| 816 | Improving Grapheme-to-Phoneme Conversion by Learning Pronunciations from Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ribeiro23b_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/improving-grapheme-to-phoneme-conversion-by-learning-pronunciations-from-speech-recordings) |
-| 2577 | Orthography-based Pronunciation Scoring for Better CAPT Feedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/richter23b_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://catir.github.io/art/capt_phone_ctc_2023.pdf) |
-| 1592 | Zero-Shot Automatic Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23r_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19563-b31b1b.svg)](https://arxiv.org/abs/2305.19563) |
-| 364 | Mispronunciation Detection and Diagnosis Model for Tonal Language, Applied to Vietnamese | [![GitHub](https://img.shields.io/github/stars/VietMDDDataset/VietMDD?style=flat)](https://github.com/VietMDDDataset/VietMDD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huu23_interspeech.pdf) |
-| 793 | An Efficient and Noise-Robust Audiovisual Encoder for Audiovisual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23k_interspeech.pdf) |
-| 540 | A Novel Self-training Approach for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23b_interspeech.pdf) |
-| 1428 | FunASR: A Fundamental End-to-End Speech Recognition Toolkit | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11013-b31b1b.svg)](https://arxiv.org/abs/2305.11013) |
-| 487 | Streaming Audio-Visual Speech Recognition with Alignment Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02133-b31b1b.svg)](https://arxiv.org/abs/2211.02133) |
-| 462 | SparseVSR: Lightweight and Noise Robust Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fernandezlopez23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.04552-b31b1b.svg)](https://arxiv.org/abs/2307.04552) |
-| 2262 | Multimodal Speech Recognition for Language-Guided Embodied Agents | [![GitHub](https://img.shields.io/github/stars/Cylumn/embodied-multimodal-asr?style=flat)](https://github.com/Cylumn/embodied-multimodal-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14030-b31b1b.svg)](https://arxiv.org/abs/2302.14030) |
+| 2044 | Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model | [![GitHub](https://img.shields.io/github/stars/jasonppy/syllable-discovery?style=flat)](https://github.com/jasonppy/syllable-discovery) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11435-b31b1b.svg)](https://arxiv.org/abs/2305.11435) |
+| 2032 | Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization | [![GitHub](https://img.shields.io/github/stars/jasonppy/PromptingWhisper?style=flat)](https://github.com/jasonppy/PromptingWhisper) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11095-b31b1b.svg)](https://arxiv.org/abs/2305.11095) |
+| 235 | Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/moore23_interspeech.pdf) |
+| 268 | Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sanabria23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02153-b31b1b.svg)](https://arxiv.org/abs/2306.02153) |
+| 601 | CASA-ASR: Context-Aware Speaker-Attributed ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12459-b31b1b.svg)](https://arxiv.org/abs/2305.12459) |
+| 1321 | Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/takahashi23_interspeech.pdf) |
+| 1167 | AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark | [![GitHub](https://img.shields.io/github/stars/liyunlongaaa/AD-TUNING?style=flat)](https://github.com/liyunlongaaa/AD-TUNING) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23n_interspeech.pdf) |
+| 190 | Distilling Knowledge from Gaussian Process Teacher to Neural Network Student | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wong23_interspeech.pdf) |
+| 135 | Segmental SpeechCLIP: Utilizing Pretrained Image-Text Models for Audio-Visual Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhati23_interspeech.pdf) |
+| 421 | Towards Hate Speech Detection in Low-Resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jacobs23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00410-b31b1b.svg)](https://arxiv.org/abs/2306.00410) |
+| 385 | Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification through Meta-Learning | [![GitHub](https://img.shields.io/github/stars/ByteFuse/MAMLCon?style=flat)](https://github.com/ByteFuse/MAMLCon) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vandermerwe23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13080-b31b1b.svg)](https://arxiv.org/abs/2305.13080) |
+| 664 | Online Punctuation Restoration using ELECTRA Model for Streaming ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/polacek23_interspeech.pdf) |
+| 2066 | Language Agnostic Data-Driven Inverse Text Normalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23s_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.08506-b31b1b.svg)](https://arxiv.org/abs/2301.08506) |
+| 1079 | How to Estimate Model Transferability of Pre-trained Speech Models? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01015-b31b1b.svg)](https://arxiv.org/abs/2306.01015) |
+| 1655 | Transcribing Speech as Spoken and Written Dual Text using an Autoregressive Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ihori23_interspeech.pdf) |
+| 587 | Phonetic and Prosody-aware Self-Supervised Learning Approach for Non-Native Fluency Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11438-b31b1b.svg)](https://arxiv.org/abs/2305.11438) |
+| 380 | Disentangling the Contribution of Non-Native Speech in Automated Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23_interspeech.pdf) |
+| 337 | A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ryu23_interspeech.pdf) |
+| 1635 | Assessing Intelligibility in Non-Native Speech: Comparing Measures Obtained at Different Levels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23d_interspeech.pdf) |
+| 585 | End-to-End Word-Level Pronunciation Assessment with MASK Pre-training | [![GitHub](https://img.shields.io/github/stars/liangyukang/MPA-InterSpeech2023?style=flat)](https://github.com/liangyukang/MPA-InterSpeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02682-b31b1b.svg)](https://arxiv.org/abs/2306.02682) |
+| 550 | A Hierarchical Context-aware Modeling Approach for Multi-Aspect and Multi-Granular Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chao23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18146-b31b1b.svg)](https://arxiv.org/abs/2305.18146) |
+| 2541 | Automatic Prediction of Language Learners' Listenability using Speech and Text Features Extracted from Listening Drills | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23j_interspeech.pdf) |
+| 2371 | Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-Level Goodness of Pronunciation Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shekar23b_interspeech.pdf) |
+| 1899 | Adapting an Unadaptable ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01208-b31b1b.svg)](https://arxiv.org/abs/2306.01208) |
+| 533 | Addressing Cold Start Problem for End-to-End Automatic Speech Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14310-b31b1b.svg)](https://arxiv.org/abs/2306.14310) |
+| 816 | Improving Grapheme-to-Phoneme Conversion by Learning Pronunciations from Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ribeiro23b_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/improving-grapheme-to-phoneme-conversion-by-learning-pronunciations-from-speech-recordings) |
+| 2577 | Orthography-based Pronunciation Scoring for Better CAPT Feedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/richter23b_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://catir.github.io/art/capt_phone_ctc_2023.pdf) |
+| 1592 | Zero-Shot Automatic Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23r_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19563-b31b1b.svg)](https://arxiv.org/abs/2305.19563) |
+| 364 | Mispronunciation Detection and Diagnosis Model for Tonal Language, Applied to Vietnamese | [![GitHub](https://img.shields.io/github/stars/VietMDDDataset/VietMDD?style=flat)](https://github.com/VietMDDDataset/VietMDD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huu23_interspeech.pdf) |
+| 793 | An Efficient and Noise-Robust Audiovisual Encoder for Audiovisual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23k_interspeech.pdf) |
+| 540 | A Novel Self-training Approach for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23b_interspeech.pdf) |
+| 1428 | FunASR: A Fundamental End-to-End Speech Recognition Toolkit | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11013-b31b1b.svg)](https://arxiv.org/abs/2305.11013) |
+| 487 | Streaming Audio-Visual Speech Recognition with Alignment Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02133-b31b1b.svg)](https://arxiv.org/abs/2211.02133) |
+| 462 | SparseVSR: Lightweight and Noise Robust Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fernandezlopez23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.04552-b31b1b.svg)](https://arxiv.org/abs/2307.04552) |
+| 2262 | Multimodal Speech Recognition for Language-Guided Embodied Agents | [![GitHub](https://img.shields.io/github/stars/Cylumn/embodied-multimodal-asr?style=flat)](https://github.com/Cylumn/embodied-multimodal-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14030-b31b1b.svg)](https://arxiv.org/abs/2302.14030) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -703,12 +703,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 643 | NoRefER: A Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning | [![GitHub](https://img.shields.io/github/stars/aixplain/NoRefER?style=flat)](https://github.com/aixplain/NoRefER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuksel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12577-b31b1b.svg)](https://arxiv.org/abs/2306.12577) |
-| 2128 | Scaling Laws for Discriminative Speech Recognition Rescoring Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gu23c_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/scaling-laws-for-discriminative-speech-recognition-rescoring-models) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15815-b31b1b.svg)](https://arxiv.org/abs/2306.15815) |
-| 2429 | Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/thu-spmi/CAT/blob/master/docs/energy-based_LM_training.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12676-b31b1b.svg)](https://arxiv.org/abs/2305.12676) |
-| 1362 | Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.00066-b31b1b.svg)](https://arxiv.org/abs/2301.00066) |
-| 1251 | Memory Network-based End-To-End Neural ES-KMeans for Improved Word Segmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/iwamoto23_interspeech.pdf) |
-| 1320 | Retraining-free Customized ASR for Enharmonic Words based on a Named-Entity-Aware Model and Phoneme Similarity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sudo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17846-b31b1b.svg)](https://arxiv.org/abs/2305.17846) |
+| 643 | NoRefER: A Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning | [![GitHub](https://img.shields.io/github/stars/aixplain/NoRefER?style=flat)](https://github.com/aixplain/NoRefER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuksel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12577-b31b1b.svg)](https://arxiv.org/abs/2306.12577) |
+| 2128 | Scaling Laws for Discriminative Speech Recognition Rescoring Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gu23c_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/scaling-laws-for-discriminative-speech-recognition-rescoring-models) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15815-b31b1b.svg)](https://arxiv.org/abs/2306.15815) |
+| 2429 | Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/thu-spmi/CAT/blob/master/docs/energy-based_LM_training.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12676-b31b1b.svg)](https://arxiv.org/abs/2305.12676) |
+| 1362 | Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.00066-b31b1b.svg)](https://arxiv.org/abs/2301.00066) |
+| 1251 | Memory Network-based End-To-End Neural ES-KMeans for Improved Word Segmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/iwamoto23_interspeech.pdf) |
+| 1320 | Retraining-free Customized ASR for Enharmonic Words based on a Named-Entity-Aware Model and Phoneme Similarity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sudo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17846-b31b1b.svg)](https://arxiv.org/abs/2305.17846) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -720,12 +720,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 304 | Lightweight and Efficient Spoken Language Identification of Long-form Audio | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23c_interspeech.pdf) |
-| 1109 | End-to-End Spoken Language Diarization with Wav2vec Embeddings | [![GitHub](https://img.shields.io/github/stars/jagabandhumishra/W2V-E2E-Language-Diarization?style=flat)](https://github.com/jagabandhumishra/W2V-E2E-Language-Diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mishra23_interspeech.pdf) |
-| 1986 | Efficient Spoken Language Recognition via Multilabel Classification | [![Dropbox](https://img.shields.io/badge/Dropbox-Video-%233B4D98.svg?style=for-the-badge&logo=Dropbox&logoColor=white)](https://www.dropbox.com/scl/fi/625psvljnntyiajrzmy9w/20230821-Interspeech-ONieto-Paper1986.mp4?dl=0&rlkey=w2nkc7zn9fvqcc5iwldlqbbrb) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nieto23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01945-b31b1b.svg)](https://arxiv.org/abs/2306.01945) |
-| 1529 | Description and Analysis of ABC Submission to NIST LRE 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/matejka23_interspeech.pdf) |
-| 1790 | Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alumae23_interspeech.pdf) |
-| 1094 | Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/villalba23_interspeech.pdf) |
+| 304 | Lightweight and Efficient Spoken Language Identification of Long-form Audio | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23c_interspeech.pdf) |
+| 1109 | End-to-End Spoken Language Diarization with Wav2vec Embeddings | [![GitHub](https://img.shields.io/github/stars/jagabandhumishra/W2V-E2E-Language-Diarization?style=flat)](https://github.com/jagabandhumishra/W2V-E2E-Language-Diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mishra23_interspeech.pdf) |
+| 1986 | Efficient Spoken Language Recognition via Multilabel Classification | [![Dropbox](https://img.shields.io/badge/Dropbox-Video-%233B4D98.svg?style=for-the-badge&logo=Dropbox&logoColor=white)](https://www.dropbox.com/scl/fi/625psvljnntyiajrzmy9w/20230821-Interspeech-ONieto-Paper1986.mp4?dl=0&rlkey=w2nkc7zn9fvqcc5iwldlqbbrb) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nieto23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01945-b31b1b.svg)](https://arxiv.org/abs/2306.01945) |
+| 1529 | Description and Analysis of ABC Submission to NIST LRE 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/matejka23_interspeech.pdf) |
+| 1790 | Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alumae23_interspeech.pdf) |
+| 1094 | Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/villalba23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -737,12 +737,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1436 | DeePMOS: Deep Posterior Mean-Opinion-Score of Speech | [![GitHub](https://img.shields.io/github/stars/Hope-Liang/DeePMOS?style=flat)](https://github.com/Hope-Liang/DeePMOS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23d_interspeech.pdf) |
-| 1644 | The Role of Formant and Excitation Source Features in Perceived Naturalness of Low Resource Tribal Language TTS: An Empirical Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dasare23_interspeech.pdf) |
-| 811 | A No-Reference Speech Quality Assessment Method based on Neural Network with Densely Connected Convolutional Architecture | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23_interspeech.pdf) |
-| 2507 | Probing Speech Quality Information in ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ta23_interspeech.pdf) |
-| 589 | Preference-based Training Framework for Automatic Speech Quality Assessment using Deep Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23d_interspeech.pdf) |
-| 389 | Crowdsourced Data Validation for ASR Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/phatthiyaphaibun23_interspeech.pdf) |
+| 1436 | DeePMOS: Deep Posterior Mean-Opinion-Score of Speech | [![GitHub](https://img.shields.io/github/stars/Hope-Liang/DeePMOS?style=flat)](https://github.com/Hope-Liang/DeePMOS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23d_interspeech.pdf) |
+| 1644 | The Role of Formant and Excitation Source Features in Perceived Naturalness of Low Resource Tribal Language TTS: An Empirical Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dasare23_interspeech.pdf) |
+| 811 | A No-Reference Speech Quality Assessment Method based on Neural Network with Densely Connected Convolutional Architecture | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23_interspeech.pdf) |
+| 2507 | Probing Speech Quality Information in ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ta23_interspeech.pdf) |
+| 589 | Preference-based Training Framework for Automatic Speech Quality Assessment using Deep Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23d_interspeech.pdf) |
+| 389 | Crowdsourced Data Validation for ASR Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/phatthiyaphaibun23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -754,12 +754,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2296 | Re-Investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huo23b_interspeech.pdf) |
-| 1556 | Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/alibaba/easyrobust/tree/main/examples/asr/WAPAT) <br /> [![GitHub](https://img.shields.io/github/stars/alibaba/easyrobust?style=flat)](https://github.com/alibaba/easyrobust) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qi23_interspeech.pdf) |
-| 509 | InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lai23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16342-b31b1b.svg)](https://arxiv.org/abs/2305.16342) |
-| 579 | Transductive Feature Space Regularization for Few-Shot Bioacoustic Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tan23_interspeech.pdf) |
-| 615 | Incorporating L2 Phonemes using Articulatory Features for Robust Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02534-b31b1b.svg)](https://arxiv.org/abs/2306.02534) |
-| 1510 | On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/parcollet23_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04116371) |
+| 2296 | Re-Investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huo23b_interspeech.pdf) |
+| 1556 | Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/alibaba/easyrobust/tree/main/examples/asr/WAPAT) <br /> [![GitHub](https://img.shields.io/github/stars/alibaba/easyrobust?style=flat)](https://github.com/alibaba/easyrobust) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qi23_interspeech.pdf) |
+| 509 | InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lai23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16342-b31b1b.svg)](https://arxiv.org/abs/2305.16342) |
+| 579 | Transductive Feature Space Regularization for Few-Shot Bioacoustic Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tan23_interspeech.pdf) |
+| 615 | Incorporating L2 Phonemes using Articulatory Features for Robust Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02534-b31b1b.svg)](https://arxiv.org/abs/2306.02534) |
+| 1510 | On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/parcollet23_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04116371) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -771,10 +771,10 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1846 | Phonemic Competition in End-to-End ASR models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tenbosch23_interspeech.pdf) |
-| 443 | Automatic Speaker Recognition with Variation Across Vocal Conditions: A Controlled Experiment with Implications for Forensics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hughes23_interspeech.pdf) |
-| 1398 | Exploring Graph Theory Methods for the Analysis of Pronunciation Variation in Spontaneous Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geiger23_interspeech.pdf) |
-| 680 | Automatic Speaker Recognition Performance with Matched and Mismatched Female Bilingual Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nuttall23_interspeech.pdf) |
+| 1846 | Phonemic Competition in End-to-End ASR models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tenbosch23_interspeech.pdf) |
+| 443 | Automatic Speaker Recognition with Variation Across Vocal Conditions: A Controlled Experiment with Implications for Forensics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hughes23_interspeech.pdf) |
+| 1398 | Exploring Graph Theory Methods for the Analysis of Pronunciation Variation in Spontaneous Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geiger23_interspeech.pdf) |
+| 680 | Automatic Speaker Recognition Performance with Matched and Mismatched Female Bilingual Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nuttall23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -786,12 +786,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2303 | FACTSpeech: Speaking a Foreign Language Pronunciation using Only Your Native Characters | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://atozto9.github.io/demo/FACTSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23x_interspeech.pdf) |
-| 934 | Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02579-b31b1b.svg)](https://arxiv.org/abs/2306.02579) |
-| 363 | DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://goarsenal.github.io/DSE-TTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14145-b31b1b.svg)](https://arxiv.org/abs/2306.14145) |
-| 1467 | Generating Multilingual Gender-Ambiguous Text-to-Speech Voices | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://innoetics.github.io/publications/gender-ambiguous/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/markopoulos23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00375-b31b1b.svg)](https://arxiv.org/abs/2211.00375) |
-| 2330 | RADMMM: Multilingual Multiaccented Multispeaker Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/badlani23_interspeech.pdf) <br /> [![NVidia AI](https://img.shields.io/badge/NVidia-AI-78B900.svg)](https://research.nvidia.com/labs/adlr/projects/radmmm/) |
-| 861 | Multilingual Context-based Pronunciation Learning for Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/comini23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/multilingual-context-based-pronunciation-learning-for-text-to-speech) |
+| 2303 | FACTSpeech: Speaking a Foreign Language Pronunciation using Only Your Native Characters | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://atozto9.github.io/demo/FACTSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23x_interspeech.pdf) |
+| 934 | Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02579-b31b1b.svg)](https://arxiv.org/abs/2306.02579) |
+| 363 | DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://goarsenal.github.io/DSE-TTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14145-b31b1b.svg)](https://arxiv.org/abs/2306.14145) |
+| 1467 | Generating Multilingual Gender-Ambiguous Text-to-Speech Voices | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://innoetics.github.io/publications/gender-ambiguous/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/markopoulos23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00375-b31b1b.svg)](https://arxiv.org/abs/2211.00375) |
+| 2330 | RADMMM: Multilingual Multiaccented Multispeaker Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/badlani23_interspeech.pdf) <br /> [![NVidia AI](https://img.shields.io/badge/NVidia-AI-78B900.svg)](https://research.nvidia.com/labs/adlr/projects/radmmm/) |
+| 861 | Multilingual Context-based Pronunciation Learning for Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/comini23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/multilingual-context-based-pronunciation-learning-for-text-to-speech) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -803,35 +803,35 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2170 | Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23c_interspeech.pdf) |
-| 1113 | The Importance of Calibration: Rethinking Confidence and Performance of Speech Multi-Label Emotion Classifiers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chou23_interspeech.pdf) <br /> [![BIIC](https://img.shields.io/badge/biic-research-F7C552.svg)](https://biic.ee.nthu.edu.tw/research.php?id=166) |
-| 1080 | A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model | [![Emulation AI](https://img.shields.io/badge/Emulation-AI-161B1F.svg)](https://emulationai.com/research/diffusion-ser/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/malik23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11413-b31b1b.svg)](https://arxiv.org/abs/2305.11413) |
-| 454 | Privacy Risks in Speech Emotion Recognition: A Systematic Study on Gender Inference Attack | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alsenani23_interspeech.pdf) |
-| 2111 | Episodic Memory For Domain-Adaptable, Robust Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tavernor23_interspeech.pdf) |
-| 80 | Stable Speech Emotion Recognition with Head-k-Pooling Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ding23_interspeech.pdf) |
-| 890 | A Context-Constrained Sentence Modeling for Deception Detection in Real Interrogation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23d_interspeech.pdf) |
-| 819 | MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer | [![GitHub](https://img.shields.io/github/stars/crowpeter/MetricAug?style=flat)](https://github.com/crowpeter/MetricAug) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23c_interspeech.pdf) |
-| 240 | The Co-use of Laughter and Head Gestures Across Speech Styles | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ludusan23_interspeech.pdf) |
-| 1351 | EmotionNAS: Two-Stream Neural Architecture Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.13617-b31b1b.svg)](https://arxiv.org/abs/2203.13617) |
-| 136 | Pre-Finetuning for Few-Shot Emotional Speech Recognition | [![GitHub](https://img.shields.io/github/stars/maxlchen/Speech-PreFinetuning?style=flat)](https://github.com/maxlchen/Speech-PreFinetuning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12921-b31b1b.svg)](https://arxiv.org/abs/2302.12921) |
-| 293 | Integrating Emotion Recognition with Speech Recognition and Speaker Diarization for Conversations | [![GitHub](https://img.shields.io/github/stars/W-Wu/sTEER?style=flat)](https://github.com/W-Wu/sTEER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23_interspeech.pdf) |
-| 1075 | Utility-Preserving Privacy-Enabled Speech Embeddings for Emotion Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lavania23_interspeech.pdf) |
-| 1923 | Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews | [![GitHub](https://img.shields.io/github/stars/idiap/Node_weighted_GCN_for_depression_detection?style=flat)](https://github.com/idiap/Node_weighted_GCN_for_depression_detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/burdisso23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00920-b31b1b.svg)](https://arxiv.org/abs/2307.00920) |
-| 1914 | Laughter in Task-based Settings: Whom We Talk to Affects How, When, and How Often We Laugh | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/branco23_interspeech.pdf) |
-| 653 | Exploring Downstream Transfer of Self-Supervised Features for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fang23b_interspeech.pdf) |
-| 1758 | Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deoliveira23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19184-b31b1b.svg)](https://arxiv.org/abs/2305.19184) |
-| 756 | Two-Stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23d_interspeech.pdf) |
-| 1311 | Investigating Acoustic Cues for Multilingual Abuse Detection | [![GitHub](https://img.shields.io/github/stars/Cross-Caps/ACMAD?style=flat)](https://github.com/Cross-Caps/ACMAD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/thakran23_interspeech.pdf) |
-| 1600 | A Novel Frequency Warping Scale for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23c_interspeech.pdf) |
-| 1170 | Multi-Scale Temporal Transformer for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23m_interspeech.pdf) |
-| 1169 | Distant Speech Emotion Recognition in an Indoor Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/grageda23_interspeech.pdf) |
-| 2498 | A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tao23b_interspeech.pdf) |
-| 2375 | Improving Joint Speech and Emotion Recognition using Global Style Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kyung23_interspeech.pdf) |
-| 1163 | Speech Emotion Recognition by Estimating Emotional Label Sequences with Phoneme Class Attribute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nagase23_interspeech.pdf) |
-| 274 | Unsupervised Transfer Components Learning for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23_interspeech.pdf) |
-| 1090 | Dual Memory Fusion for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/prisayad23_interspeech.pdf) |
-| 311 | Hybrid Dataset for Speech Emotion Recognition in Russian Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kondratenko23_interspeech.pdf) |
-| 396 | Speech Emotion Recognition using Decomposed Speech via Multi-Task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hsu23_interspeech.pdf) |
+| 2170 | Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23c_interspeech.pdf) |
+| 1113 | The Importance of Calibration: Rethinking Confidence and Performance of Speech Multi-Label Emotion Classifiers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chou23_interspeech.pdf) <br /> [![BIIC](https://img.shields.io/badge/biic-research-F7C552.svg)](https://biic.ee.nthu.edu.tw/research.php?id=166) |
+| 1080 | A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model | [![Emulation AI](https://img.shields.io/badge/Emulation-AI-161B1F.svg)](https://emulationai.com/research/diffusion-ser/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/malik23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11413-b31b1b.svg)](https://arxiv.org/abs/2305.11413) |
+| 454 | Privacy Risks in Speech Emotion Recognition: A Systematic Study on Gender Inference Attack | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alsenani23_interspeech.pdf) |
+| 2111 | Episodic Memory For Domain-Adaptable, Robust Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tavernor23_interspeech.pdf) |
+| 80 | Stable Speech Emotion Recognition with Head-k-Pooling Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ding23_interspeech.pdf) |
+| 890 | A Context-Constrained Sentence Modeling for Deception Detection in Real Interrogation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23d_interspeech.pdf) |
+| 819 | MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer | [![GitHub](https://img.shields.io/github/stars/crowpeter/MetricAug?style=flat)](https://github.com/crowpeter/MetricAug) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23c_interspeech.pdf) |
+| 240 | The Co-use of Laughter and Head Gestures Across Speech Styles | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ludusan23_interspeech.pdf) |
+| 1351 | EmotionNAS: Two-Stream Neural Architecture Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.13617-b31b1b.svg)](https://arxiv.org/abs/2203.13617) |
+| 136 | Pre-Finetuning for Few-Shot Emotional Speech Recognition | [![GitHub](https://img.shields.io/github/stars/maxlchen/Speech-PreFinetuning?style=flat)](https://github.com/maxlchen/Speech-PreFinetuning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12921-b31b1b.svg)](https://arxiv.org/abs/2302.12921) |
+| 293 | Integrating Emotion Recognition with Speech Recognition and Speaker Diarization for Conversations | [![GitHub](https://img.shields.io/github/stars/W-Wu/sTEER?style=flat)](https://github.com/W-Wu/sTEER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23_interspeech.pdf) |
+| 1075 | Utility-Preserving Privacy-Enabled Speech Embeddings for Emotion Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lavania23_interspeech.pdf) |
+| 1923 | Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews | [![GitHub](https://img.shields.io/github/stars/idiap/Node_weighted_GCN_for_depression_detection?style=flat)](https://github.com/idiap/Node_weighted_GCN_for_depression_detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/burdisso23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00920-b31b1b.svg)](https://arxiv.org/abs/2307.00920) |
+| 1914 | Laughter in Task-based Settings: Whom We Talk to Affects How, When, and How Often We Laugh | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/branco23_interspeech.pdf) |
+| 653 | Exploring Downstream Transfer of Self-Supervised Features for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fang23b_interspeech.pdf) |
+| 1758 | Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deoliveira23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19184-b31b1b.svg)](https://arxiv.org/abs/2305.19184) |
+| 756 | Two-Stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23d_interspeech.pdf) |
+| 1311 | Investigating Acoustic Cues for Multilingual Abuse Detection | [![GitHub](https://img.shields.io/github/stars/Cross-Caps/ACMAD?style=flat)](https://github.com/Cross-Caps/ACMAD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/thakran23_interspeech.pdf) |
+| 1600 | A Novel Frequency Warping Scale for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23c_interspeech.pdf) |
+| 1170 | Multi-Scale Temporal Transformer for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23m_interspeech.pdf) |
+| 1169 | Distant Speech Emotion Recognition in an Indoor Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/grageda23_interspeech.pdf) |
+| 2498 | A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tao23b_interspeech.pdf) |
+| 2375 | Improving Joint Speech and Emotion Recognition using Global Style Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kyung23_interspeech.pdf) |
+| 1163 | Speech Emotion Recognition by Estimating Emotional Label Sequences with Phoneme Class Attribute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nagase23_interspeech.pdf) |
+| 274 | Unsupervised Transfer Components Learning for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23_interspeech.pdf) |
+| 1090 | Dual Memory Fusion for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/prisayad23_interspeech.pdf) |
+| 311 | Hybrid Dataset for Speech Emotion Recognition in Russian Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kondratenko23_interspeech.pdf) |
+| 396 | Speech Emotion Recognition using Decomposed Speech via Multi-Task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hsu23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -843,43 +843,43 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 46 | FC-MTLF: A Fine- and Coarse-grained Multi-task Learning Framework for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23b_interspeech.pdf) |
-| 93 | C<sup>2</sup>A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23c_interspeech.pdf) |
-| 2300 | Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets | [![GitHub](https://img.shields.io/github/stars/adlnlp/Tri-NLU?style=flat)](https://github.com/adlnlp/Tri-NLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/weld23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17729-b31b1b.svg)](https://arxiv.org/abs/2305.17729) |
-| 2234 | Semantic Enrichment Towards Efficient Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/laperriere23_interspeech.pdf) |
-| 1299 | Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kashiwagi23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01247-b31b1b.svg)](https://arxiv.org/abs/2306.01247) |
-| 699 | DiffSLU: Knowledge Distillation based Diffusion Model for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mao23_interspeech.pdf) |
-| 1962 | Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arora23_interspeech.pdf) |
-| 644 | Contrastive Learning based ASR Robust Knowledge Selection for Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23e_interspeech.pdf) |
-| 1859 | Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space | [![GitHub](https://img.shields.io/github/stars/seongminp/hyperseg?style=flat)](https://github.com/seongminp/hyperseg) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23f_interspeech.pdf) |
-| 198 | An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/CL_SLU?style=flat)](https://github.com/umbertocappellazzo/CL_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cappellazzo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.08161-b31b1b.svg)](https://arxiv.org/abs/2211.08161) |
-| 1740 | Enhancing New Intent Discovery via Robust Neighbor-based Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23h_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://chenmengdx.github.io/papers/IS23-NID.pdf) |
-| 211 | Personalized Predictive ASR for Latency Reduction in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schwarz23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13794-b31b1b.svg)](https://arxiv.org/abs/2305.13794) |
-| 1419 | Compositional Generalization in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ray23_interspeech.pdf) |
-| 2314 | Sampling Bias in NLU Models: Impact and Mitigation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ha_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/sampling-bias-in-nlu-models-impact-and-mitigation) |
-| 1038 | 5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01855-b31b1b.svg)](https://arxiv.org/abs/2306.01855) |
-| 1236 | Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23e_interspeech.pdf) |
-| 1505 | WhiSLU: End-to-End Spoken Language Understanding with Whisper | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ga_interspeech.pdf) |
-| 1947 | Relationship between Auditory and Semantic Entrainment using Deep Neural Networks (DNN) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kejriwal23b_interspeech.pdf) |
-| 1929 | Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kejriwal23_interspeech.pdf) |
-| 952 | Prosodic Features Improve Sentence Segmentation and Parsing in English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nielsen23_interspeech.pdf) |
-| 320 | Estimation of Listening Response Timing by Generative Model and Parameter Control of Response Substantialness using Dynamic-Prompt-Tune | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/muromachi23_interspeech.pdf) |
-| 1885 | Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chowdhury23_interspeech.pdf) |
-| 2341 | Efficient Multimodal Neural Networks for Trigger-Less Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/buddi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12063-b31b1b.svg)](https://arxiv.org/abs/2305.12063) |
-| 2332 | Rapid Lexical Alignment to a Conversational Agent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ostrand23_interspeech.pdf) |
-| 578 | Multimodal Turn-Taking Model using Visual cues for End-of-Utterance Prediction in Spoken Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kurata23_interspeech.pdf) |
-| 1464 | Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hojo23_interspeech.pdf) |
-| 1618 | Improving the Response Timing Estimation for Spoken Dialogue Systems by Reducing the Effect of Speech Recognition Delay | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sakuma23_interspeech.pdf) |
-| 555 | Focus-Attention-Enhanced Cross-Modal Transformer with Metric Learning for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23c_interspeech.pdf) |
-| 1717 | A Multiple-Teacher Pruning based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23la_interspeech.pdf) |
-| 789 | Abusive Speech Detection in Indic Languages using Acoustic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/spiesberger23_interspeech.pdf) |
-| 1791 | Listening to Silences In Contact Center Conversations using Textual cues | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ingle23_interspeech.pdf) |
-| 2475 | I Learned Error, I Can Fix It!: A Detector-Corrector Structure for ASR Error Calibration | [![GitHub](https://img.shields.io/github/stars/yeonheuiyeon/Detector_Corrector_SLU?style=flat)](https://github.com/yeonheuiyeon/Detector_Corrector_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yeen23_interspeech.pdf) |
-| 1074 | Verbal and Nonverbal Feedback Signals in Response to Increasing Levels of Miscommunication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/garnier23b_interspeech.pdf) |
-| 76 | Speech-based Classification of Defensive Communication: A Novel Dataset and Results | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/amiriparian23_interspeech.pdf) |
-| 1951 | Quantifying the Perceptual Value of Lexical and Non-Lexical Channels in Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarenne.github.io/is-2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wallbridge23_interspeech.pdf) |
-| 1267 | Relationships between Gender, Personality Traits and Features of Multi-Modal Data to Responses to Spoken Dialog Systems Breakdown | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tsubokura23_interspeech.pdf) |
-| 1650 | Speaker-aware Cross-Modal Fusion Architecture for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23e_interspeech.pdf) |
+| 46 | FC-MTLF: A Fine- and Coarse-grained Multi-task Learning Framework for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23b_interspeech.pdf) |
+| 93 | C<sup>2</sup>A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23c_interspeech.pdf) |
+| 2300 | Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets | [![GitHub](https://img.shields.io/github/stars/adlnlp/Tri-NLU?style=flat)](https://github.com/adlnlp/Tri-NLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/weld23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17729-b31b1b.svg)](https://arxiv.org/abs/2305.17729) |
+| 2234 | Semantic Enrichment Towards Efficient Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/laperriere23_interspeech.pdf) |
+| 1299 | Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kashiwagi23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01247-b31b1b.svg)](https://arxiv.org/abs/2306.01247) |
+| 699 | DiffSLU: Knowledge Distillation based Diffusion Model for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mao23_interspeech.pdf) |
+| 1962 | Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arora23_interspeech.pdf) |
+| 644 | Contrastive Learning based ASR Robust Knowledge Selection for Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23e_interspeech.pdf) |
+| 1859 | Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space | [![GitHub](https://img.shields.io/github/stars/seongminp/hyperseg?style=flat)](https://github.com/seongminp/hyperseg) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23f_interspeech.pdf) |
+| 198 | An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/CL_SLU?style=flat)](https://github.com/umbertocappellazzo/CL_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cappellazzo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.08161-b31b1b.svg)](https://arxiv.org/abs/2211.08161) |
+| 1740 | Enhancing New Intent Discovery via Robust Neighbor-based Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23h_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://chenmengdx.github.io/papers/IS23-NID.pdf) |
+| 211 | Personalized Predictive ASR for Latency Reduction in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schwarz23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13794-b31b1b.svg)](https://arxiv.org/abs/2305.13794) |
+| 1419 | Compositional Generalization in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ray23_interspeech.pdf) |
+| 2314 | Sampling Bias in NLU Models: Impact and Mitigation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ha_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/sampling-bias-in-nlu-models-impact-and-mitigation) |
+| 1038 | 5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01855-b31b1b.svg)](https://arxiv.org/abs/2306.01855) |
+| 1236 | Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23e_interspeech.pdf) |
+| 1505 | WhiSLU: End-to-End Spoken Language Understanding with Whisper | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ga_interspeech.pdf) |
+| 1947 | Relationship between Auditory and Semantic Entrainment using Deep Neural Networks (DNN) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kejriwal23b_interspeech.pdf) |
+| 1929 | Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kejriwal23_interspeech.pdf) |
+| 952 | Prosodic Features Improve Sentence Segmentation and Parsing in English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nielsen23_interspeech.pdf) |
+| 320 | Estimation of Listening Response Timing by Generative Model and Parameter Control of Response Substantialness using Dynamic-Prompt-Tune | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/muromachi23_interspeech.pdf) |
+| 1885 | Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chowdhury23_interspeech.pdf) |
+| 2341 | Efficient Multimodal Neural Networks for Trigger-Less Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/buddi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12063-b31b1b.svg)](https://arxiv.org/abs/2305.12063) |
+| 2332 | Rapid Lexical Alignment to a Conversational Agent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ostrand23_interspeech.pdf) |
+| 578 | Multimodal Turn-Taking Model using Visual cues for End-of-Utterance Prediction in Spoken Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kurata23_interspeech.pdf) |
+| 1464 | Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hojo23_interspeech.pdf) |
+| 1618 | Improving the Response Timing Estimation for Spoken Dialogue Systems by Reducing the Effect of Speech Recognition Delay | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sakuma23_interspeech.pdf) |
+| 555 | Focus-Attention-Enhanced Cross-Modal Transformer with Metric Learning for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23c_interspeech.pdf) |
+| 1717 | A Multiple-Teacher Pruning based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23la_interspeech.pdf) |
+| 789 | Abusive Speech Detection in Indic Languages using Acoustic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/spiesberger23_interspeech.pdf) |
+| 1791 | Listening to Silences In Contact Center Conversations using Textual cues | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ingle23_interspeech.pdf) |
+| 2475 | I Learned Error, I Can Fix It!: A Detector-Corrector Structure for ASR Error Calibration | [![GitHub](https://img.shields.io/github/stars/yeonheuiyeon/Detector_Corrector_SLU?style=flat)](https://github.com/yeonheuiyeon/Detector_Corrector_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yeen23_interspeech.pdf) |
+| 1074 | Verbal and Nonverbal Feedback Signals in Response to Increasing Levels of Miscommunication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/garnier23b_interspeech.pdf) |
+| 76 | Speech-based Classification of Defensive Communication: A Novel Dataset and Results | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/amiriparian23_interspeech.pdf) |
+| 1951 | Quantifying the Perceptual Value of Lexical and Non-Lexical Channels in Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarenne.github.io/is-2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wallbridge23_interspeech.pdf) |
+| 1267 | Relationships between Gender, Personality Traits and Features of Multi-Modal Data to Responses to Spoken Dialog Systems Breakdown | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tsubokura23_interspeech.pdf) |
+| 1650 | Speaker-aware Cross-Modal Fusion Architecture for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23e_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -891,64 +891,64 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 936 | Biophysically-Inspired Single-Channel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wen23b_interspeech.pdf) |
-| 1902 | On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on Flexible Location Gradient Reversal Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jalal23_interspeech.pdf) |
-| 1901 | How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shim23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00044-b31b1b.svg)](https://arxiv.org/abs/2306.00044) |
-| 1287 | CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cleanunet2.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kong23c_interspeech.pdf) |
-| 521 | A Two-Stage Progressive Neural Network for Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23e_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371040399_A_Two-stage_Progressive_Neural_Network_for_Acoustic_Echo_Cancellation) |
-| 537 | An Intra-BRNN and GB-RVQ based End-to-End Neural Audio Codec | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23_interspeech.pdf) |
-| 1066 | Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-Attended Speaker Representations | [![GitHub](https://img.shields.io/github/stars/shucongzhang/CrossAttnPse?style=flat)](https://github.com/shucongzhang/CrossAttnPse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23r_interspeech.pdf) |
-| 280 | CFTNet: Complex-Valued Frequency Transformation Network for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mamun23_interspeech.pdf) |
-| 623 | Feature Normalization for Fine-Tuning Self-Supervised Models in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08406-b31b1b.svg)](https://arxiv.org/abs/2306.08406) |
-| 1490 | Multi-Mode Neural Speech Coding based on Deep Generative Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23c_interspeech.pdf) |
-| 751 | Streaming Dual-Path Transformer for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bae23_interspeech.pdf) |
-| 1848 | Sequence-to-Sequence Multi-Modal Speech In-Painting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kadkhodaeielyaderani23_interspeech.pdf) |
-| 984 | Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23q_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.02583-b31b1b.svg)](https://arxiv.org/abs/2305.02583) |
-| 551 | Differentially Private Adapters for Parameter Efficient Acoustic Modeling | [![GitHub](https://img.shields.io/github/stars/Chun-wei-Ho/Private-Speech-Adapter?style=flat)](https://github.com/Chun-wei-Ho/Private-Speech-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ho23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11360-b31b1b.svg)](https://arxiv.org/abs/2305.11360) |
-| 780 | Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhengrachel.github.io/UTIforAVSE-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/ZhengRachel/UTIforAVSE-demo?style=flat)](https://github.com/ZhengRachel/UTIforAVSE-demo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zheng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14933-b31b1b.svg)](https://arxiv.org/abs/2305.14933) |
-| 2568 | Consonant-Emphasis Method Incorporating Robust Consonant-Section Detection to Improve Intelligibility of Bone-Conducted Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/uezu23_interspeech.pdf) |
-| 1578 | Downstream Task-Agnostic Speech Enhancement with Self-Supervised Representation Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sato23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14723-b31b1b.svg)](https://arxiv.org/abs/2305.14723) |
-| 2305 | Perceptual Improvement of Deep Neural Network (DNN) Speech Coder using Parametric and Nonparametric Density Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/byun23_interspeech.pdf) |
-| 2437 | DeFT-AN RT: Real-Time Multichannel Speech Enhancement using Dense Frequency-Time Attentive Network and Non-overlapping Synthesis Window | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23j_interspeech.pdf) |
-| 1376 | PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23f_interspeech.pdf) |
-| 1364 | Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/han23b_interspeech.pdf) |
-| 365 | Iterative Autoregression: A Novel Trick to Improve your Low-Latency Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/andreev23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.01751-b31b1b.svg)](https://arxiv.org/abs/2211.01751) |
-| 1084 | A Multi-Dimensional Deep Structured State Space Approach to Speech Enhancement using Small-Footprint Models | [![GitHub](https://img.shields.io/github/stars/Kuray107/S4ND-U-Net_speech_enhancement?style=flat)](https://github.com/Kuray107/S4ND-U-Net_speech_enhancement) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ku23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00331-b31b1b.svg)](https://arxiv.org/abs/2306.00331) |
-| 705 | Domain Adaptation for Speech Enhancement in a Large Domain Gap | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/frenkel23_interspeech.pdf) |
-| 456 | SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zadorozhnyy23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.14474-b31b1b.svg)](https://arxiv.org/abs/2210.14474) |
-| 339 | A Mask Free Neural Network for Monaural Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ioyy900205/MFNet?style=flat)](https://github.com/ioyy900205/MFNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04286-b31b1b.svg)](https://arxiv.org/abs/2306.04286) |
-| 1548 | A Training and Inference Strategy using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech | [![GitHub](https://img.shields.io/github/stars/Sinica-SLAM/Ny-EnhTT?style=flat)](https://github.com/Sinica-SLAM/Ny-EnhTT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15368-b31b1b.svg)](https://arxiv.org/abs/2210.15368) |
-| 2418 | A Simple RNN Model for Lightweight, Low-Compute and Low-Latency Multichannel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pandey23b_interspeech.pdf) |
-| 1433 | High Fidelity Speech Enhancement with Band-Split RNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.00406-b31b1b.svg)](https://arxiv.org/abs/2212.00406) |
-| 218 | Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-NS-Extractor/) <br /> [![GitHub](https://img.shields.io/github/stars/thuhcsi/interspeech2023-NS-Extractor?style=flat)](https://github.com/thuhcsi/interspeech2023-NS-Extractor) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16241-b31b1b.svg)](https://arxiv.org/abs/2306.16241) |
-| 882 | DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kovalyov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.13407-b31b1b.svg)](https://arxiv.org/abs/2302.13407) |
-| 1323 | Speaker-Aware Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.01126-b31b1b.svg)](https://arxiv.org/abs/2303.01126) |
-| 1116 | Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/araki23_interspeech.pdf) |
-| 799 | EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sach23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02778-b31b1b.svg)](https://arxiv.org/abs/2306.02778) |
-| 1795 | HAD-ANC: A Hybrid System Comprising an Adaptive Filter and Deep Neural Networks for Active Noise Control | [![GitHub](https://img.shields.io/github/stars/wndvlf96/HAD-ANC?style=flat)](https://github.com/wndvlf96/HAD-ANC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23e_interspeech.pdf) |
-| 886 | MSAF: A Multiple Self-Attention Field Method for Speech Enhancement | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mmf-sasegan.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chu23_interspeech.pdf) |
-| 2302 | Ultra Dual-Path Compression for Joint echo Cancellation and Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23t_interspeech.pdf) |
-| 971 | ABC-KD: Attention-based-Compression Knowledge Distillation for Deep Learning-based Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16665-b31b1b.svg)](https://arxiv.org/abs/2305.16665) |
-| 1532 | PLCMOS – a Data-Driven Non-Intrusive Metric for the Evaluation of Packet Loss Concealment Algorithms | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/microsoft/PLC-Challenge/tree/main/PLCMOS) <br /> [![PyPI](https://img.shields.io/pypi/v/speechmos)](https://pypi.org/project/speechmos/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/diener23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15127-b31b1b.svg)](https://arxiv.org/abs/2305.15127) |
-| 1910 | Multi-Dataset Co-training with Sharpness-aware Optimization for Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shim23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19953-b31b1b.svg)](https://arxiv.org/abs/2305.19953) |
-| 1445 | Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sp-uhh/sgmse-bbed?style=flat)](https://github.com/sp-uhh/sgmse-bbed) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lay23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14748-b31b1b.svg)](https://arxiv.org/abs/2302.14748) |
-| 901 | Complex-valued Neural Networks for Voice Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/muller23_interspeech.pdf) |
-| 1028 | DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic echo Cancellation, Noise Suppression and Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ristea.github.io/deep-vqe/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ristea23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03177-b31b1b.svg)](https://arxiv.org/abs/2306.03177) |
-| 1547 | Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sony/diffiner?style=flat)](https://github.com/sony/diffiner) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sawata23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17287-b31b1b.svg)](https://arxiv.org/abs/2210.17287) |
-| 1642 | HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01411-b31b1b.svg)](https://arxiv.org/abs/2306.01411) |
-| 1441 | MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yxlu-0102.github.io/mpsenet-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/yxlu-0102/MP-SENet?style=flat)](https://github.com/yxlu-0102/MP-SENet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13686-b31b1b.svg)](https://arxiv.org/abs/2305.13686) |
-| 565 | TRIDENTSE: Guiding Speech Enhancement with 32 Global Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yin23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.12995-b31b1b.svg)](https://arxiv.org/abs/2210.12995) |
-| 1254 | Detection of Cross-Dataset Fake Audio based on Prosodic and Pronunciation Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23x_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13700-b31b1b.svg)](https://arxiv.org/abs/2305.13700) |
-| 1890 | Self-Supervised Learning with Diffusion based Multichannel Speech Enhancement for Speaker Verification under Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dowerah23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.02244-b31b1b.svg)](https://arxiv.org/abs/2307.02244) |
-| 1341 | Two-Stage Voice Anonymization for Enhanced Privacy | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nespoli23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16069-b31b1b.svg)](https://arxiv.org/abs/2306.16069) |
-| 2055 | Personalized Dereverberation of Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dereverb.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23h_interspeech.pdf) |
-| 580 | Weighted Von Mises Distribution-based Loss Function for Real-Time STFT Phase Reconstruction using DNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/binhthien23_interspeech.pdf) |
-| 272 | Deep Multi-Frame Filtering for Hearing Aids | [![GitHub](https://img.shields.io/github/stars/rikorose/deepfilternet?style=flat)](https://github.com/rikorose/deepfilternet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schroter23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.08225-b31b1b.svg)](https://arxiv.org/abs/2305.08225) |
-| 1232 | Aligning Speech Enhancement for Improving Downstream Classification Performance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiong23_interspeech.pdf) |
-| 420 | DNN-based Parameter Estimation for MVDR Beamforming and Post-Filtering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23b_interspeech.pdf) |
-| 675 | FRA-RIR: Fast Random Approximation of the Image-Source | [![GitHub](https://img.shields.io/github/stars/tencent-ailab/FRA-RIR?style=flat)](https://github.com/tencent-ailab/FRA-RIR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2208.04101-b31b1b.svg)](https://arxiv.org/abs/2208.04101) |
-| 686 | Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.04320-b31b1b.svg)](https://arxiv.org/abs/2301.04320) |
-| 186 | Harmonic Enhancement using Learnable Comb Filter for Light-Weight Full-band Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/le23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00812-b31b1b.svg)](https://arxiv.org/abs/2306.00812) |
+| 936 | Biophysically-Inspired Single-Channel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wen23b_interspeech.pdf) |
+| 1902 | On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on Flexible Location Gradient Reversal Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jalal23_interspeech.pdf) |
+| 1901 | How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shim23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00044-b31b1b.svg)](https://arxiv.org/abs/2306.00044) |
+| 1287 | CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cleanunet2.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kong23c_interspeech.pdf) |
+| 521 | A Two-Stage Progressive Neural Network for Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23e_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371040399_A_Two-stage_Progressive_Neural_Network_for_Acoustic_Echo_Cancellation) |
+| 537 | An Intra-BRNN and GB-RVQ based End-to-End Neural Audio Codec | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23_interspeech.pdf) |
+| 1066 | Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-Attended Speaker Representations | [![GitHub](https://img.shields.io/github/stars/shucongzhang/CrossAttnPse?style=flat)](https://github.com/shucongzhang/CrossAttnPse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23r_interspeech.pdf) |
+| 280 | CFTNet: Complex-Valued Frequency Transformation Network for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mamun23_interspeech.pdf) |
+| 623 | Feature Normalization for Fine-Tuning Self-Supervised Models in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08406-b31b1b.svg)](https://arxiv.org/abs/2306.08406) |
+| 1490 | Multi-Mode Neural Speech Coding based on Deep Generative Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23c_interspeech.pdf) |
+| 751 | Streaming Dual-Path Transformer for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bae23_interspeech.pdf) |
+| 1848 | Sequence-to-Sequence Multi-Modal Speech In-Painting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kadkhodaeielyaderani23_interspeech.pdf) |
+| 984 | Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23q_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.02583-b31b1b.svg)](https://arxiv.org/abs/2305.02583) |
+| 551 | Differentially Private Adapters for Parameter Efficient Acoustic Modeling | [![GitHub](https://img.shields.io/github/stars/Chun-wei-Ho/Private-Speech-Adapter?style=flat)](https://github.com/Chun-wei-Ho/Private-Speech-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ho23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11360-b31b1b.svg)](https://arxiv.org/abs/2305.11360) |
+| 780 | Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhengrachel.github.io/UTIforAVSE-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/ZhengRachel/UTIforAVSE-demo?style=flat)](https://github.com/ZhengRachel/UTIforAVSE-demo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zheng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14933-b31b1b.svg)](https://arxiv.org/abs/2305.14933) |
+| 2568 | Consonant-Emphasis Method Incorporating Robust Consonant-Section Detection to Improve Intelligibility of Bone-Conducted Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/uezu23_interspeech.pdf) |
+| 1578 | Downstream Task-Agnostic Speech Enhancement with Self-Supervised Representation Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sato23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14723-b31b1b.svg)](https://arxiv.org/abs/2305.14723) |
+| 2305 | Perceptual Improvement of Deep Neural Network (DNN) Speech Coder using Parametric and Nonparametric Density Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/byun23_interspeech.pdf) |
+| 2437 | DeFT-AN RT: Real-Time Multichannel Speech Enhancement using Dense Frequency-Time Attentive Network and Non-overlapping Synthesis Window | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23j_interspeech.pdf) |
+| 1376 | PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23f_interspeech.pdf) |
+| 1364 | Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/han23b_interspeech.pdf) |
+| 365 | Iterative Autoregression: A Novel Trick to Improve your Low-Latency Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/andreev23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.01751-b31b1b.svg)](https://arxiv.org/abs/2211.01751) |
+| 1084 | A Multi-Dimensional Deep Structured State Space Approach to Speech Enhancement using Small-Footprint Models | [![GitHub](https://img.shields.io/github/stars/Kuray107/S4ND-U-Net_speech_enhancement?style=flat)](https://github.com/Kuray107/S4ND-U-Net_speech_enhancement) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ku23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00331-b31b1b.svg)](https://arxiv.org/abs/2306.00331) |
+| 705 | Domain Adaptation for Speech Enhancement in a Large Domain Gap | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/frenkel23_interspeech.pdf) |
+| 456 | SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zadorozhnyy23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.14474-b31b1b.svg)](https://arxiv.org/abs/2210.14474) |
+| 339 | A Mask Free Neural Network for Monaural Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ioyy900205/MFNet?style=flat)](https://github.com/ioyy900205/MFNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04286-b31b1b.svg)](https://arxiv.org/abs/2306.04286) |
+| 1548 | A Training and Inference Strategy using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech | [![GitHub](https://img.shields.io/github/stars/Sinica-SLAM/Ny-EnhTT?style=flat)](https://github.com/Sinica-SLAM/Ny-EnhTT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15368-b31b1b.svg)](https://arxiv.org/abs/2210.15368) |
+| 2418 | A Simple RNN Model for Lightweight, Low-Compute and Low-Latency Multichannel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pandey23b_interspeech.pdf) |
+| 1433 | High Fidelity Speech Enhancement with Band-Split RNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.00406-b31b1b.svg)](https://arxiv.org/abs/2212.00406) |
+| 218 | Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-NS-Extractor/) <br /> [![GitHub](https://img.shields.io/github/stars/thuhcsi/interspeech2023-NS-Extractor?style=flat)](https://github.com/thuhcsi/interspeech2023-NS-Extractor) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16241-b31b1b.svg)](https://arxiv.org/abs/2306.16241) |
+| 882 | DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kovalyov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.13407-b31b1b.svg)](https://arxiv.org/abs/2302.13407) |
+| 1323 | Speaker-Aware Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.01126-b31b1b.svg)](https://arxiv.org/abs/2303.01126) |
+| 1116 | Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/araki23_interspeech.pdf) |
+| 799 | EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sach23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02778-b31b1b.svg)](https://arxiv.org/abs/2306.02778) |
+| 1795 | HAD-ANC: A Hybrid System Comprising an Adaptive Filter and Deep Neural Networks for Active Noise Control | [![GitHub](https://img.shields.io/github/stars/wndvlf96/HAD-ANC?style=flat)](https://github.com/wndvlf96/HAD-ANC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23e_interspeech.pdf) |
+| 886 | MSAF: A Multiple Self-Attention Field Method for Speech Enhancement | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mmf-sasegan.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chu23_interspeech.pdf) |
+| 2302 | Ultra Dual-Path Compression for Joint echo Cancellation and Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23t_interspeech.pdf) |
+| 971 | ABC-KD: Attention-based-Compression Knowledge Distillation for Deep Learning-based Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16665-b31b1b.svg)](https://arxiv.org/abs/2305.16665) |
+| 1532 | PLCMOS – a Data-Driven Non-Intrusive Metric for the Evaluation of Packet Loss Concealment Algorithms | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/microsoft/PLC-Challenge/tree/main/PLCMOS) <br /> [![PyPI](https://img.shields.io/pypi/v/speechmos)](https://pypi.org/project/speechmos/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/diener23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15127-b31b1b.svg)](https://arxiv.org/abs/2305.15127) |
+| 1910 | Multi-Dataset Co-training with Sharpness-aware Optimization for Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shim23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19953-b31b1b.svg)](https://arxiv.org/abs/2305.19953) |
+| 1445 | Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sp-uhh/sgmse-bbed?style=flat)](https://github.com/sp-uhh/sgmse-bbed) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lay23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14748-b31b1b.svg)](https://arxiv.org/abs/2302.14748) |
+| 901 | Complex-valued Neural Networks for Voice Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/muller23_interspeech.pdf) |
+| 1028 | DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic echo Cancellation, Noise Suppression and Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ristea.github.io/deep-vqe/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ristea23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03177-b31b1b.svg)](https://arxiv.org/abs/2306.03177) |
+| 1547 | Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sony/diffiner?style=flat)](https://github.com/sony/diffiner) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sawata23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17287-b31b1b.svg)](https://arxiv.org/abs/2210.17287) |
+| 1642 | HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01411-b31b1b.svg)](https://arxiv.org/abs/2306.01411) |
+| 1441 | MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yxlu-0102.github.io/mpsenet-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/yxlu-0102/MP-SENet?style=flat)](https://github.com/yxlu-0102/MP-SENet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13686-b31b1b.svg)](https://arxiv.org/abs/2305.13686) |
+| 565 | TRIDENTSE: Guiding Speech Enhancement with 32 Global Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yin23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.12995-b31b1b.svg)](https://arxiv.org/abs/2210.12995) |
+| 1254 | Detection of Cross-Dataset Fake Audio based on Prosodic and Pronunciation Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23x_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13700-b31b1b.svg)](https://arxiv.org/abs/2305.13700) |
+| 1890 | Self-Supervised Learning with Diffusion based Multichannel Speech Enhancement for Speaker Verification under Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dowerah23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.02244-b31b1b.svg)](https://arxiv.org/abs/2307.02244) |
+| 1341 | Two-Stage Voice Anonymization for Enhanced Privacy | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nespoli23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16069-b31b1b.svg)](https://arxiv.org/abs/2306.16069) |
+| 2055 | Personalized Dereverberation of Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dereverb.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23h_interspeech.pdf) |
+| 580 | Weighted Von Mises Distribution-based Loss Function for Real-Time STFT Phase Reconstruction using DNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/binhthien23_interspeech.pdf) |
+| 272 | Deep Multi-Frame Filtering for Hearing Aids | [![GitHub](https://img.shields.io/github/stars/rikorose/deepfilternet?style=flat)](https://github.com/rikorose/deepfilternet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schroter23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.08225-b31b1b.svg)](https://arxiv.org/abs/2305.08225) |
+| 1232 | Aligning Speech Enhancement for Improving Downstream Classification Performance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiong23_interspeech.pdf) |
+| 420 | DNN-based Parameter Estimation for MVDR Beamforming and Post-Filtering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23b_interspeech.pdf) |
+| 675 | FRA-RIR: Fast Random Approximation of the Image-Source | [![GitHub](https://img.shields.io/github/stars/tencent-ailab/FRA-RIR?style=flat)](https://github.com/tencent-ailab/FRA-RIR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2208.04101-b31b1b.svg)](https://arxiv.org/abs/2208.04101) |
+| 686 | Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.04320-b31b1b.svg)](https://arxiv.org/abs/2301.04320) |
+| 186 | Harmonic Enhancement using Learnable Comb Filter for Light-Weight Full-band Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/le23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00812-b31b1b.svg)](https://arxiv.org/abs/2306.00812) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -960,28 +960,28 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1023 | Detection of Emotional Hotspots in Meetings using a Cross-Corpus Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/stemmer23_interspeech.pdf) |
-| 1412 | Detection of Laughter and Screaming using the Attention and CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/matsuda23_interspeech.pdf) |
-| 1852 | Capturing Formality in Speech Across Domains and Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhattacharya23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.cs.columbia.edu/speech/PaperFiles/2023/interspeech23_formality.pdf) |
-| 460 | Towards Robust Family-Infant Audio Analysis based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/wav2vec_LittleBeats) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-lijialudew-FFD21F.svg)](https://huggingface.co/lijialudew/wav2vec_LittleBeats_LENA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12530-b31b1b.svg)](https://arxiv.org/abs/2305.12530) |
-| 778 | Cues to Next-Speaker Projection in Conversational Swedish: Evidence from Reaction Times | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feindt23_interspeech.pdf) <br /> [![psyArXiv](https://img.shields.io/badge/psyArXiv-Preprints-226B79.svg)](https://psyarxiv.com/qasge/) |
-| 1200 | Multiple Instance Learning for Inference of Child Attachment from Paralinguistic Aspects of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/buker23_interspeech.pdf) |
-| 2070 | Speaker Embeddings as Individuality Proxy for Voice Stress Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05915-b31b1b.svg)](https://arxiv.org/abs/2306.05915) |
-| 2213 | From Interval to Ordinal: A HMM based Approach for Emotion Label Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23j_interspeech.pdf) |
-| 661 | Turbo your Multi-Modal Classification with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23l_interspeech.pdf) |
-| 497 | Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ioannides23_interspeech.pdf) |
-| 1360 | SOT: Self-Supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23y_interspeech.pdf) |
-| 2464 | On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bansal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12540-b31b1b.svg)](https://arxiv.org/abs/2305.12540) |
-| 830 | Speaking State Decoder with Transition Detection for Next Speaker Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23b_interspeech.pdf) |
-| 1507 | What are Differences? Comparing DNN and Human by their Performance and Characteristics in Speaker Age Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kitagishi23_interspeech.pdf) |
-| 846 | Effects of Perceived Gender on the Perceived Social Function of Laughter | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arts23_interspeech.pdf) |
-| 1999 | Implicit Phonetic Information Modeling for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/purohit23_interspeech.pdf) <br /> [![idiap](https://img.shields.io/badge/idiap.ch.5062-FF6A00.svg)](https://publications.idiap.ch/publications/show/5062) |
-| 1034 | Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/leem23_interspeech.pdf) |
-| 300 | Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23b_interspeech.pdf) |
-| 1108 | Preference Learning Labels by Anchoring on Consecutive Annotations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/naini23_interspeech.pdf) |
-| 2561 | Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chetiaphukan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18640-b31b1b.svg)](https://arxiv.org/abs/2305.18640) |
-| 543 | Learning Local to Global Feature Aggregation for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01491-b31b1b.svg)](https://arxiv.org/abs/2306.01491) |
-| 842 | Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23q_interspeech.pdf) |
+| 1023 | Detection of Emotional Hotspots in Meetings using a Cross-Corpus Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/stemmer23_interspeech.pdf) |
+| 1412 | Detection of Laughter and Screaming using the Attention and CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/matsuda23_interspeech.pdf) |
+| 1852 | Capturing Formality in Speech Across Domains and Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhattacharya23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.cs.columbia.edu/speech/PaperFiles/2023/interspeech23_formality.pdf) |
+| 460 | Towards Robust Family-Infant Audio Analysis based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/wav2vec_LittleBeats) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-lijialudew-FFD21F.svg)](https://huggingface.co/lijialudew/wav2vec_LittleBeats_LENA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12530-b31b1b.svg)](https://arxiv.org/abs/2305.12530) |
+| 778 | Cues to Next-Speaker Projection in Conversational Swedish: Evidence from Reaction Times | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feindt23_interspeech.pdf) <br /> [![psyArXiv](https://img.shields.io/badge/psyArXiv-Preprints-226B79.svg)](https://psyarxiv.com/qasge/) |
+| 1200 | Multiple Instance Learning for Inference of Child Attachment from Paralinguistic Aspects of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/buker23_interspeech.pdf) |
+| 2070 | Speaker Embeddings as Individuality Proxy for Voice Stress Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05915-b31b1b.svg)](https://arxiv.org/abs/2306.05915) |
+| 2213 | From Interval to Ordinal: A HMM based Approach for Emotion Label Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23j_interspeech.pdf) |
+| 661 | Turbo your Multi-Modal Classification with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23l_interspeech.pdf) |
+| 497 | Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ioannides23_interspeech.pdf) |
+| 1360 | SOT: Self-Supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23y_interspeech.pdf) |
+| 2464 | On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bansal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12540-b31b1b.svg)](https://arxiv.org/abs/2305.12540) |
+| 830 | Speaking State Decoder with Transition Detection for Next Speaker Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23b_interspeech.pdf) |
+| 1507 | What are Differences? Comparing DNN and Human by their Performance and Characteristics in Speaker Age Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kitagishi23_interspeech.pdf) |
+| 846 | Effects of Perceived Gender on the Perceived Social Function of Laughter | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arts23_interspeech.pdf) |
+| 1999 | Implicit Phonetic Information Modeling for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/purohit23_interspeech.pdf) <br /> [![idiap](https://img.shields.io/badge/idiap.ch.5062-FF6A00.svg)](https://publications.idiap.ch/publications/show/5062) |
+| 1034 | Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/leem23_interspeech.pdf) |
+| 300 | Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23b_interspeech.pdf) |
+| 1108 | Preference Learning Labels by Anchoring on Consecutive Annotations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/naini23_interspeech.pdf) |
+| 2561 | Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chetiaphukan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18640-b31b1b.svg)](https://arxiv.org/abs/2305.18640) |
+| 543 | Learning Local to Global Feature Aggregation for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01491-b31b1b.svg)](https://arxiv.org/abs/2306.01491) |
+| 842 | Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23q_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -993,12 +993,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1088 | Real-Time Joint Personalized Speech Enhancement and Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eskimez23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02773-b31b1b.svg)](https://arxiv.org/abs/2211.02773) |
-| 514 | TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://andong-li-speech.github.io/TaylorBM-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12024-b31b1b.svg)](https://arxiv.org/abs/2211.12024) |
-| 865 | MFT-CRN:Multi-Scale Fourier Transform for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23s_interspeech.pdf) |
-| 1265 | Variance-Preserving-based Interpolation Diffusion Models for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/guo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08527-b31b1b.svg)](https://arxiv.org/abs/2306.08527) |
-| 318 | Multi-Input Multi-Output Complex Spectral Mapping for Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/taherian23_interspeech.pdf) |
-| 992 | Short-Term Extrapolation of Speech Signals using Recursive Neural Networks in the STFT Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oberhag23_interspeech.pdf) |
+| 1088 | Real-Time Joint Personalized Speech Enhancement and Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eskimez23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02773-b31b1b.svg)](https://arxiv.org/abs/2211.02773) |
+| 514 | TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://andong-li-speech.github.io/TaylorBM-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12024-b31b1b.svg)](https://arxiv.org/abs/2211.12024) |
+| 865 | MFT-CRN:Multi-Scale Fourier Transform for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23s_interspeech.pdf) |
+| 1265 | Variance-Preserving-based Interpolation Diffusion Models for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/guo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08527-b31b1b.svg)](https://arxiv.org/abs/2306.08527) |
+| 318 | Multi-Input Multi-Output Complex Spectral Mapping for Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/taherian23_interspeech.pdf) |
+| 992 | Short-Term Extrapolation of Speech Signals using Recursive Neural Networks in the STFT Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oberhag23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1010,12 +1010,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1843 | Listener Sensitivity to Deviating Obstruents in WaveNet | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pandey23_interspeech.pdf) |
-| 981 | How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00697-b31b1b.svg)](https://arxiv.org/abs/2306.00697) |
-| 2014 | MOS vs. AB: Evaluating Text-to-Speech Systems Reliably using Clustered Standard Errors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/camp23_interspeech.pdf) |
-| 851 | RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23r_interspeech.pdf) |
-| 2013 | Can Better Perception Become a Disadvantage? Synthetic Speech Perception in Congenitally Blind Users | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/melnikleroy23_interspeech.pdf) |
-| 1076 | Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cooper23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10608-b31b1b.svg)](https://arxiv.org/abs/2305.10608) |
+| 1843 | Listener Sensitivity to Deviating Obstruents in WaveNet | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pandey23_interspeech.pdf) |
+| 981 | How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00697-b31b1b.svg)](https://arxiv.org/abs/2306.00697) |
+| 2014 | MOS vs. AB: Evaluating Text-to-Speech Systems Reliably using Clustered Standard Errors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/camp23_interspeech.pdf) |
+| 851 | RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23r_interspeech.pdf) |
+| 2013 | Can Better Perception Become a Disadvantage? Synthetic Speech Perception in Congenitally Blind Users | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/melnikleroy23_interspeech.pdf) |
+| 1076 | Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cooper23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10608-b31b1b.svg)](https://arxiv.org/abs/2305.10608) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1027,12 +1027,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1799 | Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mutiann.github.io/papers/ChatGPT_SLU/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/he23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13512-b31b1b.svg)](https://arxiv.org/abs/2305.13512) |
-| 1760 | Improving End-to-End SLU performance with Prosodic Attention and Distillation | [![GitHub](https://img.shields.io/github/stars/skit-ai/slu-prosody?style=flat)](https://github.com/skit-ai/slu-prosody) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rajaa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.08067-b31b1b.svg)](https://arxiv.org/abs/2305.08067) |
-| 2575 | Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23n_interspeech.pdf) |
-|758 | Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23c_interspeech.pdf) |
-| 2018 | ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sunder23_interspeech.pdf) |
-| 41 | GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23_interspeech.pdf) |
+| 1799 | Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mutiann.github.io/papers/ChatGPT_SLU/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/he23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13512-b31b1b.svg)](https://arxiv.org/abs/2305.13512) |
+| 1760 | Improving End-to-End SLU performance with Prosodic Attention and Distillation | [![GitHub](https://img.shields.io/github/stars/skit-ai/slu-prosody?style=flat)](https://github.com/skit-ai/slu-prosody) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rajaa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.08067-b31b1b.svg)](https://arxiv.org/abs/2305.08067) |
+| 2575 | Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23n_interspeech.pdf) |
+|758 | Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23c_interspeech.pdf) |
+| 2018 | ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sunder23_interspeech.pdf) |
+| 41 | GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1044,16 +1044,16 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 278 | Obstructive Sleep Apnea Detection using Pretrained Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23b_interspeech.pdf) |
-| 620 | EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23f_interspeech.pdf) |
-| 1966 | Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/beeson23_interspeech.pdf) |
-| 1377 | Auditory Attention Detection in Real-Life Scenarios using Common Spatial Patterns from EEG | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23r_interspeech.pdf) |
-| 1381 | Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG | [![GitHub](https://img.shields.io/github/stars/yorgoon/DiffE?style=flat)](https://github.com/yorgoon/DiffE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23g_interspeech.pdf) |
-| 40 | Towards Ultrasound Tongue Image Prediction from EEG During Speech Production | [![GitHub](https://img.shields.io/github/stars/BME-SmartLab/EEG-to-UTI?style=flat)](https://github.com/BME-SmartLab/EEG-to-UTI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/csapo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05374-b31b1b.svg)](https://arxiv.org/abs/2306.05374) |
-| 1607 | Adaptation of Tongue Ultrasound-based Silent Speech Interfaces using Spatial Transformer Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/toth23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19130-b31b1b.svg)](https://arxiv.org/abs/2305.19130) |
-| 174 | STE-GAN: Speech-to-Electromyography Signal Conversion using Generative Adversarial Networks | [![GitHub](https://img.shields.io/github/stars/scheck-k/ste-gan?style=flat)](https://github.com/scheck-k/ste-gan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/scheck23_interspeech.pdf) |
-| 1881 | Spanish Phone Confusion Analysis for EMG-based Silent Speech Interfaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/salomons23_interspeech.pdf) |
-| 805 | Hybrid Silent Speech Interface through Fusion of Electroencephalography and Electromyography | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://stone-wave.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23l_interspeech.pdf) |
+| 278 | Obstructive Sleep Apnea Detection using Pretrained Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23b_interspeech.pdf) |
+| 620 | EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23f_interspeech.pdf) |
+| 1966 | Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/beeson23_interspeech.pdf) |
+| 1377 | Auditory Attention Detection in Real-Life Scenarios using Common Spatial Patterns from EEG | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23r_interspeech.pdf) |
+| 1381 | Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG | [![GitHub](https://img.shields.io/github/stars/yorgoon/DiffE?style=flat)](https://github.com/yorgoon/DiffE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23g_interspeech.pdf) |
+| 40 | Towards Ultrasound Tongue Image Prediction from EEG During Speech Production | [![GitHub](https://img.shields.io/github/stars/BME-SmartLab/EEG-to-UTI?style=flat)](https://github.com/BME-SmartLab/EEG-to-UTI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/csapo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05374-b31b1b.svg)](https://arxiv.org/abs/2306.05374) |
+| 1607 | Adaptation of Tongue Ultrasound-based Silent Speech Interfaces using Spatial Transformer Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/toth23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19130-b31b1b.svg)](https://arxiv.org/abs/2305.19130) |
+| 174 | STE-GAN: Speech-to-Electromyography Signal Conversion using Generative Adversarial Networks | [![GitHub](https://img.shields.io/github/stars/scheck-k/ste-gan?style=flat)](https://github.com/scheck-k/ste-gan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/scheck23_interspeech.pdf) |
+| 1881 | Spanish Phone Confusion Analysis for EMG-based Silent Speech Interfaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/salomons23_interspeech.pdf) |
+| 805 | Hybrid Silent Speech Interface through Fusion of Electroencephalography and Electromyography | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://stone-wave.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23l_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1065,12 +1065,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1968 | Can Self-Supervised Neural Representations Pre-trained on Human Speech Distinguish Animal Callers? | [![GitHub](https://img.shields.io/github/stars/idiap/ssl-caller-detection?style=flat)](https://github.com/idiap/ssl-caller-detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sarkar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14035-b31b1b.svg)](https://arxiv.org/abs/2305.14035) |
-| 2342 | Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data using Contrastive Learning with Varying Pre-Training Domains | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cai23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01864-b31b1b.svg)](https://arxiv.org/abs/2306.01864) |
-| 330 | Background-aware Modeling for Weakly Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23_interspeech.pdf) |
-| 1065 | How to (Virtually) Train Your Speaker Localizer | [![GitHub](https://img.shields.io/github/stars/prerak23/Dir_SrcMic_DOA?style=flat)](https://github.com/prerak23/Dir_SrcMic_DOA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/srivastava23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.16958-b31b1b.svg)](https://arxiv.org/abs/2211.16958) |
-| 2271 | MMER: Multimodal Multi-task Learning for Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Sreyan88/MMER?style=flat)](https://github.com/Sreyan88/MMER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ghosh23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.16794-b31b1b.svg)](https://arxiv.org/abs/2203.16794) |
-| 909 | A Multi-task Learning Framework for Sound Event Detection using High-Level Acoustic Characteristics of Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/khandelwal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10729-b31b1b.svg)](https://arxiv.org/abs/2305.10729) |
+| 1968 | Can Self-Supervised Neural Representations Pre-trained on Human Speech Distinguish Animal Callers? | [![GitHub](https://img.shields.io/github/stars/idiap/ssl-caller-detection?style=flat)](https://github.com/idiap/ssl-caller-detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sarkar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14035-b31b1b.svg)](https://arxiv.org/abs/2305.14035) |
+| 2342 | Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data using Contrastive Learning with Varying Pre-Training Domains | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cai23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01864-b31b1b.svg)](https://arxiv.org/abs/2306.01864) |
+| 330 | Background-aware Modeling for Weakly Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23_interspeech.pdf) |
+| 1065 | How to (Virtually) Train Your Speaker Localizer | [![GitHub](https://img.shields.io/github/stars/prerak23/Dir_SrcMic_DOA?style=flat)](https://github.com/prerak23/Dir_SrcMic_DOA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/srivastava23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.16958-b31b1b.svg)](https://arxiv.org/abs/2211.16958) |
+| 2271 | MMER: Multimodal Multi-task Learning for Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Sreyan88/MMER?style=flat)](https://github.com/Sreyan88/MMER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ghosh23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.16794-b31b1b.svg)](https://arxiv.org/abs/2203.16794) |
+| 909 | A Multi-task Learning Framework for Sound Event Detection using High-Level Acoustic Characteristics of Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/khandelwal23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10729-b31b1b.svg)](https://arxiv.org/abs/2305.10729) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1082,11 +1082,11 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2194 | A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression with and without Medication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/neumann23b_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1n6ymnLKt21RDfawu9WHsd8tgmBPuz9SC/view) |
-| 307 | Understanding Disrupted Sentences using Underspecified Abstract Meaning Representation | [![GitHub](https://img.shields.io/github/stars/amazon-science/disrupt-amr?style=flat)](https://github.com/amazon-science/disrupt-amr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/addlesee23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/understanding-disrupted-sentences-using-underspecified-abstract-meaning-representation) |
-| 2109 | Developing Speech Processing Pipelines for Police Accountability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/field23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06086-b31b1b.svg)](https://arxiv.org/abs/2306.06086) |
-| 2086 | Prosody-Controllable Gender-Ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception | [![GitHub](https://img.shields.io/github/stars/evaszekely/ambiguous?style=flat)](https://github.com/evaszekely/ambiguous) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/szekely23_interspeech.pdf) |
-| 848 | Affective Attributes of French Caregivers' Professional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rouas23_interspeech.pdf) |
+| 2194 | A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression with and without Medication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/neumann23b_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1n6ymnLKt21RDfawu9WHsd8tgmBPuz9SC/view) |
+| 307 | Understanding Disrupted Sentences using Underspecified Abstract Meaning Representation | [![GitHub](https://img.shields.io/github/stars/amazon-science/disrupt-amr?style=flat)](https://github.com/amazon-science/disrupt-amr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/addlesee23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/understanding-disrupted-sentences-using-underspecified-abstract-meaning-representation) |
+| 2109 | Developing Speech Processing Pipelines for Police Accountability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/field23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06086-b31b1b.svg)](https://arxiv.org/abs/2306.06086) |
+| 2086 | Prosody-Controllable Gender-Ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception | [![GitHub](https://img.shields.io/github/stars/evaszekely/ambiguous?style=flat)](https://github.com/evaszekely/ambiguous) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/szekely23_interspeech.pdf) |
+| 848 | Affective Attributes of French Caregivers' Professional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rouas23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1098,54 +1098,54 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 180 | Pragmatic Pertinence: A Learnable Confidence Metric to Assess the Subjective Quality of LM-Generated Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bellegarda23_interspeech.pdf) |
-| 2078 | ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ea_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16065-b31b1b.svg)](https://arxiv.org/abs/2305.16065) |
-| 916 | BASS: Block-wise Adaptation for Speech Summarization | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sharma23_interspeech.pdf) |
-| 1258 | Speaker Tracking using Graph Attention Networks with Varying Duration Utterances in Multi-Channel Naturalistic Data: Fearless Steps Apollo 11 Audio Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shekar23_interspeech.pdf) |
-| 36 | Combining Language Corpora in a Japanese Electromagnetic Articulography Database for Acoustic-to-Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yan23_interspeech.pdf) |
-| 523 | A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/zxiaohen/Speech-emotion-recognition-MCFN?style=flat)](https://github.com/zxiaohen/Speech-emotion-recognition-MCFN) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23g_interspeech.pdf) |
-| 2174 | Large Dataset Generation of Synchronized Music Audio and Lyrics at Scale using Teacher-Student Paradigm | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chivriga23_interspeech.pdf) |
-| 483 | Enc-Dec RNN Acoustic Word Embeddings Learned via Pairwise Prediction | [![GitHub](https://img.shields.io/github/stars/madhavlab/2023_adhiraj_encdecPairwisePred?style=flat)](https://github.com/madhavlab/2023_adhiraj_encdecPairwisePred) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/banerjee23_interspeech.pdf) |
-| 864 | Query based Acoustic Summarization for Podcasts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kotey23_interspeech.pdf) |
-| 1242 | Spot Keywords from Very Noisy and Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17706-b31b1b.svg)](https://arxiv.org/abs/2305.17706) |
-| 891 | Knowledge Distillation on Joint Task End-to-End Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nayem23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/knowledge-distillation-on-joint-task-end-to-end-speech-translation) |
-| 343 | Investigating Pre-trained Audio Encoders in the Low-Resource Condition | [![GitHub](https://img.shields.io/github/stars/YangHao97/investigateAudioEncoders?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17733-b31b1b.svg)](https://arxiv.org/abs/2305.17733) |
-| 1718 | Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18096-b31b1b.svg)](https://arxiv.org/abs/2305.18096) |
-| 823 | MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information | [![GitHub](https://img.shields.io/github/stars/SpringHuo/MAVD?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02263-b31b1b.svg)](https://arxiv.org/abs/2306.02263) |
-| 1674 | CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://cnceleb.org/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23y_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16049-b31b1b.svg)](https://arxiv.org/abs/2305.16049) |
-| 1762 | Improving Zero-Shot Cross-Domain Slot Filling via Transformer-based Slot Semantics Fusion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ca_interspeech.pdf) |
-| 619 | Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shin23_interspeech.pdf) |
-| 1468 | Boosting Punctuation Restoration with Data Generation and Reinforcement Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lai23c_interspeech.pdf) |
-| 695 | J-ToneNet: A Transformer-based Encoding Network for Improving Tone Classification in Continuous Speech via F0 Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23e_interspeech.pdf) |
-| 1152 | Towards Cross-Language Prosody Transfer for Dialog | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.cs.utep.edu/nigel/abstracts/interspeech2023.html) <br /> [![GitHub](https://img.shields.io/github/stars/joneavila/DRAL?style=flat)](https://github.com/joneavila/DRAL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/avila23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.cs.utep.edu/nigel/papers/interspeech2023.pdf) |
-| 2506 | Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kesiraju23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00208-b31b1b.svg)](https://arxiv.org/abs/2306.00208) |
-| 1980 | ITALIC: An Italian Intent Classification Dataset | [![GitHub](https://img.shields.io/github/stars/RiTA-nlp/ITALIC?style=flat)](https://github.com/RiTA-nlp/ITALIC) <br /> [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/8040649) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/koudounas23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08502-b31b1b.svg)](https://arxiv.org/abs/2306.08502) |
-| 1778 | Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rugayan23_interspeech.pdf) |
-| 1466 | How ChatGPT is Robust for Spoken Language Understanding? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23s_interspeech.pdf) |
-| 1233 | GigaST: A 10,000-hour Pseudo Speech Translation Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://st-benchmark.github.io/resources/GigaST.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ye23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2204.03939-b31b1b.svg)](https://arxiv.org/abs/2204.03939) |
-| 1570 | Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fan23b_interspeech.pdf) |
-| 2473 | Crowdsource-based Validation of the Audio Cocktail as a Sound Browsing Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fallgren23_interspeech.pdf) |
-| 1675 | PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts | [![GitHub](https://img.shields.io/github/stars/cpii-cai/PunCantonese?style=flat)](https://github.com/cpii-cai/PunCantonese) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23z_interspeech.pdf) |
-| 1358 | Speech-to-Face Conversion using Denoising Diffusion Probabilistic Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kato23_interspeech.pdf) |
-| 2255 | Inter-Connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nishikawa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16897-b31b1b.svg)](https://arxiv.org/abs/2305.16897) |
-| 1068 | How Does Pretraining Improve Discourse-aware Translation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19847-b31b1b.svg)](https://arxiv.org/abs/2305.19847) |
-| 1135 | PATCorrect: Non-Autoregressive Phoneme-Augmented Transformer for ASR Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.05040-b31b1b.svg)](https://arxiv.org/abs/2302.05040) |
-| 161 | Model-assisted Lexical Tone Evaluation of Three-Year-Old Chinese-Speaking Children by also Considering Segment Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tseng23_interspeech.pdf) |
-| 1392 | Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/declare-lab/segue?style=flat)](https://github.com/declare-lab/segue) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tan23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12301-b31b1b.svg)](https://arxiv.org/abs/2305.12301) |
-| 1582 | Joint Time and Frequency Transformer for Chinese Opera Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23u_interspeech.pdf) |
-| 116 | AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jung23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.14564-b31b1b.svg)](https://arxiv.org/abs/2210.14564) |
-| 2252 | Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arvan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.10033-b31b1b.svg)](https://arxiv.org/abs/2306.10033) |
-| 2250 | Combining Heterogeneous Structures for Event Causality Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pouranbenveyseh23_interspeech.pdf) |
-| 1208 | An Efficient Approach for the Automated Segmentation and Transcription of the People's Speech Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/biswas23_interspeech.pdf) |
-| 1425 | Diverse Feature Mapping and Fusion via Multitask Learning for Multilingual Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23g_interspeech.pdf) |
-| 903 | Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text | [![GitHub](https://img.shields.io/github/stars/apptek/ArabicDiacritizationInterspeech2023?style=flat)](https://github.com/apptek/ArabicDiacritizationInterspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bahar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03557-b31b1b.svg)](https://arxiv.org/abs/2306.03557) |
-| 466 | Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin | [![GitHub](https://img.shields.io/github/stars/muhammed-saeed/CLaT?style=flat)](https://github.com/muhammed-saeed/CLaT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00382-b31b1b.svg)](https://arxiv.org/abs/2307.00382) |
-| 1878 | Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23j_interspeech.pdf) |
-| 597 | PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords | [![GitHub](https://img.shields.io/github/stars/ncsoft/PhonMatchNet?style=flat)](https://github.com/ncsoft/PhonMatchNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23d_interspeech.pdf) |
-| 69 | Mix before Align: Towards Zero-Shot Cross-Lingual Sentiment Analysis via Soft-Mix and Multi-View Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23_interspeech.pdf) |
-| 170 | AlignAtt: using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/papi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11408-b31b1b.svg)](https://arxiv.org/abs/2305.11408) |
-| 2225 | Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/polak23_interspeech.pdf) |
-| 1979 | Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages | [![GitHub](https://img.shields.io/github/stars/unza-speech-lab/zambezi-voice?style=flat)](https://github.com/unza-speech-lab/zambezi-voice) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sikasote23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04428-b31b1b.svg)](https://arxiv.org/abs/2306.04428) |
+| 180 | Pragmatic Pertinence: A Learnable Confidence Metric to Assess the Subjective Quality of LM-Generated Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bellegarda23_interspeech.pdf) |
+| 2078 | ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ea_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16065-b31b1b.svg)](https://arxiv.org/abs/2305.16065) |
+| 916 | BASS: Block-wise Adaptation for Speech Summarization | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sharma23_interspeech.pdf) |
+| 1258 | Speaker Tracking using Graph Attention Networks with Varying Duration Utterances in Multi-Channel Naturalistic Data: Fearless Steps Apollo 11 Audio Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shekar23_interspeech.pdf) |
+| 36 | Combining Language Corpora in a Japanese Electromagnetic Articulography Database for Acoustic-to-Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yan23_interspeech.pdf) |
+| 523 | A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/zxiaohen/Speech-emotion-recognition-MCFN?style=flat)](https://github.com/zxiaohen/Speech-emotion-recognition-MCFN) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23g_interspeech.pdf) |
+| 2174 | Large Dataset Generation of Synchronized Music Audio and Lyrics at Scale using Teacher-Student Paradigm | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chivriga23_interspeech.pdf) |
+| 483 | Enc-Dec RNN Acoustic Word Embeddings Learned via Pairwise Prediction | [![GitHub](https://img.shields.io/github/stars/madhavlab/2023_adhiraj_encdecPairwisePred?style=flat)](https://github.com/madhavlab/2023_adhiraj_encdecPairwisePred) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/banerjee23_interspeech.pdf) |
+| 864 | Query based Acoustic Summarization for Podcasts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kotey23_interspeech.pdf) |
+| 1242 | Spot Keywords from Very Noisy and Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17706-b31b1b.svg)](https://arxiv.org/abs/2305.17706) |
+| 891 | Knowledge Distillation on Joint Task End-to-End Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nayem23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/knowledge-distillation-on-joint-task-end-to-end-speech-translation) |
+| 343 | Investigating Pre-trained Audio Encoders in the Low-Resource Condition | [![GitHub](https://img.shields.io/github/stars/YangHao97/investigateAudioEncoders?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17733-b31b1b.svg)](https://arxiv.org/abs/2305.17733) |
+| 1718 | Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18096-b31b1b.svg)](https://arxiv.org/abs/2305.18096) |
+| 823 | MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information | [![GitHub](https://img.shields.io/github/stars/SpringHuo/MAVD?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02263-b31b1b.svg)](https://arxiv.org/abs/2306.02263) |
+| 1674 | CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://cnceleb.org/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23y_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16049-b31b1b.svg)](https://arxiv.org/abs/2305.16049) |
+| 1762 | Improving Zero-Shot Cross-Domain Slot Filling via Transformer-based Slot Semantics Fusion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ca_interspeech.pdf) |
+| 619 | Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shin23_interspeech.pdf) |
+| 1468 | Boosting Punctuation Restoration with Data Generation and Reinforcement Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lai23c_interspeech.pdf) |
+| 695 | J-ToneNet: A Transformer-based Encoding Network for Improving Tone Classification in Continuous Speech via F0 Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23e_interspeech.pdf) |
+| 1152 | Towards Cross-Language Prosody Transfer for Dialog | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.cs.utep.edu/nigel/abstracts/interspeech2023.html) <br /> [![GitHub](https://img.shields.io/github/stars/joneavila/DRAL?style=flat)](https://github.com/joneavila/DRAL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/avila23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.cs.utep.edu/nigel/papers/interspeech2023.pdf) |
+| 2506 | Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kesiraju23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00208-b31b1b.svg)](https://arxiv.org/abs/2306.00208) |
+| 1980 | ITALIC: An Italian Intent Classification Dataset | [![GitHub](https://img.shields.io/github/stars/RiTA-nlp/ITALIC?style=flat)](https://github.com/RiTA-nlp/ITALIC) <br /> [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/8040649) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/koudounas23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08502-b31b1b.svg)](https://arxiv.org/abs/2306.08502) |
+| 1778 | Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rugayan23_interspeech.pdf) |
+| 1466 | How ChatGPT is Robust for Spoken Language Understanding? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23s_interspeech.pdf) |
+| 1233 | GigaST: A 10,000-hour Pseudo Speech Translation Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://st-benchmark.github.io/resources/GigaST.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ye23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2204.03939-b31b1b.svg)](https://arxiv.org/abs/2204.03939) |
+| 1570 | Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fan23b_interspeech.pdf) |
+| 2473 | Crowdsource-based Validation of the Audio Cocktail as a Sound Browsing Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fallgren23_interspeech.pdf) |
+| 1675 | PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts | [![GitHub](https://img.shields.io/github/stars/cpii-cai/PunCantonese?style=flat)](https://github.com/cpii-cai/PunCantonese) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23z_interspeech.pdf) |
+| 1358 | Speech-to-Face Conversion using Denoising Diffusion Probabilistic Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kato23_interspeech.pdf) |
+| 2255 | Inter-Connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nishikawa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16897-b31b1b.svg)](https://arxiv.org/abs/2305.16897) |
+| 1068 | How Does Pretraining Improve Discourse-aware Translation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19847-b31b1b.svg)](https://arxiv.org/abs/2305.19847) |
+| 1135 | PATCorrect: Non-Autoregressive Phoneme-Augmented Transformer for ASR Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.05040-b31b1b.svg)](https://arxiv.org/abs/2302.05040) |
+| 161 | Model-assisted Lexical Tone Evaluation of Three-Year-Old Chinese-Speaking Children by also Considering Segment Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tseng23_interspeech.pdf) |
+| 1392 | Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/declare-lab/segue?style=flat)](https://github.com/declare-lab/segue) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tan23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12301-b31b1b.svg)](https://arxiv.org/abs/2305.12301) |
+| 1582 | Joint Time and Frequency Transformer for Chinese Opera Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23u_interspeech.pdf) |
+| 116 | AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jung23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.14564-b31b1b.svg)](https://arxiv.org/abs/2210.14564) |
+| 2252 | Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arvan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.10033-b31b1b.svg)](https://arxiv.org/abs/2306.10033) |
+| 2250 | Combining Heterogeneous Structures for Event Causality Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pouranbenveyseh23_interspeech.pdf) |
+| 1208 | An Efficient Approach for the Automated Segmentation and Transcription of the People's Speech Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/biswas23_interspeech.pdf) |
+| 1425 | Diverse Feature Mapping and Fusion via Multitask Learning for Multilingual Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23g_interspeech.pdf) |
+| 903 | Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text | [![GitHub](https://img.shields.io/github/stars/apptek/ArabicDiacritizationInterspeech2023?style=flat)](https://github.com/apptek/ArabicDiacritizationInterspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bahar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03557-b31b1b.svg)](https://arxiv.org/abs/2306.03557) |
+| 466 | Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin | [![GitHub](https://img.shields.io/github/stars/muhammed-saeed/CLaT?style=flat)](https://github.com/muhammed-saeed/CLaT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00382-b31b1b.svg)](https://arxiv.org/abs/2307.00382) |
+| 1878 | Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23j_interspeech.pdf) |
+| 597 | PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords | [![GitHub](https://img.shields.io/github/stars/ncsoft/PhonMatchNet?style=flat)](https://github.com/ncsoft/PhonMatchNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23d_interspeech.pdf) |
+| 69 | Mix before Align: Towards Zero-Shot Cross-Lingual Sentiment Analysis via Soft-Mix and Multi-View Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23_interspeech.pdf) |
+| 170 | AlignAtt: using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/papi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11408-b31b1b.svg)](https://arxiv.org/abs/2305.11408) |
+| 2225 | Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/polak23_interspeech.pdf) |
+| 1979 | Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages | [![GitHub](https://img.shields.io/github/stars/unza-speech-lab/zambezi-voice?style=flat)](https://github.com/unza-speech-lab/zambezi-voice) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sikasote23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04428-b31b1b.svg)](https://arxiv.org/abs/2306.04428) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1157,34 +1157,34 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2421 | Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13108-b31b1b.svg)](https://arxiv.org/abs/2305.13108) |
-| 2198 | Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/papadimitriou23_interspeech.pdf) |
-| 1759 | Towards Supporting an Early Diagnosis of Multiple Sclerosis using Vocal Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gonzalezmachorro23_interspeech.pdf) |
-| 1891 | Whisper Features for Dysarthric Severity-Level Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rathod23_interspeech.pdf) |
-| 2191 | A New Benchmark of Aphasia Speech Recognition and Detection based on E-Branchformer and Multi-task Learning | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/aphasiabank/asr1) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13331-b31b1b.svg)](https://arxiv.org/abs/2305.13331) |
-| 222 | Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yue23_interspeech.pdf) |
-| 2026 | A Stutter Seldom Comes Alone - Cross-Corpus Stuttering Detection as a Multi-label Problem | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bayerl23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19255-b31b1b.svg)](https://arxiv.org/abs/2305.19255) |
-| 1542 | Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhattacharjee23_interspeech.pdf) |
-| 2203 | DuTa-VC: A Duration-aware Typical-to-Atypical Voice Conversion Approach with Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wanghelin1997.github.io/DuTa-VC-Demo/) <br /> [![GitHub](https://img.shields.io/github/stars/WangHelin1997/DuTa-VC?style=flat)](https://github.com/WangHelin1997/DuTa-VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23qa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.10588-b31b1b.svg)](https://arxiv.org/abs/2306.10588) |
-| 201 | CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice | [![GitHub](https://img.shields.io/github/stars/hedeshy/CNVVE?style=flat)](https://github.com/hedeshy/CNVVE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hedeshy23_interspeech.pdf) <br /> [![University of Southampton](https://img.shields.io/badge/soton-ac-015C84.svg)](https://eprints.soton.ac.uk/478344/) |
-| 1541 | Arabic Dysarthric Speech Recognition using Adversarial and Signal-based Augmentation | [![GitHub](https://img.shields.io/github/stars/massabaali7/AR_Dysarthric?style=flat)](https://github.com/massabaali7/AR_Dysarthric) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baali23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04368-b31b1b.svg)](https://arxiv.org/abs/2306.04368) |
-| 1887 | Weakly-Supervised Forced Alignment of Disfluent Speech using Phoneme-level Modeling | [![GitHub](https://img.shields.io/github/stars/zelaki/WSFA?style=flat)](https://github.com/zelaki/WSFA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kouzelis23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00996-b31b1b.svg)](https://arxiv.org/abs/2306.00996) |
-| 1998 | Glottal Source Analysis of Voice Deficits in Basal Ganglia Dysfunction: Evidence from de novo Parkinson's Disease and Huntington's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/novotny23_interspeech.pdf) |
-| 2478 | An Analysis of Glottal Features of Chronic Kidney Disease Speech and its Application to CKD Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mun23b_interspeech.pdf) |
-| 983 | Weakly Supervised Glottis Segmentation in High-Speed Video Endoscopy using Bounding Box Labels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/belagali23_interspeech.pdf) |
-| 1669 | Investigating the Dynamics of Hand and Lips in French Cued Speech using Attention Mechanisms and CTC-based Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sankar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08290-b31b1b.svg)](https://arxiv.org/abs/2306.08290) |
-| 670 | Hearing Loss Affects Emotion Perception in Older Adults: Evidence from a Prosody-Semantics Stroop Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23h_interspeech.pdf) |
-| 554 | Cochlear-Implant Listeners Listening to Cochlear-Implant Simulated Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kong23b_interspeech.pdf) |
-| 2168 | Validation of a Task-Independent Cepstral Peak Prominence Measure with Voice Activity Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/murton23_interspeech.pdf) |
-| 1679 | Score-balanced Loss for Multi-aspect Pronunciation Assessment | [![GitHub](https://img.shields.io/github/stars/doheejin/SB_loss_PA?style=flat)](https://github.com/doheejin/SB_loss_PA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16664-b31b1b.svg)](https://arxiv.org/abs/2305.16664) |
-| 2108 | Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection using Speech from Different Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tayebiarasteh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11284-b31b1b.svg)](https://arxiv.org/abs/2305.11284) |
-| 652 | F0inTFS: A Lightweight Periodicity Enhancement Strategy for Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23c_interspeech.pdf) |
-| 1678 | Differentiating Acoustic and Physiological Features in Speech for Hypoxia Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/obrien23_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-04154914.svg)](https://hal.science/hal-04154914) |
-| 786 | Mandarin Electrolaryngeal Speech Voice Conversion using Cross-Domain Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06653-b31b1b.svg)](https://arxiv.org/abs/2306.06653) |
-| 866 | Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chien23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06652-b31b1b.svg)](https://arxiv.org/abs/2306.06652) |
-| 1744 | Which Aspects of Motor Speech Disorder are Captured by Mel Frequency Cepstral Coefficients? Evidence from the Change in STN-DBS Conditions in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/illner23_interspeech.pdf) |
-| 1096 | Detecting Manifest Huntington's Disease using Vocal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/subramanian23_interspeech.pdf) |
-| 1623 | Exploring Multi-Task Learning and Data Augmentation in Dementia Detection with Self-Supervised Pre-trained Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23q_interspeech.pdf) |
+| 2421 | Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13108-b31b1b.svg)](https://arxiv.org/abs/2305.13108) |
+| 2198 | Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/papadimitriou23_interspeech.pdf) |
+| 1759 | Towards Supporting an Early Diagnosis of Multiple Sclerosis using Vocal Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gonzalezmachorro23_interspeech.pdf) |
+| 1891 | Whisper Features for Dysarthric Severity-Level Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rathod23_interspeech.pdf) |
+| 2191 | A New Benchmark of Aphasia Speech Recognition and Detection based on E-Branchformer and Multi-task Learning | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/aphasiabank/asr1) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13331-b31b1b.svg)](https://arxiv.org/abs/2305.13331) |
+| 222 | Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yue23_interspeech.pdf) |
+| 2026 | A Stutter Seldom Comes Alone - Cross-Corpus Stuttering Detection as a Multi-label Problem | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bayerl23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19255-b31b1b.svg)](https://arxiv.org/abs/2305.19255) |
+| 1542 | Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhattacharjee23_interspeech.pdf) |
+| 2203 | DuTa-VC: A Duration-aware Typical-to-Atypical Voice Conversion Approach with Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wanghelin1997.github.io/DuTa-VC-Demo/) <br /> [![GitHub](https://img.shields.io/github/stars/WangHelin1997/DuTa-VC?style=flat)](https://github.com/WangHelin1997/DuTa-VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23qa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.10588-b31b1b.svg)](https://arxiv.org/abs/2306.10588) |
+| 201 | CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice | [![GitHub](https://img.shields.io/github/stars/hedeshy/CNVVE?style=flat)](https://github.com/hedeshy/CNVVE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hedeshy23_interspeech.pdf) <br /> [![University of Southampton](https://img.shields.io/badge/soton-ac-015C84.svg)](https://eprints.soton.ac.uk/478344/) |
+| 1541 | Arabic Dysarthric Speech Recognition using Adversarial and Signal-based Augmentation | [![GitHub](https://img.shields.io/github/stars/massabaali7/AR_Dysarthric?style=flat)](https://github.com/massabaali7/AR_Dysarthric) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baali23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04368-b31b1b.svg)](https://arxiv.org/abs/2306.04368) |
+| 1887 | Weakly-Supervised Forced Alignment of Disfluent Speech using Phoneme-level Modeling | [![GitHub](https://img.shields.io/github/stars/zelaki/WSFA?style=flat)](https://github.com/zelaki/WSFA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kouzelis23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00996-b31b1b.svg)](https://arxiv.org/abs/2306.00996) |
+| 1998 | Glottal Source Analysis of Voice Deficits in Basal Ganglia Dysfunction: Evidence from de novo Parkinson's Disease and Huntington's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/novotny23_interspeech.pdf) |
+| 2478 | An Analysis of Glottal Features of Chronic Kidney Disease Speech and its Application to CKD Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mun23b_interspeech.pdf) |
+| 983 | Weakly Supervised Glottis Segmentation in High-Speed Video Endoscopy using Bounding Box Labels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/belagali23_interspeech.pdf) |
+| 1669 | Investigating the Dynamics of Hand and Lips in French Cued Speech using Attention Mechanisms and CTC-based Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sankar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08290-b31b1b.svg)](https://arxiv.org/abs/2306.08290) |
+| 670 | Hearing Loss Affects Emotion Perception in Older Adults: Evidence from a Prosody-Semantics Stroop Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23h_interspeech.pdf) |
+| 554 | Cochlear-Implant Listeners Listening to Cochlear-Implant Simulated Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kong23b_interspeech.pdf) |
+| 2168 | Validation of a Task-Independent Cepstral Peak Prominence Measure with Voice Activity Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/murton23_interspeech.pdf) |
+| 1679 | Score-balanced Loss for Multi-aspect Pronunciation Assessment | [![GitHub](https://img.shields.io/github/stars/doheejin/SB_loss_PA?style=flat)](https://github.com/doheejin/SB_loss_PA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16664-b31b1b.svg)](https://arxiv.org/abs/2305.16664) |
+| 2108 | Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection using Speech from Different Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tayebiarasteh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11284-b31b1b.svg)](https://arxiv.org/abs/2305.11284) |
+| 652 | F0inTFS: A Lightweight Periodicity Enhancement Strategy for Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23c_interspeech.pdf) |
+| 1678 | Differentiating Acoustic and Physiological Features in Speech for Hypoxia Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/obrien23_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-04154914.svg)](https://hal.science/hal-04154914) |
+| 786 | Mandarin Electrolaryngeal Speech Voice Conversion using Cross-Domain Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23h_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06653-b31b1b.svg)](https://arxiv.org/abs/2306.06653) |
+| 866 | Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chien23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06652-b31b1b.svg)](https://arxiv.org/abs/2306.06652) |
+| 1744 | Which Aspects of Motor Speech Disorder are Captured by Mel Frequency Cepstral Coefficients? Evidence from the Change in STN-DBS Conditions in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/illner23_interspeech.pdf) |
+| 1096 | Detecting Manifest Huntington's Disease using Vocal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/subramanian23_interspeech.pdf) |
+| 1623 | Exploring Multi-Task Learning and Data Augmentation in Dementia Detection with Self-Supervised Pre-trained Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23q_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1196,12 +1196,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 478 | Matching Latent Encoding for Audio-Text based Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nishu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05245-b31b1b.svg)](https://arxiv.org/abs/2306.05245) |
-| 1215 | Self-Paced Pattern Augmentation for Spoken Term Detection in Zero-Resource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/p23_interspeech.pdf) |
-| 2362 | On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23y_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/on-device-constrained-self-supervised-speech-representation-learning-for-keyword-spotting-via-knowledge-distillation) |
-| 90 | Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/michieli23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.12660-b31b1b.svg)](https://arxiv.org/abs/2307.12660) |
-| 689 | Improving Small Footprint Few-Shot Keyword Spotting with Supervision on Auxiliary Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23j_interspeech.pdf) |
-| 2222 | Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23t_interspeech.pdf) |
+| 478 | Matching Latent Encoding for Audio-Text based Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nishu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05245-b31b1b.svg)](https://arxiv.org/abs/2306.05245) |
+| 1215 | Self-Paced Pattern Augmentation for Spoken Term Detection in Zero-Resource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/p23_interspeech.pdf) |
+| 2362 | On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23y_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/on-device-constrained-self-supervised-speech-representation-learning-for-keyword-spotting-via-knowledge-distillation) |
+| 90 | Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/michieli23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.12660-b31b1b.svg)](https://arxiv.org/abs/2307.12660) |
+| 689 | Improving Small Footprint Few-Shot Keyword Spotting with Supervision on Auxiliary Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23j_interspeech.pdf) |
+| 2222 | Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23t_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1213,12 +1213,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 831 | Enhancing the Unified Streaming and Non-Streaming Model with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00755-b31b1b.svg)](https://arxiv.org/abs/2306.00755) |
-| 1497 | ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/song23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10649-b31b1b.svg)](https://arxiv.org/abs/2305.10649) |
-| 361 | Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01296-b31b1b.svg)](https://arxiv.org/abs/2306.01296) |
-| 1129 | DCTX-Conformer: Dynamic Context Carry-over for Low Latency Unified Streaming and Non-Streaming Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huybrechts23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08175-b31b1b.svg)](https://arxiv.org/abs/2306.08175) |
-| 1121 | Knowledge Distillation from Non-Streaming to Streaming ASR Encoder using Auxiliary Non-Streaming Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shim23_interspeech.pdf) |
-| 884 | Adaptive Contextual Biasing for Transducer based Streaming Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00804-b31b1b.svg)](https://arxiv.org/abs/2306.00804) |
+| 831 | Enhancing the Unified Streaming and Non-Streaming Model with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00755-b31b1b.svg)](https://arxiv.org/abs/2306.00755) |
+| 1497 | ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/song23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10649-b31b1b.svg)](https://arxiv.org/abs/2305.10649) |
+| 361 | Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01296-b31b1b.svg)](https://arxiv.org/abs/2306.01296) |
+| 1129 | DCTX-Conformer: Dynamic Context Carry-over for Low Latency Unified Streaming and Non-Streaming Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huybrechts23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08175-b31b1b.svg)](https://arxiv.org/abs/2306.08175) |
+| 1121 | Knowledge Distillation from Non-Streaming to Streaming ASR Encoder using Auxiliary Non-Streaming Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shim23_interspeech.pdf) |
+| 884 | Adaptive Contextual Biasing for Transducer based Streaming Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00804-b31b1b.svg)](https://arxiv.org/abs/2306.00804) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1230,12 +1230,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1753 | Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://avlit-interspeech.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/hmartelb/avlit?style=flat)](https://github.com/hmartelb/avlit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00160-b31b1b.svg)](https://arxiv.org/abs/2306.00160) |
-| 1389 | Remixing-based Unsupervised Source Separation from Scratch | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/saijo23_interspeech.pdf) |
-| 577 | CAPTDURE: Captioned Sound Dataset of Single Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/okamoto23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17758-b31b1b.svg)](https://arxiv.org/abs/2305.17758) |
-| 488 | Recursive Sound Source Separation with Deep Learning-based Beamforming for Unknown Number of Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/munakata23_interspeech.pdf) |
-| 2537 | Multi-Channel Speech Separation with Cross-Attention and Beamforming | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mosner23_interspeech.pdf) |
-| 185 | Background-Sound Controllable Voice Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eom23_interspeech.pdf) |
+| 1753 | Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://avlit-interspeech.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/hmartelb/avlit?style=flat)](https://github.com/hmartelb/avlit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00160-b31b1b.svg)](https://arxiv.org/abs/2306.00160) |
+| 1389 | Remixing-based Unsupervised Source Separation from Scratch | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/saijo23_interspeech.pdf) |
+| 577 | CAPTDURE: Captioned Sound Dataset of Single Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/okamoto23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17758-b31b1b.svg)](https://arxiv.org/abs/2305.17758) |
+| 488 | Recursive Sound Source Separation with Deep Learning-based Beamforming for Unknown Number of Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/munakata23_interspeech.pdf) |
+| 2537 | Multi-Channel Speech Separation with Cross-Attention and Beamforming | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mosner23_interspeech.pdf) |
+| 185 | Background-Sound Controllable Voice Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eom23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1247,12 +1247,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1922 | A Neural Architecture for Selective Attention to Speech Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jurov23_interspeech.pdf) |
-| 1122 | Quantifying Informational Masking due to Masker Intelligibility in Same-Talker Speech-in-Speech Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huo23_interspeech.pdf) |
-| 1476 | On the Benefits of Self-Supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cuervo23_interspeech.pdf) |
-| 2154 | Predicting Perceptual Centers Located at Vowel Onset in German Speech using Long Short-Term Memory Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schulz23_interspeech.pdf) |
-| 63 | Exploring the Mutual Intelligibility Breakdown Caused by Sculpting Speech from a Competing Speech Signal | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cooke23_interspeech.pdf) |
-| 2103 | Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kitahara23_interspeech.pdf) |
+| 1922 | A Neural Architecture for Selective Attention to Speech Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jurov23_interspeech.pdf) |
+| 1122 | Quantifying Informational Masking due to Masker Intelligibility in Same-Talker Speech-in-Speech Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huo23_interspeech.pdf) |
+| 1476 | On the Benefits of Self-Supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cuervo23_interspeech.pdf) |
+| 2154 | Predicting Perceptual Centers Located at Vowel Onset in German Speech using Long Short-Term Memory Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schulz23_interspeech.pdf) |
+| 63 | Exploring the Mutual Intelligibility Breakdown Caused by Sculpting Speech from a Competing Speech Signal | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cooke23_interspeech.pdf) |
+| 2103 | Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kitahara23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1264,12 +1264,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1879 | The Emergence of Obstruent-Intrinsic f0 and VOT as Cues to the Fortis/Lenis Contrast in West Central Bavarian | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pohnlein23_interspeech.pdf) |
-| 431 | 〈'〉 in Tsimane': A Preliminary Investigation | [![GIN](https://img.shields.io/badge/G-Node-2854A4.svg)](https://gin.g-node.org/William-N-Havard/tsimane-glottal-interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/havard23_interspeech.pdf) |
-| 2200 | Segmental Features of Brazilian (Santa Catarina) Hunsrik | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hoffmann23_interspeech.pdf) |
-| 2337 | Opening or Closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ratko23_interspeech.pdf) |
-| 295 | Increasing Aspiration of Word-Medial Fortis Plosives in Swiss Standard German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zebe23_interspeech.pdf) |
-| 1456 | Lexical Stress and Velar Palatalization in Italian: A Spatio-Temporal Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shao23_interspeech.pdf) |
+| 1879 | The Emergence of Obstruent-Intrinsic f0 and VOT as Cues to the Fortis/Lenis Contrast in West Central Bavarian | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pohnlein23_interspeech.pdf) |
+| 431 | 〈'〉 in Tsimane': A Preliminary Investigation | [![GIN](https://img.shields.io/badge/G-Node-2854A4.svg)](https://gin.g-node.org/William-N-Havard/tsimane-glottal-interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/havard23_interspeech.pdf) |
+| 2200 | Segmental Features of Brazilian (Santa Catarina) Hunsrik | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hoffmann23_interspeech.pdf) |
+| 2337 | Opening or Closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ratko23_interspeech.pdf) |
+| 295 | Increasing Aspiration of Word-Medial Fortis Plosives in Swiss Standard German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zebe23_interspeech.pdf) |
+| 1456 | Lexical Stress and Velar Palatalization in Italian: A Spatio-Temporal Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shao23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1281,65 +1281,65 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1989 | Vietnam-Celeb: A Large-Scale Dataset for Vietnamese Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/thanhpv2102/Vietnam-Celeb.Interspeech?style=flat)](https://github.com/thanhpv2102/Vietnam-Celeb.Interspeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pham23b_interspeech.pdf) |
-| 2254 | What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://is23-2254.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06524-b31b1b.svg)](https://arxiv.org/abs/2306.06524) |
-| 241 | The 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14624-b31b1b.svg)](https://arxiv.org/abs/2302.14624) |
-| 155 | Description and Analysis of the KPT system for NIST Language Recognition Evaluation 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sarni23_interspeech.pdf) |
-| 1725 | ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention | [![GitHub](https://img.shields.io/github/stars/Yip-Jia-Qi/ACA-Net?style=flat)](https://github.com/Yip-Jia-Qi/ACA-Net) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yip23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12121-b31b1b.svg)](https://arxiv.org/abs/2305.12121) |
-| 402 | Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yao23_interspeech.pdf) |
-| 2052 | Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07501-b31b1b.svg)](https://arxiv.org/abs/2306.07501)|
-| 2569 | Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dey23_interspeech.pdf) |
-| 1407 | A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-trained General Purpose Speech Model | [![GitHub](https://img.shields.io/github/stars/Srijith-rkr/KAUST-Whisper-Adapter?style=flat)](https://github.com/Srijith-rkr/KAUST-Whisper-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/radhakrishnan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11244-b31b1b.svg)](https://arxiv.org/abs/2305.11244)|
-| 2272 | HABLA: A Dataset of Latin American Spanish Accents for Voice Anti-Spoofing | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7370805) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tamayoflorez23_interspeech.pdf) |
-| 1702 | Self-Supervised Learning Representation based Accent Recognition with Persistent Accent Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23aa_interspeech.pdf) |
-| 800 | Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23g_interspeech.pdf) |
-| 1974 | Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/das23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.10326-b31b1b.svg)](https://arxiv.org/abs/2302.10326) |
-| 105 | Pyannote.Audio 2.1 Speaker Diarization Pipeline: Principle, Benchmark and Recipe | [![GitHub](https://img.shields.io/github/stars/pyannote/pyannote-audio?style=flat)](https://github.com/pyannote/pyannote-audio) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bredin23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://huggingface.co/bhuvanesh25/pyannote-diar-copy/resolve/main/technical_report_2.1.pdf) |
-| 1524 | Model Compression for DNN-based Speaker Verification using Weight Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17326-b31b1b.svg)](https://arxiv.org/abs/2210.17326) |
-| 1354 | Multi-Resolution Approach to Identification of Spoken Languages and to Improve Overall Language Diarization System using Whisper Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vachhani23_interspeech.pdf) |
-| 125 | Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zeng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10940-b31b1b.svg)](https://arxiv.org/abs/2305.10940) |
-| 849 | Dynamic Fully-Connected Layer for Large-Scale Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/song23b_interspeech.pdf) |
-| 844 | Reversible Neural Networks for Memory-Efficient Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23i_interspeech.pdf) |
-| 777 | ECAPA++: Fine-grained Deep Embedding Learning for TDNN based Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23f_interspeech.pdf) |
-| 1206 | TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13701-b31b1b.svg)](https://arxiv.org/abs/2305.13701) |
-| 100 | Fooling Speaker Identification Systems with Adversarial Background Music | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zuo23_interspeech.pdf) |
-| 1314 | Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23r_interspeech.pdf) |
-| 574 | Target Active Speaker Detection with Audio-Visual Cues | [![GitHub](https://img.shields.io/github/stars/Jiang-Yidi/TS-TalkNet?style=flat)](https://github.com/Jiang-Yidi/TS-TalkNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12831-b31b1b.svg)](https://arxiv.org/abs/2305.12831) |
-| 2401 | Improving End-to-End Neural Diarization using Conversational Summary Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/broughton23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.13863-b31b1b.svg)](https://arxiv.org/abs/2306.13863) |
-| 2039 | Phase Perturbation Improves Channel Robustness for Speech Spoofing Countermeasures | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongyi.dev/phase-antispoofing/) <br /> [![GitHub](https://img.shields.io/github/stars/yongyizang/PhaseAntispoofing_INTERSPEECH?style=flat)](https://github.com/yongyizang/PhaseAntispoofing_INTERSPEECH) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03389-b31b1b.svg)](https://arxiv.org/abs/2306.03389) |
-| 210 | Improving Training Datasets for Resource-constrained Speaker Recognition Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bousquet23_interspeech.pdf) |
-| 1498 | Instance-based Temporal Normalization for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lertpetchpun23_interspeech.pdf) |
-| 881 | On the Robustness of Wav2Vec 2.0 based Speaker Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/novoselov23_interspeech.pdf) |
-| 697 | P-Vectors: A Parallel-coupled TDNN/Transformer Network for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/xyw7/pvector?style=flat)](https://github.com/xyw7/pvector) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14778-b31b1b.svg)](https://arxiv.org/abs/2305.14778) |
-| 1249 | Group GMM-ResNet for Detection of Synthetic Speech Attacks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lei23_interspeech.pdf) |
-| 452 | Robust Training for Speaker Verification against Noisy Labels | [![GitHub](https://img.shields.io/github/stars/PunkMale/OR-Gate?style=flat)](https://github.com/PunkMale/OR-Gate) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12080-b31b1b.svg)](https://arxiv.org/abs/2211.12080) |
-| 1404 | Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jeoung23_interspeech.pdf) |
-| 1217 | Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022 | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.cnceleb.org/competition) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00815-b31b1b.svg)](https://arxiv.org/abs/2211.00815) |
-| 1648 | Describing the Phonetics in the Underlying Speech Attributes for Deep and Interpretable Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/LIAvignon/BA-LR?style=flat)](https://github.com/LIAvignon/BA-LR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benamor23_interspeech.pdf) |
-| 1214 | Range-based Equal Error Rate for Spoof Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17739-b31b1b.svg)](https://arxiv.org/abs/2305.17739) |
-| 1888 | Exploring the English Accent-Independent Features for Speech Emotion Recognition using Filter and Wrapper-based Methods for Feature Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tabassum23_interspeech.pdf) |
-| 205 | Powerset Multi-Class Cross Entropy Loss for Neural Speaker Diarization | [![GitHub](https://img.shields.io/github/stars/FrenchKrab/IS2023-powerset-diarization?style=flat)](https://github.com/FrenchKrab/IS2023-powerset-diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/plaquet23_interspeech.pdf) |
-| 394 | A Method of Audio-Visual Person Verification by Mining Connections between Time Series | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23_interspeech.pdf) |
-| 605 | One-Step Knowledge Distillation and Fine-Tuning in using Large Pre-trained Self-Supervised Learning Models for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/jungwoo4021/OS-KDFT?style=flat)](https://github.com/jungwoo4021/OS-KDFT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/heo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17394-b31b1b.svg)](https://arxiv.org/abs/2305.17394) |
-| 409 | Defense Against Adversarial Attacks on Audio DeepFake Detection | [![GitHub](https://img.shields.io/github/stars/piotrkawa/audio-deepfake-adversarial-attacks?style=flat)](https://github.com/piotrkawa/audio-deepfake-adversarial-attacks) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kawa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.14597-b31b1b.svg)](https://arxiv.org/abs/2212.14597) |
-| 1820 | A Conformer-based Classifier for Variable-Length Utterance Processing in Anti-Spoofing | [![GitHub](https://img.shields.io/github/stars/ErosRos/conformer-based-classifier-for-anti-spoofing?style=flat)](https://github.com/ErosRos/conformer-based-classifier-for-anti-spoofing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rosello23_interspeech.pdf) |
-| 1557 | Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ia_interspeech.pdf) |
-| 2419 | CommonAccent: Exploring Large Acoustic Pre-trained Models for Accent Classification based on Common Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zuluagagomez23_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371136510_CommonAccent_Exploring_Large_Acoustic_Pretrained_Models_for_Accent_Classification_Based_on_Common_Voice) |
-| 266 | From Adaptive Score Normalization to Adaptive Data Normalization for Speaker Verification Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cumani23_interspeech.pdf) |
-| 1513 | CAM++: A Fast and Efficient Network for Speaker Verification using Context-aware Masking | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ha_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00332-b31b1b.svg)](https://arxiv.org/abs/2303.00332) |
-| 1928 | North Sámi Dialect Identification with Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/skakouros/sami_dialects?style=flat)](https://github.com/skakouros/sami_dialects) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kakouros23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11864-b31b1b.svg)](https://arxiv.org/abs/2305.11864) |
-| 2289 | Encoder-Decoder Multimodal Speaker Change Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jung23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00680-b31b1b.svg)](https://arxiv.org/abs/2306.00680) |
-| 1603 | Disentangled Representation Learning for Multilingual Speaker Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://mm.kaist.ac.kr/projects/voxceleb1-b/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nam23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00437-b31b1b.svg)](https://arxiv.org/abs/2211.00437) |
-| 2310 | A Compact End-to-End Model with Local and Global Context for Spoken Language Identification | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jia23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15781-b31b1b.svg)](https://arxiv.org/abs/2210.15781) |
-| 1005 | On the Robustness of Arabic Speech Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sullivan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03789-b31b1b.svg)](https://arxiv.org/abs/2306.03789) |
-| 927 | Adaptive Neural Network Quantization for Lightweight Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23u_interspeech.pdf) |
-| 1205 | Adversarial Diffusion Probability Model For Cross-Domain Speaker Verification Integrating Contrastive Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/su23_interspeech.pdf) |
-| 1554 | Chinese Dialect Recognition based on Transfer Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23f_interspeech.pdf) |
-| 270 | Spoofing Attacker also Benefits from Self-Supervised Pretrained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ito23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15518-b31b1b.svg)](https://arxiv.org/abs/2305.15518) |
-| 854 | Label aware Speech Representation Learning for Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vashishth23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04374-b31b1b.svg)](https://arxiv.org/abs/2306.04374) |
-| 1761 | Exploring the Impact of Back-end Network on Wav2vec 2.0 for Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luo23c_interspeech.pdf) |
-| 453 | Improving Speaker Verification with Self-pretrained Transformer Models | [![GitHub](https://img.shields.io/github/stars/JunyiPeng00/Interspeech23_SelfPretraining?style=flat)](https://github.com/JunyiPeng00/Interspeech23_SelfPretraining) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10517-b31b1b.svg)](https://arxiv.org/abs/2305.10517) |
-| 372 | Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-based, Alignment-Free and Hybrid Approaches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ribeiro23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.08950-b31b1b.svg)](https://arxiv.org/abs/2302.08950) |
+| 1989 | Vietnam-Celeb: A Large-Scale Dataset for Vietnamese Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/thanhpv2102/Vietnam-Celeb.Interspeech?style=flat)](https://github.com/thanhpv2102/Vietnam-Celeb.Interspeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pham23b_interspeech.pdf) |
+| 2254 | What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://is23-2254.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.06524-b31b1b.svg)](https://arxiv.org/abs/2306.06524) |
+| 241 | The 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14624-b31b1b.svg)](https://arxiv.org/abs/2302.14624) |
+| 155 | Description and Analysis of the KPT system for NIST Language Recognition Evaluation 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sarni23_interspeech.pdf) |
+| 1725 | ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention | [![GitHub](https://img.shields.io/github/stars/Yip-Jia-Qi/ACA-Net?style=flat)](https://github.com/Yip-Jia-Qi/ACA-Net) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yip23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12121-b31b1b.svg)](https://arxiv.org/abs/2305.12121) |
+| 402 | Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yao23_interspeech.pdf) |
+| 2052 | Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07501-b31b1b.svg)](https://arxiv.org/abs/2306.07501)|
+| 2569 | Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dey23_interspeech.pdf) |
+| 1407 | A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-trained General Purpose Speech Model | [![GitHub](https://img.shields.io/github/stars/Srijith-rkr/KAUST-Whisper-Adapter?style=flat)](https://github.com/Srijith-rkr/KAUST-Whisper-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/radhakrishnan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11244-b31b1b.svg)](https://arxiv.org/abs/2305.11244)|
+| 2272 | HABLA: A Dataset of Latin American Spanish Accents for Voice Anti-Spoofing | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7370805) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tamayoflorez23_interspeech.pdf) |
+| 1702 | Self-Supervised Learning Representation based Accent Recognition with Persistent Accent Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23aa_interspeech.pdf) |
+| 800 | Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23g_interspeech.pdf) |
+| 1974 | Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/das23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.10326-b31b1b.svg)](https://arxiv.org/abs/2302.10326) |
+| 105 | Pyannote.Audio 2.1 Speaker Diarization Pipeline: Principle, Benchmark and Recipe | [![GitHub](https://img.shields.io/github/stars/pyannote/pyannote-audio?style=flat)](https://github.com/pyannote/pyannote-audio) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bredin23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://huggingface.co/bhuvanesh25/pyannote-diar-copy/resolve/main/technical_report_2.1.pdf) |
+| 1524 | Model Compression for DNN-based Speaker Verification using Weight Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.17326-b31b1b.svg)](https://arxiv.org/abs/2210.17326) |
+| 1354 | Multi-Resolution Approach to Identification of Spoken Languages and to Improve Overall Language Diarization System using Whisper Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vachhani23_interspeech.pdf) |
+| 125 | Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zeng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10940-b31b1b.svg)](https://arxiv.org/abs/2305.10940) |
+| 849 | Dynamic Fully-Connected Layer for Large-Scale Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/song23b_interspeech.pdf) |
+| 844 | Reversible Neural Networks for Memory-Efficient Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23i_interspeech.pdf) |
+| 777 | ECAPA++: Fine-grained Deep Embedding Learning for TDNN based Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23f_interspeech.pdf) |
+| 1206 | TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13701-b31b1b.svg)](https://arxiv.org/abs/2305.13701) |
+| 100 | Fooling Speaker Identification Systems with Adversarial Background Music | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zuo23_interspeech.pdf) |
+| 1314 | Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23r_interspeech.pdf) |
+| 574 | Target Active Speaker Detection with Audio-Visual Cues | [![GitHub](https://img.shields.io/github/stars/Jiang-Yidi/TS-TalkNet?style=flat)](https://github.com/Jiang-Yidi/TS-TalkNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12831-b31b1b.svg)](https://arxiv.org/abs/2305.12831) |
+| 2401 | Improving End-to-End Neural Diarization using Conversational Summary Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/broughton23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.13863-b31b1b.svg)](https://arxiv.org/abs/2306.13863) |
+| 2039 | Phase Perturbation Improves Channel Robustness for Speech Spoofing Countermeasures | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongyi.dev/phase-antispoofing/) <br /> [![GitHub](https://img.shields.io/github/stars/yongyizang/PhaseAntispoofing_INTERSPEECH?style=flat)](https://github.com/yongyizang/PhaseAntispoofing_INTERSPEECH) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03389-b31b1b.svg)](https://arxiv.org/abs/2306.03389) |
+| 210 | Improving Training Datasets for Resource-constrained Speaker Recognition Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bousquet23_interspeech.pdf) |
+| 1498 | Instance-based Temporal Normalization for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lertpetchpun23_interspeech.pdf) |
+| 881 | On the Robustness of Wav2Vec 2.0 based Speaker Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/novoselov23_interspeech.pdf) |
+| 697 | P-Vectors: A Parallel-coupled TDNN/Transformer Network for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/xyw7/pvector?style=flat)](https://github.com/xyw7/pvector) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14778-b31b1b.svg)](https://arxiv.org/abs/2305.14778) |
+| 1249 | Group GMM-ResNet for Detection of Synthetic Speech Attacks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lei23_interspeech.pdf) |
+| 452 | Robust Training for Speaker Verification against Noisy Labels | [![GitHub](https://img.shields.io/github/stars/PunkMale/OR-Gate?style=flat)](https://github.com/PunkMale/OR-Gate) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.12080-b31b1b.svg)](https://arxiv.org/abs/2211.12080) |
+| 1404 | Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jeoung23_interspeech.pdf) |
+| 1217 | Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022 | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.cnceleb.org/competition) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00815-b31b1b.svg)](https://arxiv.org/abs/2211.00815) |
+| 1648 | Describing the Phonetics in the Underlying Speech Attributes for Deep and Interpretable Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/LIAvignon/BA-LR?style=flat)](https://github.com/LIAvignon/BA-LR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benamor23_interspeech.pdf) |
+| 1214 | Range-based Equal Error Rate for Spoof Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17739-b31b1b.svg)](https://arxiv.org/abs/2305.17739) |
+| 1888 | Exploring the English Accent-Independent Features for Speech Emotion Recognition using Filter and Wrapper-based Methods for Feature Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tabassum23_interspeech.pdf) |
+| 205 | Powerset Multi-Class Cross Entropy Loss for Neural Speaker Diarization | [![GitHub](https://img.shields.io/github/stars/FrenchKrab/IS2023-powerset-diarization?style=flat)](https://github.com/FrenchKrab/IS2023-powerset-diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/plaquet23_interspeech.pdf) |
+| 394 | A Method of Audio-Visual Person Verification by Mining Connections between Time Series | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23_interspeech.pdf) |
+| 605 | One-Step Knowledge Distillation and Fine-Tuning in using Large Pre-trained Self-Supervised Learning Models for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/jungwoo4021/OS-KDFT?style=flat)](https://github.com/jungwoo4021/OS-KDFT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/heo23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17394-b31b1b.svg)](https://arxiv.org/abs/2305.17394) |
+| 409 | Defense Against Adversarial Attacks on Audio DeepFake Detection | [![GitHub](https://img.shields.io/github/stars/piotrkawa/audio-deepfake-adversarial-attacks?style=flat)](https://github.com/piotrkawa/audio-deepfake-adversarial-attacks) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kawa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.14597-b31b1b.svg)](https://arxiv.org/abs/2212.14597) |
+| 1820 | A Conformer-based Classifier for Variable-Length Utterance Processing in Anti-Spoofing | [![GitHub](https://img.shields.io/github/stars/ErosRos/conformer-based-classifier-for-anti-spoofing?style=flat)](https://github.com/ErosRos/conformer-based-classifier-for-anti-spoofing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rosello23_interspeech.pdf) |
+| 1557 | Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ia_interspeech.pdf) |
+| 2419 | CommonAccent: Exploring Large Acoustic Pre-trained Models for Accent Classification based on Common Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zuluagagomez23_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371136510_CommonAccent_Exploring_Large_Acoustic_Pretrained_Models_for_Accent_Classification_Based_on_Common_Voice) |
+| 266 | From Adaptive Score Normalization to Adaptive Data Normalization for Speaker Verification Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cumani23_interspeech.pdf) |
+| 1513 | CAM++: A Fast and Efficient Network for Speaker Verification using Context-aware Masking | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ha_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00332-b31b1b.svg)](https://arxiv.org/abs/2303.00332) |
+| 1928 | North Sámi Dialect Identification with Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/skakouros/sami_dialects?style=flat)](https://github.com/skakouros/sami_dialects) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kakouros23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11864-b31b1b.svg)](https://arxiv.org/abs/2305.11864) |
+| 2289 | Encoder-Decoder Multimodal Speaker Change Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jung23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00680-b31b1b.svg)](https://arxiv.org/abs/2306.00680) |
+| 1603 | Disentangled Representation Learning for Multilingual Speaker Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://mm.kaist.ac.kr/projects/voxceleb1-b/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nam23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00437-b31b1b.svg)](https://arxiv.org/abs/2211.00437) |
+| 2310 | A Compact End-to-End Model with Local and Global Context for Spoken Language Identification | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jia23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15781-b31b1b.svg)](https://arxiv.org/abs/2210.15781) |
+| 1005 | On the Robustness of Arabic Speech Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sullivan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03789-b31b1b.svg)](https://arxiv.org/abs/2306.03789) |
+| 927 | Adaptive Neural Network Quantization for Lightweight Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23u_interspeech.pdf) |
+| 1205 | Adversarial Diffusion Probability Model For Cross-Domain Speaker Verification Integrating Contrastive Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/su23_interspeech.pdf) |
+| 1554 | Chinese Dialect Recognition based on Transfer Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23f_interspeech.pdf) |
+| 270 | Spoofing Attacker also Benefits from Self-Supervised Pretrained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ito23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15518-b31b1b.svg)](https://arxiv.org/abs/2305.15518) |
+| 854 | Label aware Speech Representation Learning for Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vashishth23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04374-b31b1b.svg)](https://arxiv.org/abs/2306.04374) |
+| 1761 | Exploring the Impact of Back-end Network on Wav2vec 2.0 for Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luo23c_interspeech.pdf) |
+| 453 | Improving Speaker Verification with Self-pretrained Transformer Models | [![GitHub](https://img.shields.io/github/stars/JunyiPeng00/Interspeech23_SelfPretraining?style=flat)](https://github.com/JunyiPeng00/Interspeech23_SelfPretraining) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10517-b31b1b.svg)](https://arxiv.org/abs/2305.10517) |
+| 372 | Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-based, Alignment-Free and Hybrid Approaches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ribeiro23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.08950-b31b1b.svg)](https://arxiv.org/abs/2302.08950) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1351,23 +1351,23 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2336 | Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23d_interspeech.pdf) |
-| 160 | Streaming Parrotron for On-Device Speech-to-Speech Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rybakov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.13761-b31b1b.svg)](https://arxiv.org/abs/2210.13761) |
-| 2407 | Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://controllable-tts.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shaheen23_interspeech.pdf) |
-| 2518 | E2E-S2S-VC: End-to-End Sequence-to-Sequence Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ast-astrec.nict.go.jp/demo_samples/e2e-s2s-vc/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/okamoto23b_interspeech.pdf) |
-| 2403 | DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer | [![GitHub](https://img.shields.io/github/stars/lakahaga/dc-comix-tts?style=flat)](https://github.com/lakahaga/dc-comix-tts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19567-b31b1b.svg)](https://arxiv.org/abs/2305.19567) |
-| 419 | Voice Conversion with Just Nearest Neighbors | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bshall.github.io/knn-vc/) <br /> [![GitHub](https://img.shields.io/github/stars/bshall/knn-vc?style=flat)](https://github.com/bshall/knn-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baas23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18975-b31b1b.svg)](https://arxiv.org/abs/2305.18975) |
-| 1193 | CFVC: Conditional Filtering for Controllable Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](http://www.kecl.ntt.co.jp/people/tanaka.ko/projects/cfvc/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tanaka23_interspeech.pdf) |
-| 1157 | DualVC: Dual-mode Voice Conversion using Intra-Model Knowledge Distillation and Hybrid Predictive Coding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dualvc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ning23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12425-b31b1b.svg)](https://arxiv.org/abs/2305.12425) |
-| 39 | Attention-based Interactive Disentangling Network for Instance-Level Emotional Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ainn-evc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23_interspeech.pdf) |
-| 836 | ALO-VC: Any-to-Any Low-Latency One-Shot Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bohan7.github.io/ALO-VC-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01100-b31b1b.svg)](https://arxiv.org/abs/2306.01100) |
-| 1978 | Evaluating and Reducing the Distance between Synthetic and Real Speech Distributions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/minixhofer23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.16049-b31b1b.svg)](https://arxiv.org/abs/2211.16049) |
-| 2202 | Decoupling Segmental and Prosodic cues of Non-Native Speech through Vector Quantization | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymousis23.github.io/demos/prosody-accent-conversion/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/quamer23_interspeech.pdf) |
-| 2383 | VC-T: Streaming Voice Conversion based on Neural Transducer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023vct/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kanagawa23_interspeech.pdf) |
-| 191 | Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion Preserving Voice Conversion | [![GitHub](https://img.shields.io/github/stars/suhitaghosh10/emo-stargan?style=flat)](https://github.com/suhitaghosh10/emo-stargan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ghosh23_interspeech.pdf) |
-| 1788 | ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://melissachen15.notion.site/melissachen15/ControlVC-Audio-Demo-dd0ea58c5b7f434a81af9cbcd67f56f6) [![GitHub](https://img.shields.io/github/stars/MelissaChen15/control-vc?style=flat)](https://github.com/MelissaChen15/control-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23r_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.11866-b31b1b.svg)](https://arxiv.org/abs/2209.11866) |
-| 1356 | Reverberation-Controllable Voice Conversion using Reverberation Time Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23e_interspeech.pdf) |
-| 2558 | Cross-Utterance Conditioned Coherent Speech Editing | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://speechediting-8gdxbpso7cc72014-1307012619.tcloudbaseapp.com/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23d_interspeech.pdf) |
+| 2336 | Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23d_interspeech.pdf) |
+| 160 | Streaming Parrotron for On-Device Speech-to-Speech Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rybakov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.13761-b31b1b.svg)](https://arxiv.org/abs/2210.13761) |
+| 2407 | Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://controllable-tts.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shaheen23_interspeech.pdf) |
+| 2518 | E2E-S2S-VC: End-to-End Sequence-to-Sequence Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ast-astrec.nict.go.jp/demo_samples/e2e-s2s-vc/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/okamoto23b_interspeech.pdf) |
+| 2403 | DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer | [![GitHub](https://img.shields.io/github/stars/lakahaga/dc-comix-tts?style=flat)](https://github.com/lakahaga/dc-comix-tts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19567-b31b1b.svg)](https://arxiv.org/abs/2305.19567) |
+| 419 | Voice Conversion with Just Nearest Neighbors | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bshall.github.io/knn-vc/) <br /> [![GitHub](https://img.shields.io/github/stars/bshall/knn-vc?style=flat)](https://github.com/bshall/knn-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baas23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18975-b31b1b.svg)](https://arxiv.org/abs/2305.18975) |
+| 1193 | CFVC: Conditional Filtering for Controllable Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](http://www.kecl.ntt.co.jp/people/tanaka.ko/projects/cfvc/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tanaka23_interspeech.pdf) |
+| 1157 | DualVC: Dual-mode Voice Conversion using Intra-Model Knowledge Distillation and Hybrid Predictive Coding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dualvc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ning23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12425-b31b1b.svg)](https://arxiv.org/abs/2305.12425) |
+| 39 | Attention-based Interactive Disentangling Network for Instance-Level Emotional Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ainn-evc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23_interspeech.pdf) |
+| 836 | ALO-VC: Any-to-Any Low-Latency One-Shot Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bohan7.github.io/ALO-VC-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01100-b31b1b.svg)](https://arxiv.org/abs/2306.01100) |
+| 1978 | Evaluating and Reducing the Distance between Synthetic and Real Speech Distributions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/minixhofer23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.16049-b31b1b.svg)](https://arxiv.org/abs/2211.16049) |
+| 2202 | Decoupling Segmental and Prosodic cues of Non-Native Speech through Vector Quantization | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymousis23.github.io/demos/prosody-accent-conversion/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/quamer23_interspeech.pdf) |
+| 2383 | VC-T: Streaming Voice Conversion based on Neural Transducer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023vct/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kanagawa23_interspeech.pdf) |
+| 191 | Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion Preserving Voice Conversion | [![GitHub](https://img.shields.io/github/stars/suhitaghosh10/emo-stargan?style=flat)](https://github.com/suhitaghosh10/emo-stargan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ghosh23_interspeech.pdf) |
+| 1788 | ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://melissachen15.notion.site/melissachen15/ControlVC-Audio-Demo-dd0ea58c5b7f434a81af9cbcd67f56f6) [![GitHub](https://img.shields.io/github/stars/MelissaChen15/control-vc?style=flat)](https://github.com/MelissaChen15/control-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23r_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.11866-b31b1b.svg)](https://arxiv.org/abs/2209.11866) |
+| 1356 | Reverberation-Controllable Voice Conversion using Reverberation Time Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23e_interspeech.pdf) |
+| 2558 | Cross-Utterance Conditioned Coherent Speech Editing | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://speechediting-8gdxbpso7cc72014-1307012619.tcloudbaseapp.com/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23d_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1379,35 +1379,35 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2287 | An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/escobargrisales23_interspeech.pdf) |
-| 1332 | Personalization for Robust Voice Pathology Detection in Sound Waves | [![GitHub](https://img.shields.io/github/stars/Fsoft-AIC/RoPADet?style=flat)](https://github.com/Fsoft-AIC/RoPADet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23_interspeech.pdf) |
-| 2249 | Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23d_interspeech.pdf) |
-| 1990 | Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/niu23b_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://emp.engin.umich.edu/wp-content/uploads/sites/67/2023/06/Capturing_Mismatch_between_Textual_and_Acoustic_Emotion_Expressions_for_Mood_Identification_in_Bipolar_Disorder-3.pdf) |
-| 296 | FTA-Net: A Frequency and Time Attention Network for Speech Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23d_interspeech.pdf) |
-| 1709 | Bayesian Networks for the Robust and Unbiased Prediction of Depression and its Symptoms Utilizing Speech and Multimodal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fara23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://readpaper.com/paper/4770892998779076609) |
-| 1263 | Hyper-Parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23y_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15265-b31b1b.svg)](https://arxiv.org/abs/2306.15265) |
-| 1721 | Classifying Depression Symptom Severity: Assessment of Speech Representations in Personalized and Generalized Machine Learning Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/campbell23_interspeech.pdf) |
-| 1946 | Active Learning for Abnormal Lung Sound Data Curation and Detection in Asthma | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ghaffarzadegan23_interspeech.pdf) |
-| 2079 | Automatic Assessment of Alzheimer's across Three Languages using Speech and Language Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pereztoro23_interspeech.pdf) |
-| 301 | On-the-Fly Feature based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition | [![GitHub](https://img.shields.io/github/stars/timspeech/on_the_fly_adapt?style=flat)](https://github.com/timspeech/on_the_fly_adapt) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.14593-b31b1b.svg)](https://arxiv.org/abs/2203.14593) |
-| 1722 | Relationship between LTAS-based Spectral Moments and Acoustic Parameters of Hypokinetic Dysarthria in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/svihlik23_interspeech.pdf) |
-| 963 | Respiratory Distress Estimation in Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alvarado23_interspeech.pdf) |
-| 1771 | Prediction of the Gender-based Violence Victim Condition using Speech: What do Machine Learning Models rely on? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/reynerfuentes23_interspeech.pdf) |
-| 1916 | Whisper Encoder features for Infant Cry Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/charola23_interspeech.pdf) |
-| 1997 | Classifying Dementia in the Presence of Depression: A Cross-Corpus Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/braun23_interspeech.pdf) |
-| 297 | Exploiting Cross-Domain and Cross-Lingual Ultrasound Tongue Imaging Features for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2206.07327-b31b1b.svg)](https://arxiv.org/abs/2206.07327) |
-| 464 | Multi-Class Detection of Pathological Speech with Latent Features: How does It Perform on Unseen Data? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wagner23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15336-b31b1b.svg)](https://arxiv.org/abs/2210.15336) |
-| 2002 | Responsiveness, Sensitivity and Clinical Utility of Timing-Related Speech Biomarkers for Remote Monitoring of ALS Disease Progression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kothare23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1-W1buG48sqQnd9uld2c-z-Ls0NSS-bNn/view) |
-| 322 | Use of Speech Impairment Severity for Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10659-b31b1b.svg)](https://arxiv.org/abs/2305.10659) |
-| 721 | MMLung: Moving Closer to Practical Lung Health Estimation using Smartphones | [![GitHub](https://img.shields.io/github/stars/MohammedMosuily/mmlung?style=flat)](https://github.com/MohammedMosuily/mmlung) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mosuily23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://mobiuk.org/2023/abstract/S5_P1_Mosuily_MMLung.pdf) |
-| 913 | Investigating the Utility of Synthetic Data for Doctor-Patient Conversation Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23i_interspeech.pdf) |
-| 2101 | Non-Uniform Speaker Disentanglement for Depression Detection from Raw Speech Signals | [![GitHub](https://img.shields.io/github/stars/kingformatty/NUSD?style=flat)](https://github.com/kingformatty/NUSD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23pa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01861-b31b1b.svg)](https://arxiv.org/abs/2306.01861) |
-| 753 | PoCaPNet: A Novel Approach for Surgical Phase Recognition using Speech and X-Ray Images | [![GitHub](https://img.shields.io/github/stars/kubicndmr/PoCaPNet?style=flat)](https://github.com/kubicndmr/PoCaPNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/demir23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15993-b31b1b.svg)](https://arxiv.org/abs/2305.15993) |
-| 2100 | Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/neumann23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1FfcQifvTL9bTD7SBU7y_A3APgX8N_Vd0/view) |
-| 1438 | The MASCFLICHT Corpus: Face Mask Type and Coverage Area Recognition from Speech | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7985457) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mallolragolta23_interspeech.pdf) |
-| 1435 | Towards Reference Speech Characterization for Health Applications | [![GitHub](https://img.shields.io/github/stars/mcatarinatb/reference-speech-characterization?style=flat)](https://github.com/mcatarinatb/reference-speech-characterization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/botelho23_interspeech.pdf) |
-| 2146 | Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/riosurrego23_interspeech.pdf) |
-| 947 | Towards Robust Paralinguistic Assessment for Real-World Mobile Health (mHealth) Monitoring: an Initial Study of Reverberation Effects on Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dineley23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12514-b31b1b.svg)](https://arxiv.org/abs/2305.12514) |
+| 2287 | An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/escobargrisales23_interspeech.pdf) |
+| 1332 | Personalization for Robust Voice Pathology Detection in Sound Waves | [![GitHub](https://img.shields.io/github/stars/Fsoft-AIC/RoPADet?style=flat)](https://github.com/Fsoft-AIC/RoPADet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23_interspeech.pdf) |
+| 2249 | Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23d_interspeech.pdf) |
+| 1990 | Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/niu23b_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://emp.engin.umich.edu/wp-content/uploads/sites/67/2023/06/Capturing_Mismatch_between_Textual_and_Acoustic_Emotion_Expressions_for_Mood_Identification_in_Bipolar_Disorder-3.pdf) |
+| 296 | FTA-Net: A Frequency and Time Attention Network for Speech Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23d_interspeech.pdf) |
+| 1709 | Bayesian Networks for the Robust and Unbiased Prediction of Depression and its Symptoms Utilizing Speech and Multimodal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fara23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://readpaper.com/paper/4770892998779076609) |
+| 1263 | Hyper-Parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23y_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15265-b31b1b.svg)](https://arxiv.org/abs/2306.15265) |
+| 1721 | Classifying Depression Symptom Severity: Assessment of Speech Representations in Personalized and Generalized Machine Learning Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/campbell23_interspeech.pdf) |
+| 1946 | Active Learning for Abnormal Lung Sound Data Curation and Detection in Asthma | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ghaffarzadegan23_interspeech.pdf) |
+| 2079 | Automatic Assessment of Alzheimer's across Three Languages using Speech and Language Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pereztoro23_interspeech.pdf) |
+| 301 | On-the-Fly Feature based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition | [![GitHub](https://img.shields.io/github/stars/timspeech/on_the_fly_adapt?style=flat)](https://github.com/timspeech/on_the_fly_adapt) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.14593-b31b1b.svg)](https://arxiv.org/abs/2203.14593) |
+| 1722 | Relationship between LTAS-based Spectral Moments and Acoustic Parameters of Hypokinetic Dysarthria in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/svihlik23_interspeech.pdf) |
+| 963 | Respiratory Distress Estimation in Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alvarado23_interspeech.pdf) |
+| 1771 | Prediction of the Gender-based Violence Victim Condition using Speech: What do Machine Learning Models rely on? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/reynerfuentes23_interspeech.pdf) |
+| 1916 | Whisper Encoder features for Infant Cry Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/charola23_interspeech.pdf) |
+| 1997 | Classifying Dementia in the Presence of Depression: A Cross-Corpus Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/braun23_interspeech.pdf) |
+| 297 | Exploiting Cross-Domain and Cross-Lingual Ultrasound Tongue Imaging Features for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2206.07327-b31b1b.svg)](https://arxiv.org/abs/2206.07327) |
+| 464 | Multi-Class Detection of Pathological Speech with Latent Features: How does It Perform on Unseen Data? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wagner23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15336-b31b1b.svg)](https://arxiv.org/abs/2210.15336) |
+| 2002 | Responsiveness, Sensitivity and Clinical Utility of Timing-Related Speech Biomarkers for Remote Monitoring of ALS Disease Progression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kothare23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1-W1buG48sqQnd9uld2c-z-Ls0NSS-bNn/view) |
+| 322 | Use of Speech Impairment Severity for Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10659-b31b1b.svg)](https://arxiv.org/abs/2305.10659) |
+| 721 | MMLung: Moving Closer to Practical Lung Health Estimation using Smartphones | [![GitHub](https://img.shields.io/github/stars/MohammedMosuily/mmlung?style=flat)](https://github.com/MohammedMosuily/mmlung) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mosuily23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://mobiuk.org/2023/abstract/S5_P1_Mosuily_MMLung.pdf) |
+| 913 | Investigating the Utility of Synthetic Data for Doctor-Patient Conversation Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23i_interspeech.pdf) |
+| 2101 | Non-Uniform Speaker Disentanglement for Depression Detection from Raw Speech Signals | [![GitHub](https://img.shields.io/github/stars/kingformatty/NUSD?style=flat)](https://github.com/kingformatty/NUSD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23pa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01861-b31b1b.svg)](https://arxiv.org/abs/2306.01861) |
+| 753 | PoCaPNet: A Novel Approach for Surgical Phase Recognition using Speech and X-Ray Images | [![GitHub](https://img.shields.io/github/stars/kubicndmr/PoCaPNet?style=flat)](https://github.com/kubicndmr/PoCaPNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/demir23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15993-b31b1b.svg)](https://arxiv.org/abs/2305.15993) |
+| 2100 | Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/neumann23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1FfcQifvTL9bTD7SBU7y_A3APgX8N_Vd0/view) |
+| 1438 | The MASCFLICHT Corpus: Face Mask Type and Coverage Area Recognition from Speech | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7985457) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mallolragolta23_interspeech.pdf) |
+| 1435 | Towards Reference Speech Characterization for Health Applications | [![GitHub](https://img.shields.io/github/stars/mcatarinatb/reference-speech-characterization?style=flat)](https://github.com/mcatarinatb/reference-speech-characterization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/botelho23_interspeech.pdf) |
+| 2146 | Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/riosurrego23_interspeech.pdf) |
+| 947 | Towards Robust Paralinguistic Assessment for Real-World Mobile Health (mHealth) Monitoring: an Initial Study of Reverberation Effects on Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dineley23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12514-b31b1b.svg)](https://arxiv.org/abs/2305.12514) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1419,12 +1419,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2228 | Conmer: Streaming Conformer without Self-Attention for Interactive Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/radfar23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/conmer-streaming-conformer-without-self-attention-for-interactive-voice-assistants) |
-| 1255 | Intra-Ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23e_interspeech.pdf) |
-| 1194 | A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11073-b31b1b.svg)](https://arxiv.org/abs/2305.11073) |
-| 1611 | HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mai23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18281-b31b1b.svg)](https://arxiv.org/abs/2305.18281) |
-| 893 | Memory-Augmented Conformer for Improved End-To-End Long-form ASR | [![GitHub](https://img.shields.io/github/stars/Miamoto/Conformer-NTM?style=flat)](https://github.com/Miamoto/Conformer-NTM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/carvalho23_interspeech.pdf) |
-| 552 | Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cui23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.13307-b31b1b.svg)](https://arxiv.org/abs/2306.13307) |
+| 2228 | Conmer: Streaming Conformer without Self-Attention for Interactive Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/radfar23_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/conmer-streaming-conformer-without-self-attention-for-interactive-voice-assistants) |
+| 1255 | Intra-Ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23e_interspeech.pdf) |
+| 1194 | A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11073-b31b1b.svg)](https://arxiv.org/abs/2305.11073) |
+| 1611 | HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mai23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18281-b31b1b.svg)](https://arxiv.org/abs/2305.18281) |
+| 893 | Memory-Augmented Conformer for Improved End-To-End Long-form ASR | [![GitHub](https://img.shields.io/github/stars/Miamoto/Conformer-NTM?style=flat)](https://github.com/Miamoto/Conformer-NTM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/carvalho23_interspeech.pdf) |
+| 552 | Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cui23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.13307-b31b1b.svg)](https://arxiv.org/abs/2306.13307) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1436,16 +1436,16 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1294 | An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12838-b31b1b.svg)](https://arxiv.org/abs/2305.12838) |
-| 1286 | A Study on Visualization of Voiceprint Feature | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23x_interspeech.pdf) |
-| 1083 | VoxTube: A Multilingual Speaker Recognition Dataset | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://idrnd.github.io/VoxTube/) <br /> [![GitHub](https://img.shields.io/github/stars/IDRnD/VoxTube?style=flat)](https://github.com/IDRnD/VoxTube) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yakovlev23_interspeech.pdf) |
-| 1298 | Visualizing Data Augmentation in Deep Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16070-b31b1b.svg)](https://arxiv.org/abs/2305.16070) |
-| 1565 | Ordered and Binary Speaker Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ja_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16043-b31b1b.svg)](https://arxiv.org/abs/2305.16043) |
-| 2031 | Self-FiLM: Conditioning GANs with Self-Supervised Representations for Bandwidth Extension based Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kataria23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.03657-b31b1b.svg)](https://arxiv.org/abs/2303.03657) |
-| 1202 | Curriculum Learning for Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/heo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.14525-b31b1b.svg)](https://arxiv.org/abs/2203.14525) |
-| 1558 | Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23aa_interspeech.pdf) |
-| 1379 | A Teacher-Student Approach for Extracting Informative Speaker Embeddings from Speech Mixtures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cordlandwehr23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00634-b31b1b.svg)](https://arxiv.org/abs/2306.00634) |
-| 1479 | Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lepage23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03664-b31b1b.svg)](https://arxiv.org/abs/2306.03664) |
+| 1294 | An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23o_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12838-b31b1b.svg)](https://arxiv.org/abs/2305.12838) |
+| 1286 | A Study on Visualization of Voiceprint Feature | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23x_interspeech.pdf) |
+| 1083 | VoxTube: A Multilingual Speaker Recognition Dataset | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://idrnd.github.io/VoxTube/) <br /> [![GitHub](https://img.shields.io/github/stars/IDRnD/VoxTube?style=flat)](https://github.com/IDRnD/VoxTube) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yakovlev23_interspeech.pdf) |
+| 1298 | Visualizing Data Augmentation in Deep Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16070-b31b1b.svg)](https://arxiv.org/abs/2305.16070) |
+| 1565 | Ordered and Binary Speaker Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ja_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16043-b31b1b.svg)](https://arxiv.org/abs/2305.16043) |
+| 2031 | Self-FiLM: Conditioning GANs with Self-Supervised Representations for Bandwidth Extension based Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kataria23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.03657-b31b1b.svg)](https://arxiv.org/abs/2303.03657) |
+| 1202 | Curriculum Learning for Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/heo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.14525-b31b1b.svg)](https://arxiv.org/abs/2203.14525) |
+| 1558 | Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23aa_interspeech.pdf) |
+| 1379 | A Teacher-Student Approach for Extracting Informative Speaker Embeddings from Speech Mixtures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cordlandwehr23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00634-b31b1b.svg)](https://arxiv.org/abs/2306.00634) |
+| 1479 | Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lepage23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03664-b31b1b.svg)](https://arxiv.org/abs/2306.03664) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1457,12 +1457,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1630 | Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ba_interspeech.pdf) |
-| 1338 | UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23z_interspeech.pdf) |
-| 772 | Allophant: Cross-Lingual Phoneme Recognition with Articulatory Attributes | [![GitHub](https://img.shields.io/github/stars/kgnlp/allophant?style=flat)](https://github.com/kgnlp/allophant) [![GitHub](https://img.shields.io/github/stars/Aariciah/allophoible?style=flat)](https://github.com/Aariciah/allophoible) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/glocker23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04306-b31b1b.svg)](https://arxiv.org/abs/2306.04306) |
-| 97 | Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.01571-b31b1b.svg)](https://arxiv.org/abs/2211.01571) |
-| 1061 | Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-training for Adaptation to Unseen Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rouditchenko23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12606-b31b1b.svg)](https://arxiv.org/abs/2305.12606) |
-| 1444 | DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model | [![GitHub](https://img.shields.io/github/stars/backspacetg/distilXLSR?style=flat)](https://github.com/backspacetg/distilXLSR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ea_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01303-b31b1b.svg)](https://arxiv.org/abs/2306.01303) |
+| 1630 | Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ba_interspeech.pdf) |
+| 1338 | UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23z_interspeech.pdf) |
+| 772 | Allophant: Cross-Lingual Phoneme Recognition with Articulatory Attributes | [![GitHub](https://img.shields.io/github/stars/kgnlp/allophant?style=flat)](https://github.com/kgnlp/allophant) [![GitHub](https://img.shields.io/github/stars/Aariciah/allophoible?style=flat)](https://github.com/Aariciah/allophoible) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/glocker23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04306-b31b1b.svg)](https://arxiv.org/abs/2306.04306) |
+| 97 | Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.01571-b31b1b.svg)](https://arxiv.org/abs/2211.01571) |
+| 1061 | Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-training for Adaptation to Unseen Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rouditchenko23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12606-b31b1b.svg)](https://arxiv.org/abs/2305.12606) |
+| 1444 | DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model | [![GitHub](https://img.shields.io/github/stars/backspacetg/distilXLSR?style=flat)](https://github.com/backspacetg/distilXLSR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ea_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01303-b31b1b.svg)](https://arxiv.org/abs/2306.01303) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1474,12 +1474,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 251 | Emotional Voice Conversion with Semi-Supervised Generative Modeling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://haizhu1.github.io/sgevc/) <br /> [![GitHub](https://img.shields.io/github/stars/haizhu1/sgevc?style=flat)](https://github.com/haizhu1/sgevc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23b_interspeech.pdf) |
-| 817 | Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-Shot Speaker Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diff-hiervc.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/hayeong0/Diff-HierVC?style=flat)](https://github.com/hayeong0/Diff-HierVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23d_interspeech.pdf) |
-| 215 | S2CD-VC: Self-Heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wmaiga.github.io/S2CD/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23_interspeech.pdf) |
-| 1508 | Flow-VAE VC: End-to-End Flow Framework with Contrastive Loss for Zero-Shot Voice Conversion | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://blog.frostmiku.com/Flow-VAE-VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23g_interspeech.pdf) |
-| 1602 | Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hhhuazi.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23s_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12259-b31b1b.svg)](https://arxiv.org/abs/2306.12259) |
-| 2298 | End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lvc-vc.github.io/lvc-vc-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/wonjune-kang/lvc-vc?style=flat)](https://github.com/wonjune-kang/lvc-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2205.09784-b31b1b.svg)](https://arxiv.org/abs/2205.09784) |
+| 251 | Emotional Voice Conversion with Semi-Supervised Generative Modeling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://haizhu1.github.io/sgevc/) <br /> [![GitHub](https://img.shields.io/github/stars/haizhu1/sgevc?style=flat)](https://github.com/haizhu1/sgevc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23b_interspeech.pdf) |
+| 817 | Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-Shot Speaker Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diff-hiervc.github.io/) <br /> [![GitHub](https://img.shields.io/github/stars/hayeong0/Diff-HierVC?style=flat)](https://github.com/hayeong0/Diff-HierVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23d_interspeech.pdf) |
+| 215 | S2CD-VC: Self-Heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wmaiga.github.io/S2CD/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23_interspeech.pdf) |
+| 1508 | Flow-VAE VC: End-to-End Flow Framework with Contrastive Loss for Zero-Shot Voice Conversion | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://blog.frostmiku.com/Flow-VAE-VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23g_interspeech.pdf) |
+| 1602 | Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hhhuazi.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23s_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12259-b31b1b.svg)](https://arxiv.org/abs/2306.12259) |
+| 2298 | End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lvc-vc.github.io/lvc-vc-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/wonjune-kang/lvc-vc?style=flat)](https://github.com/wonjune-kang/lvc-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2205.09784-b31b1b.svg)](https://arxiv.org/abs/2205.09784) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1491,18 +1491,18 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2093 | Multimodal Assessment of Bulbar Amyotrophic Lateral Sclerosis (ALS) using a Novel Remote Speech Assessment App | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/simmatis23_interspeech.pdf) |
-| 2181 | On the use of High Frequency Information for Voice Pathology Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martinez23_interspeech.pdf) |
-| 1784 | Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/favaro23_interspeech.pdf) |
-| 2531 | Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kadiri23_interspeech.pdf) |
-| 1915 | Comparison of Acoustic Measures of Dysphonia in Parkinson's Disease and Huntington's Disease: Effect of Sex and Speaking Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/simek23_interspeech.pdf) |
-| 1734 | Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses | [![GitHub](https://img.shields.io/github/stars/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease?style=flat)](https://github.com/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gomezzaragoza23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03443-b31b1b.svg)](https://arxiv.org/abs/2306.03443) |
-| 1574 | A Pipeline to Evaluate the Effects of Noise on Machine Learning Detection of Laryngeal Cancer | [![GitHub](https://img.shields.io/github/stars/mary-paterson/Interspeech2023-EvaluationPipeline?style=flat)](https://github.com/mary-paterson/Interspeech2023-EvaluationPipeline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/paterson23_interspeech.pdf) |
-| 2474 | ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ga_interspeech.pdf) |
-| 234 | Automated Multiple Sclerosis Screening based on Encoded Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/egaslopez23_interspeech.pdf) |
-| 1934 | Cross-Lingual Features for Alzheimer's Dementia Detection from Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/melistas23_interspeech.pdf) |
-| 1653 | Careful Whisper - Leveraging Advances in Automatic Speech Recognition for Robust and Interpretable Aphasia Subtype Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zusag23_interspeech.pdf) |
-| 1868 | Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/thienpondt23_interspeech.pdf) |
+| 2093 | Multimodal Assessment of Bulbar Amyotrophic Lateral Sclerosis (ALS) using a Novel Remote Speech Assessment App | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/simmatis23_interspeech.pdf) |
+| 2181 | On the use of High Frequency Information for Voice Pathology Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martinez23_interspeech.pdf) |
+| 1784 | Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/favaro23_interspeech.pdf) |
+| 2531 | Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kadiri23_interspeech.pdf) |
+| 1915 | Comparison of Acoustic Measures of Dysphonia in Parkinson's Disease and Huntington's Disease: Effect of Sex and Speaking Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/simek23_interspeech.pdf) |
+| 1734 | Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses | [![GitHub](https://img.shields.io/github/stars/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease?style=flat)](https://github.com/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gomezzaragoza23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03443-b31b1b.svg)](https://arxiv.org/abs/2306.03443) |
+| 1574 | A Pipeline to Evaluate the Effects of Noise on Machine Learning Detection of Laryngeal Cancer | [![GitHub](https://img.shields.io/github/stars/mary-paterson/Interspeech2023-EvaluationPipeline?style=flat)](https://github.com/mary-paterson/Interspeech2023-EvaluationPipeline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/paterson23_interspeech.pdf) |
+| 2474 | ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ga_interspeech.pdf) |
+| 234 | Automated Multiple Sclerosis Screening based on Encoded Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/egaslopez23_interspeech.pdf) |
+| 1934 | Cross-Lingual Features for Alzheimer's Dementia Detection from Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/melistas23_interspeech.pdf) |
+| 1653 | Careful Whisper - Leveraging Advances in Automatic Speech Recognition for Robust and Interpretable Aphasia Subtype Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zusag23_interspeech.pdf) |
+| 1868 | Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/thienpondt23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1514,12 +1514,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1832 | LanSER: Language-Model Supported Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23c_interspeech.pdf) |
-| 463 | Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luo23_interspeech.pdf) |
-| 1591 | Emotion Label Encoding using Word Embeddings for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/stanley23_interspeech.pdf) |
-| 2444 | Discrimination of the Different Intents Carried by the Same Text through Integrating Multimodal Information | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ia_interspeech.pdf) |
-| 510 | Meta-Domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-Sentiment Predictions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23f_interspeech.pdf) |
-| 413 | SWRR: Feature Map Classifier based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23b_interspeech.pdf) |
+| 1832 | LanSER: Language-Model Supported Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23c_interspeech.pdf) |
+| 463 | Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luo23_interspeech.pdf) |
+| 1591 | Emotion Label Encoding using Word Embeddings for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/stanley23_interspeech.pdf) |
+| 2444 | Discrimination of the Different Intents Carried by the Same Text through Integrating Multimodal Information | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ia_interspeech.pdf) |
+| 510 | Meta-Domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-Sentiment Predictions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23f_interspeech.pdf) |
+| 413 | SWRR: Feature Map Classifier based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23b_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1531,38 +1531,38 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1443 | Effects of Meter, Genre and Experience on Pausing, Lengthening and Prosodic Phrasing in German Poetry Reading | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wagner23b_interspeech.pdf) |
-| 1142 | Comparing First Spectral Moment of Australian English /s/ between Straight and Gay Voices using Three Analysis Window Sizes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/szalay23_interspeech.pdf) |
-| 2584 | Universal Automatic Phonetic Transcription into the International Phonetic Alphabet | [![GitHub](https://img.shields.io/github/stars/ctaguchi/multipa?style=flat)](https://github.com/ctaguchi/multipa) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/taguchi23_interspeech.pdf) |
-| 2134 | Voice Twins: Discovering Extremely Similar-Sounding, Unrelated Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gerlach23_interspeech.pdf) |
-| 1042 | Filling the Population Statistics Gap: Swiss German Reference Data on F0 and Speech Tempo for Forensic Contexts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hedegard23_interspeech.pdf) |
-| 1619 | Investigating the Syntax-Discourse Interface in the Phonetic Implementation of Discourse Markers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hutin23_interspeech.pdf) |
-| 2214 | Evaluation of a Forensic Automatic Speaker Recognition System with Emotional Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/essery23_interspeech.pdf) |
-| 1052 | An Outlier Analysis of Vowel Formants from a Corpus Phonetics Pipeline | [![GitHub](https://img.shields.io/github/stars/emilyahn/outliers?style=flat)](https://github.com/emilyahn/outliers) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ahn23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.eleanorchodroff.com/articles/AhnLevowWrightChodroff_Outliers_Interspeech_2023.pdf) |
-| 340 | The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features | [![GitHub](https://img.shields.io/github/stars/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage?style=flat)](https://github.com/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qu23_interspeech.pdf) |
-| 1880 | Beatboxing Kick Drum Kinematics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/blaylock23_interspeech.pdf) |
-| 536 | Effects of Hearing Loss and Amplification on Mandarin Consonant Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23b_interspeech.pdf) |
-| 2020 | An Acoustic Analysis of Fricative Variation in Three Accents of English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/adams23_interspeech.pdf) |
-| 109 | Acoustic Cues to Stress Perception in Spanish – a Mismatch Negativity Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bros23_interspeech.pdf) |
-| 976 | Bulgarian Unstressed Vowel Reduction: Received Views vs Corpus Findings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sabev23_interspeech.pdf) |
-| 1764 | An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jain23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.09284-b31b1b.svg)](https://arxiv.org/abs/2212.09284) |
-| 498 | Identifying Stable Sections for Formant Frequency Extraction of French Nasal Vowels based on Difference Thresholds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23b_interspeech.pdf) |
-| 1903 | Evaluation of Delexicalization Methods for Research on Emotional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/audibert23_interspeech.pdf) |
-| 1772 | Nonbinary American English Speakers Encode Gender in Vowel Acoustics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hope23_interspeech.pdf) |
-| 44 | Coarticulation of Sibe Vowels and Dorsal Fricatives in Spontaneous Speech: An Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sharp23_interspeech.pdf) |
-| 1013 | Using Speech Synthesis to Explain Automatic Speaker Recognition: A New Application of Synthetic Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/brown23_interspeech.pdf) |
-| 2534 | Same F0, Different Tones: A Multidimensional Investigation of Zhangzhou Tones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23i_interspeech.pdf) |
-| 1985 | Discovering Phonetic Feature Event Patterns in Transformer Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/english23_interspeech.pdf) |
-| 2204 | A System for Generating Voice Source Signals that Implements the Transformed LF-Model Parameter Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ra_interspeech.pdf) |
-| 2352 | Speaker-Independent Speech Inversion for Estimation of Nasalance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/siriwardena23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00203-b31b1b.svg)](https://arxiv.org/abs/2306.00203) |
-| 1359 | Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02251-b31b1b.svg)](https://arxiv.org/abs/2306.02251) |
-| 2187 | Durational and Non-Durational Correlates of Lexical and Derived Geminates in Arabic | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/issa23_interspeech.pdf) |
-| 68 | Mapping Phonemes to Acoustic Symbols and Codes using Synchrony in Speech Modulation Vectors Estimated by the Travellingwave Filter Bank | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rao23_interspeech.pdf) |
-| 1480 | Rhythmic Characteristics of L2 German Speech by Advanced Chinese Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ge23_interspeech.pdf) |
-| 1538 | (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-Prosodic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kelterer23_interspeech.pdf) |
-| 995 | Vowel Reduction by Greek-Speaking Children: The Effect of Stress and Word Length | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/christodoulidou23_interspeech.pdf) |
-| 1822 | Pitch Distributions in a Very Large Corpus of Spontaneous Finnish Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lennes23_interspeech.pdf) |
-| 828 | Speech Enhancement Patterns in Human-Robot Interaction: A Cross-Linguistic Perspective | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/qwyzv/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kudera23_interspeech.pdf) |
+| 1443 | Effects of Meter, Genre and Experience on Pausing, Lengthening and Prosodic Phrasing in German Poetry Reading | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wagner23b_interspeech.pdf) |
+| 1142 | Comparing First Spectral Moment of Australian English /s/ between Straight and Gay Voices using Three Analysis Window Sizes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/szalay23_interspeech.pdf) |
+| 2584 | Universal Automatic Phonetic Transcription into the International Phonetic Alphabet | [![GitHub](https://img.shields.io/github/stars/ctaguchi/multipa?style=flat)](https://github.com/ctaguchi/multipa) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/taguchi23_interspeech.pdf) |
+| 2134 | Voice Twins: Discovering Extremely Similar-Sounding, Unrelated Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gerlach23_interspeech.pdf) |
+| 1042 | Filling the Population Statistics Gap: Swiss German Reference Data on F0 and Speech Tempo for Forensic Contexts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hedegard23_interspeech.pdf) |
+| 1619 | Investigating the Syntax-Discourse Interface in the Phonetic Implementation of Discourse Markers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hutin23_interspeech.pdf) |
+| 2214 | Evaluation of a Forensic Automatic Speaker Recognition System with Emotional Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/essery23_interspeech.pdf) |
+| 1052 | An Outlier Analysis of Vowel Formants from a Corpus Phonetics Pipeline | [![GitHub](https://img.shields.io/github/stars/emilyahn/outliers?style=flat)](https://github.com/emilyahn/outliers) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ahn23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.eleanorchodroff.com/articles/AhnLevowWrightChodroff_Outliers_Interspeech_2023.pdf) |
+| 340 | The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features | [![GitHub](https://img.shields.io/github/stars/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage?style=flat)](https://github.com/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qu23_interspeech.pdf) |
+| 1880 | Beatboxing Kick Drum Kinematics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/blaylock23_interspeech.pdf) |
+| 536 | Effects of Hearing Loss and Amplification on Mandarin Consonant Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23b_interspeech.pdf) |
+| 2020 | An Acoustic Analysis of Fricative Variation in Three Accents of English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/adams23_interspeech.pdf) |
+| 109 | Acoustic Cues to Stress Perception in Spanish – a Mismatch Negativity Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bros23_interspeech.pdf) |
+| 976 | Bulgarian Unstressed Vowel Reduction: Received Views vs Corpus Findings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sabev23_interspeech.pdf) |
+| 1764 | An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jain23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.09284-b31b1b.svg)](https://arxiv.org/abs/2212.09284) |
+| 498 | Identifying Stable Sections for Formant Frequency Extraction of French Nasal Vowels based on Difference Thresholds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23b_interspeech.pdf) |
+| 1903 | Evaluation of Delexicalization Methods for Research on Emotional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/audibert23_interspeech.pdf) |
+| 1772 | Nonbinary American English Speakers Encode Gender in Vowel Acoustics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hope23_interspeech.pdf) |
+| 44 | Coarticulation of Sibe Vowels and Dorsal Fricatives in Spontaneous Speech: An Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sharp23_interspeech.pdf) |
+| 1013 | Using Speech Synthesis to Explain Automatic Speaker Recognition: A New Application of Synthetic Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/brown23_interspeech.pdf) |
+| 2534 | Same F0, Different Tones: A Multidimensional Investigation of Zhangzhou Tones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23i_interspeech.pdf) |
+| 1985 | Discovering Phonetic Feature Event Patterns in Transformer Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/english23_interspeech.pdf) |
+| 2204 | A System for Generating Voice Source Signals that Implements the Transformed LF-Model Parameter Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ra_interspeech.pdf) |
+| 2352 | Speaker-Independent Speech Inversion for Estimation of Nasalance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/siriwardena23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00203-b31b1b.svg)](https://arxiv.org/abs/2306.00203) |
+| 1359 | Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02251-b31b1b.svg)](https://arxiv.org/abs/2306.02251) |
+| 2187 | Durational and Non-Durational Correlates of Lexical and Derived Geminates in Arabic | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/issa23_interspeech.pdf) |
+| 68 | Mapping Phonemes to Acoustic Symbols and Codes using Synchrony in Speech Modulation Vectors Estimated by the Travellingwave Filter Bank | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rao23_interspeech.pdf) |
+| 1480 | Rhythmic Characteristics of L2 German Speech by Advanced Chinese Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ge23_interspeech.pdf) |
+| 1538 | (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-Prosodic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kelterer23_interspeech.pdf) |
+| 995 | Vowel Reduction by Greek-Speaking Children: The Effect of Stress and Word Length | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/christodoulidou23_interspeech.pdf) |
+| 1822 | Pitch Distributions in a Very Large Corpus of Spontaneous Finnish Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lennes23_interspeech.pdf) |
+| 828 | Speech Enhancement Patterns in Human-Robot Interaction: A Cross-Linguistic Perspective | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/qwyzv/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kudera23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1574,12 +1574,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1026 | Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health | [![GitHub](https://img.shields.io/github/stars/aditthapron/windowMasking?style=flat)](https://github.com/aditthapron/windowMasking) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ditthapron23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.04161-b31b1b.svg)](https://arxiv.org/abs/2302.04161) |
-| 727 | eSTImate: A Real-Time Speech Transmission Index Estimator with Speech Enhancement Auxiliary Task using Self-Attention Feature Pyramid Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiang23_interspeech.pdf) |
-| 815 | Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23n_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05861-b31b1b.svg)](https://arxiv.org/abs/2306.05861) |
-| 2138 | Privacy-Preserving Representation Learning for Speech Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23b_interspeech.pdf) |
-| 448 | Vocoder Drift in X-Vector–based Speaker Anonymization | [![GitHub](https://img.shields.io/github/stars/eurecom-asp/vocoder-drift?style=flat)](https://github.com/eurecom-asp/vocoder-drift) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/panariello23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02892-b31b1b.svg)](https://arxiv.org/abs/2306.02892) |
-| 703 | Malafide: A Novel Adversarial Convolutive Noise Attack Against Deepfake and Spoofing Detection Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/panariello23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07655-b31b1b.svg)](https://arxiv.org/abs/2306.07655) |
+| 1026 | Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health | [![GitHub](https://img.shields.io/github/stars/aditthapron/windowMasking?style=flat)](https://github.com/aditthapron/windowMasking) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ditthapron23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.04161-b31b1b.svg)](https://arxiv.org/abs/2302.04161) |
+| 727 | eSTImate: A Real-Time Speech Transmission Index Estimator with Speech Enhancement Auxiliary Task using Self-Attention Feature Pyramid Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiang23_interspeech.pdf) |
+| 815 | Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23n_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.05861-b31b1b.svg)](https://arxiv.org/abs/2306.05861) |
+| 2138 | Privacy-Preserving Representation Learning for Speech Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23b_interspeech.pdf) |
+| 448 | Vocoder Drift in X-Vector–based Speaker Anonymization | [![GitHub](https://img.shields.io/github/stars/eurecom-asp/vocoder-drift?style=flat)](https://github.com/eurecom-asp/vocoder-drift) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/panariello23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02892-b31b1b.svg)](https://arxiv.org/abs/2306.02892) |
+| 703 | Malafide: A Novel Adversarial Convolutive Noise Attack Against Deepfake and Spoofing Detection Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/panariello23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07655-b31b1b.svg)](https://arxiv.org/abs/2306.07655) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1591,12 +1591,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1087 | Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/salah-zaiem/speechbrain-2/tree/develop/recipes/SSL_benchmark) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zaiem23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00452-b31b1b.svg)](https://arxiv.org/abs/2306.00452) |
-| 383 | An Extension of Disentanglement Metrics and its Application to Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23d_interspeech.pdf) |
-| 2131 | An Information-Theoretic Analysis of Self-Supervised Discrete Representations of Speech | [![GitHub](https://img.shields.io/github/stars/uds-lsv/phone2unit?style=flat)](https://github.com/uds-lsv/phone2unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/abdullah23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02405-b31b1b.svg)](https://arxiv.org/abs/2306.02405) |
-| 1823 | SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? | [![GitHub](https://img.shields.io/github/stars/ashi-ta/speechGLUE?style=flat)](https://github.com/ashi-ta/speechGLUE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ashihara23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08374-b31b1b.svg)](https://arxiv.org/abs/2306.08374) |
-| 1418 | Comparison of GIF- and SSL-based Features in Pathological Voice Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sasou23_interspeech.pdf) |
-| 1617 | What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions | [![GitHub](https://img.shields.io/github/stars/Hanyu-Meng/Adapting-LEAF?style=flat)](https://github.com/Hanyu-Meng/Adapting-LEAF) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23c_interspeech.pdf) |
+| 1087 | Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/salah-zaiem/speechbrain-2/tree/develop/recipes/SSL_benchmark) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zaiem23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00452-b31b1b.svg)](https://arxiv.org/abs/2306.00452) |
+| 383 | An Extension of Disentanglement Metrics and its Application to Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23d_interspeech.pdf) |
+| 2131 | An Information-Theoretic Analysis of Self-Supervised Discrete Representations of Speech | [![GitHub](https://img.shields.io/github/stars/uds-lsv/phone2unit?style=flat)](https://github.com/uds-lsv/phone2unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/abdullah23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02405-b31b1b.svg)](https://arxiv.org/abs/2306.02405) |
+| 1823 | SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? | [![GitHub](https://img.shields.io/github/stars/ashi-ta/speechGLUE?style=flat)](https://github.com/ashi-ta/speechGLUE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ashihara23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08374-b31b1b.svg)](https://arxiv.org/abs/2306.08374) |
+| 1418 | Comparison of GIF- and SSL-based Features in Pathological Voice Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sasou23_interspeech.pdf) |
+| 1617 | What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions | [![GitHub](https://img.shields.io/github/stars/Hanyu-Meng/Adapting-LEAF?style=flat)](https://github.com/Hanyu-Meng/Adapting-LEAF) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23c_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1608,12 +1608,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1640 | End-to-End Joint Target and Non-Target Speakers ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/masumura23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02273-b31b1b.svg)](https://arxiv.org/abs/2306.02273) |
-| 144 | Improving Frame-Level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07949-b31b1b.svg)](https://arxiv.org/abs/2306.07949) |
-| 564 | Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-Level Timestamp Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/makishima23_interspeech.pdf) |
-| 101 | Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition | [![GitHub](https://img.shields.io/github/stars/YUCHEN005/DPSL-ASR?style=flat)](https://github.com/YUCHEN005/DPSL-ASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.14838-b31b1b.svg)](https://arxiv.org/abs/2203.14838) |
-| 142 | Multi-Pass Training and Cross-Information Fusion for Low-Resource End-to-End Accented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11309-b31b1b.svg)](https://arxiv.org/abs/2306.11309) |
-| 906 | Text-Only Domain Adaptation for End-to-End ASR using Integrated Text-to-Mel-Spectrogram Generator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bataev23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) |
+| 1640 | End-to-End Joint Target and Non-Target Speakers ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/masumura23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02273-b31b1b.svg)](https://arxiv.org/abs/2306.02273) |
+| 144 | Improving Frame-Level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07949-b31b1b.svg)](https://arxiv.org/abs/2306.07949) |
+| 564 | Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-Level Timestamp Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/makishima23_interspeech.pdf) |
+| 101 | Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition | [![GitHub](https://img.shields.io/github/stars/YUCHEN005/DPSL-ASR?style=flat)](https://github.com/YUCHEN005/DPSL-ASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.14838-b31b1b.svg)](https://arxiv.org/abs/2203.14838) |
+| 142 | Multi-Pass Training and Cross-Information Fusion for Low-Resource End-to-End Accented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11309-b31b1b.svg)](https://arxiv.org/abs/2306.11309) |
+| 906 | Text-Only Domain Adaptation for End-to-End ASR using Integrated Text-to-Mel-Spectrogram Generator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bataev23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1625,12 +1625,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 461 | Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/tree/main/examples/slu/speech_intent_slot) <br /> [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23_interspeech.pdf) |
-| 277 | Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models | [![GitHub](https://img.shields.io/github/stars/hryang06/rda-rcl?style=flat)](https://github.com/hryang06/rda-rcl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23b_interspeech.pdf) |
-| 1307 | Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/matsuura23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04233-b31b1b.svg)](https://arxiv.org/abs/2306.04233) |
-| 1136 | Audio Retrieval with WavText5K and CLAP Training | [![GitHub](https://img.shields.io/github/stars/microsoft/WavText5K?style=flat)](https://github.com/microsoft/WavText5K) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deshmukh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.14275-b31b1b.svg)](https://arxiv.org/abs/2209.14275) |
-| 242 | Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/slurp-seqkd?style=flat)](https://github.com/umbertocappellazzo/slurp-seqkd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cappellazzo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13899-b31b1b.svg)](https://arxiv.org/abs/2305.13899) |
-| 1652 | Contrastive Disentangled Learning for Memory-Augmented Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chien23b_interspeech.pdf) |
+| 461 | Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/tree/main/examples/slu/speech_intent_slot) <br /> [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23_interspeech.pdf) |
+| 277 | Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models | [![GitHub](https://img.shields.io/github/stars/hryang06/rda-rcl?style=flat)](https://github.com/hryang06/rda-rcl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23b_interspeech.pdf) |
+| 1307 | Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/matsuura23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04233-b31b1b.svg)](https://arxiv.org/abs/2306.04233) |
+| 1136 | Audio Retrieval with WavText5K and CLAP Training | [![GitHub](https://img.shields.io/github/stars/microsoft/WavText5K?style=flat)](https://github.com/microsoft/WavText5K) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deshmukh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2209.14275-b31b1b.svg)](https://arxiv.org/abs/2209.14275) |
+| 242 | Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/slurp-seqkd?style=flat)](https://github.com/umbertocappellazzo/slurp-seqkd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cappellazzo23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13899-b31b1b.svg)](https://arxiv.org/abs/2305.13899) |
+| 1652 | Contrastive Disentangled Learning for Memory-Augmented Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chien23b_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1642,12 +1642,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 438 | ProsAudit, a Prosodic Benchmark for Self-Supervised Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deseyssel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12057-b31b1b.svg)](https://arxiv.org/abs/2302.12057) |
-| 871 | Self-Supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12464-b31b1b.svg)](https://arxiv.org/abs/2305.12464) |
-| 1862 | Evaluating Context-Invariance in Unsupervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/perceptimatic/irpam2023?style=flat)](https://github.com/perceptimatic/irpam2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hallap23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15775-b31b1b.svg)](https://arxiv.org/abs/2210.15775) |
-| 1390 | CoBERT: Self-Supervised Speech Representation Learning through Code Representation Learning | [![GitHub](https://img.shields.io/github/stars/mct10/CoBERT?style=flat)](https://github.com/mct10/CoBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.04062-b31b1b.svg)](https://arxiv.org/abs/2210.04062) |
-| 847 | Self-Supervised Fine-tuning for Improved Content Representations by Speaker-Invariant Clustering | [![GitHub](https://img.shields.io/github/stars/vectominist/spin?style=flat)](https://github.com/vectominist/spin) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11072-b31b1b.svg)](https://arxiv.org/abs/2305.11072) |
-| 359 | Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23d_interspeech.pdf) |
+| 438 | ProsAudit, a Prosodic Benchmark for Self-Supervised Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deseyssel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.12057-b31b1b.svg)](https://arxiv.org/abs/2302.12057) |
+| 871 | Self-Supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12464-b31b1b.svg)](https://arxiv.org/abs/2305.12464) |
+| 1862 | Evaluating Context-Invariance in Unsupervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/perceptimatic/irpam2023?style=flat)](https://github.com/perceptimatic/irpam2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hallap23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.15775-b31b1b.svg)](https://arxiv.org/abs/2210.15775) |
+| 1390 | CoBERT: Self-Supervised Speech Representation Learning through Code Representation Learning | [![GitHub](https://img.shields.io/github/stars/mct10/CoBERT?style=flat)](https://github.com/mct10/CoBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.04062-b31b1b.svg)](https://arxiv.org/abs/2210.04062) |
+| 847 | Self-Supervised Fine-tuning for Improved Content Representations by Speaker-Invariant Clustering | [![GitHub](https://img.shields.io/github/stars/vectominist/spin?style=flat)](https://github.com/vectominist/spin) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11072-b31b1b.svg)](https://arxiv.org/abs/2305.11072) |
+| 359 | Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23d_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1659,12 +1659,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1571 | Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/AILTTS_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23b_interspeech.pdf) |
-| 2313 | Adapter-based Extension of Multi-Speaker Text-To-Speech Model for New Speakers | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hsiehjackson.github.io/adapter-tts-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hsieh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00585-b31b1b.svg)](https://arxiv.org/abs/2211.00585) |
-| 2574 | SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sivaguru23_interspeech.pdf) |
-| 2326 | UnitSpeech: Speaker-Adaptive Speech Synthesis with Untranscribed Data | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://unitspeech.github.io) <br /> [![GitHub](https://img.shields.io/github/stars/gmltmd789/UnitSpeech?style=flat)](https://github.com/gmltmd789/UnitSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16083-b31b1b.svg)](https://arxiv.org/abs/2306.16083) |
-| 677 | LightVoc: an Upsampling-Free GAN Vocoder based on Conformer and Inverse Short-time Fourier Transform | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightvoc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dang23b_interspeech.pdf) |
-| 1095 | ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarulab-speech.github.io/demo_ChatGPT_EDSS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/saito23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13724-b31b1b.svg)](https://arxiv.org/abs/2305.13724) |
+| 1571 | Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/AILTTS_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23b_interspeech.pdf) |
+| 2313 | Adapter-based Extension of Multi-Speaker Text-To-Speech Model for New Speakers | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hsiehjackson.github.io/adapter-tts-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hsieh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.00585-b31b1b.svg)](https://arxiv.org/abs/2211.00585) |
+| 2574 | SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sivaguru23_interspeech.pdf) |
+| 2326 | UnitSpeech: Speaker-Adaptive Speech Synthesis with Untranscribed Data | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://unitspeech.github.io) <br /> [![GitHub](https://img.shields.io/github/stars/gmltmd789/UnitSpeech?style=flat)](https://github.com/gmltmd789/UnitSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16083-b31b1b.svg)](https://arxiv.org/abs/2306.16083) |
+| 677 | LightVoc: an Upsampling-Free GAN Vocoder based on Conformer and Inverse Short-time Fourier Transform | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightvoc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dang23b_interspeech.pdf) |
+| 1095 | ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarulab-speech.github.io/demo_ChatGPT_EDSS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/saito23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13724-b31b1b.svg)](https://arxiv.org/abs/2305.13724) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1676,39 +1676,39 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1330 | Human Transcription Quality Improvement | [![GitHub](https://img.shields.io/github/stars/GenerateAI/TransAudioUI?style=flat)](https://github.com/GenerateAI/TransAudioUI) <br /> [![GitHub](https://img.shields.io/github/stars/GenerateAI/LibriCrowd?style=flat)](https://github.com/GenerateAI/LibriCrowd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23f_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/human-transcription-quality-improvement) |
-| 1604 | The Effect of Masking Noise on Listeners' Spectral Tilt Preferences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/simantiraki23_interspeech.pdf) |
-| 1967 | The Effect of Whistled Vowels on Whistled Word Categorization for Naive Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tranngoc23_interspeech.pdf) |
-| 1481 | Automatic Deep Neural Network-based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bharati23_interspeech.pdf) |
-| 1662 | The Effect of Stress on Mandarin Tonal Perception in Continuous Speech for Spanish-Speaking Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hao23_interspeech.pdf) |
-| 1918 | Combining Acoustic and Aerodynamic Data Collection: A Perceptual Evaluation of Acoustic Distortions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elmerich23_interspeech.pdf) |
-| 953 | Estimating Virtual Targets for Lingual Stop Consonants using General Tau Theory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elie23b_interspeech.pdf) |
-| 1931 | Using Random Forests to Classify Language as a Function of Syllable Timing in Two Groups: Children with Cochlear Implants and with Normal Hearing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gibson23_interspeech.pdf) |
-| 2256 | An Improved End-to-End Audio-Visual Speech Recognition Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23w_interspeech.pdf) |
-| 1954 | What Influences the Foreign Accent Strength? Phonological and Grammatical Errors in the Perception of Accentedness | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/k2mta/?view_only=f65bdededa9c4ad0b81c43c380ae5b3b) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wesoek23_interspeech.pdf) |
-| 2077 | Investigating the Perception Production Link through Perceptual Adaptation and Phonetic Convergence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huttner23_interspeech.pdf) |
-| 1385 | Emotion Prompting for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23f_interspeech.pdf) |
-| 1196 | Speech-in-Speech Recognition is Modulated by Familiarity to Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chin23_interspeech.pdf) |
-| 673 | BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-Talker Conditions | [![GitHub](https://img.shields.io/github/stars/jzhangU/Basen?style=flat)](https://github.com/jzhangU/Basen) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.09994-b31b1b.svg)](https://arxiv.org/abs/2305.09994) |
-| 2046 | Are Retroflex-to-Dental Sibilant Substitutions in Polish Children's Speech an Example of a Covert Contrast? A Preliminary Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/miodonska23_interspeech.pdf) |
-| 1123 | First Language Effects on Second Language Perception: Evidence from English Low-Vowel Nasal Sequences Perceived by L1 Mandarin Chinese Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23s_interspeech.pdf) |
-| 2247 | Motor Control Similarity between Speakers Saying "a Souk" using Inverse Atlas Tongue Modeling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/maity23_interspeech.pdf) |
-| 910 | Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04980-b31b1b.svg)](https://arxiv.org/abs/2306.04980) |
-| 317 | A Relationship between Vocal Fold Vibration and Droplet Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoshinaga23_interspeech.pdf) |
-| 803 | Audio, Visual and Audiovisual Intelligibility of Vowels Produced in Noise | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/garnier23_interspeech.pdf) |
-| 172 | Optimal Control of Speech with Context-Dependent Articulatory Targets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elie23_interspeech.pdf) |
-| 593 | Computational Modeling of Auditory Brainstem Responses Derived from Modified Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23d_interspeech.pdf) |
-| 1732 | Leveraging Label Information for Multimodal Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Digimonseeker/LE-MER?style=flat)](https://github.com/Digimonseeker/LE-MER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ma_interspeech.pdf) |
-| 1465 | Improving End-to-End Modeling for Mandarin-English Code-Switching using Lightweight Switch-Routing Mixture-of-Experts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tan23c_interspeech.pdf) |
-| 1803 | Frequency Patterns of Individual Speaker Characteristics at Higher and Lower Spectral Ranges | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ea_interspeech.pdf) |
-| 1818 | Adaptation to Predictive Prosodic cues in Non-Native Standard Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gosselkeberthelsen23_interspeech.pdf) |
-| 1007 | Head Movements in Two- and Four-Person Inter-Active Conversational Tasks in Noisy and Moderately Reverberant Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/archerboyd23_interspeech.pdf) |
-| 334 | Second Language Identification of Vietnamese Tones by Native Mandarin Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23d_interspeech.pdf) |
-| 203 | Nasal Vowel Production and Grammatical Processing in French-Speaking Children with Cochlear Implants and Normal-Hearing Peers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fagniart23_interspeech.pdf) |
-| 412 | Emotion Classification with EEG Responses Evoked by Emotional Prosody of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23f_interspeech.pdf) |
-| 145 | L2-Mandarin Regional Accent Variability During Mandarin Tone-Word Training Facilitates English listeners' Subsequent tone Categorizations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23b_interspeech.pdf) |
-| 1680 | HumanDiffusion: Diffusion Model using Perceptual Gradients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ueda23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12169-b31b1b.svg)](https://arxiv.org/abs/2306.12169) |
-| 2087 | Queer Events, Relationships, and Sports: Does Topic Influence Speakers' Acoustic Expression of Sexual Orientation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kachel23_interspeech.pdf) |
+| 1330 | Human Transcription Quality Improvement | [![GitHub](https://img.shields.io/github/stars/GenerateAI/TransAudioUI?style=flat)](https://github.com/GenerateAI/TransAudioUI) <br /> [![GitHub](https://img.shields.io/github/stars/GenerateAI/LibriCrowd?style=flat)](https://github.com/GenerateAI/LibriCrowd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23f_interspeech.pdf) <br /> [![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/human-transcription-quality-improvement) |
+| 1604 | The Effect of Masking Noise on Listeners' Spectral Tilt Preferences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/simantiraki23_interspeech.pdf) |
+| 1967 | The Effect of Whistled Vowels on Whistled Word Categorization for Naive Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tranngoc23_interspeech.pdf) |
+| 1481 | Automatic Deep Neural Network-based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bharati23_interspeech.pdf) |
+| 1662 | The Effect of Stress on Mandarin Tonal Perception in Continuous Speech for Spanish-Speaking Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hao23_interspeech.pdf) |
+| 1918 | Combining Acoustic and Aerodynamic Data Collection: A Perceptual Evaluation of Acoustic Distortions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elmerich23_interspeech.pdf) |
+| 953 | Estimating Virtual Targets for Lingual Stop Consonants using General Tau Theory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elie23b_interspeech.pdf) |
+| 1931 | Using Random Forests to Classify Language as a Function of Syllable Timing in Two Groups: Children with Cochlear Implants and with Normal Hearing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gibson23_interspeech.pdf) |
+| 2256 | An Improved End-to-End Audio-Visual Speech Recognition Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23w_interspeech.pdf) |
+| 1954 | What Influences the Foreign Accent Strength? Phonological and Grammatical Errors in the Perception of Accentedness | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/k2mta/?view_only=f65bdededa9c4ad0b81c43c380ae5b3b) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wesoek23_interspeech.pdf) |
+| 2077 | Investigating the Perception Production Link through Perceptual Adaptation and Phonetic Convergence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huttner23_interspeech.pdf) |
+| 1385 | Emotion Prompting for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23f_interspeech.pdf) |
+| 1196 | Speech-in-Speech Recognition is Modulated by Familiarity to Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chin23_interspeech.pdf) |
+| 673 | BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-Talker Conditions | [![GitHub](https://img.shields.io/github/stars/jzhangU/Basen?style=flat)](https://github.com/jzhangU/Basen) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23m_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.09994-b31b1b.svg)](https://arxiv.org/abs/2305.09994) |
+| 2046 | Are Retroflex-to-Dental Sibilant Substitutions in Polish Children's Speech an Example of a Covert Contrast? A Preliminary Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/miodonska23_interspeech.pdf) |
+| 1123 | First Language Effects on Second Language Perception: Evidence from English Low-Vowel Nasal Sequences Perceived by L1 Mandarin Chinese Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23s_interspeech.pdf) |
+| 2247 | Motor Control Similarity between Speakers Saying "a Souk" using Inverse Atlas Tongue Modeling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/maity23_interspeech.pdf) |
+| 910 | Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04980-b31b1b.svg)](https://arxiv.org/abs/2306.04980) |
+| 317 | A Relationship between Vocal Fold Vibration and Droplet Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoshinaga23_interspeech.pdf) |
+| 803 | Audio, Visual and Audiovisual Intelligibility of Vowels Produced in Noise | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/garnier23_interspeech.pdf) |
+| 172 | Optimal Control of Speech with Context-Dependent Articulatory Targets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elie23_interspeech.pdf) |
+| 593 | Computational Modeling of Auditory Brainstem Responses Derived from Modified Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23d_interspeech.pdf) |
+| 1732 | Leveraging Label Information for Multimodal Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Digimonseeker/LE-MER?style=flat)](https://github.com/Digimonseeker/LE-MER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ma_interspeech.pdf) |
+| 1465 | Improving End-to-End Modeling for Mandarin-English Code-Switching using Lightweight Switch-Routing Mixture-of-Experts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tan23c_interspeech.pdf) |
+| 1803 | Frequency Patterns of Individual Speaker Characteristics at Higher and Lower Spectral Ranges | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ea_interspeech.pdf) |
+| 1818 | Adaptation to Predictive Prosodic cues in Non-Native Standard Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gosselkeberthelsen23_interspeech.pdf) |
+| 1007 | Head Movements in Two- and Four-Person Inter-Active Conversational Tasks in Noisy and Moderately Reverberant Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/archerboyd23_interspeech.pdf) |
+| 334 | Second Language Identification of Vietnamese Tones by Native Mandarin Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23d_interspeech.pdf) |
+| 203 | Nasal Vowel Production and Grammatical Processing in French-Speaking Children with Cochlear Implants and Normal-Hearing Peers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fagniart23_interspeech.pdf) |
+| 412 | Emotion Classification with EEG Responses Evoked by Emotional Prosody of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23f_interspeech.pdf) |
+| 145 | L2-Mandarin Regional Accent Variability During Mandarin Tone-Word Training Facilitates English listeners' Subsequent tone Categorizations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23b_interspeech.pdf) |
+| 1680 | HumanDiffusion: Diffusion Model using Perceptual Gradients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ueda23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12169-b31b1b.svg)](https://arxiv.org/abs/2306.12169) |
+| 2087 | Queer Events, Relationships, and Sports: Does Topic Influence Speakers' Acoustic Expression of Sexual Orientation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kachel23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1720,12 +1720,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 583 | Factorised Speaker-Environment Adaptive Training of Conformer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14608-b31b1b.svg)](https://arxiv.org/abs/2306.14608) |
-| 1349 | Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23aa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) |
- | 327 | Cross-Lingual Cross-Age Adaptation for Low-Resource Elderly Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/hltchkust/elderly_ser?style=flat)](https://github.com/hltchkust/elderly_ser) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cahyawijaya23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14517-b31b1b.svg)](https://arxiv.org/abs/2306.14517) |
- | 2215 | Modular Domain Adaptation for Conformer-based Streaming ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23fa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13408-b31b1b.svg)](https://arxiv.org/abs/2305.13408) |
- | 2192 | Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhatia23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00453-b31b1b.svg)](https://arxiv.org/abs/2307.00453) |
-| 1282 | SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization | [![GitHub](https://img.shields.io/github/stars/drumpt/SGEM?style=flat)](https://github.com/drumpt/SGEM/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01981-b31b1b.svg)](https://arxiv.org/abs/2306.01981) |
+| 583 | Factorised Speaker-Environment Adaptive Training of Conformer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14608-b31b1b.svg)](https://arxiv.org/abs/2306.14608) |
+| 1349 | Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23aa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) |
+ | 327 | Cross-Lingual Cross-Age Adaptation for Low-Resource Elderly Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/hltchkust/elderly_ser?style=flat)](https://github.com/hltchkust/elderly_ser) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cahyawijaya23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.14517-b31b1b.svg)](https://arxiv.org/abs/2306.14517) |
+ | 2215 | Modular Domain Adaptation for Conformer-based Streaming ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23fa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13408-b31b1b.svg)](https://arxiv.org/abs/2305.13408) |
+ | 2192 | Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhatia23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00453-b31b1b.svg)](https://arxiv.org/abs/2307.00453) |
+| 1282 | SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization | [![GitHub](https://img.shields.io/github/stars/drumpt/SGEM?style=flat)](https://github.com/drumpt/SGEM/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01981-b31b1b.svg)](https://arxiv.org/abs/2306.01981) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1737,32 +1737,32 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 858 | Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions | [![GitHub](https://img.shields.io/github/stars/DigitalPhonetics/IMS-Toucan?style=flat)](https://github.com/DigitalPhonetics/IMS-Toucan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lux23_interspeech.pdf) |
-| 2242 | Dual Audio Encoders based Mandarin Prosodic Boundary Prediction by using Multi-Granularity Prosodic Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ga_interspeech.pdf) |
-| 645 | NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://anonymousdemo.fun/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02448-b31b1b.svg)](https://arxiv.org/abs/2211.02448) |
-| 782 | MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speech11.github.io/MaskedSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23n_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.06170-b31b1b.svg)](https://arxiv.org/abs/2211.06170) |
-| 2469 | Narrator or Character: Voice Modulation in an Expressive Multi-Speaker TTS | [![GitHub](https://img.shields.io/github/stars/tpavankalyan/Storynory?style=flat)](https://github.com/tpavankalyan/Storynory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pavankalyan23_interspeech.pdf) |
-| 843 | CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cui23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00020-b31b1b.svg)](https://arxiv.org/abs/2307.00020) |
-| 1405 | Semi-Supervised Learning for Continuous Emotional Intensity Controllable Speech Synthesis with Disentangled Representations | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://tinyurl.com/2p8vdcnd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.06160-b31b1b.svg)](https://arxiv.org/abs/2211.06160) |
-| 1905 | Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speechbot.github.io/expresso/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nguyen23_interspeech.pdf) |
-| 1460 | ComedicSpeech: Adaptive Text to Speech For Stand-up Comedy in Low-Resource Scenario | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://xh621.github.io/stand-up-comedy-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23fa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12200-b31b1b.svg)](https://arxiv.org/abs/2305.12200) |
-| 1552 | Neural Speech Synthesis with Enriched Phrase Boundaries | [![GitHub](https://img.shields.io/github/stars/mkunes/w2v2_audioFrameClassification?style=flat)](https://github.com/mkunes/w2v2_audioFrameClassification) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kunesova23_interspeech.pdf) |
-| 437 | Cross-Lingual Prosody Transfer for Expressive Machine Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/swiatkowski23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11658-b31b1b.svg)](https://arxiv.org/abs/2306.11658) |
-| 2178 | Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception | [![GitHub](https://img.shields.io/github/stars/MikeyElmers/paper_interspeech23?style=flat)](https://github.com/MikeyElmers/paper_interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elmers23_interspeech.pdf) |
-| 433 | Accentor: An Explicit Lexical Stress Model for TTS Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geneva23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://lml.bas.bg/~stoyan/interspeech2023.pdf) |
-| 1032 | A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ibm.biz/IS23-TBE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shechtman23_interspeech.pdf) |
-| 715 | Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffvar.github.io/DDPM-prosody-predictor/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16749-b31b1b.svg)](https://arxiv.org/abs/2305.16749) |
-| 289 | Prosody Modeling with 3D Visual Information for Expressive Video Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23c_interspeech.pdf) |
-| 1528 | LightClone: Speaker-Guided Parallel Subnet Selection for Few-Shot Voice Cloning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightclone2023.github.io/INTERSPEECH2023-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23f_interspeech.pdf) |
-| 1671 | EE-TTS: Emphatic Expressive TTS with Linguistic Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://expressive-emphatic-ttsdemo.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhong23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12107-b31b1b.svg)](https://arxiv.org/abs/2305.12107) |
-| 1673 | Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ogun23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17724-b31b1b.svg)](https://arxiv.org/abs/2305.17724) |
-| 122 | ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://contextspeech.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00782-b31b1b.svg)](https://arxiv.org/abs/2307.00782) |
-| 1779 | PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://promptstyle.github.io/PromptStyle) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19522-b31b1b.svg)](https://arxiv.org/abs/2305.19522)
-| 1639 | Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffcorrect.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tian23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17436-b31b1b.svg)](https://arxiv.org/abs/2305.17436) |
-| 2453 | A Generative Framework for Conversational Laughter: Its "Language Model" and Laughter Sound Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mori23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03465-b31b1b.svg)](https://arxiv.org/abs/2306.03465) |
-| 1754 | Towards Spontaneous Style Modeling with Semi-Supervised Pre-training for Conversational Text-to-Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-spontaneousTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ba_interspeech.pdf) |
-| 2072 | Beyond Style: Synthesizing Speech with Pragmatic Functions | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.speech.kth.se/tts-demos/beyond_style/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lameris23_interspeech.pdf) |
-| 965 | eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/abbas23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11327-b31b1b.svg)](https://arxiv.org/abs/2306.11327) |
+| 858 | Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions | [![GitHub](https://img.shields.io/github/stars/DigitalPhonetics/IMS-Toucan?style=flat)](https://github.com/DigitalPhonetics/IMS-Toucan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lux23_interspeech.pdf) |
+| 2242 | Dual Audio Encoders based Mandarin Prosodic Boundary Prediction by using Multi-Granularity Prosodic Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ga_interspeech.pdf) |
+| 645 | NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://anonymousdemo.fun/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23i_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02448-b31b1b.svg)](https://arxiv.org/abs/2211.02448) |
+| 782 | MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speech11.github.io/MaskedSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23n_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.06170-b31b1b.svg)](https://arxiv.org/abs/2211.06170) |
+| 2469 | Narrator or Character: Voice Modulation in an Expressive Multi-Speaker TTS | [![GitHub](https://img.shields.io/github/stars/tpavankalyan/Storynory?style=flat)](https://github.com/tpavankalyan/Storynory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pavankalyan23_interspeech.pdf) |
+| 843 | CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cui23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00020-b31b1b.svg)](https://arxiv.org/abs/2307.00020) |
+| 1405 | Semi-Supervised Learning for Continuous Emotional Intensity Controllable Speech Synthesis with Disentangled Representations | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://tinyurl.com/2p8vdcnd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oh23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.06160-b31b1b.svg)](https://arxiv.org/abs/2211.06160) |
+| 1905 | Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speechbot.github.io/expresso/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nguyen23_interspeech.pdf) |
+| 1460 | ComedicSpeech: Adaptive Text to Speech For Stand-up Comedy in Low-Resource Scenario | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://xh621.github.io/stand-up-comedy-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23fa_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12200-b31b1b.svg)](https://arxiv.org/abs/2305.12200) |
+| 1552 | Neural Speech Synthesis with Enriched Phrase Boundaries | [![GitHub](https://img.shields.io/github/stars/mkunes/w2v2_audioFrameClassification?style=flat)](https://github.com/mkunes/w2v2_audioFrameClassification) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kunesova23_interspeech.pdf) |
+| 437 | Cross-Lingual Prosody Transfer for Expressive Machine Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/swiatkowski23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11658-b31b1b.svg)](https://arxiv.org/abs/2306.11658) |
+| 2178 | Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception | [![GitHub](https://img.shields.io/github/stars/MikeyElmers/paper_interspeech23?style=flat)](https://github.com/MikeyElmers/paper_interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elmers23_interspeech.pdf) |
+| 433 | Accentor: An Explicit Lexical Stress Model for TTS Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geneva23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://lml.bas.bg/~stoyan/interspeech2023.pdf) |
+| 1032 | A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ibm.biz/IS23-TBE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shechtman23_interspeech.pdf) |
+| 715 | Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffvar.github.io/DDPM-prosody-predictor/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23j_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16749-b31b1b.svg)](https://arxiv.org/abs/2305.16749) |
+| 289 | Prosody Modeling with 3D Visual Information for Expressive Video Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23c_interspeech.pdf) |
+| 1528 | LightClone: Speaker-Guided Parallel Subnet Selection for Few-Shot Voice Cloning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightclone2023.github.io/INTERSPEECH2023-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23f_interspeech.pdf) |
+| 1671 | EE-TTS: Emphatic Expressive TTS with Linguistic Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://expressive-emphatic-ttsdemo.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhong23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12107-b31b1b.svg)](https://arxiv.org/abs/2305.12107) |
+| 1673 | Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ogun23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17724-b31b1b.svg)](https://arxiv.org/abs/2305.17724) |
+| 122 | ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://contextspeech.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00782-b31b1b.svg)](https://arxiv.org/abs/2307.00782) |
+| 1779 | PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://promptstyle.github.io/PromptStyle) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23t_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19522-b31b1b.svg)](https://arxiv.org/abs/2305.19522)
+| 1639 | Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffcorrect.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tian23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17436-b31b1b.svg)](https://arxiv.org/abs/2305.17436) |
+| 2453 | A Generative Framework for Conversational Laughter: Its "Language Model" and Laughter Sound Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mori23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03465-b31b1b.svg)](https://arxiv.org/abs/2306.03465) |
+| 1754 | Towards Spontaneous Style Modeling with Semi-Supervised Pre-training for Conversational Text-to-Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-spontaneousTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ba_interspeech.pdf) |
+| 2072 | Beyond Style: Synthesizing Speech with Pragmatic Functions | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.speech.kth.se/tts-demos/beyond_style/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lameris23_interspeech.pdf) |
+| 965 | eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/abbas23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11327-b31b1b.svg)](https://arxiv.org/abs/2306.11327) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1774,12 +1774,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1146 | BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://soumitri2001.github.io/BeAts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deb23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02680-b31b1b.svg)](https://arxiv.org/abs/2306.02680) |
-| 370 | Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech based on Metric Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kashiwagi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14203-b31b1b.svg)](https://arxiv.org/abs/2305.14203) |
-| 989 | Whistle-to-Text: Automatic Recognition of the Silbo Gomero Whistled Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jakubiak23_interspeech.pdf) |
-| 663 | A Novel Interpretable and Generalizable Re-Synchronization Model for Cued Speech based on a Multi-Cuer Corpus | [![GitHub](https://img.shields.io/github/stars/lufei321/ReSync-CS?style=flat)](https://github.com/lufei321/ReSync-CS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02596-b31b1b.svg)](https://arxiv.org/abs/2306.02596) |
-| 668 | Visually Grounded Few-Shot Word Acquisition with Fewer Shots | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nortje23_interspeech.pdf)  <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15937-b31b1b.svg)](https://arxiv.org/abs/2305.15937) |
-| 183 | JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23_interspeech.pdf) |
+| 1146 | BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://soumitri2001.github.io/BeAts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deb23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02680-b31b1b.svg)](https://arxiv.org/abs/2306.02680) |
+| 370 | Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech based on Metric Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kashiwagi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14203-b31b1b.svg)](https://arxiv.org/abs/2305.14203) |
+| 989 | Whistle-to-Text: Automatic Recognition of the Silbo Gomero Whistled Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jakubiak23_interspeech.pdf) |
+| 663 | A Novel Interpretable and Generalizable Re-Synchronization Model for Cued Speech based on a Multi-Cuer Corpus | [![GitHub](https://img.shields.io/github/stars/lufei321/ReSync-CS?style=flat)](https://github.com/lufei321/ReSync-CS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02596-b31b1b.svg)](https://arxiv.org/abs/2306.02596) |
+| 668 | Visually Grounded Few-Shot Word Acquisition with Fewer Shots | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nortje23_interspeech.pdf)  <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15937-b31b1b.svg)](https://arxiv.org/abs/2305.15937) |
+| 183 | JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1791,12 +1791,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1485 | Prompt Guided Copy Mechanism for Conversational Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23z_interspeech.pdf) |
-| 1240 | Composing Spoken Hints for Follow-on Question Suggestion in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/faustini23_interspeech.pdf) |
-| 1391 | On Monotonic Aggregation for Open-Domain QA | [![GitHub](https://img.shields.io/github/stars/YeonseokJeong/Judge-Specialist?style=flat)](https://github.com/YeonseokJeong/Judge-Specialist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/han23c_interspeech.pdf) |
-| 2240 | Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nguyen23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02196-b31b1b.svg)](https://arxiv.org/abs/2306.02196) |
-| 1606 | Multi-Scale Attention for Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/GeWu-Lab/MWAFM?style=flat)](https://github.com/GeWu-Lab/MWAFM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17993-b31b1b.svg)](https://arxiv.org/abs/2305.17993) |
-| 539 | Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23f_interspeech.pdf) |
+| 1485 | Prompt Guided Copy Mechanism for Conversational Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23z_interspeech.pdf) |
+| 1240 | Composing Spoken Hints for Follow-on Question Suggestion in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/faustini23_interspeech.pdf) |
+| 1391 | On Monotonic Aggregation for Open-Domain QA | [![GitHub](https://img.shields.io/github/stars/YeonseokJeong/Judge-Specialist?style=flat)](https://github.com/YeonseokJeong/Judge-Specialist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/han23c_interspeech.pdf) |
+| 2240 | Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nguyen23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02196-b31b1b.svg)](https://arxiv.org/abs/2306.02196) |
+| 1606 | Multi-Scale Attention for Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/GeWu-Lab/MWAFM?style=flat)](https://github.com/GeWu-Lab/MWAFM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17993-b31b1b.svg)](https://arxiv.org/abs/2305.17993) |
+| 539 | Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23f_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1808,22 +1808,22 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1749 | SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zeng23c_interspeech.pdf) |
-| 1530 | Overlap aware Continuous Speech Separation without Permutation Invariant Training Linfeng | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23c_interspeech.pdf) |
-| 1952 | Cascaded Encoders for Fine-Tuning ASR Models on Overlapped Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rose23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16398-b31b1b.svg)](https://arxiv.org/abs/2306.16398) |
-| 2069 | TokenSplit: using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/erdogan23_interspeech.pdf) |
-| 1422 | Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16263-b31b1b.svg)](https://arxiv.org/abs/2305.16263) |
-| 2098 | Time-Domain Transformer-based Audiovisual Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ahmadikalkhorani23_interspeech.pdf) |
-| 628 | Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13580-b31b1b.svg)](https://arxiv.org/abs/2305.13580) |
-| 1502 | Unsupervised Adaptation with Quality-aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/niu23_interspeech.pdf) |
-| 1521 | BA-SOT: Boundary-aware Serialized Output Training for Multi-Talker ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13716-b31b1b.svg)](https://arxiv.org/abs/2305.13716) |
-| 1172 | Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23e_interspeech.pdf) |
-| 975 | Joint Compensation of Multi-Talker Noise and Reverberation for Speech Enhancement with Cochlear Implants using One or More Microphones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gaultier23_interspeech.pdf) |
-| 494 | Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yousefi23_interspeech.pdf) |
-| 42 | GPU-accelerated Guided Source Separation for Meeting Transcription | [![GitHub](https://img.shields.io/github/stars/desh2608/gss?style=flat)](https://github.com/desh2608/gss) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/raj23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.05271-b31b1b.svg)](https://arxiv.org/abs/2212.05271) |
-| 1280 | Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Emrys365/fairseq/tree/wavlm/examples/tshubert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16286-b31b1b.svg)](https://arxiv.org/abs/2305.16286) |
-| 2076 | Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23j_interspeech.pdf) |
-| 1815 | Mixture Encoder for Joint Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/berger23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12173-b31b1b.svg)](https://arxiv.org/abs/2306.12173) |
+| 1749 | SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zeng23c_interspeech.pdf) |
+| 1530 | Overlap aware Continuous Speech Separation without Permutation Invariant Training Linfeng | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23c_interspeech.pdf) |
+| 1952 | Cascaded Encoders for Fine-Tuning ASR Models on Overlapped Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rose23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16398-b31b1b.svg)](https://arxiv.org/abs/2306.16398) |
+| 2069 | TokenSplit: using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/erdogan23_interspeech.pdf) |
+| 1422 | Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16263-b31b1b.svg)](https://arxiv.org/abs/2305.16263) |
+| 2098 | Time-Domain Transformer-based Audiovisual Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ahmadikalkhorani23_interspeech.pdf) |
+| 628 | Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/delcroix23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13580-b31b1b.svg)](https://arxiv.org/abs/2305.13580) |
+| 1502 | Unsupervised Adaptation with Quality-aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/niu23_interspeech.pdf) |
+| 1521 | BA-SOT: Boundary-aware Serialized Output Training for Multi-Talker ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13716-b31b1b.svg)](https://arxiv.org/abs/2305.13716) |
+| 1172 | Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23e_interspeech.pdf) |
+| 975 | Joint Compensation of Multi-Talker Noise and Reverberation for Speech Enhancement with Cochlear Implants using One or More Microphones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gaultier23_interspeech.pdf) |
+| 494 | Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yousefi23_interspeech.pdf) |
+| 42 | GPU-accelerated Guided Source Separation for Meeting Transcription | [![GitHub](https://img.shields.io/github/stars/desh2608/gss?style=flat)](https://github.com/desh2608/gss) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/raj23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.05271-b31b1b.svg)](https://arxiv.org/abs/2212.05271) |
+| 1280 | Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Emrys365/fairseq/tree/wavlm/examples/tshubert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16286-b31b1b.svg)](https://arxiv.org/abs/2305.16286) |
+| 2076 | Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23j_interspeech.pdf) |
+| 1815 | Mixture Encoder for Joint Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/berger23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.12173-b31b1b.svg)](https://arxiv.org/abs/2306.12173) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1835,10 +1835,10 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 206 | Aberystwyth English Pre-Aspiration in Apparent Time |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hejna23_interspeech.pdf) |
-| 1154 | Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23c_interspeech.pdf) |
-| 1414 | Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/steiner23_interspeech.pdf) |
-| 1704 | Vowel Normalisation in Latent Space for Sociolinguistics |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/burridge23_interspeech.pdf) |
+| 206 | Aberystwyth English Pre-Aspiration in Apparent Time |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hejna23_interspeech.pdf) |
+| 1154 | Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23c_interspeech.pdf) |
+| 1414 | Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/steiner23_interspeech.pdf) |
+| 1704 | Vowel Normalisation in Latent Space for Sociolinguistics |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/burridge23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1850,12 +1850,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1228 | Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23n_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10704-b31b1b.svg)](https://arxiv.org/abs/2305.10704) |
-| 1447 | Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lahiri23_interspeech.pdf) |
-| 2367 | The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://displace2023.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baghel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00830-b31b1b.svg)](https://arxiv.org/abs/2303.00830) |
-| 1982 | Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/paturi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.09313-b31b1b.svg)](https://arxiv.org/abs/2306.09313) |
-| 1839 | The SpeeD-ZevoTech Submission at DISPLACE 2023 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pirlogeanu23_interspeech.pdf) |
-| 656 | End-to-End Neural Speaker Diarization with Absolute Speaker Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23g_interspeech.pdf) |
+| 1228 | Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23n_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10704-b31b1b.svg)](https://arxiv.org/abs/2305.10704) |
+| 1447 | Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lahiri23_interspeech.pdf) |
+| 2367 | The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://displace2023.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baghel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00830-b31b1b.svg)](https://arxiv.org/abs/2303.00830) |
+| 1982 | Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/paturi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.09313-b31b1b.svg)](https://arxiv.org/abs/2306.09313) |
+| 1839 | The SpeeD-ZevoTech Submission at DISPLACE 2023 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pirlogeanu23_interspeech.pdf) |
+| 656 | End-to-End Neural Speaker Diarization with Absolute Speaker Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23g_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1867,12 +1867,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1402 | Towards Single Integrated Spoofing-aware Speaker Verification Embeddings | [![GitHub](https://img.shields.io/github/stars/sasv-challenge/ASVSpoof5-SASVBaseline?style=flat)](https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mun23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19051-b31b1b.svg)](https://arxiv.org/abs/2305.19051) |
-| 1352 | Pseudo-Siamese Network based Timbre-Reserved Black-Box Adversarial Attack in Speaker Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ba_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19020-b31b1b.svg)](https://arxiv.org/abs/2305.19020) |
-| 2335 | Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion | [![GitHub](https://img.shields.io/github/stars/ttslr/M2S-ADD?style=flat)](https://github.com/ttslr/M2S-ADD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16353-b31b1b.svg)](https://arxiv.org/abs/2305.16353) |
-| 1166 | Robust Audio Anti-Spoofing Countermeasure with Joint Training of Front-end and Back-end and Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23v_interspeech.pdf) |
-| 1537 | Improved DeepFake Detection using Whisper Features | [![GitHub](https://img.shields.io/github/stars/piotrkawa/deepfake-whisper-features?style=flat)](https://github.com/piotrkawa/deepfake-whisper-features) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kawa23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01428-b31b1b.svg)](https://arxiv.org/abs/2306.01428) |
-| 371 | DoubleDeceiver: Deceiving the Speaker Verification System Protected by Spoofing Countermeasures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23c_interspeech.pdf) |
+| 1402 | Towards Single Integrated Spoofing-aware Speaker Verification Embeddings | [![GitHub](https://img.shields.io/github/stars/sasv-challenge/ASVSpoof5-SASVBaseline?style=flat)](https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mun23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19051-b31b1b.svg)](https://arxiv.org/abs/2305.19051) |
+| 1352 | Pseudo-Siamese Network based Timbre-Reserved Black-Box Adversarial Attack in Speaker Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ba_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19020-b31b1b.svg)](https://arxiv.org/abs/2305.19020) |
+| 2335 | Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion | [![GitHub](https://img.shields.io/github/stars/ttslr/M2S-ADD?style=flat)](https://github.com/ttslr/M2S-ADD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23v_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16353-b31b1b.svg)](https://arxiv.org/abs/2305.16353) |
+| 1166 | Robust Audio Anti-Spoofing Countermeasure with Joint Training of Front-end and Back-end and Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23v_interspeech.pdf) |
+| 1537 | Improved DeepFake Detection using Whisper Features | [![GitHub](https://img.shields.io/github/stars/piotrkawa/deepfake-whisper-features?style=flat)](https://github.com/piotrkawa/deepfake-whisper-features) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kawa23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01428-b31b1b.svg)](https://arxiv.org/abs/2306.01428) |
+| 371 | DoubleDeceiver: Deceiving the Speaker Verification System Protected by Spoofing Countermeasures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23c_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1884,12 +1884,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2209 | On Training a Neural Residual Acoustic echo Suppressor for Improved ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/panchapagesan23_interspeech.pdf) |
-| 1429 | Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jmlemercier.github.io/2023/05/30/interspeech2023.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lemercier23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00529-b31b1b.svg)](https://arxiv.org/abs/2303.00529) |
-| 378 | UnSE: Unsupervised Speech Enhancement using Optimal Transport | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jiang-wenbin.github.io/UnSE/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23b_interspeech.pdf) |
-| 1130 | MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rookiejunchen.github.io/MC-SpEx_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16250-b31b1b.svg)](https://arxiv.org/abs/2306.16250) |
-| 2177 | Causal Signal-based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bartolewska23_interspeech.pdf) |
-| 1511 | Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23q_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08454-b31b1b.svg)](https://arxiv.org/abs/2306.08454) |
+| 2209 | On Training a Neural Residual Acoustic echo Suppressor for Improved ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/panchapagesan23_interspeech.pdf) |
+| 1429 | Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jmlemercier.github.io/2023/05/30/interspeech2023.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lemercier23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00529-b31b1b.svg)](https://arxiv.org/abs/2303.00529) |
+| 378 | UnSE: Unsupervised Speech Enhancement using Optimal Transport | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jiang-wenbin.github.io/UnSE/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23b_interspeech.pdf) |
+| 1130 | MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rookiejunchen.github.io/MC-SpEx_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.16250-b31b1b.svg)](https://arxiv.org/abs/2306.16250) |
+| 2177 | Causal Signal-based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bartolewska23_interspeech.pdf) |
+| 1511 | Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23q_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08454-b31b1b.svg)](https://arxiv.org/abs/2306.08454) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1901,12 +1901,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2183 | A Metric-Driven Approach to Conformer Layer Pruning for Efficient ASR Inference | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bekal23_interspeech.pdf) |
-| 1981 | Distillation Strategies for Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gurunathshivakumar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.09452-b31b1b.svg)](https://arxiv.org/abs/2306.09452) |
-| 969 | Another Point of View on Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pouthier23_interspeech.pdf) |
-| 1062 | RASR2: The RWTH ASR Toolkit for Generic Sequence-to-Sequence Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/rwth-i6/rasr/tree/generic-seq2seq-decoder) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17782-b31b1b.svg)](https://arxiv.org/abs/2305.17782) |
-| 486 | Streaming Speech-to-Confusion Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/filimonov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03778-b31b1b.svg)](https://arxiv.org/abs/2306.03778) |
-| 809 | Accurate and Structured Pruning for Efficient Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19549-b31b1b.svg)](https://arxiv.org/abs/2305.19549) |
+| 2183 | A Metric-Driven Approach to Conformer Layer Pruning for Efficient ASR Inference | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bekal23_interspeech.pdf) |
+| 1981 | Distillation Strategies for Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gurunathshivakumar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.09452-b31b1b.svg)](https://arxiv.org/abs/2306.09452) |
+| 969 | Another Point of View on Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pouthier23_interspeech.pdf) |
+| 1062 | RASR2: The RWTH ASR Toolkit for Generic Sequence-to-Sequence Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/rwth-i6/rasr/tree/generic-seq2seq-decoder) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17782-b31b1b.svg)](https://arxiv.org/abs/2305.17782) |
+| 486 | Streaming Speech-to-Confusion Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/filimonov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03778-b31b1b.svg)](https://arxiv.org/abs/2306.03778) |
+| 809 | Accurate and Structured Pruning for Efficient Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19549-b31b1b.svg)](https://arxiv.org/abs/2305.19549) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1918,11 +1918,11 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1446 | MERLIon CCS Challenge: A English-Mandarin Code-Switching Child-directed Speech Corpus for Language Identification and Diarization | [![GitHub](https://img.shields.io/github/stars/MERLIon-Challenge/merlion-ccs-2023?style=flat)](https://github.com/MERLIon-Challenge/merlion-ccs-2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chua23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18881-b31b1b.svg)](https://arxiv.org/abs/2305.18881) |
-| 1335 | Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech | [![GitHub](https://img.shields.io/github/stars/shashikg/LID-Code-Switching?style=flat)](https://github.com/shashikg/LID-Code-Switching) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gupta23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00736-b31b1b.svg)](https://arxiv.org/abs/2306.00736) |
-| 1707 | Investigating Model Performance in Language Identification: beyond Simple Error Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/styles23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18925-b31b1b.svg)](https://arxiv.org/abs/2305.18925) |
-| 2533 | Improving Wav2vec2-based Spoken Language Identification by Learning Phonological Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shahin23_interspeech.pdf) |
-| 2047 | Language Identification Networks for Multilingual Everyday Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/praveen23_interspeech.pdf) |
+| 1446 | MERLIon CCS Challenge: A English-Mandarin Code-Switching Child-directed Speech Corpus for Language Identification and Diarization | [![GitHub](https://img.shields.io/github/stars/MERLIon-Challenge/merlion-ccs-2023?style=flat)](https://github.com/MERLIon-Challenge/merlion-ccs-2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chua23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18881-b31b1b.svg)](https://arxiv.org/abs/2305.18881) |
+| 1335 | Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech | [![GitHub](https://img.shields.io/github/stars/shashikg/LID-Code-Switching?style=flat)](https://github.com/shashikg/LID-Code-Switching) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gupta23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00736-b31b1b.svg)](https://arxiv.org/abs/2306.00736) |
+| 1707 | Investigating Model Performance in Language Identification: beyond Simple Error Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/styles23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18925-b31b1b.svg)](https://arxiv.org/abs/2305.18925) |
+| 2533 | Improving Wav2vec2-based Spoken Language Identification by Learning Phonological Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shahin23_interspeech.pdf) |
+| 2047 | Language Identification Networks for Multilingual Everyday Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/praveen23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1934,12 +1934,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2038 | Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kodali23_interspeech.pdf) |
-| 1668 | The Effect of Clinical Intervention on the Speech of Individuals with PTSD: Features and Recognition Performances | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kathan23_interspeech.pdf) |
-| 470 | Analysis and Automatic Prediction of Exertion from Speech: Contrasting Objective and Subjective Measures Collected while Running | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/triantafyllopoulos23_interspeech.pdf) |
-| 894 | The Androids Corpus: A New Publicly Available Benchmark for Speech based Depression Detection | [![GitHub](https://img.shields.io/github/stars/androidscorpus/data?style=flat)](https://github.com/androidscorpus/data) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tao23_interspeech.pdf) |
-| 658 | Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eni23_interspeech.pdf) |
-| 839 | Acoustic Characteristics of Depression in Older Adults' Speech: the Role of Covariates | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mijnders23_interspeech.pdf) |
+| 2038 | Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kodali23_interspeech.pdf) |
+| 1668 | The Effect of Clinical Intervention on the Speech of Individuals with PTSD: Features and Recognition Performances | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kathan23_interspeech.pdf) |
+| 470 | Analysis and Automatic Prediction of Exertion from Speech: Contrasting Objective and Subjective Measures Collected while Running | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/triantafyllopoulos23_interspeech.pdf) |
+| 894 | The Androids Corpus: A New Publicly Available Benchmark for Speech based Depression Detection | [![GitHub](https://img.shields.io/github/stars/androidscorpus/data?style=flat)](https://github.com/androidscorpus/data) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tao23_interspeech.pdf) |
+| 658 | Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eni23_interspeech.pdf) |
+| 839 | Acoustic Characteristics of Depression in Older Adults' Speech: the Role of Covariates | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mijnders23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1951,10 +1951,10 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 943 | Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18753-b31b1b.svg)](https://arxiv.org/abs/2305.18753) |
-| 1564 | Adapting a ConvNeXt Model to Audio Classification on AudioSet | [![GitHub](https://img.shields.io/github/stars/topel/audioset-convnext-inf?style=flat)](https://github.com/topel/audioset-convnext-inf) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pellegrini23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00830-b31b1b.svg)](https://arxiv.org/abs/2306.00830) |
-| 1610 | Few-Shot Class-Incremental Audio Classification using Stochastic Classifier | [![GitHub](https://img.shields.io/github/stars/vinceasvp/meta-sc?style=flat)](https://github.com/vinceasvp/meta-sc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02053-b31b1b.svg)](https://arxiv.org/abs/2306.02053) |
-| 1614 | Enhance Temporal Relations in Audio Captioning with Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01533-b31b1b.svg)](https://arxiv.org/abs/2306.01533) |
+| 943 | Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18753-b31b1b.svg)](https://arxiv.org/abs/2305.18753) |
+| 1564 | Adapting a ConvNeXt Model to Audio Classification on AudioSet | [![GitHub](https://img.shields.io/github/stars/topel/audioset-convnext-inf?style=flat)](https://github.com/topel/audioset-convnext-inf) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pellegrini23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00830-b31b1b.svg)](https://arxiv.org/abs/2306.00830) |
+| 1610 | Few-Shot Class-Incremental Audio Classification using Stochastic Classifier | [![GitHub](https://img.shields.io/github/stars/vinceasvp/meta-sc?style=flat)](https://github.com/vinceasvp/meta-sc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23w_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02053-b31b1b.svg)](https://arxiv.org/abs/2306.02053) |
+| 1614 | Enhance Temporal Relations in Audio Captioning with Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23d_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01533-b31b1b.svg)](https://arxiv.org/abs/2306.01533) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1966,28 +1966,28 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 407 | Epoch-based Spectrum Estimation for Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/cadia-lvl/ebs/tree/interspeech2023/) <br /> [![GitHub](https://img.shields.io/github/stars/cadia-lvl/ebs?style=flat)](https://github.com/cadia-lvl/ebs/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gunason23_interspeech.pdf) |
-| 1996 | OverFlow: Putting Flows on Top of Neural Transducers for Better TTS | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://shivammehta25.github.io/OverFlow/) <br /> [![GitHub](https://img.shields.io/github/stars/shivammehta25/OverFlow?style=flat)](https://github.com/shivammehta25/OverFlow) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mehta23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.06892-b31b1b.svg)](https://arxiv.org/abs/2211.06892) |
-| 1568 | AdapterMix: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | [![GitHub](https://img.shields.io/github/stars/declare-lab/adapter-mix?style=flat)](https://github.com/declare-lab/adapter-mix) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mehrish23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18028-b31b1b.svg)](https://arxiv.org/abs/2305.18028) |
-| 506 | Prior-Free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23c_interspeech.pdf) |
-| 367 | UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/iashchenko23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00721-b31b1b.svg)](https://arxiv.org/abs/2306.00721) |
-| 1301 | Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/SparseTTS-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23_interspeech.pdf) |
-| 1151 | Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gwh22.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/guan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04301-b31b1b.svg)](https://arxiv.org/abs/2306.04301) |
-| 879 | Towards Robust FastSpeech 2 by Modelling Residual Multimodality | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sony.github.io/ai-research-code/tvcgmm/project_page/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kogel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01442-b31b1b.svg)](https://arxiv.org/abs/2306.01442) |
-| 1137 | Real Time Spectrogram Inversion on Mobile Phone | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/google-research/google-research/tree/master/specinvert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rybakov23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.00756-b31b1b.svg)](https://arxiv.org/abs/2203.00756) |
-| 58 | Automatic Tuning of Loss Trade-offs without Hyper-Parameter Search in End-to-End Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cnaigithub.github.io/Auto_Tuning_Zeroshot_TTS_and_VC/) <br /> [![GitHub](https://img.shields.io/github/stars/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC?style=flat)](https://github.com/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16699-b31b1b.svg)](https://arxiv.org/abs/2305.16699) |
-| 2056 | A Low-Resource Pipeline for Text-to-Speech from Found Data With Application to Scottish Gaelic | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/dan-wells/kiss-aligner/tree/main/egs/learngaelic_litir) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wells23_interspeech.pdf) |
-| 2173 | Self-Supervised Solution to the Control Problem of Articulatory Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tensortract.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/krug23_interspeech.pdf) |
-| 1128 | Hierarchical Timbre-Cadence Speaker Encoder for Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://srtts.github.io/tc-zstts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23f_interspeech.pdf) |
-| 754 | ZET-Speech: Zero-Shot adaptive Emotion-Controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zet-speech.github.io/ZET-Speech-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13831-b31b1b.svg)](https://arxiv.org/abs/2305.13831) |
-| 690 | Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://muyangdu.github.io/WaveRNN-Heuristic-Dynamic-Blending/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/du23_interspeech.pdf) |
-| 194 | Intelligible Lip-to-Speech Synthesis with Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://choijeongsoo.github.io/lip2speech-unit/) <br /> [![GitHub](https://img.shields.io/github/stars/choijeongsoo/lip2speech-unit?style=flat)](https://github.com/choijeongsoo/lip2speech-unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19603-b31b1b.svg)](https://arxiv.org/abs/2305.19603) |
-| 1212 | Parameter-Efficient Learning for Text-to-Speech Accent Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tts-research.github.io) <br /> [![GitHub](https://img.shields.io/github/stars/TTS-Research/PEL-TTS?style=flat)](https://github.com/TTS-Research/PEL-TTS) <br /> [![GitHub](https://img.shields.io/github/stars/Li-JEN/PEL-accent-adaptaion?style=flat)](https://github.com/Li-JEN/PEL-accent-adaptaion) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11320-b31b1b.svg)](https://arxiv.org/abs/2305.11320) |
-| 820 | Controlling Formant Frequencies with Neural Text-to-Speech for the Manipulation of Perceived Speaker Age | [![GitHub](https://img.shields.io/github/stars/ziafkhan/FastPitch?style=flat)](https://github.com/ziafkhan/FastPitch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/khan23_interspeech.pdf) |
-| 2379 | FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder with Multiple STFTs | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://kallavinka8045.github.io/is2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10823-b31b1b.svg)](https://arxiv.org/abs/2305.10823) |
-| 1726 | iSTFTNet2: Faster and more Lightweight iSTFT-based Neural Vocoder using 1D-2D CNN | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kaneko23_interspeech.pdf) |
-| 534 | VITS2: Improving Quality and Efficiency of Single Stage Text to Speech with Adversarial Learning and Architecture Design | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://vits-2.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kong23_interspeech.pdf) |
-| 1175 | Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luong23_interspeech.pdf) |
+| 407 | Epoch-based Spectrum Estimation for Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/cadia-lvl/ebs/tree/interspeech2023/) <br /> [![GitHub](https://img.shields.io/github/stars/cadia-lvl/ebs?style=flat)](https://github.com/cadia-lvl/ebs/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gunason23_interspeech.pdf) |
+| 1996 | OverFlow: Putting Flows on Top of Neural Transducers for Better TTS | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://shivammehta25.github.io/OverFlow/) <br /> [![GitHub](https://img.shields.io/github/stars/shivammehta25/OverFlow?style=flat)](https://github.com/shivammehta25/OverFlow) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mehta23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.06892-b31b1b.svg)](https://arxiv.org/abs/2211.06892) |
+| 1568 | AdapterMix: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | [![GitHub](https://img.shields.io/github/stars/declare-lab/adapter-mix?style=flat)](https://github.com/declare-lab/adapter-mix) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mehrish23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18028-b31b1b.svg)](https://arxiv.org/abs/2305.18028) |
+| 506 | Prior-Free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23c_interspeech.pdf) |
+| 367 | UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/iashchenko23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00721-b31b1b.svg)](https://arxiv.org/abs/2306.00721) |
+| 1301 | Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/SparseTTS-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23_interspeech.pdf) |
+| 1151 | Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gwh22.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/guan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.04301-b31b1b.svg)](https://arxiv.org/abs/2306.04301) |
+| 879 | Towards Robust FastSpeech 2 by Modelling Residual Multimodality | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sony.github.io/ai-research-code/tvcgmm/project_page/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kogel23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01442-b31b1b.svg)](https://arxiv.org/abs/2306.01442) |
+| 1137 | Real Time Spectrogram Inversion on Mobile Phone | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/google-research/google-research/tree/master/specinvert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rybakov23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2203.00756-b31b1b.svg)](https://arxiv.org/abs/2203.00756) |
+| 58 | Automatic Tuning of Loss Trade-offs without Hyper-Parameter Search in End-to-End Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cnaigithub.github.io/Auto_Tuning_Zeroshot_TTS_and_VC/) <br /> [![GitHub](https://img.shields.io/github/stars/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC?style=flat)](https://github.com/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16699-b31b1b.svg)](https://arxiv.org/abs/2305.16699) |
+| 2056 | A Low-Resource Pipeline for Text-to-Speech from Found Data With Application to Scottish Gaelic | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/dan-wells/kiss-aligner/tree/main/egs/learngaelic_litir) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wells23_interspeech.pdf) |
+| 2173 | Self-Supervised Solution to the Control Problem of Articulatory Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tensortract.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/krug23_interspeech.pdf) |
+| 1128 | Hierarchical Timbre-Cadence Speaker Encoder for Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://srtts.github.io/tc-zstts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23f_interspeech.pdf) |
+| 754 | ZET-Speech: Zero-Shot adaptive Emotion-Controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zet-speech.github.io/ZET-Speech-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13831-b31b1b.svg)](https://arxiv.org/abs/2305.13831) |
+| 690 | Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://muyangdu.github.io/WaveRNN-Heuristic-Dynamic-Blending/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/du23_interspeech.pdf) |
+| 194 | Intelligible Lip-to-Speech Synthesis with Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://choijeongsoo.github.io/lip2speech-unit/) <br /> [![GitHub](https://img.shields.io/github/stars/choijeongsoo/lip2speech-unit?style=flat)](https://github.com/choijeongsoo/lip2speech-unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19603-b31b1b.svg)](https://arxiv.org/abs/2305.19603) |
+| 1212 | Parameter-Efficient Learning for Text-to-Speech Accent Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tts-research.github.io) <br /> [![GitHub](https://img.shields.io/github/stars/TTS-Research/PEL-TTS?style=flat)](https://github.com/TTS-Research/PEL-TTS) <br /> [![GitHub](https://img.shields.io/github/stars/Li-JEN/PEL-accent-adaptaion?style=flat)](https://github.com/Li-JEN/PEL-accent-adaptaion) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23p_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11320-b31b1b.svg)](https://arxiv.org/abs/2305.11320) |
+| 820 | Controlling Formant Frequencies with Neural Text-to-Speech for the Manipulation of Perceived Speaker Age | [![GitHub](https://img.shields.io/github/stars/ziafkhan/FastPitch?style=flat)](https://github.com/ziafkhan/FastPitch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/khan23_interspeech.pdf) |
+| 2379 | FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder with Multiple STFTs | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://kallavinka8045.github.io/is2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jang23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10823-b31b1b.svg)](https://arxiv.org/abs/2305.10823) |
+| 1726 | iSTFTNet2: Faster and more Lightweight iSTFT-based Neural Vocoder using 1D-2D CNN | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kaneko23_interspeech.pdf) |
+| 534 | VITS2: Improving Quality and Efficiency of Single Stage Text to Speech with Adversarial Learning and Architecture Design | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://vits-2.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kong23_interspeech.pdf) |
+| 1175 | Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luong23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -1999,12 +1999,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1608 | HierVST: Hierarchical Adaptive Zero-Shot Voice Style Transfer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hiervst.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23i_interspeech.pdf) |
-| 391 | VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhangyongmao.github.io/VISinger2/) <br /> [![GitHub](https://img.shields.io/github/stars/zhangyongmao/VISinger2?style=flat)](https://github.com/zhangyongmao/VISinger2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02903-b31b1b.svg)](https://arxiv.org/abs/2211.02903) |
-| 700 | EdenTTS: A Simple and Efficient Parallel Text-to-Speech Architecture with Collaborative Duration-Alignment Learning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://edenynm.github.io/edentts-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/younengma/eden-tts?style=flat)](https://github.com/younengma/eden-tts)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23c_interspeech.pdf) |
-| 368 | Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gzs-tv.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23c_interspeech.pdf) |
-| 1020 | Speech Inpainting: Context-based Speech Synthesis Guided by Video | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ipcv.github.io/avsi/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/montesinos23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00489-b31b1b.svg)](https://arxiv.org/abs/2306.00489) |
-| 2243 | STEN-TTS: Improving Zero-Shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23d_interspeech.pdf) |
+| 1608 | HierVST: Hierarchical Adaptive Zero-Shot Voice Style Transfer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hiervst.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23i_interspeech.pdf) |
+| 391 | VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhangyongmao.github.io/VISinger2/) <br /> [![GitHub](https://img.shields.io/github/stars/zhangyongmao/VISinger2?style=flat)](https://github.com/zhangyongmao/VISinger2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.02903-b31b1b.svg)](https://arxiv.org/abs/2211.02903) |
+| 700 | EdenTTS: A Simple and Efficient Parallel Text-to-Speech Architecture with Collaborative Duration-Alignment Learning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://edenynm.github.io/edentts-demo/) <br /> [![GitHub](https://img.shields.io/github/stars/younengma/eden-tts?style=flat)](https://github.com/younengma/eden-tts)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23c_interspeech.pdf) |
+| 368 | Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gzs-tv.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23c_interspeech.pdf) |
+| 1020 | Speech Inpainting: Context-based Speech Synthesis Guided by Video | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ipcv.github.io/avsi/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/montesinos23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00489-b31b1b.svg)](https://arxiv.org/abs/2306.00489) |
+| 2243 | STEN-TTS: Improving Zero-Shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23d_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2016,12 +2016,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 933 | Average Token Delay: A Latency Metric for Simultaneous Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kano23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.13173-b31b1b.svg)](https://arxiv.org/abs/2211.13173) |
-| 1450 | Automatic Speech Recognition Transformer with Global Contextual Information Decoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qian23_interspeech.pdf) |
-| 1333 | Time-Synchronous One-Pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sudo23c_interspeech.pdf) |
-| 2065 | Prefix Search Decoding for RNN Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/praveen23b_interspeech.pdf) |
-| 78 | WhisperX: Time-Accurate Speech Transcription of Long-Form Audio | [![GitHub](https://img.shields.io/github/stars/m-bain/whisperX?style=flat)](https://github.com/m-bain/whisperX) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bain23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00747-b31b1b.svg)](https://arxiv.org/abs/2303.00747) |
-| 2449 | Implementing Contextual Biasing in GPU Decoder for Online ASR | [![GitHub](https://img.shields.io/github/stars/idiap/contextual-biasing-on-gpus?style=flat)](https://github.com/idiap/contextual-biasing-on-gpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nigmatulina23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15685-b31b1b.svg)](https://arxiv.org/abs/2306.15685) |
+| 933 | Average Token Delay: A Latency Metric for Simultaneous Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kano23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.13173-b31b1b.svg)](https://arxiv.org/abs/2211.13173) |
+| 1450 | Automatic Speech Recognition Transformer with Global Contextual Information Decoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qian23_interspeech.pdf) |
+| 1333 | Time-Synchronous One-Pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sudo23c_interspeech.pdf) |
+| 2065 | Prefix Search Decoding for RNN Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/praveen23b_interspeech.pdf) |
+| 78 | WhisperX: Time-Accurate Speech Transcription of Long-Form Audio | [![GitHub](https://img.shields.io/github/stars/m-bain/whisperX?style=flat)](https://github.com/m-bain/whisperX) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bain23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00747-b31b1b.svg)](https://arxiv.org/abs/2303.00747) |
+| 2449 | Implementing Contextual Biasing in GPU Decoder for Online ASR | [![GitHub](https://img.shields.io/github/stars/idiap/contextual-biasing-on-gpus?style=flat)](https://github.com/idiap/contextual-biasing-on-gpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nigmatulina23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15685-b31b1b.svg)](https://arxiv.org/abs/2306.15685) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2033,12 +2033,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2487 | MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-Level Feature Fusion | [![GitHub](https://img.shields.io/github/stars/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch?style=flat)](https://github.com/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chung23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.09640-b31b1b.svg)](https://arxiv.org/abs/2306.09640) |
-| 2211 | Enhancing Speech Articulation Analysis using A Geometric Transformation of the X-ray Microbeam Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/attia23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10775-b31b1b.svg)](https://arxiv.org/abs/2305.10775) |
-| 1729 | Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jouaiti23_interspeech.pdf) |
-| 283 | Improved Contextualized Speech Representations for Tonal Analysis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuan23_interspeech.pdf) |
-| 1738 | A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chandrasekar23_interspeech.pdf) <br /> [![idiap](https://img.shields.io/badge/idiap.ch.5064-FF6A00.svg)](https://publications.idiap.ch/index.php/publications/show/5064) |
-| 2229 | FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eren23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.seas.ucla.edu/spapl/paper/Eray_IS_2023.pdf) |
+| 2487 | MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-Level Feature Fusion | [![GitHub](https://img.shields.io/github/stars/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch?style=flat)](https://github.com/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chung23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.09640-b31b1b.svg)](https://arxiv.org/abs/2306.09640) |
+| 2211 | Enhancing Speech Articulation Analysis using A Geometric Transformation of the X-ray Microbeam Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/attia23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10775-b31b1b.svg)](https://arxiv.org/abs/2305.10775) |
+| 1729 | Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jouaiti23_interspeech.pdf) |
+| 283 | Improved Contextualized Speech Representations for Tonal Analysis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuan23_interspeech.pdf) |
+| 1738 | A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chandrasekar23_interspeech.pdf) <br /> [![idiap](https://img.shields.io/badge/idiap.ch.5064-FF6A00.svg)](https://publications.idiap.ch/index.php/publications/show/5064) |
+| 2229 | FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eren23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.seas.ucla.edu/spapl/paper/Eray_IS_2023.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2050,25 +2050,25 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 928 | Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/piton23_interspeech.pdf) |
-| 907 | Uncertainty Estimation for Connectionist Temporal Classification based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rumberg23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1678/2023-Rumberg-Uncertainty_Estimation_for_Connectionist_Temporal_Classification_Based_Speech_Recognition.pdf) |
-| 2185 | Speech Breathing Behavior During Pauses in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/charuau23_interspeech.pdf) |
-| 926 | Exploiting Diversity of Automatic Transcripts from Distinct Speech Recognition Techniques for Children's Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gebauer23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1679/gebauer_interspeech23_childspeechdiversity.pdf) |
-| 1924 | Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benway23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16085-b31b1b.svg)](https://arxiv.org/abs/2305.16085) |
-| 978 | BabySLM: Language-Acquisition-Friendly Benchmark of Self-Supervised Spoken Language Models | [![GitHub](https://img.shields.io/github/stars/MarvinLvn/BabySLM?style=flat)](https://github.com/MarvinLvn/BabySLM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lavechin23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01506-b31b1b.svg)](https://arxiv.org/abs/2306.01506) |
-| 702 | Data Augmentation for Children ASR and Child-adult Speaker Classification using Voice Conversion Methods | [![GitHub](https://img.shields.io/github/stars/zhao-shuyang/childrenize?style=flat)](https://github.com/zhao-shuyang/childrenize) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23c_interspeech.pdf) |
-| 2236 | Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shetty23_interspeech.pdf) |
-| 2251 | Automatically Predicting Perceived Conversation Quality in a Pediatric Sample Enriched for Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23u_interspeech.pdf) |
-| 1257 | An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/johnson23_interspeech.pdf) |
-| 743 | An Analysis of Goodness of Pronunciation for Child Speech | [![GitHub](https://img.shields.io/github/stars/frank613/GOPs?style=flat)](https://github.com/frank613/GOPs) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cao23_interspeech.pdf) |
-| 1569 | Measuring Language Development from Child-centered Recordings | [![GitHub](https://img.shields.io/github/stars/yaya-sy/EntropyBasedCLDMetrics?style=flat)](https://github.com/yaya-sy/EntropyBasedCLDMetrics) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sy23_interspeech.pdf) |
-| 2057 | Speaking Clearly, Understanding Better: Predicting the L2 Narrative Comprehension of Chinese Bilingual Kindergarten Children based on Speech Intelligibility using a Machine Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hung23_interspeech.pdf) |
-| 312 | Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benway23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16111-b31b1b.svg)](https://arxiv.org/abs/2305.16111) |
-| 1273 | Understanding Spoken Language Development of Children with ASD using Pre-trained Speech Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14117-b31b1b.svg)](https://arxiv.org/abs/2305.14117) |
-| 2099 | Measuring Phonological Precision in Children with Cleft Lip and Palate | [![GitHub](https://img.shields.io/github/stars/TAriasVergara/PhonoQ?style=flat)](https://github.com/TAriasVergara/PhonoQ) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ariasvergara23_interspeech.pdf) |
-| 937 | A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ng23_interspeech.pdf) |
-| 1873 | Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://clpclf.github.io/clp-clf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baumann23_interspeech.pdf)|
-| 1882 | Prospective Validation of Motor-based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benway23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19090-b31b1b.svg)](https://arxiv.org/abs/2305.19090) |
+| 928 | Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/piton23_interspeech.pdf) |
+| 907 | Uncertainty Estimation for Connectionist Temporal Classification based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rumberg23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1678/2023-Rumberg-Uncertainty_Estimation_for_Connectionist_Temporal_Classification_Based_Speech_Recognition.pdf) |
+| 2185 | Speech Breathing Behavior During Pauses in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/charuau23_interspeech.pdf) |
+| 926 | Exploiting Diversity of Automatic Transcripts from Distinct Speech Recognition Techniques for Children's Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gebauer23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1679/gebauer_interspeech23_childspeechdiversity.pdf) |
+| 1924 | Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benway23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16085-b31b1b.svg)](https://arxiv.org/abs/2305.16085) |
+| 978 | BabySLM: Language-Acquisition-Friendly Benchmark of Self-Supervised Spoken Language Models | [![GitHub](https://img.shields.io/github/stars/MarvinLvn/BabySLM?style=flat)](https://github.com/MarvinLvn/BabySLM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lavechin23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01506-b31b1b.svg)](https://arxiv.org/abs/2306.01506) |
+| 702 | Data Augmentation for Children ASR and Child-adult Speaker Classification using Voice Conversion Methods | [![GitHub](https://img.shields.io/github/stars/zhao-shuyang/childrenize?style=flat)](https://github.com/zhao-shuyang/childrenize) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23c_interspeech.pdf) |
+| 2236 | Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shetty23_interspeech.pdf) |
+| 2251 | Automatically Predicting Perceived Conversation Quality in a Pediatric Sample Enriched for Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23u_interspeech.pdf) |
+| 1257 | An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/johnson23_interspeech.pdf) |
+| 743 | An Analysis of Goodness of Pronunciation for Child Speech | [![GitHub](https://img.shields.io/github/stars/frank613/GOPs?style=flat)](https://github.com/frank613/GOPs) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cao23_interspeech.pdf) |
+| 1569 | Measuring Language Development from Child-centered Recordings | [![GitHub](https://img.shields.io/github/stars/yaya-sy/EntropyBasedCLDMetrics?style=flat)](https://github.com/yaya-sy/EntropyBasedCLDMetrics) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sy23_interspeech.pdf) |
+| 2057 | Speaking Clearly, Understanding Better: Predicting the L2 Narrative Comprehension of Chinese Bilingual Kindergarten Children based on Speech Intelligibility using a Machine Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hung23_interspeech.pdf) |
+| 312 | Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benway23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16111-b31b1b.svg)](https://arxiv.org/abs/2305.16111) |
+| 1273 | Understanding Spoken Language Development of Children with ASD using Pre-trained Speech Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23e_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.14117-b31b1b.svg)](https://arxiv.org/abs/2305.14117) |
+| 2099 | Measuring Phonological Precision in Children with Cleft Lip and Palate | [![GitHub](https://img.shields.io/github/stars/TAriasVergara/PhonoQ?style=flat)](https://github.com/TAriasVergara/PhonoQ) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ariasvergara23_interspeech.pdf) |
+| 937 | A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ng23_interspeech.pdf) |
+| 1873 | Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://clpclf.github.io/clp-clf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baumann23_interspeech.pdf)|
+| 1882 | Prospective Validation of Motor-based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benway23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19090-b31b1b.svg)](https://arxiv.org/abs/2305.19090) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2080,12 +2080,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2238 | Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.10915-b31b1b.svg)](https://arxiv.org/abs/2301.10915) |
-| 2525 | An Autoregressive Conversational Dynamics Model for Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mcneill23_interspeech.pdf) |
-| 1983 | Style-Transfer based Speech and Audio-Visual Scene Understanding for Robot Action Sequence Acquisition from Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hori23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15644-b31b1b.svg)](https://arxiv.org/abs/2306.15644) |
-| 1037 | Speech aware Dialog System Technology Challenge (DSTC11) | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://dstc11.dstc.community/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/soltau23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.08704-b31b1b.svg)](https://arxiv.org/abs/2212.08704) |
-| 1397 | Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision | [![GitHub](https://img.shields.io/github/stars/thu-spmi/JSA-KRTOD?style=flat)](https://github.com/thu-spmi/JSA-KRTOD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cai23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13199-b31b1b.svg)](https://arxiv.org/abs/2305.13199) |
-| 2513 | Tracking Must Go On: Dialogue State Tracking with Verified Self-Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23k_interspeech.pdf) |
+| 2238 | Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23g_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2301.10915-b31b1b.svg)](https://arxiv.org/abs/2301.10915) |
+| 2525 | An Autoregressive Conversational Dynamics Model for Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mcneill23_interspeech.pdf) |
+| 1983 | Style-Transfer based Speech and Audio-Visual Scene Understanding for Robot Action Sequence Acquisition from Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hori23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15644-b31b1b.svg)](https://arxiv.org/abs/2306.15644) |
+| 1037 | Speech aware Dialog System Technology Challenge (DSTC11) | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://dstc11.dstc.community/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/soltau23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.08704-b31b1b.svg)](https://arxiv.org/abs/2212.08704) |
+| 1397 | Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision | [![GitHub](https://img.shields.io/github/stars/thu-spmi/JSA-KRTOD?style=flat)](https://github.com/thu-spmi/JSA-KRTOD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cai23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13199-b31b1b.svg)](https://arxiv.org/abs/2305.13199) |
+| 2513 | Tracking Must Go On: Dialogue State Tracking with Verified Self-Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23k_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2097,12 +2097,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 558 | GL-SSD: Global and Local Speech Style Disentanglement by Vector Quantization for Robust Sentence Boundary Detection in Speech Stream | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23i_interspeech.pdf) |
-| 598 | Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12450-b31b1b.svg)](https://arxiv.org/abs/2305.12450) |
-| 2466 | Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gudepu23_interspeech.pdf)|
-| 996 | Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/moussa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.05641-b31b1b.svg)](https://arxiv.org/abs/2307.05641) |
-| 716 | Real-Time Causal Spectro-Temporal Voice Activity Detection based on Convolutional Encoding and Residual Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23k_interspeech.pdf) |
-| 2413 | SVVAD: Personal Voice Activity Detection for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19581-b31b1b.svg)](https://arxiv.org/abs/2305.19581) |
+| 558 | GL-SSD: Global and Local Speech Style Disentanglement by Vector Quantization for Robust Sentence Boundary Detection in Speech Stream | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23i_interspeech.pdf) |
+| 598 | Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.12450-b31b1b.svg)](https://arxiv.org/abs/2305.12450) |
+| 2466 | Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gudepu23_interspeech.pdf)|
+| 996 | Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/moussa23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.05641-b31b1b.svg)](https://arxiv.org/abs/2307.05641) |
+| 716 | Real-Time Causal Spectro-Temporal Voice Activity Detection based on Convolutional Encoding and Residual Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23k_interspeech.pdf) |
+| 2413 | SVVAD: Personal Voice Activity Detection for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kang23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19581-b31b1b.svg)](https://arxiv.org/abs/2305.19581) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2114,12 +2114,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1613 | Learning Cross-Lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/farooq23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08577-b31b1b.svg)](https://arxiv.org/abs/2306.08577) |
-| 2122 | AfriNames: Most ASR models "butcher" African Names | [![Hugging Face](https://img.shields.io/badge/🤗-tobiolatunji-FFD21F.svg)](https://huggingface.co/datasets/tobiolatunji/afrispeech-200) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/olatunji23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00253-b31b1b.svg)](https://arxiv.org/abs/2306.00253) |
-| 2528 | Towards Dialect-Inclusive Recognition in a Low-Resource Language: are Balanced Corpora the Answer? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lonergan23_interspeech.pdf) |
-| 2588 | Svarah: Evaluating English ASR Systems on Indian Accents | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/Svarah?style=flat)](https://github.com/AI4Bharat/Svarah) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/javed23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15760-b31b1b.svg)](https://arxiv.org/abs/2305.15760) |
-| 1044 | N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/talafha23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02902-b31b1b.svg)](https://arxiv.org/abs/2306.02902) |
-| 1014 | The MALACH Corpus: Results with End-to-End Architectures and Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/picheny23_interspeech.pdf) |
+| 1613 | Learning Cross-Lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/farooq23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.08577-b31b1b.svg)](https://arxiv.org/abs/2306.08577) |
+| 2122 | AfriNames: Most ASR models "butcher" African Names | [![Hugging Face](https://img.shields.io/badge/🤗-tobiolatunji-FFD21F.svg)](https://huggingface.co/datasets/tobiolatunji/afrispeech-200) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/olatunji23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00253-b31b1b.svg)](https://arxiv.org/abs/2306.00253) |
+| 2528 | Towards Dialect-Inclusive Recognition in a Low-Resource Language: are Balanced Corpora the Answer? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lonergan23_interspeech.pdf) |
+| 2588 | Svarah: Evaluating English ASR Systems on Indian Accents | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/Svarah?style=flat)](https://github.com/AI4Bharat/Svarah) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/javed23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15760-b31b1b.svg)](https://arxiv.org/abs/2305.15760) |
+| 1044 | N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/talafha23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02902-b31b1b.svg)](https://arxiv.org/abs/2306.02902) |
+| 1014 | The MALACH Corpus: Results with End-to-End Architectures and Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/picheny23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2131,12 +2131,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 232 | Unsupervised Speech Enhancement with Deep Dynamical Generative Speech and Noise Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07820-b31b1b.svg)](https://arxiv.org/abs/2306.07820) |
-| 857 | Noise-Robust Bandwidth Expansion for 8K Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23f_interspeech.pdf) |
-| 113 | mdctGAN: Taming Transformer-based GAN for Speech Super-Resolution with Modified DCT Spectra | [![GitHub](https://img.shields.io/github/stars/neoncloud/mdctgan?style=flat)](https://github.com/neoncloud/mdctgan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shuai23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11104-b31b1b.svg)](https://arxiv.org/abs/2305.11104) |
-| 625 | Zoneformer: On-Device Neural Beamformer for In-Car Multi-Zone Speech Separation, Enhancement and echo Cancellation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongxuustc.github.io/zf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23b_interspeech.pdf) |
-| 634 | Low-Complexity Broadband Beampattern Synthesis using Array Response Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23c_interspeech.pdf) |
-| 904 | A GAN Speech Inpainting Model for Audio Editing Software | [![GitHub](https://img.shields.io/github/stars/HXZhao1/GSIM?style=flat)](https://github.com/HXZhao1/GSIM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23d_interspeech.pdf) |
+| 232 | Unsupervised Speech Enhancement with Deep Dynamical Generative Speech and Noise Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07820-b31b1b.svg)](https://arxiv.org/abs/2306.07820) |
+| 857 | Noise-Robust Bandwidth Expansion for 8K Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23f_interspeech.pdf) |
+| 113 | mdctGAN: Taming Transformer-based GAN for Speech Super-Resolution with Modified DCT Spectra | [![GitHub](https://img.shields.io/github/stars/neoncloud/mdctgan?style=flat)](https://github.com/neoncloud/mdctgan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shuai23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.11104-b31b1b.svg)](https://arxiv.org/abs/2305.11104) |
+| 625 | Zoneformer: On-Device Neural Beamformer for In-Car Multi-Zone Speech Separation, Enhancement and echo Cancellation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongxuustc.github.io/zf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23b_interspeech.pdf) |
+| 634 | Low-Complexity Broadband Beampattern Synthesis using Array Response Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23c_interspeech.pdf) |
+| 904 | A GAN Speech Inpainting Model for Audio Editing Software | [![GitHub](https://img.shields.io/github/stars/HXZhao1/GSIM?style=flat)](https://github.com/HXZhao1/GSIM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23d_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2148,10 +2148,10 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2316 | Deep Speech Synthesis from MRI-based Articulatory Representations | [![GitHub](https://img.shields.io/github/stars/articulatory/articulatory?style=flat)](https://github.com/articulatory/articulatory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.02471-b31b1b.svg)](https://arxiv.org/abs/2307.02471) |
-| 562 | Learning to Compute the Articulatory Representations of Speech with the MIRRORNET | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yashish92.github.io/MirrorNet-for-speech/) <br /> [![GitHub](https://img.shields.io/github/stars/Yashish92/MirrorNet-for-speech?style=flat)](https://github.com/Yashish92/MirrorNet-for-speech)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/siriwardena23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.16454-b31b1b.svg)](https://arxiv.org/abs/2210.16454) |
-| 804 | Generating High-Resolution 3D Real-Time MRI of the Vocal Tract | [![GitHub](https://img.shields.io/github/stars/tonioser/supplementary-material-Interspeech2023-paper804?style=flat)](https://github.com/tonioser/supplementary-material-Interspeech2023-paper804) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/strauch23_interspeech.pdf) |
-| 1593 | Exploring a Classification Approach using Quantised Articulatory Movements for Acoustic to Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bandekar23_interspeech.pdf) |
+| 2316 | Deep Speech Synthesis from MRI-based Articulatory Representations | [![GitHub](https://img.shields.io/github/stars/articulatory/articulatory?style=flat)](https://github.com/articulatory/articulatory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.02471-b31b1b.svg)](https://arxiv.org/abs/2307.02471) |
+| 562 | Learning to Compute the Articulatory Representations of Speech with the MIRRORNET | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yashish92.github.io/MirrorNet-for-speech/) <br /> [![GitHub](https://img.shields.io/github/stars/Yashish92/MirrorNet-for-speech?style=flat)](https://github.com/Yashish92/MirrorNet-for-speech)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/siriwardena23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2210.16454-b31b1b.svg)](https://arxiv.org/abs/2210.16454) |
+| 804 | Generating High-Resolution 3D Real-Time MRI of the Vocal Tract | [![GitHub](https://img.shields.io/github/stars/tonioser/supplementary-material-Interspeech2023-paper804?style=flat)](https://github.com/tonioser/supplementary-material-Interspeech2023-paper804) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/strauch23_interspeech.pdf) |
+| 1593 | Exploring a Classification Approach using Quantised Articulatory Movements for Acoustic to Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bandekar23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2163,15 +2163,15 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 633 | Coherence Estimation Tracks Auditory Attention in Listeners with Hearing Impairment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/keding23_interspeech.pdf) |
-| 2378 | Enhancing the EEG Speech Match Mismatch Tasks with Word Boundaries | [![GitHub](https://img.shields.io/github/stars/iiscleap/EEGspeech-MatchMismatch?style=flat)](https://github.com/iiscleap/EEGspeech-MatchMismatch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/soman23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00366-b31b1b.svg)](https://arxiv.org/abs/2307.00366) |
-| 1347 | Similar Hierarchical Representation of Speech and Other Complex Sounds in the Brain and Deep Residual Networks: an MEG Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23e_interspeech.pdf) |
-| 121 | Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oota23_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04131475) |
-| 282 | MEG Encoding using Word Context Semantics in Listening Stories | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oota23b_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04148324) |
-| 1949 | Investigating the Cortical Tracking of Speech and Music with Sung Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cantisani23_interspeech.pdf) |
-| 414 | Exploring Auditory Attention Decoding using Speaker Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qiu23_interspeech.pdf)|
-| 1776 | Effects of Spectral Degradation on the Cortical Tracking of the Speech Envelope | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/macintyre23_interspeech.pdf) |
-| 964 | Effects of Spectral and Temporal Modulation Degradation on Intelligibility and Cortical Tracking of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/calderondepalma23_interspeech.pdf) |
+| 633 | Coherence Estimation Tracks Auditory Attention in Listeners with Hearing Impairment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/keding23_interspeech.pdf) |
+| 2378 | Enhancing the EEG Speech Match Mismatch Tasks with Word Boundaries | [![GitHub](https://img.shields.io/github/stars/iiscleap/EEGspeech-MatchMismatch?style=flat)](https://github.com/iiscleap/EEGspeech-MatchMismatch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/soman23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.00366-b31b1b.svg)](https://arxiv.org/abs/2307.00366) |
+| 1347 | Similar Hierarchical Representation of Speech and Other Complex Sounds in the Brain and Deep Residual Networks: an MEG Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23e_interspeech.pdf) |
+| 121 | Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oota23_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04131475) |
+| 282 | MEG Encoding using Word Context Semantics in Listening Stories | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oota23b_interspeech.pdf) <br /> [![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04148324) |
+| 1949 | Investigating the Cortical Tracking of Speech and Music with Sung Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cantisani23_interspeech.pdf) |
+| 414 | Exploring Auditory Attention Decoding using Speaker Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qiu23_interspeech.pdf)|
+| 1776 | Effects of Spectral Degradation on the Cortical Tracking of the Speech Envelope | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/macintyre23_interspeech.pdf) |
+| 964 | Effects of Spectral and Temporal Modulation Degradation on Intelligibility and Cortical Tracking of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/calderondepalma23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2183,12 +2183,12 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2061 | Transfer Learning for Personality Perception via Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23da_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16076-b31b1b.svg)](https://arxiv.org/abs/2305.16076) |
-| 1131 | A Stimulus-Organism-Response Model of Willingness to Buy from Advertising Speech using Voice Quality | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023-SOR-VQ/)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nagano23_interspeech.pdf) |
-| 1835 | Voice Passing: A Non-Binary Voice Gender Prediction System for evaluating Transgender | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/doukhan23_interspeech.pdf) |
-| 1139 | Influence of Personal Traits on Impressions of One's Own Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yanagida23_interspeech.pdf) |
-| 887 | Pardon my Disfluency: The Impact of Disfluency Effects on the Perception of Speaker Competence and Confidence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kirkland23_interspeech.pdf) |
-| 711 | Cross-Linguistic Emotion Perception in Human and TTS Voices | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://michelledcohn.com/2023/05/19/interspeech-2023-paper-on-cross-cultural-emotion-perception/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gessinger23_interspeech.pdf)|
+| 2061 | Transfer Learning for Personality Perception via Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23da_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16076-b31b1b.svg)](https://arxiv.org/abs/2305.16076) |
+| 1131 | A Stimulus-Organism-Response Model of Willingness to Buy from Advertising Speech using Voice Quality | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023-SOR-VQ/)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nagano23_interspeech.pdf) |
+| 1835 | Voice Passing: A Non-Binary Voice Gender Prediction System for evaluating Transgender | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/doukhan23_interspeech.pdf) |
+| 1139 | Influence of Personal Traits on Impressions of One's Own Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yanagida23_interspeech.pdf) |
+| 887 | Pardon my Disfluency: The Impact of Disfluency Effects on the Perception of Speaker Competence and Confidence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kirkland23_interspeech.pdf) |
+| 711 | Cross-Linguistic Emotion Perception in Human and TTS Voices | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://michelledcohn.com/2023/05/19/interspeech-2023-paper-on-cross-cultural-emotion-perception/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gessinger23_interspeech.pdf)|
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2200,10 +2200,10 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 1302 | Joint Learning Feature and Model Adaptation for Unsupervised Acoustic Modelling of Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/duan23_interspeech.pdf) |
-| 1681 | Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics | [![GitHub](https://img.shields.io/github/stars/bomolenaar/jasmin_data_prep?style=flat)](https://github.com/bomolenaar/jasmin_data_prep) <br /> [![GitHub](https://img.shields.io/github/stars/cristiantg/kaldi_egs_CGN?style=flat)](https://github.com/cristiantg/kaldi_egs_CGN/tree/onPonyLand) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/molenaar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03444-b31b1b.svg)](https://arxiv.org/abs/2306.03444) |
-| 2084 | An ASR-enabled Reading Tutor: Investigating Feedback to Optimize Interaction for Learning to Read | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bai23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://aichildinteraction.github.io/preprint/AIAIC23_paper_7671.pdf) |
-| 935 | Adaptation of Whisper Models to Child Speech Recognition | [![GitHub](https://img.shields.io/github/stars/C3Imaging/whisper_child_asr?style=flat)](https://github.com/C3Imaging/whisper_child_asr) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-rishabhjain16-FFD21F.svg)](https://huggingface.co/rishabhjain16) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jain23_interspeech.pdf) |
+| 1302 | Joint Learning Feature and Model Adaptation for Unsupervised Acoustic Modelling of Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/duan23_interspeech.pdf) |
+| 1681 | Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics | [![GitHub](https://img.shields.io/github/stars/bomolenaar/jasmin_data_prep?style=flat)](https://github.com/bomolenaar/jasmin_data_prep) <br /> [![GitHub](https://img.shields.io/github/stars/cristiantg/kaldi_egs_CGN?style=flat)](https://github.com/cristiantg/kaldi_egs_CGN/tree/onPonyLand) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/molenaar23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.03444-b31b1b.svg)](https://arxiv.org/abs/2306.03444) |
+| 2084 | An ASR-enabled Reading Tutor: Investigating Feedback to Optimize Interaction for Learning to Read | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bai23_interspeech.pdf) <br /> [![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://aichildinteraction.github.io/preprint/AIAIC23_paper_7671.pdf) |
+| 935 | Adaptation of Whisper Models to Child Speech Recognition | [![GitHub](https://img.shields.io/github/stars/C3Imaging/whisper_child_asr?style=flat)](https://github.com/C3Imaging/whisper_child_asr) <br /> [![Hugging Face](https://img.shields.io/badge/🤗-rishabhjain16-FFD21F.svg)](https://huggingface.co/rishabhjain16) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jain23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2215,28 +2215,28 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2064 | Automatic Evaluation of Turn-Taking Cues in Conversational Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://erikekstedt.github.io/vap_tts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ekstedt23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17971-b31b1b.svg)](https://arxiv.org/abs/2305.17971) |
-| 441 | Expressive Machine Dubbing through Phrase-Level Cross-Lingual Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/swiatkowski23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11662-b31b1b.svg)](https://arxiv.org/abs/2306.11662) |
-| 1691 | Robust Feature Decoupling in Voice Conversion by using Locality-based Instance Normalization | [![GitHub](https://img.shields.io/github/stars/BrightGu/LoINVC?style=flat)](https://github.com/BrightGu/LoINVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gu23b_interspeech.pdf) |
-| 612 | Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jia23_interspeech.pdf) |
-| 2148 | The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://phat-do.github.io/nodict-IS23/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00535-b31b1b.svg)](https://arxiv.org/abs/2306.00535) |
-| 1727 | GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bytecong.github.io/GenerTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cong23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15304-b31b1b.svg)](https://arxiv.org/abs/2306.15304) |
-| 1285 | Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech based on Tail Probabilities | [![GitHub](https://img.shields.io/github/stars/todalab/mos-analysis-interspeech2023?style=flat)](https://github.com/todalab/mos-analysis-interspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yasuda23_interspeech.pdf) |
-| 1584 | LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://google.github.io/df-conformer/) <br /> [![Openslr](https://img.shields.io/badge/OpenSLR-dataset-FFD1BF.svg)](http://www.openslr.org/141/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/koizumi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18802-b31b1b.svg)](https://arxiv.org/abs/2305.18802) |
-| 1067 | UniFLG: Unified Facial Landmark Generator from Text or Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rinnakk.github.io/research/publications/UniFLG/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mitsui23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14337-b31b1b.svg)](https://arxiv.org/abs/2302.14337) |
-| 444 | XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech | [![GitHub](https://img.shields.io/github/stars/VinAIResearch/XPhoneBERT?style=flat)](https://github.com/VinAIResearch/XPhoneBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/thenguyen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19709-b31b1b.svg)](https://arxiv.org/abs/2305.19709) |
-| 2224 | ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | [![ClArTTS](https://img.shields.io/badge/ClArTTS-dataset-CBB2FF.svg)](https://www.clartts.com) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kulkarni23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00069-b31b1b.svg)](https://arxiv.org/abs/2303.00069) |
-| 154 | Diffusion-based Accent Modelling in Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deja23_interspeech.pdf) |
-| 249 | Multilingual Text-to-Speech Synthesis for Turkic Languages using Transliteration | [![GitHub](https://img.shields.io/github/stars/IS2AI/TurkicTTS?style=flat)](https://github.com/IS2AI/TurkicTTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yeshpanov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15749-b31b1b.svg)](https://arxiv.org/abs/2305.15749) |
-| 553 | CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation | [![GitHub](https://img.shields.io/github/stars/NewZsh/polyphone?style=flat)](https://github.com/NewZsh/polyphone) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23h_interspeech.pdf) |
-| 709 | Improve Bilingual TTS using Language and Phonology Embedding with Embedding Strength Modulator | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://fyyang1996.github.io/esm/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.03435-b31b1b.svg)](https://arxiv.org/abs/2212.03435) |
-| 2179 | High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ranacm.github.io/DSU-AVO/) <br /> [![GitHub](https://img.shields.io/github/stars/RanaCM/DSU-AVO?style=flat)](https://github.com/RanaCM/DSU-AVO) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.17005-b31b1b.svg)](https://arxiv.org/abs/2306.17005) |
-| 1097 | PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23_interspeech.pdf) |
-| 2158 | Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](phat-do.github.io/sigul22) |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23d_interspeech.pdf) <br />[![arXiv](https://img.shields.io/badge/arXiv-2305.19396-b31b1b.svg)](https://arxiv.org/abs/2305.19396) |
-| 416 | Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously | [![GitHub](https://img.shields.io/github/stars/d223302/SubjectiveEvaluation?style=flat)](https://github.com/d223302/SubjectiveEvaluation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chiang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02044-b31b1b.svg)](https://arxiv.org/abs/2306.02044) |
-| 1622 | Speaker-Independent Neural Formant Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://perezpoz.github.io/neuralformants) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/perezzarazaga23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01957-b31b1b.svg)](https://arxiv.org/abs/2306.01957) |
-| 1098 | CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sython.org/Corpus/STUDIES-2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/saito23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13713-b31b1b.svg)](https://arxiv.org/abs/2305.13713) |
-| 430 | SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous19283746.github.io/saspeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sharoni23_interspeech.pdf) |
+| 2064 | Automatic Evaluation of Turn-Taking Cues in Conversational Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://erikekstedt.github.io/vap_tts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ekstedt23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.17971-b31b1b.svg)](https://arxiv.org/abs/2305.17971) |
+| 441 | Expressive Machine Dubbing through Phrase-Level Cross-Lingual Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/swiatkowski23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.11662-b31b1b.svg)](https://arxiv.org/abs/2306.11662) |
+| 1691 | Robust Feature Decoupling in Voice Conversion by using Locality-based Instance Normalization | [![GitHub](https://img.shields.io/github/stars/BrightGu/LoINVC?style=flat)](https://github.com/BrightGu/LoINVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gu23b_interspeech.pdf) |
+| 612 | Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jia23_interspeech.pdf) |
+| 2148 | The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://phat-do.github.io/nodict-IS23/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23c_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.00535-b31b1b.svg)](https://arxiv.org/abs/2306.00535) |
+| 1727 | GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bytecong.github.io/GenerTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cong23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.15304-b31b1b.svg)](https://arxiv.org/abs/2306.15304) |
+| 1285 | Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech based on Tail Probabilities | [![GitHub](https://img.shields.io/github/stars/todalab/mos-analysis-interspeech2023?style=flat)](https://github.com/todalab/mos-analysis-interspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yasuda23_interspeech.pdf) |
+| 1584 | LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://google.github.io/df-conformer/) <br /> [![Openslr](https://img.shields.io/badge/OpenSLR-dataset-FFD1BF.svg)](http://www.openslr.org/141/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/koizumi23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.18802-b31b1b.svg)](https://arxiv.org/abs/2305.18802) |
+| 1067 | UniFLG: Unified Facial Landmark Generator from Text or Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rinnakk.github.io/research/publications/UniFLG/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mitsui23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2302.14337-b31b1b.svg)](https://arxiv.org/abs/2302.14337) |
+| 444 | XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech | [![GitHub](https://img.shields.io/github/stars/VinAIResearch/XPhoneBERT?style=flat)](https://github.com/VinAIResearch/XPhoneBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/thenguyen23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.19709-b31b1b.svg)](https://arxiv.org/abs/2305.19709) |
+| 2224 | ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | [![ClArTTS](https://img.shields.io/badge/ClArTTS-dataset-CBB2FF.svg)](https://www.clartts.com) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kulkarni23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2303.00069-b31b1b.svg)](https://arxiv.org/abs/2303.00069) |
+| 154 | Diffusion-based Accent Modelling in Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deja23_interspeech.pdf) |
+| 249 | Multilingual Text-to-Speech Synthesis for Turkic Languages using Transliteration | [![GitHub](https://img.shields.io/github/stars/IS2AI/TurkicTTS?style=flat)](https://github.com/IS2AI/TurkicTTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yeshpanov23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.15749-b31b1b.svg)](https://arxiv.org/abs/2305.15749) |
+| 553 | CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation | [![GitHub](https://img.shields.io/github/stars/NewZsh/polyphone?style=flat)](https://github.com/NewZsh/polyphone) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23h_interspeech.pdf) |
+| 709 | Improve Bilingual TTS using Language and Phonology Embedding with Embedding Strength Modulator | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://fyyang1996.github.io/esm/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23k_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2212.03435-b31b1b.svg)](https://arxiv.org/abs/2212.03435) |
+| 2179 | High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ranacm.github.io/DSU-AVO/) <br /> [![GitHub](https://img.shields.io/github/stars/RanaCM/DSU-AVO?style=flat)](https://github.com/RanaCM/DSU-AVO) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23f_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.17005-b31b1b.svg)](https://arxiv.org/abs/2306.17005) |
+| 1097 | PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23_interspeech.pdf) |
+| 2158 | Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](phat-do.github.io/sigul22) |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23d_interspeech.pdf) <br />[![arXiv](https://img.shields.io/badge/arXiv-2305.19396-b31b1b.svg)](https://arxiv.org/abs/2305.19396) |
+| 416 | Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously | [![GitHub](https://img.shields.io/github/stars/d223302/SubjectiveEvaluation?style=flat)](https://github.com/d223302/SubjectiveEvaluation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chiang23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.02044-b31b1b.svg)](https://arxiv.org/abs/2306.02044) |
+| 1622 | Speaker-Independent Neural Formant Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://perezpoz.github.io/neuralformants) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/perezzarazaga23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01957-b31b1b.svg)](https://arxiv.org/abs/2306.01957) |
+| 1098 | CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sython.org/Corpus/STUDIES-2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/saito23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.13713-b31b1b.svg)](https://arxiv.org/abs/2305.13713) |
+| 430 | SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous19283746.github.io/saspeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sharoni23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2248,18 +2248,18 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2618 | A Personalised Speech Communication Application for Dysarthric Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gibson23b_interspeech.pdf) |
-| 2624 | Video Multimodal Emotion Recognition System for Real World Applications | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23l_interspeech.pdf) |
-| 2626 | Promoting Mental Self-Disclosure in a Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rohmatillah23_interspeech.pdf) |
-| 2632 | "Select Language, Modality or Put on a Mask!" Experiments with Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bujnowski23_interspeech.pdf) |
-| 2635 | My Vowels Matter: Formant Automation Tools for Diverse Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/valentine23_interspeech.pdf) |
-| 2636 | NEMA: An Ecologically Valid Tool for Assessing Hearing Devices, Advanced Algorithms, and Communication in Diverse Listening Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chongwhite23_interspeech.pdf) |
-| 2644 | When Words Speak Just as Loudly as Actions: Virtual Agent based Remote Health Assessment Integrating What Patients Say with What They Do | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ramanarayanan23_interspeech.pdf) <br />[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1wxkBg7fqSi0yV6uLjNO4FyhT3cEKoDhF/view) |
-| 2648 | Stuttering Detection Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/motepalli23_interspeech.pdf)|
-| 2649 | Providing Interpretable Insights for Neurological Speech and Cognitive Disorders from Interactive Serious Games | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zusag23b_interspeech.pdf) |
-| 2651 | Automated Neural Nursing Assistant (ANNA): An Over-the-Phone System for Cognitive Monitoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/solinsky23_interspeech.pdf) |
-| 2656 | 5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://cogmhear.org/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gupta23b_interspeech.pdf) |
-| 2671 | Towards Two-Point Neuron-Inspired Energy-Efficient Multimodal Open Master Hearing aid | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/raza23_interspeech.pdf) |
+| 2618 | A Personalised Speech Communication Application for Dysarthric Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gibson23b_interspeech.pdf) |
+| 2624 | Video Multimodal Emotion Recognition System for Real World Applications | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23l_interspeech.pdf) |
+| 2626 | Promoting Mental Self-Disclosure in a Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rohmatillah23_interspeech.pdf) |
+| 2632 | "Select Language, Modality or Put on a Mask!" Experiments with Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bujnowski23_interspeech.pdf) |
+| 2635 | My Vowels Matter: Formant Automation Tools for Diverse Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/valentine23_interspeech.pdf) |
+| 2636 | NEMA: An Ecologically Valid Tool for Assessing Hearing Devices, Advanced Algorithms, and Communication in Diverse Listening Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chongwhite23_interspeech.pdf) |
+| 2644 | When Words Speak Just as Loudly as Actions: Virtual Agent based Remote Health Assessment Integrating What Patients Say with What They Do | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ramanarayanan23_interspeech.pdf) <br />[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1wxkBg7fqSi0yV6uLjNO4FyhT3cEKoDhF/view) |
+| 2648 | Stuttering Detection Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/motepalli23_interspeech.pdf)|
+| 2649 | Providing Interpretable Insights for Neurological Speech and Cognitive Disorders from Interactive Serious Games | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zusag23b_interspeech.pdf) |
+| 2651 | Automated Neural Nursing Assistant (ANNA): An Over-the-Phone System for Cognitive Monitoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/solinsky23_interspeech.pdf) |
+| 2656 | 5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://cogmhear.org/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gupta23b_interspeech.pdf) |
+| 2671 | Towards Two-Point Neuron-Inspired Energy-Efficient Multimodal Open Master Hearing aid | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/raza23_interspeech.pdf) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2271,16 +2271,16 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2614 | DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/Rikorose/DeepFilterNet?style=flat)](https://github.com/Rikorose/DeepFilterNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schroter23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.08227-b31b1b.svg)](https://arxiv.org/abs/2305.08227) |
-| 2615 | Nkululeko: Machine Learning Experiments on Speaker Characteristics without Programming | [![GitHub](https://img.shields.io/github/stars/felixbur/nkululeko?style=flat)](https://github.com/felixbur/nkululeko) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/burkhardt23_interspeech.pdf) |
-| 2625 | Sp1NY: A Quick and Flexible Python Speech Visualization Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lemaguer23_interspeech.pdf) |
-| 2629 | Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/corkey23_interspeech.pdf) |
-| 2634 | So-to-Speak: an Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS | [![GitHub](https://img.shields.io/github/stars/evaszekely/So_To_Speak?style=flat)](https://github.com/evaszekely/So_To_Speak) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/szekely23b_interspeech.pdf) |
-| 2638 | Comparing /b/ and /d/ with a Single Physical Model of the Human Vocal Tract to Visualize Droplets Produced while Speaking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arai23_interspeech.pdf) |
-| 2640 | Show & Tell: Voice Activity Projection and Turn-taking | [![GitHub](https://img.shields.io/github/stars/ErikEkstedt/VoiceActivityProjection?style=flat)](https://github.com/ErikEkstedt/VoiceActivityProjection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ekstedt23b_interspeech.pdf)|
-| 2652 | Real-Time Detection of Soft Voice for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cordourier23_interspeech.pdf) |
-| 2655 | Data Augmentation for Diverse Voice Conversion in Noisy Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tanna23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10684-b31b1b.svg)](https://arxiv.org/abs/2305.10684) |
-| 2667 | Application for Real-Time Audio-Visual Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gogate23_interspeech.pdf)|
+| 2614 | DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/Rikorose/DeepFilterNet?style=flat)](https://github.com/Rikorose/DeepFilterNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schroter23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.08227-b31b1b.svg)](https://arxiv.org/abs/2305.08227) |
+| 2615 | Nkululeko: Machine Learning Experiments on Speaker Characteristics without Programming | [![GitHub](https://img.shields.io/github/stars/felixbur/nkululeko?style=flat)](https://github.com/felixbur/nkululeko) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/burkhardt23_interspeech.pdf) |
+| 2625 | Sp1NY: A Quick and Flexible Python Speech Visualization Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lemaguer23_interspeech.pdf) |
+| 2629 | Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/corkey23_interspeech.pdf) |
+| 2634 | So-to-Speak: an Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS | [![GitHub](https://img.shields.io/github/stars/evaszekely/So_To_Speak?style=flat)](https://github.com/evaszekely/So_To_Speak) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/szekely23b_interspeech.pdf) |
+| 2638 | Comparing /b/ and /d/ with a Single Physical Model of the Human Vocal Tract to Visualize Droplets Produced while Speaking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arai23_interspeech.pdf) |
+| 2640 | Show & Tell: Voice Activity Projection and Turn-taking | [![GitHub](https://img.shields.io/github/stars/ErikEkstedt/VoiceActivityProjection?style=flat)](https://github.com/ErikEkstedt/VoiceActivityProjection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ekstedt23b_interspeech.pdf)|
+| 2652 | Real-Time Detection of Soft Voice for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cordourier23_interspeech.pdf) |
+| 2655 | Data Augmentation for Diverse Voice Conversion in Noisy Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tanna23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.10684-b31b1b.svg)](https://arxiv.org/abs/2305.10684) |
+| 2667 | Application for Real-Time Audio-Visual Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gogate23_interspeech.pdf)|
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2292,17 +2292,17 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2623 | A Unified Framework to Improve Learners' Skills of Perception and Production based on Speech Shadowing and Overlapping | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/minematsu23_interspeech.pdf) |
-| 2633 | Speak & Improve: L2 English Speaking Practice Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nicholls23_interspeech.pdf) |
-| 2641 | Measuring Prosody in Child Speech using SoapBox Fluency API | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nicolao23_interspeech.pdf) |
-| 2650 | Teaching Non-native Sound Contrasts using Visual Biofeedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nissen23_interspeech.pdf) |
-| 2654 | Large-Scale Automatic Audiobook Creation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/walsh23_interspeech.pdf) |
-| 2658 | QVoice: Arabic Speech Pronunciation Learning Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elkheir23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.07445-b31b1b.svg)](https://arxiv.org/abs/2305.07445) |
-| 2659 | Asking Questions: an Innovative Way to Interact with Oral History Archives | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/svec23_interspeech.pdf) |
-| 2660 | DisfluencyFixer: A Tool to Enhance Language Learning through Speech to Speech Disfluency Correction | [![React](https://img.shields.io/badge/react-%2320232a.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB)](https://www.cfilt.iitb.ac.in/speech2text/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhat23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16957-b31b1b.svg)](https://arxiv.org/abs/2305.16957) |
-| 2661 | Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/prakash23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.01338-b31b1b.svg)](https://arxiv.org/abs/2211.01338) |
-| 2668 | MyVoice: Arabic Speech Resource Collaboration Platform | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elshahawy23_interspeech.pdf)|
-| 2669 | Personal Primer Prototype 1: Invitation to Make Your Own Embooked Speech-based Educational Artifact | [![GitHub](https://img.shields.io/github/stars/hromi/lesen-mikroserver?style=flat)](https://github.com/hromi/lesen-mikroserver) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hromada23_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371491906_Personal_Primer_Prototype_1_Invitation_to_Make_Your_Own_Embooked_Speech-Based_Educational_Artifact) |
+| 2623 | A Unified Framework to Improve Learners' Skills of Perception and Production based on Speech Shadowing and Overlapping | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/minematsu23_interspeech.pdf) |
+| 2633 | Speak & Improve: L2 English Speaking Practice Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nicholls23_interspeech.pdf) |
+| 2641 | Measuring Prosody in Child Speech using SoapBox Fluency API | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nicolao23_interspeech.pdf) |
+| 2650 | Teaching Non-native Sound Contrasts using Visual Biofeedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nissen23_interspeech.pdf) |
+| 2654 | Large-Scale Automatic Audiobook Creation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/walsh23_interspeech.pdf) |
+| 2658 | QVoice: Arabic Speech Pronunciation Learning Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elkheir23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.07445-b31b1b.svg)](https://arxiv.org/abs/2305.07445) |
+| 2659 | Asking Questions: an Innovative Way to Interact with Oral History Archives | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/svec23_interspeech.pdf) |
+| 2660 | DisfluencyFixer: A Tool to Enhance Language Learning through Speech to Speech Disfluency Correction | [![React](https://img.shields.io/badge/react-%2320232a.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB)](https://www.cfilt.iitb.ac.in/speech2text/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhat23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2305.16957-b31b1b.svg)](https://arxiv.org/abs/2305.16957) |
+| 2661 | Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/prakash23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2211.01338-b31b1b.svg)](https://arxiv.org/abs/2211.01338) |
+| 2668 | MyVoice: Arabic Speech Resource Collaboration Platform | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elshahawy23_interspeech.pdf)|
+| 2669 | Personal Primer Prototype 1: Invitation to Make Your Own Embooked Speech-based Educational Artifact | [![GitHub](https://img.shields.io/github/stars/hromi/lesen-mikroserver?style=flat)](https://github.com/hromi/lesen-mikroserver) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hromada23_interspeech.pdf) <br /> [![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371491906_Personal_Primer_Prototype_1_Invitation_to_Make_Your_Own_Embooked_Speech-Based_Educational_Artifact) |
 
 <a href="#sections">
   <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/top.svg" alt="" />
@@ -2314,18 +2314,18 @@ Contributions to improve the completeness of this list are greatly appreciated.
 
 | :id: | **Title** | **Repo** | **Paper** |
 |------|-----------|:--------:|:---------:|
-| 2621 | Let's Give a Voice to Conversational Agents in Virtual Reality | [![GitHub](https://img.shields.io/github/stars/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR?style=flat)](https://github.com/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yin23b_interspeech.pdf) |
-| 2622 | FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator | :heavy_minus_sign: |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baali23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07936-b31b1b.svg)](https://arxiv.org/abs/2306.07936) |
-| 2637 | Video Summarization Leveraging Multimodal Information for Presentations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23x_interspeech.pdf) |
-| 2645 | What Questions are My Customers Asking?: Towards Actionable Insights from Customer Questions in Contact Center Calls | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nathan23_interspeech.pdf) |
-| 2646 | COnVoy: A Contact Center Operated Pipeline for Voice of Customer Discovery | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tripathi23_interspeech.pdf) |
-| 2653 | NeMo Forced Aligner and its Application to Word Alignment for Subtitle Generation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rastorgueva23_interspeech.pdf) |
-| 2662 | CauSE: Causal Search Engine for Understanding Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pattnaik23_interspeech.pdf)|
-| 2663 | Tailored Real-Time Call Summarization System for Contact Centers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sachdeva23_interspeech.pdf) |
-| 2647 | Federated Learning Toolkit with Voice-based User Verification Demo | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mandke23_interspeech.pdf) |
-| 2657 | Learning when to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models | [![GitHub](https://img.shields.io/github/stars/liamdugan/speech-to-speech?style=flat)](https://github.com/liamdugan/speech-to-speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dugan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01201-b31b1b.svg)](https://arxiv.org/abs/2306.01201) |
-| 2628 | Fast Enrollable Streaming Keyword Spotting System: Training and Inference using a Web Browser | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cho23b_interspeech.pdf) |
-| 2665 | Cross-Lingual/Cross-Channel Intent Detection in Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/agrawal23b_interspeech.pdf) |
+| 2621 | Let's Give a Voice to Conversational Agents in Virtual Reality | [![GitHub](https://img.shields.io/github/stars/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR?style=flat)](https://github.com/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yin23b_interspeech.pdf) |
+| 2622 | FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator | :heavy_minus_sign: |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baali23b_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.07936-b31b1b.svg)](https://arxiv.org/abs/2306.07936) |
+| 2637 | Video Summarization Leveraging Multimodal Information for Presentations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23x_interspeech.pdf) |
+| 2645 | What Questions are My Customers Asking?: Towards Actionable Insights from Customer Questions in Contact Center Calls | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nathan23_interspeech.pdf) |
+| 2646 | COnVoy: A Contact Center Operated Pipeline for Voice of Customer Discovery | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tripathi23_interspeech.pdf) |
+| 2653 | NeMo Forced Aligner and its Application to Word Alignment for Subtitle Generation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rastorgueva23_interspeech.pdf) |
+| 2662 | CauSE: Causal Search Engine for Understanding Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pattnaik23_interspeech.pdf)|
+| 2663 | Tailored Real-Time Call Summarization System for Contact Centers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sachdeva23_interspeech.pdf) |
+| 2647 | Federated Learning Toolkit with Voice-based User Verification Demo | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mandke23_interspeech.pdf) |
+| 2657 | Learning when to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models | [![GitHub](https://img.shields.io/github/stars/liamdugan/speech-to-speech?style=flat)](https://github.com/liamdugan/speech-to-speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dugan23_interspeech.pdf) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2306.01201-b31b1b.svg)](https://arxiv.org/abs/2306.01201) |
+| 2628 | Fast Enrollable Streaming Keyword Spotting System: Training and Inference using a Web Browser | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cho23b_interspeech.pdf) |
+| 2665 | Cross-Lingual/Cross-Channel Intent Detection in Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/agrawal23b_interspeech.pdf) |
 
 ---