From 44371bd07496c638dc9e5bbbd681581e6506a354 Mon Sep 17 00:00:00 2001 From: MaloMn Date: Thu, 22 Feb 2024 10:43:06 +0100 Subject: [PATCH] Updated links to current archived urls for ISCA papers --- README.md | 2284 ++++++++++++++++++++++++++--------------------------- 1 file changed, 1142 insertions(+), 1142 deletions(-) diff --git a/README.md b/README.md index 75d7037..15771fd 100644 --- a/README.md +++ b/README.md @@ -282,12 +282,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1686 | Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech | [![GitHub](https://img.shields.io/github/stars/DmitryRyumin/OCEANAI?style=flat)](https://github.com/DmitryRyumin/OCEANAI)
[![Documentation Status](https://readthedocs.org/projects/oceanai/badge/?version=latest)](https://oceanai.readthedocs.io/en/latest/?badge=latest)
[![PyPI](https://img.shields.io/pypi/v/oceanai)](https://pypi.org/project/oceanai/)
[![MuPTA](https://img.shields.io/badge/MuPTA-dataset-20BEFF.svg)](https://hci.nw.ru/en/pages/mupta-corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ryumina23_interspeech.pdf) | -| 1049 | MOCKS 1.0: Multilingual Open Custom Keyword Spotting Testset | [![Hugging Face](https://img.shields.io/badge/🤗-MOCKS-FFD21F.svg)](https://huggingface.co/datasets/voiceintelligenceresearch/MOCKS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pudo23_interspeech.pdf) | -| 2150 | MD3: The Multi-Dialect Dataset of Dialogues | [![Kaggle](https://img.shields.io/badge/kaggle-dataset-20BEFF.svg)](https://www.kaggle.com/datasets/jacobeis99/md3en) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eisenstein23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11355-b31b1b.svg)](https://arxiv.org/abs/2305.11355) | -| 2279 | MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | [![GitHub](https://img.shields.io/github/stars/facebookresearch/muavic?style=flat)](https://github.com/facebookresearch/muavic) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/anwar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00628-b31b1b.svg)](https://arxiv.org/abs/2303.00628) | -| 1828 | Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/suwanbandit23_interspeech.pdf) | -| 2351 | HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11252-b31b1b.svg)](https://arxiv.org/abs/2306.11252) | +| 1686 | Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech | [![GitHub](https://img.shields.io/github/stars/DmitryRyumin/OCEANAI?style=flat)](https://github.com/DmitryRyumin/OCEANAI)
[![Documentation Status](https://readthedocs.org/projects/oceanai/badge/?version=latest)](https://oceanai.readthedocs.io/en/latest/?badge=latest)
[![PyPI](https://img.shields.io/pypi/v/oceanai)](https://pypi.org/project/oceanai/)
[![MuPTA](https://img.shields.io/badge/MuPTA-dataset-20BEFF.svg)](https://hci.nw.ru/en/pages/mupta-corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ryumina23_interspeech.pdf) | +| 1049 | MOCKS 1.0: Multilingual Open Custom Keyword Spotting Testset | [![Hugging Face](https://img.shields.io/badge/🤗-MOCKS-FFD21F.svg)](https://huggingface.co/datasets/voiceintelligenceresearch/MOCKS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pudo23_interspeech.pdf) | +| 2150 | MD3: The Multi-Dialect Dataset of Dialogues | [![Kaggle](https://img.shields.io/badge/kaggle-dataset-20BEFF.svg)](https://www.kaggle.com/datasets/jacobeis99/md3en) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eisenstein23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11355-b31b1b.svg)](https://arxiv.org/abs/2305.11355) | +| 2279 | MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | [![GitHub](https://img.shields.io/github/stars/facebookresearch/muavic?style=flat)](https://github.com/facebookresearch/muavic) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/anwar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00628-b31b1b.svg)](https://arxiv.org/abs/2303.00628) | +| 1828 | Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/suwanbandit23_interspeech.pdf) | +| 2351 | HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11252-b31b1b.svg)](https://arxiv.org/abs/2306.11252) | @@ -299,12 +299,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 749 | Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03594-b31b1b.svg)](https://arxiv.org/abs/2306.03594) | -| 1292 | Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttsbylzc.github.io/ttsdemo202303/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23n_interspeech.pdf) | -| 1317 | EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00648-b31b1b.svg)](https://arxiv.org/abs/2306.00648) | -| 806 | Laughter Synthesis using Pseudo Phonetic Tokens with a Large-Scale In-the-Wild Laughter Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://aria-k-alethia.github.io/2023laughter-demo/)
[![GitHub](https://img.shields.io/github/stars/Aria-K-Alethia/laughter-synthesis?style=flat)](https://github.com/Aria-K-Alethia/laughter-synthesis)
[![Laughterscape](https://img.shields.io/badge/Laughterscape-corpus-20BEFF.svg)](https://sites.google.com/site/shinnosuketakamichi/research-topics/laughter_corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12442-b31b1b.svg)](https://arxiv.org/abs/2305.12442) | -| 2270 | Explicit Intensity Control for Accented Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttslr.github.io/Ai-TTS/)
[![GitHub](https://img.shields.io/github/stars/ttslr/Ai-TTS?style=flat)](https://github.com/ttslr/Ai-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23u_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15364-b31b1b.svg)](https://arxiv.org/abs/2210.15364) | -| 834 | Comparing Normalizing Flows and Diffusion Models for Prosody and Acoustic Modelling in Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23o_interspeech.pdf) | +| 749 | Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03594-b31b1b.svg)](https://arxiv.org/abs/2306.03594) | +| 1292 | Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttsbylzc.github.io/ttsdemo202303/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23n_interspeech.pdf) | +| 1317 | EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00648-b31b1b.svg)](https://arxiv.org/abs/2306.00648) | +| 806 | Laughter Synthesis using Pseudo Phonetic Tokens with a Large-Scale In-the-Wild Laughter Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://aria-k-alethia.github.io/2023laughter-demo/)
[![GitHub](https://img.shields.io/github/stars/Aria-K-Alethia/laughter-synthesis?style=flat)](https://github.com/Aria-K-Alethia/laughter-synthesis)
[![Laughterscape](https://img.shields.io/badge/Laughterscape-corpus-20BEFF.svg)](https://sites.google.com/site/shinnosuketakamichi/research-topics/laughter_corpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12442-b31b1b.svg)](https://arxiv.org/abs/2305.12442) | +| 2270 | Explicit Intensity Control for Accented Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ttslr.github.io/Ai-TTS/)
[![GitHub](https://img.shields.io/github/stars/ttslr/Ai-TTS?style=flat)](https://github.com/ttslr/Ai-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23u_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15364-b31b1b.svg)](https://arxiv.org/abs/2210.15364) | +| 834 | Comparing Normalizing Flows and Diffusion Models for Prosody and Acoustic Modelling in Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23o_interspeech.pdf) |
@@ -316,12 +316,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2484 | Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/duquenne23_interspeech.pdf) | -| 1063 | Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13204-b31b1b.svg)](https://arxiv.org/abs/2305.13204) | -| 648 | StyleS2ST: Zero-Shot Style Transfer for Direct Speech-to-Speech Translation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://styles2st.github.io/StyleS2ST/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/song23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17732-b31b1b.svg)](https://arxiv.org/abs/2305.17732) | -| 1767 | Joint Speech Translation and Named Entity Recognition | [![GitHub](https://img.shields.io/github/stars/hlt-mt/FBK-fairseq?style=flat)](https://github.com/hlt-mt/FBK-fairseq/blob/master/fbk_works/JOINT_ST_NER2023.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gaido23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.11987-b31b1b.svg)](https://arxiv.org/abs/2210.11987) | -| 2050 | Analysis of Acoustic Information in End-to-End Spoken Language Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sant23_interspeech.pdf) | -| 2004 | LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23oa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02809-b31b1b.svg)](https://arxiv.org/abs/2211.02809) | +| 2484 | Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/duquenne23_interspeech.pdf) | +| 1063 | Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13204-b31b1b.svg)](https://arxiv.org/abs/2305.13204) | +| 648 | StyleS2ST: Zero-Shot Style Transfer for Direct Speech-to-Speech Translation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://styles2st.github.io/StyleS2ST/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/song23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17732-b31b1b.svg)](https://arxiv.org/abs/2305.17732) | +| 1767 | Joint Speech Translation and Named Entity Recognition | [![GitHub](https://img.shields.io/github/stars/hlt-mt/FBK-fairseq?style=flat)](https://github.com/hlt-mt/FBK-fairseq/blob/master/fbk_works/JOINT_ST_NER2023.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gaido23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.11987-b31b1b.svg)](https://arxiv.org/abs/2210.11987) | +| 2050 | Analysis of Acoustic Information in End-to-End Spoken Language Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sant23_interspeech.pdf) | +| 2004 | LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23oa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02809-b31b1b.svg)](https://arxiv.org/abs/2211.02809) |
@@ -333,12 +333,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1213 | DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/pyf98/DPHuBERT?style=flat)](https://github.com/pyf98/DPHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17651-b31b1b.svg)](https://arxiv.org/abs/2305.17651) | -| 1040 | Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/salah-zaiem/augmentations_adaptation?style=flat)](https://github.com/salah-zaiem/augmentations_adaptation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zaiem23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00481-b31b1b.svg)](https://arxiv.org/abs/2306.00481) | -| 387 | Dual Acoustic Linguistic Self-Supervised Representation Learning for Cross-Domain Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23e_interspeech.pdf) | -| 2166 | O-1: Self-Training with Oracle and 1-best Hypothesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baskar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2308.07486-b31b1b.svg)](https://arxiv.org/abs/2308.07486) | -| 822 | MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets | [![GitHub](https://img.shields.io/github/stars/ddlBoJack/MT4SSL?style=flat)](https://github.com/ddlBoJack/MT4SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.07321-b31b1b.svg)](https://arxiv.org/abs/2211.07321) | -| 1802 | Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lamyeemui23_interspeech.pdf) | +| 1213 | DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/pyf98/DPHuBERT?style=flat)](https://github.com/pyf98/DPHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17651-b31b1b.svg)](https://arxiv.org/abs/2305.17651) | +| 1040 | Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/salah-zaiem/augmentations_adaptation?style=flat)](https://github.com/salah-zaiem/augmentations_adaptation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zaiem23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00481-b31b1b.svg)](https://arxiv.org/abs/2306.00481) | +| 387 | Dual Acoustic Linguistic Self-Supervised Representation Learning for Cross-Domain Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23e_interspeech.pdf) | +| 2166 | O-1: Self-Training with Oracle and 1-best Hypothesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baskar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2308.07486-b31b1b.svg)](https://arxiv.org/abs/2308.07486) | +| 822 | MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets | [![GitHub](https://img.shields.io/github/stars/ddlBoJack/MT4SSL?style=flat)](https://github.com/ddlBoJack/MT4SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.07321-b31b1b.svg)](https://arxiv.org/abs/2211.07321) | +| 1802 | Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lamyeemui23_interspeech.pdf) |
@@ -350,12 +350,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1781 | Chinese EFL Learners' Perception of English Prosodic Focus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23da_interspeech.pdf) | -| 315 | Pitch Accent Variation and the Interpretation of Rising and Falling Intonation in American English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sostarics23_interspeech.pdf) | -| 1033 | Tonal Coarticulation as a Cue for Upcoming Prosodic Boundary | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kuang23_interspeech.pdf) | -| 2116 | Alignment of Beat Gestures and Prosodic Prominence in German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/repp23_interspeech.pdf) | -| 1454 | Creak Prevalence and Prosodic Context in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/white23_interspeech.pdf) | -| 1651 | Speech Reduction: Position within French Prosodic Structure | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bodur23_interspeech.pdf) | +| 1781 | Chinese EFL Learners' Perception of English Prosodic Focus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23da_interspeech.pdf) | +| 315 | Pitch Accent Variation and the Interpretation of Rising and Falling Intonation in American English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sostarics23_interspeech.pdf) | +| 1033 | Tonal Coarticulation as a Cue for Upcoming Prosodic Boundary | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kuang23_interspeech.pdf) | +| 2116 | Alignment of Beat Gestures and Prosodic Prominence in German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/repp23_interspeech.pdf) | +| 1454 | Creak Prevalence and Prosodic Context in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/white23_interspeech.pdf) | +| 1651 | Speech Reduction: Position within French Prosodic Structure | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bodur23_interspeech.pdf) | @@ -367,10 +367,10 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 637 | Transvelar Nasal Coupling Contributing to Speaker Characteristics in Non-nasal Vowels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23d_interspeech.pdf) | -| 286 | Speech Synthesis from Articulatory Movements Recorded by Real-time MRI | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/otani23_interspeech.pdf) | -| 2283 | The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN | [![GitHub](https://img.shields.io/github/stars/byronthecoder/S-RNN-4-ART?style=flat)](https://github.com/byronthecoder/S-RNN-4-ART) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuan23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05088-b31b1b.svg)](https://arxiv.org/abs/2306.05088) | -| 1933 | Did You See that? Exploring the Role of Vision in the Development of Consonant Feature Contrasts in Children with Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mahshie23_interspeech.pdf) | +| 637 | Transvelar Nasal Coupling Contributing to Speaker Characteristics in Non-nasal Vowels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23d_interspeech.pdf) | +| 286 | Speech Synthesis from Articulatory Movements Recorded by Real-time MRI | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/otani23_interspeech.pdf) | +| 2283 | The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN | [![GitHub](https://img.shields.io/github/stars/byronthecoder/S-RNN-4-ART?style=flat)](https://github.com/byronthecoder/S-RNN-4-ART) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuan23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05088-b31b1b.svg)](https://arxiv.org/abs/2306.05088) | +| 1933 | Did You See that? Exploring the Role of Vision in the Development of Consonant Feature Contrasts in Children with Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mahshie23_interspeech.pdf) |
@@ -382,12 +382,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2017 | Automatic Assessments of Dysarthric Speech: the Usability of Acoustic-Phonetic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vanbemmel23_interspeech.pdf) | -| 1455 | Classification of Multi-class Vowels and Fricatives from Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/venkatathirumalakumar23_interspeech.pdf) | -| 1627 | Parameter-efficient Dysarthric Speech Recognition using Adapter Fusion and Householder Transformation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qi23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07090-b31b1b.svg)](https://arxiv.org/abs/2306.07090) | -| 2481 | Few-Shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hermann23_interspeech.pdf)
[![idiap](https://img.shields.io/badge/idiap.ch.5055-FF6A00.svg)](http://publications.idiap.ch/index.php/publications/show/5055) | -| 1921 | Latent Phrase Matching for Dysarthric Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yee23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05446-b31b1b.svg)](https://arxiv.org/abs/2306.05446) | -| 173 | Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification | [![GitHub](https://img.shields.io/github/stars/juice500ml/dysarthria-gop?style=flat)](https://github.com/juice500ml/dysarthria-gop) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yeo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18392-b31b1b.svg)](https://arxiv.org/abs/2305.18392) | +| 2017 | Automatic Assessments of Dysarthric Speech: the Usability of Acoustic-Phonetic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vanbemmel23_interspeech.pdf) | +| 1455 | Classification of Multi-class Vowels and Fricatives from Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/venkatathirumalakumar23_interspeech.pdf) | +| 1627 | Parameter-efficient Dysarthric Speech Recognition using Adapter Fusion and Householder Transformation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qi23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07090-b31b1b.svg)](https://arxiv.org/abs/2306.07090) | +| 2481 | Few-Shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hermann23_interspeech.pdf)
[![idiap](https://img.shields.io/badge/idiap.ch.5055-FF6A00.svg)](http://publications.idiap.ch/index.php/publications/show/5055) | +| 1921 | Latent Phrase Matching for Dysarthric Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yee23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05446-b31b1b.svg)](https://arxiv.org/abs/2306.05446) | +| 173 | Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification | [![GitHub](https://img.shields.io/github/stars/juice500ml/dysarthria-gop?style=flat)](https://github.com/juice500ml/dysarthria-gop) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yeo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18392-b31b1b.svg)](https://arxiv.org/abs/2305.18392) |
@@ -399,10 +399,10 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1562 | CQNV: A Combination of Coarsely Quantized Bitstream and Neural Vocoder for Low Rate Speech Coding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zheng23c_interspeech.pdf) | -| 1234 | Target Speech Extraction with Conditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kamo23_interspeech.pdf) | -| 883 | Towards Fully Quantized Neural Networks For Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ssi-research/FQSE?style=flat)](https://github.com/ssi-research/FQSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cohen23_interspeech.pdf) | -| 980 | Complex Image Generation SwinTransformer Network for Audio Denoising | [![GitHub](https://img.shields.io/github/stars/YoushanZhang/CoxImgSwinTransformer?style=flat)](https://github.com/YoushanZhang/CoxImgSwinTransformer) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23p_interspeech.pdf) | +| 1562 | CQNV: A Combination of Coarsely Quantized Bitstream and Neural Vocoder for Low Rate Speech Coding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zheng23c_interspeech.pdf) | +| 1234 | Target Speech Extraction with Conditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kamo23_interspeech.pdf) | +| 883 | Towards Fully Quantized Neural Networks For Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ssi-research/FQSE?style=flat)](https://github.com/ssi-research/FQSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cohen23_interspeech.pdf) | +| 980 | Complex Image Generation SwinTransformer Network for Audio Denoising | [![GitHub](https://img.shields.io/github/stars/YoushanZhang/CoxImgSwinTransformer?style=flat)](https://github.com/YoushanZhang/CoxImgSwinTransformer) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23p_interspeech.pdf) | @@ -414,81 +414,81 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2118 | Using Text Injection to Improve Recognition of Personal Identifiers in Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/blau23_interspeech.pdf) | -| 837 | Investigating Wav2Vec2 Context Representations and the Effects of Fine-Tuning, a Case-Study of a Finnish Model | [![GitHub](https://img.shields.io/github/stars/aalto-speech/Wav2vec2Interpretation?style=flat)](https://github.com/aalto-speech/Wav2vec2Interpretation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/grosz23_interspeech.pdf) | -| 872 | Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lehecka23_interspeech.pdf) | -| 177 | Iteratively Improving Speech Recognition and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://demosamplesites.github.io/IterativeASR_VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15055-b31b1b.svg)](https://arxiv.org/abs/2305.15055) | -| 2001 | LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fatehi23_interspeech.pdf)
[![nottingham-repo](https://img.shields.io/badge/nottingham-22183323-1A296B.svg)](https://nottingham-repository.worktribe.com/output/22183323) | -| 746 | TranUSR: Phoneme-to-Word Transcoder based Unified Speech Representation Learning for Cross-Lingual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xue23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13629-b31b1b.svg)](https://arxiv.org/abs/2305.13629) | -| 1124 | Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23e_interspeech.pdf) | -| 2417 | GhostRNN: Reducing State Redundancy in RNN with Cheap Operations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23g_interspeech.pdf) | -| 1442 | Task-Agnostic Structured Pruning of Speech Representation Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23da_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01385-b31b1b.svg)](https://arxiv.org/abs/2306.01385) | -| 485 | Factual Consistency Oriented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kanda23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12369-b31b1b.svg)](https://arxiv.org/abs/2302.12369) | -| 1036 | Multi-Head State Space Model for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fathullah23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12498-b31b1b.svg)](https://arxiv.org/abs/2305.12498) | -| 341 | Cascaded Multi-task Adaptive Learning based on Neural Architecture Search | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23_interspeech.pdf) | -| 2359 | Probing Self-Supervised Speech Models for Phonetic and Phonemic Information: A Case Study in Aspiration | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martin23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06232-b31b1b.svg)](https://arxiv.org/abs/2306.06232) | -| 739 | Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/harding23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/selective-biasing-with-trie-based-contextual-adapters-for-personalised-speech-recognition-using-neural-transducers) | -| 213 | A More Accurate Internal Language Model Score Estimation for the Hybrid Autoregressive Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23b_interspeech.pdf) | -| 106 | Attention Gate between Capsules in Fully Capsule-Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23_interspeech.pdf) | -|2585 | OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bengaliai.github.io/asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rakib23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.09688-b31b1b.svg)](https://arxiv.org/abs/2305.09688) | -| 1316 | ML-SUPERB: Multilingual Speech Universal PERformance Benchmark | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/ml_superb/asr1) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10615-b31b1b.svg)](https://arxiv.org/abs/2305.10615) | -| 2389 | General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23l_interspeech.pdf) | -| 275 | Joint Instance Reconstruction and Feature Sub-space Alignment for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23_interspeech.pdf) | -| 2280 | Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/moriya23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15971-b31b1b.svg)](https://arxiv.org/abs/2305.15971) | -| 1272 | Random Utterance Concatenation based Data Augmentation for Improving Short-Video Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15876-b31b1b.svg)](https://arxiv.org/abs/2210.15876) | -| 1189 | Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers | [![GitHub](https://img.shields.io/github/stars/NMS05/Adapter-Incremental-Continual-Learning-AST?style=flat)](https://github.com/NMS05/Adapter-Incremental-Continual-Learning-AST) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/muthuchamyselvaraj23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14314-b31b1b.svg)](https://arxiv.org/abs/2302.14314) | -| 223 | Rethinking Speech Recognition with a Multimodal Perspective via Acoustic and Semantic Cooperative Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14049-b31b1b.svg)](https://arxiv.org/abs/2305.14049) | -| 923 | Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://liangzheng-zl.github.io/bedit-web/)
[![GitHub](https://img.shields.io/github/stars/Liangzheng-ZL/BEdit-TTS?style=flat)](https://github.com/Liangzheng-ZL/BEdit-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08588-b31b1b.svg)](https://arxiv.org/abs/2306.08588) | -| 2258 | Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01031-b31b1b.svg)](https://arxiv.org/abs/2306.01031) | -| 1184 | DCCRN-KWS: An Audio Bias based Model for Noise Robust Small-Footprint Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lv23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12331-b31b1b.svg)](https://arxiv.org/abs/2305.12331) | -| 1609 | OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02541-b31b1b.svg)](https://arxiv.org/abs/2306.02541) | -| 2136 | Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bleeker23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2304.08862-b31b1b.svg)](https://arxiv.org/abs/2304.08862) | -| 788 | Rehearsal-Free Online Continual Learning for Automatic Speech Recognition | [![GitHub](https://img.shields.io/github/stars/StevenVdEeckt/online-cl-for-asr?style=flat)](https://github.com/StevenVdEeckt/online-cl-for-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vandereeckt23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.10860-b31b1b.svg)](https://arxiv.org/abs/2306.10860) | -| 496 | ASR Data Augmentation in Low-Resource Settings using Cross-Lingual Multi-Speaker TTS and Cross-Lingual Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Edresson/Wav2Vec-Wrapper/tree/main/Papers/TTS-Augmentation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/casanova23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2204.00618-b31b1b.svg)](https://arxiv.org/abs/2204.00618) | -| 642 | Personality-aware Training based Speaker Adaptation for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/shibeiing/Personality-aware-Training-PAT?style=flat)](https://github.com/shibeiing/Personality-aware-Training-PAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gu23_interspeech.pdf) | -| 2257 | Target Vocabulary Recognition based on Multi-task Learning with Decomposed Teacher Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ito23b_interspeech.pdf) | -| 679 | Wave to Syntax: Probing Spoken Language Models for Syntax | [![GitHub](https://img.shields.io/github/stars/techsword/wave-to-syntax?style=flat)](https://github.com/techsword/wave-to-syntax) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18957-b31b1b.svg)](https://arxiv.org/abs/2305.18957) | -| 720 | Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/naowarat23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/effective-training-of-attention-based-contextual-biasing-adapters-with-synthetic-audio-for-personalised-asr) | -| 630 | Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08920-b31b1b.svg)](https://arxiv.org/abs/2306.08920) | -| 1118 | SlothSpeech: Denial-of-Service Attack Against Speech Recognition Models | [![GitHub](https://img.shields.io/github/stars/0xrutvij/SlothSpeech?style=flat)](https://github.com/0xrutvij/SlothSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/haque23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00794-b31b1b.svg)](https://arxiv.org/abs/2306.00794) | -| 503 | CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23d_interspeech.pdf) | -| 159 | Exploring Sources of Racial Bias in Automatic Speech Recognition through the Lens of Rhythmic Variation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lai23_interspeech.pdf) | -| 1440 | Can Contextual Biasing Remain Effective with Whisper and GPT-2? | [![GitHub](https://img.shields.io/github/stars/BriansIDP/WhisperBiasing?style=flat)](https://github.com/BriansIDP/WhisperBiasing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01942-b31b1b.svg)](https://arxiv.org/abs/2306.01942) | -| 221 | Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/nttcslab/m2d/tree/master/speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/niizumi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14079-b31b1b.svg)](https://arxiv.org/abs/2305.14079) | -| 2207 | Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cui23c_interspeech.pdf) | -| 1216 | MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition | [![GitHub](https://img.shields.io/github/stars/jiamin1013/mixrep-espnet?style=flat)](https://github.com/jiamin1013/mixrep-espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23_interspeech.pdf) | -| 1192 | Improving Chinese Mandarin Speech Recognition using Graph Embedding Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23h_interspeech.pdf) | -| 1276 | Adapting Multi-Lingual ASR Models for Handling Multiple Talkers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18747-b31b1b.svg)](https://arxiv.org/abs/2305.18747) | -| 1221 | Adapter-Tuning with Effective Token-Dependent Representation Shift for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ng23c_interspeech.pdf) | -| 1010 | Model-Internal Slot-Triggered Biasing for Domain Expansion in Neural Transducer ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23c_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/model-internal-slot-triggered-biasing-for-domain-expansion-in-neural-transducer-asr-models) | -| 2508 | Delay-Penalized CTC Implemented based on Finite State Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yao23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11539-b31b1b.svg)](https://arxiv.org/abs/2305.11539) | -| 2589 | Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/vistaar?style=flat)](https://github.com/AI4Bharat/vistaar) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhogale23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15386-b31b1b.svg)](https://arxiv.org/abs/2305.15386) | -| 1091 | Domain Adaptive Self-Supervised Training of Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23_interspeech.pdf) | -| 1105 | There is more than One Kind of Robustness: Fooling Whisper with Adversarial Examples | [![GitHub](https://img.shields.io/github/stars/RaphaelOlivier/whisper_attack?style=flat)](https://github.com/RaphaelOlivier/whisper_attack) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/olivier23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17316-b31b1b.svg)](https://arxiv.org/abs/2210.17316) | -| 1064 | MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations | [![GitHub](https://img.shields.io/github/stars/CHeggan/MT-SLVR?style=flat)](https://github.com/CHeggan/MT-SLVR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/heggan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17191-b31b1b.svg)](https://arxiv.org/abs/2305.17191) | -| 1176 | Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23l_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06672-b31b1b.svg)](https://arxiv.org/abs/2306.06672) | -| 759 | Blank-Regularized CTC for Frame Skipping in Neural Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23l_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11558-b31b1b.svg)](https://arxiv.org/abs/2305.11558) | -| 2406 | The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jayakumar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19584-b31b1b.svg)](https://arxiv.org/abs/2305.19584) | -| 2354 | Improving RNN-Transducers with Acoustic LookAhead | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/unni23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.05006-b31b1b.svg)](https://arxiv.org/abs/2307.05006) | -| 1847 | Everyone has an Accent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/markl23_interspeech.pdf) | -| 2124 | Some Voices are too Common: Building Fair Speech Recognition Systems using the Common-Voice Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/maison23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03773-b31b1b.svg)](https://arxiv.org/abs/2306.03773) | -| 1168 | Information Magnitude based Dynamic Sub-Sampling for Speech-to-Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23u_interspeech.pdf) | -| 353 | Towards Multi-task Learning of Speech and Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/nikvaessen/disjoint-mtl?style=flat)](https://github.com/nikvaessen/disjoint-mtl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vaessen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12773-b31b1b.svg)](https://arxiv.org/abs/2302.12773) | -| 2186 | Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23f_interspeech.pdf) | -| 1012 | 2-bit Conformer Quantization for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rybakov23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16619-b31b1b.svg)](https://arxiv.org/abs/2305.16619) | -| 167 | Time-Domain Speech Enhancement for Robust Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.13318-b31b1b.svg)](https://arxiv.org/abs/2210.13318) | -| 257 | Multi-Channel Multi-Speaker Transformer for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yifan23_interspeech.pdf) | -| 733 | Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ye23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15875-b31b1b.svg)](https://arxiv.org/abs/2306.15875) | -| 2463 | Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/miwa23_interspeech.pdf) | -| 767 | Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) | -| 970 | Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/raissi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) | -| 791 | MMSpeech: Multi-Modal Multi-Task Encoder-Decoder Pre-training for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.00500-b31b1b.svg)](https://arxiv.org/abs/2212.00500) | -| 2499 | Biased Self-Supervised Learning for ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kreyssig23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02536-b31b1b.svg)](https://arxiv.org/abs/2211.02536) | -| 1300 | A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23q_interspeech.pdf) | -| 2470 | Wav2Vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23h_interspeech.pdf) | -| 770 | BAT: Boundary aware Transducer for Memory-Efficient and Low-Latency ASR | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/an23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11571-b31b1b.svg)](https://arxiv.org/abs/2305.11571) | -| 1342 | Bayes Risk Transducer: Transducer with Controllable Alignment Prediction | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tian23_interspeech.pdf) | -| 783 | Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alastruey23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06954-b31b1b.svg)](https://arxiv.org/abs/2306.06954) | +| 2118 | Using Text Injection to Improve Recognition of Personal Identifiers in Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/blau23_interspeech.pdf) | +| 837 | Investigating Wav2Vec2 Context Representations and the Effects of Fine-Tuning, a Case-Study of a Finnish Model | [![GitHub](https://img.shields.io/github/stars/aalto-speech/Wav2vec2Interpretation?style=flat)](https://github.com/aalto-speech/Wav2vec2Interpretation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/grosz23_interspeech.pdf) | +| 872 | Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lehecka23_interspeech.pdf) | +| 177 | Iteratively Improving Speech Recognition and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://demosamplesites.github.io/IterativeASR_VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15055-b31b1b.svg)](https://arxiv.org/abs/2305.15055) | +| 2001 | LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fatehi23_interspeech.pdf)
[![nottingham-repo](https://img.shields.io/badge/nottingham-22183323-1A296B.svg)](https://nottingham-repository.worktribe.com/output/22183323) | +| 746 | TranUSR: Phoneme-to-Word Transcoder based Unified Speech Representation Learning for Cross-Lingual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xue23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13629-b31b1b.svg)](https://arxiv.org/abs/2305.13629) | +| 1124 | Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23e_interspeech.pdf) | +| 2417 | GhostRNN: Reducing State Redundancy in RNN with Cheap Operations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23g_interspeech.pdf) | +| 1442 | Task-Agnostic Structured Pruning of Speech Representation Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23da_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01385-b31b1b.svg)](https://arxiv.org/abs/2306.01385) | +| 485 | Factual Consistency Oriented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kanda23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12369-b31b1b.svg)](https://arxiv.org/abs/2302.12369) | +| 1036 | Multi-Head State Space Model for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fathullah23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12498-b31b1b.svg)](https://arxiv.org/abs/2305.12498) | +| 341 | Cascaded Multi-task Adaptive Learning based on Neural Architecture Search | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23_interspeech.pdf) | +| 2359 | Probing Self-Supervised Speech Models for Phonetic and Phonemic Information: A Case Study in Aspiration | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martin23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06232-b31b1b.svg)](https://arxiv.org/abs/2306.06232) | +| 739 | Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/harding23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/selective-biasing-with-trie-based-contextual-adapters-for-personalised-speech-recognition-using-neural-transducers) | +| 213 | A More Accurate Internal Language Model Score Estimation for the Hybrid Autoregressive Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23b_interspeech.pdf) | +| 106 | Attention Gate between Capsules in Fully Capsule-Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23_interspeech.pdf) | +|2585 | OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bengaliai.github.io/asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rakib23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.09688-b31b1b.svg)](https://arxiv.org/abs/2305.09688) | +| 1316 | ML-SUPERB: Multilingual Speech Universal PERformance Benchmark | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/ml_superb/asr1) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10615-b31b1b.svg)](https://arxiv.org/abs/2305.10615) | +| 2389 | General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23l_interspeech.pdf) | +| 275 | Joint Instance Reconstruction and Feature Sub-space Alignment for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23_interspeech.pdf) | +| 2280 | Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/moriya23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15971-b31b1b.svg)](https://arxiv.org/abs/2305.15971) | +| 1272 | Random Utterance Concatenation based Data Augmentation for Improving Short-Video Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15876-b31b1b.svg)](https://arxiv.org/abs/2210.15876) | +| 1189 | Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers | [![GitHub](https://img.shields.io/github/stars/NMS05/Adapter-Incremental-Continual-Learning-AST?style=flat)](https://github.com/NMS05/Adapter-Incremental-Continual-Learning-AST) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/muthuchamyselvaraj23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14314-b31b1b.svg)](https://arxiv.org/abs/2302.14314) | +| 223 | Rethinking Speech Recognition with a Multimodal Perspective via Acoustic and Semantic Cooperative Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14049-b31b1b.svg)](https://arxiv.org/abs/2305.14049) | +| 923 | Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://liangzheng-zl.github.io/bedit-web/)
[![GitHub](https://img.shields.io/github/stars/Liangzheng-ZL/BEdit-TTS?style=flat)](https://github.com/Liangzheng-ZL/BEdit-TTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08588-b31b1b.svg)](https://arxiv.org/abs/2306.08588) | +| 2258 | Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01031-b31b1b.svg)](https://arxiv.org/abs/2306.01031) | +| 1184 | DCCRN-KWS: An Audio Bias based Model for Noise Robust Small-Footprint Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lv23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12331-b31b1b.svg)](https://arxiv.org/abs/2305.12331) | +| 1609 | OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02541-b31b1b.svg)](https://arxiv.org/abs/2306.02541) | +| 2136 | Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bleeker23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2304.08862-b31b1b.svg)](https://arxiv.org/abs/2304.08862) | +| 788 | Rehearsal-Free Online Continual Learning for Automatic Speech Recognition | [![GitHub](https://img.shields.io/github/stars/StevenVdEeckt/online-cl-for-asr?style=flat)](https://github.com/StevenVdEeckt/online-cl-for-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vandereeckt23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.10860-b31b1b.svg)](https://arxiv.org/abs/2306.10860) | +| 496 | ASR Data Augmentation in Low-Resource Settings using Cross-Lingual Multi-Speaker TTS and Cross-Lingual Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Edresson/Wav2Vec-Wrapper/tree/main/Papers/TTS-Augmentation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/casanova23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2204.00618-b31b1b.svg)](https://arxiv.org/abs/2204.00618) | +| 642 | Personality-aware Training based Speaker Adaptation for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/shibeiing/Personality-aware-Training-PAT?style=flat)](https://github.com/shibeiing/Personality-aware-Training-PAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gu23_interspeech.pdf) | +| 2257 | Target Vocabulary Recognition based on Multi-task Learning with Decomposed Teacher Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ito23b_interspeech.pdf) | +| 679 | Wave to Syntax: Probing Spoken Language Models for Syntax | [![GitHub](https://img.shields.io/github/stars/techsword/wave-to-syntax?style=flat)](https://github.com/techsword/wave-to-syntax) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18957-b31b1b.svg)](https://arxiv.org/abs/2305.18957) | +| 720 | Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/naowarat23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/effective-training-of-attention-based-contextual-biasing-adapters-with-synthetic-audio-for-personalised-asr) | +| 630 | Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08920-b31b1b.svg)](https://arxiv.org/abs/2306.08920) | +| 1118 | SlothSpeech: Denial-of-Service Attack Against Speech Recognition Models | [![GitHub](https://img.shields.io/github/stars/0xrutvij/SlothSpeech?style=flat)](https://github.com/0xrutvij/SlothSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/haque23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00794-b31b1b.svg)](https://arxiv.org/abs/2306.00794) | +| 503 | CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23d_interspeech.pdf) | +| 159 | Exploring Sources of Racial Bias in Automatic Speech Recognition through the Lens of Rhythmic Variation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lai23_interspeech.pdf) | +| 1440 | Can Contextual Biasing Remain Effective with Whisper and GPT-2? | [![GitHub](https://img.shields.io/github/stars/BriansIDP/WhisperBiasing?style=flat)](https://github.com/BriansIDP/WhisperBiasing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01942-b31b1b.svg)](https://arxiv.org/abs/2306.01942) | +| 221 | Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/nttcslab/m2d/tree/master/speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/niizumi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14079-b31b1b.svg)](https://arxiv.org/abs/2305.14079) | +| 2207 | Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cui23c_interspeech.pdf) | +| 1216 | MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition | [![GitHub](https://img.shields.io/github/stars/jiamin1013/mixrep-espnet?style=flat)](https://github.com/jiamin1013/mixrep-espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23_interspeech.pdf) | +| 1192 | Improving Chinese Mandarin Speech Recognition using Graph Embedding Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23h_interspeech.pdf) | +| 1276 | Adapting Multi-Lingual ASR Models for Handling Multiple Talkers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18747-b31b1b.svg)](https://arxiv.org/abs/2305.18747) | +| 1221 | Adapter-Tuning with Effective Token-Dependent Representation Shift for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ng23c_interspeech.pdf) | +| 1010 | Model-Internal Slot-Triggered Biasing for Domain Expansion in Neural Transducer ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23c_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/model-internal-slot-triggered-biasing-for-domain-expansion-in-neural-transducer-asr-models) | +| 2508 | Delay-Penalized CTC Implemented based on Finite State Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yao23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11539-b31b1b.svg)](https://arxiv.org/abs/2305.11539) | +| 2589 | Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/vistaar?style=flat)](https://github.com/AI4Bharat/vistaar) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhogale23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15386-b31b1b.svg)](https://arxiv.org/abs/2305.15386) | +| 1091 | Domain Adaptive Self-Supervised Training of Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23_interspeech.pdf) | +| 1105 | There is more than One Kind of Robustness: Fooling Whisper with Adversarial Examples | [![GitHub](https://img.shields.io/github/stars/RaphaelOlivier/whisper_attack?style=flat)](https://github.com/RaphaelOlivier/whisper_attack) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/olivier23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17316-b31b1b.svg)](https://arxiv.org/abs/2210.17316) | +| 1064 | MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations | [![GitHub](https://img.shields.io/github/stars/CHeggan/MT-SLVR?style=flat)](https://github.com/CHeggan/MT-SLVR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/heggan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17191-b31b1b.svg)](https://arxiv.org/abs/2305.17191) | +| 1176 | Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23l_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06672-b31b1b.svg)](https://arxiv.org/abs/2306.06672) | +| 759 | Blank-Regularized CTC for Frame Skipping in Neural Transducer | [![GitHub](https://img.shields.io/github/stars/k2-fsa/k2?style=flat)](https://github.com/k2-fsa/k2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23l_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11558-b31b1b.svg)](https://arxiv.org/abs/2305.11558) | +| 2406 | The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jayakumar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19584-b31b1b.svg)](https://arxiv.org/abs/2305.19584) | +| 2354 | Improving RNN-Transducers with Acoustic LookAhead | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/unni23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.05006-b31b1b.svg)](https://arxiv.org/abs/2307.05006) | +| 1847 | Everyone has an Accent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/markl23_interspeech.pdf) | +| 2124 | Some Voices are too Common: Building Fair Speech Recognition Systems using the Common-Voice Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/maison23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03773-b31b1b.svg)](https://arxiv.org/abs/2306.03773) | +| 1168 | Information Magnitude based Dynamic Sub-Sampling for Speech-to-Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23u_interspeech.pdf) | +| 353 | Towards Multi-task Learning of Speech and Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/nikvaessen/disjoint-mtl?style=flat)](https://github.com/nikvaessen/disjoint-mtl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vaessen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12773-b31b1b.svg)](https://arxiv.org/abs/2302.12773) | +| 2186 | Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23f_interspeech.pdf) | +| 1012 | 2-bit Conformer Quantization for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rybakov23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16619-b31b1b.svg)](https://arxiv.org/abs/2305.16619) | +| 167 | Time-Domain Speech Enhancement for Robust Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.13318-b31b1b.svg)](https://arxiv.org/abs/2210.13318) | +| 257 | Multi-Channel Multi-Speaker Transformer for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yifan23_interspeech.pdf) | +| 733 | Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ye23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15875-b31b1b.svg)](https://arxiv.org/abs/2306.15875) | +| 2463 | Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/miwa23_interspeech.pdf) | +| 767 | Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) | +| 970 | Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/raissi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12493-b31b1b.svg)](https://arxiv.org/abs/2305.12493) | +| 791 | MMSpeech: Multi-Modal Multi-Task Encoder-Decoder Pre-training for Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.00500-b31b1b.svg)](https://arxiv.org/abs/2212.00500) | +| 2499 | Biased Self-Supervised Learning for ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kreyssig23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02536-b31b1b.svg)](https://arxiv.org/abs/2211.02536) | +| 1300 | A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23q_interspeech.pdf) | +| 2470 | Wav2Vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23h_interspeech.pdf) | +| 770 | BAT: Boundary aware Transducer for Memory-Efficient and Low-Latency ASR | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/an23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11571-b31b1b.svg)](https://arxiv.org/abs/2305.11571) | +| 1342 | Bayes Risk Transducer: Transducer with Controllable Alignment Prediction | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tian23_interspeech.pdf) | +| 783 | Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alastruey23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06954-b31b1b.svg)](https://arxiv.org/abs/2306.06954) |
@@ -500,91 +500,91 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1173 | Robust Prototype Learning for Anomalous Sound Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zeng23b_interspeech.pdf) | -| 982 | A Multimodal Prototypical Approach for Unsupervised Sound Classification | [![GitHub](https://img.shields.io/github/stars/sakshamsingh1/audio_text_proto?style=flat)](https://github.com/sakshamsingh1/audio_text_proto) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kushwaha23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12300-b31b1b.svg)](https://arxiv.org/abs/2306.12300) | -| 563 | Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms | [![GitHub](https://img.shields.io/github/stars/ph-w2000/S2pecNet?style=flat)](https://github.com/ph-w2000/S2pecNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wen23_interspeech.pdf) | -| 1082 | Adapting Language-Audio Models as Few-Shot Audio Learners | [![GitHub](https://img.shields.io/github/stars/JinhuaLiang/lam4fsl?style=flat)](https://github.com/JinhuaLiang/lam4fsl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17719-b31b1b.svg)](https://arxiv.org/abs/2305.17719) | -| 734 | TFECN: Time-Frequency Enhanced ConvNet for Audio Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23l_interspeech.pdf) | -| 350 | Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23b_interspeech.pdf) | -| 1174 | Fine-Tuning Audio Spectrogram Transformer with Task-Aware Adapters for Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23n_interspeech.pdf) | -| 1210 | Small Footprint Multi-Channel Network for Keyword Spotting with Centroid based Awareness | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2204.05445-b31b1b.svg)](https://arxiv.org/abs/2204.05445) | -| 1380 | Few-Shot Class-Incremental Audio Classification using Adaptively-Refined Prototypes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18045-b31b1b.svg)](https://arxiv.org/abs/2305.18045) | -| 1549 | Interpretable Latent Space using Space-Filling Curves for Phonetic Analysis in Voice Conversion | [![GitLab](https://img.shields.io/gitlab/stars/speech-interaction-technology-aalto-university/sfvq)](https://gitlab.com/speech-interaction-technology-aalto-university/sfvq) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vali23_interspeech.pdf)
[![Aalto](https://img.shields.io/badge/aalto-fi-005EB8.svg)](https://research.aalto.fi/en/publications/interpretable-latent-space-using-space-filling-curves-for-phoneti) | -| 1861 | Topological Data Analysis for Speech Processing | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://topohubert.github.io/speech-topology-webpages/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tulchinskii23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.17223-b31b1b.svg)](https://arxiv.org/abs/2211.17223) | -| 1329 | Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation | [![GitHub](https://img.shields.io/github/stars/sungnyun/ARMHuBERT?style=flat)](https://github.com/sungnyun/ARMHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11685-b31b1b.svg)](https://arxiv.org/abs/2305.11685) | -| 932 | Personalized Acoustic Scene Classification in Ultra-Low Power Embedded Devices using Privacy-Preserving Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/koppelmann23_interspeech.pdf) | -| 176 | Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/boschresearch/soundsee-background-domain-switch?style=flat)](https://github.com/boschresearch/soundsee-background-domain-switch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23_interspeech.pdf) | -| 1021 | Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning | [![GitHub](https://img.shields.io/github/stars/Yuanbo2020/HGRL?style=flat)](https://github.com/Yuanbo2020/HGRL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hou23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://personal.ee.surrey.ac.uk/Personal/W.Wang/papers/Hou%20etal_INTERSPEECH_2023.pdf) | -| 2416 | Anomalous Sound Detection using Self-Attention-based Frequency Pattern Analysis of Machine Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23fa_interspeech.pdf) | -| 1478 | Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23c_interspeech.pdf) | -| 575 | Differential Privacy enabled Dementia Classification: An Exploration of the Privacy-Accuracy Trade-off in Speech Signal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bn23_interspeech.pdf) | -| 1595 | Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous.4open.science/w/INTERSPEECH2023-F8C4/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ka_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05709-b31b1b.svg)](https://arxiv.org/abs/2306.05709) | -| 1816 | Towards Multi-Lingual Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/swarupbehera/mAQA?style=flat)](https://github.com/swarupbehera/mAQA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/behera23_interspeech.pdf) | -| 1344 | Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liao23_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/370843606_Blind_Estimation_of_Room_Impulse_Response_from_Monaural_Reverberant_Speech_with_Segmental_Generative_Neural_Network) | -| 358 | Emotion-aware Audio-Driven Face Animation via Contrastive Feature Disentanglement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ren23_interspeech.pdf) | -| 591 | Anomalous Sound Detection based on Sound Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shimonishi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15859-b31b1b.svg)](https://arxiv.org/abs/2305.15859) | -| 2089 | Random Forest Classification of Breathing Phases from Audio Signals Recorded using Mobile Devices | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fahed23_interspeech.pdf) | -| 1581 | GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ahn23b_interspeech.pdf) | -| 477 | Wav2ToBI: A New Approach to Automatic ToBI Transcription | [![GitHub](https://img.shields.io/github/stars/reginazhai/Wav2ToBI?style=flat)](https://github.com/reginazhai/Wav2ToBI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhai23_interspeech.pdf) | -| 344 | Joint-Former: Jointly Regularized and Locally Down-Sampled Conformer for Semi-Supervised Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/mastergofujs/Joint-Former?style=flat)](https://github.com/mastergofujs/Joint-Former) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23b_interspeech.pdf) | -| 245 | Towards Attention-based Contrastive Learning for Audio Spoof Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/goel23_interspeech.pdf) | -| 2488 | Masked Audio Modeling with CLAP and Multi-Objective Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23d_interspeech.pdf) | -| 1904 | Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems | [![GitHub](https://img.shields.io/github/stars/mrusci/ondevice-fewshot-kws?style=flat)](https://github.com/mrusci/ondevice-fewshot-kws) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rusci23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02161-b31b1b.svg)](https://arxiv.org/abs/2306.02161) | -| 481 | Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/azeemi23_interspeech.pdf) | -| 491 | Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18419-b31b1b.svg)](https://arxiv.org/abs/2305.18419) | -| 684 | Multi-Microphone Automatic Speech Segmentation in Meetings based on Circular Harmonics Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mariotte23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04268-b31b1b.svg)](https://arxiv.org/abs/2306.04268) | -| 542 | Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23h_interspeech.pdf) | -| 88 | Insights Into End-to-End Audio-to-Score Transcription with Real Recordings: A Case Study with Saxophone Works | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martinezsevilla23_interspeech.pdf) | -| 2193 | Whisper-AT: Noise-Robust Automatic Speech Recognizers are also Strong Audio Event Taggers | [![GitHub](https://img.shields.io/github/stars/YuanGongND/whisper-at?style=flat)](https://github.com/YuanGongND/whisper-at)
[![PyPI](https://img.shields.io/pypi/v/whisper-at)](https://pypi.org/project/whisper-at/)
[![Whisper-AT](https://img.shields.io/badge/🤗-demo-FFD21F.svg)](https://huggingface.co/spaces/yuangongfdu/whisper-at) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.03183-b31b1b.svg)](https://arxiv.org/abs/2307.03183) | -| 1621 | Synthetic Voice Spoofing Detection based on Feature Pyramid Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23b_interspeech.pdf) | -| 1383 | Learning a Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23c_interspeech.pdf) | -| 2011 | Application of Knowledge Distillation to Multi-Task Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kerpicci23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.16611-b31b1b.svg)](https://arxiv.org/abs/2210.16611) | -| 2297 | DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18441-b31b1b.svg)](https://arxiv.org/abs/2305.18441) | -| 1965 | Variational Classifier for Unsupervised Anomalous Sound Detection under Domain Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/almudevar23_interspeech.pdf) | -| 745 | FlexiAST: Flexibility is What AST Needs | [![GitHub](https://img.shields.io/github/stars/JiuFengSC/FlexiAST_INTERSPEECH23?style=flat)](https://github.com/JiuFengSC/FlexiAST_INTERSPEECH23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.09286-b31b1b.svg)](https://arxiv.org/abs/2307.09286) | -| 1579 | MCR-Data2vec 2.0: Improving Self-Supervised Speech Pre-training via Model-Level Consistency Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08463-b31b1b.svg)](https://arxiv.org/abs/2306.08463) | -| 914 | Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention | [![GitHub](https://img.shields.io/github/stars/liuxubo717/V-ACT?style=flat)](https://github.com/liuxubo717/V-ACT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23l_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.16428-b31b1b.svg)](https://arxiv.org/abs/2210.16428) | -| 165 | Time-Frequency Domain Filter-and-Sum Network for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/JonathanDZ/TF-FaSNet?style=flat)](https://github.com/JonathanDZ/TF-FaSNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deng23_interspeech.pdf) | -| 801 | Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23h_interspeech.pdf) | -| 1431 | An Efficient Speech Separation Network based on Recurrent Fusion Dilated Convolution and Channel Attention | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ca_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05887-b31b1b.svg)](https://arxiv.org/abs/2306.05887) | -| 2015 | Binaural Sound Localization in Noisy Environments using Frequency-based Audio Vision Transformer (FAViT) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/phokhinanan23_interspeech.pdf) | -| 1723 | Contrastive Learning based Deep Latent Masking for Music Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23i_interspeech.pdf) | -| 655 | Speaker Extraction with Detection of Presence and Absence of Target Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23k_interspeech.pdf) | -| 889 | PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23k_interspeech.pdf) | -| 2117 | Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning | [![GitHub](https://img.shields.io/github/stars/apple/ml-spatial-librispeech?style=flat)](https://github.com/apple/ml-spatial-librispeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sarabia23_interspeech.pdf)
[![Apple](https://img.shields.io/badge/apple-ml-FE9901.svg)](https://machinelearning.apple.com/research/spatial-librispeech) | -| 1309 | Image-Driven Audio-Visual Universal Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23q_interspeech.pdf) | -| 2520 | Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fras23_interspeech.pdf) | -| 1766 | SDNet: Stream-Attention and Dual-Feature Learning Network for Ad-hoc Array Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23na_interspeech.pdf) | -| 2451 | Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baek23_interspeech.pdf) | -| 164 | Multi-Channel Separation of Dynamic Speech and Sound Events | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.robinscheibler.org/interspeech2023-moving-iva-samples/)
[![GitHub](https://img.shields.io/github/stars/fakufaku/interspeech2023-moving-iva-samples?style=flat)](https://github.com/fakufaku/interspeech2023-moving-iva-samples) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fujimura23_interspeech.pdf) | -| 2545 | Rethinking the Visual Cues in Audio-Visual Speaker Extraction | [![GitHub](https://img.shields.io/github/stars/mrjunjieli/DAVSE?style=flat)](https://github.com/mrjunjieli/DAVSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ja_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02625-b31b1b.svg)](https://arxiv.org/abs/2306.02625) | -| 85 | Using Semi-Supervised Learning for Monaural Time-Domain Speech Separation with a Self-Supervised Learning-based SI-SNR Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dang23_interspeech.pdf) | -| 1158 | Investigation of Training Mute-Expressive End-to-End Speech Separation Networks for an Unknown Number of Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23d_interspeech.pdf) | -| 2369 | SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cho23_interspeech.pdf) | -| 613 | Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23g_interspeech.pdf) | -| 714 | FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization | [![GitHub](https://img.shields.io/github/stars/Audio-WestlakeU/FN-SSL?style=flat)](https://github.com/Audio-WestlakeU/FN-SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19610-b31b1b.svg)](https://arxiv.org/abs/2305.19610) | -| 696 | A Neural State-Space Modeling Approach to Efficient Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16932-b31b1b.svg)](https://arxiv.org/abs/2305.16932) | -| 1777 | Locate and Beamform: Two-Dimensional Locating All-Neural Beamformer for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/FYJNEVERFOLLOWS/LaBNet?style=flat)](https://github.com/FYJNEVERFOLLOWS/LaBNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fu23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10821-b31b1b.svg)](https://arxiv.org/abs/2305.10821) | -| 518 | Monaural Speech Separation Method based on Recurrent Attention with Parallel Branches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23f_interspeech.pdf) | -| 979 | Ontology-Aware Learning and Evaluation for Audio Tagging | [![GitHub](https://img.shields.io/github/stars/haoheliu/ontology-aware-audio-tagging?style=flat)](https://github.com/haoheliu/ontology-aware-audio-tagging) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.12195-b31b1b.svg)](https://arxiv.org/abs/2211.12195) | -| 951 | What do Self-Supervised Speech Representations Encode? An Analysis of Languages, Varieties, Speaking Styles and Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://gitlab.tugraz.at/speech/speechcodebookanalysis) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/linke23_interspeech.pdf) | -| 1696 | A Compressed Synthetic Speech Detection Method with Compression Feature Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ca_interspeech.pdf) | -| 572 | Outlier-aware Inlier Modeling and Multi-Scale Scoring for Anomalous Sound Detection via Multitask Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23j_interspeech.pdf) | -| 263 | MOSLight: A Lightweight Data-Efficient System for Non-Intrusive Speech Quality Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23c_interspeech.pdf) | -| 1626 | A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation | [![GitHub](https://img.shields.io/github/stars/HaRry-qaq/MSAT?style=flat)](https://github.com/HaRry-qaq/MSAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16592-b31b1b.svg)](https://arxiv.org/abs/2305.16592) | -| 2494 | MTANet: Multi-band Time-Frequency Attention Network for Singing Melody Extraction from Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Annmixiu/MTANet?style=flat)](https://github.com/Annmixiu/MTANet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23i_interspeech.pdf) | -| 119 | Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer based on Generative Adversarial Network | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wavelandspeech.github.io/xiaoice2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chunhui23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.14666-b31b1b.svg)](https://arxiv.org/abs/2210.14666) | -| 2190 | Do Vocal Breath Sounds Encode Gender cues for Automatic Gender Classification? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/solanki23_interspeech.pdf) | -| 202 | Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation using Improved Differentiable Automatic Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sugiura23_interspeech.pdf) | -| 1430 | A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis | [![GitHub](https://img.shields.io/github/stars/xiaoli1996/SSBPR?style=flat)](https://github.com/xiaoli1996/SSBPR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23b_interspeech.pdf) | -| 528 | RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Dream-High/RMVPE?style=flat)](https://github.com/Dream-High/RMVPE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15412-b31b1b.svg)](https://arxiv.org/abs/2306.15412) | -| 832 | Spatialization Quality Metric for Binaural Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/manocha23_interspeech.pdf) | -| 428 | AsthmaSCELNet: A Lightweight Supervised Contrastive Embedding Learning Framework for Asthma Classification using Lung Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/roy23_interspeech.pdf) | -| 1426 | Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification | [![GitHub](https://img.shields.io/github/stars/raymin0223/patch-mix_contrastive_learning?style=flat)](https://github.com/raymin0223/patch-mix_contrastive_learning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bae23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14032-b31b1b.svg)](https://arxiv.org/abs/2305.14032) | -| 2115 | Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/richter23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1ydqKLgO18TFrMzFC2Bz_6y3Uml0bUaaN/view) | -| 852 | AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://pages.cs.huji.ac.il/adiyoss-lab/AudioToken/)
[![GitHub](https://img.shields.io/github/stars/guyyariv/AudioToken?style=flat)](https://github.com/guyyariv/AudioToken) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yariv23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13050-b31b1b.svg)](https://arxiv.org/abs/2305.13050) | -| 209 | Obstructive Sleep Apnea Screening with Breathing Sounds and Respiratory Effort: A Multimodal Deep Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/romero23_interspeech.pdf) | -| 2275 | Investigation of Music Emotion Recognition based on Segmented Semi-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23f_interspeech.pdf) | +| 1173 | Robust Prototype Learning for Anomalous Sound Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zeng23b_interspeech.pdf) | +| 982 | A Multimodal Prototypical Approach for Unsupervised Sound Classification | [![GitHub](https://img.shields.io/github/stars/sakshamsingh1/audio_text_proto?style=flat)](https://github.com/sakshamsingh1/audio_text_proto) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kushwaha23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12300-b31b1b.svg)](https://arxiv.org/abs/2306.12300) | +| 563 | Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms | [![GitHub](https://img.shields.io/github/stars/ph-w2000/S2pecNet?style=flat)](https://github.com/ph-w2000/S2pecNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wen23_interspeech.pdf) | +| 1082 | Adapting Language-Audio Models as Few-Shot Audio Learners | [![GitHub](https://img.shields.io/github/stars/JinhuaLiang/lam4fsl?style=flat)](https://github.com/JinhuaLiang/lam4fsl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17719-b31b1b.svg)](https://arxiv.org/abs/2305.17719) | +| 734 | TFECN: Time-Frequency Enhanced ConvNet for Audio Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23l_interspeech.pdf) | +| 350 | Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23b_interspeech.pdf) | +| 1174 | Fine-Tuning Audio Spectrogram Transformer with Task-Aware Adapters for Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23n_interspeech.pdf) | +| 1210 | Small Footprint Multi-Channel Network for Keyword Spotting with Centroid based Awareness | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2204.05445-b31b1b.svg)](https://arxiv.org/abs/2204.05445) | +| 1380 | Few-Shot Class-Incremental Audio Classification using Adaptively-Refined Prototypes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18045-b31b1b.svg)](https://arxiv.org/abs/2305.18045) | +| 1549 | Interpretable Latent Space using Space-Filling Curves for Phonetic Analysis in Voice Conversion | [![GitLab](https://img.shields.io/gitlab/stars/speech-interaction-technology-aalto-university/sfvq)](https://gitlab.com/speech-interaction-technology-aalto-university/sfvq) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vali23_interspeech.pdf)
[![Aalto](https://img.shields.io/badge/aalto-fi-005EB8.svg)](https://research.aalto.fi/en/publications/interpretable-latent-space-using-space-filling-curves-for-phoneti) | +| 1861 | Topological Data Analysis for Speech Processing | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://topohubert.github.io/speech-topology-webpages/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tulchinskii23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.17223-b31b1b.svg)](https://arxiv.org/abs/2211.17223) | +| 1329 | Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation | [![GitHub](https://img.shields.io/github/stars/sungnyun/ARMHuBERT?style=flat)](https://github.com/sungnyun/ARMHuBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11685-b31b1b.svg)](https://arxiv.org/abs/2305.11685) | +| 932 | Personalized Acoustic Scene Classification in Ultra-Low Power Embedded Devices using Privacy-Preserving Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/koppelmann23_interspeech.pdf) | +| 176 | Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/boschresearch/soundsee-background-domain-switch?style=flat)](https://github.com/boschresearch/soundsee-background-domain-switch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23_interspeech.pdf) | +| 1021 | Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning | [![GitHub](https://img.shields.io/github/stars/Yuanbo2020/HGRL?style=flat)](https://github.com/Yuanbo2020/HGRL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hou23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://personal.ee.surrey.ac.uk/Personal/W.Wang/papers/Hou%20etal_INTERSPEECH_2023.pdf) | +| 2416 | Anomalous Sound Detection using Self-Attention-based Frequency Pattern Analysis of Machine Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23fa_interspeech.pdf) | +| 1478 | Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23c_interspeech.pdf) | +| 575 | Differential Privacy enabled Dementia Classification: An Exploration of the Privacy-Accuracy Trade-off in Speech Signal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bn23_interspeech.pdf) | +| 1595 | Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous.4open.science/w/INTERSPEECH2023-F8C4/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ka_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05709-b31b1b.svg)](https://arxiv.org/abs/2306.05709) | +| 1816 | Towards Multi-Lingual Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/swarupbehera/mAQA?style=flat)](https://github.com/swarupbehera/mAQA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/behera23_interspeech.pdf) | +| 1344 | Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liao23_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/370843606_Blind_Estimation_of_Room_Impulse_Response_from_Monaural_Reverberant_Speech_with_Segmental_Generative_Neural_Network) | +| 358 | Emotion-aware Audio-Driven Face Animation via Contrastive Feature Disentanglement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ren23_interspeech.pdf) | +| 591 | Anomalous Sound Detection based on Sound Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shimonishi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15859-b31b1b.svg)](https://arxiv.org/abs/2305.15859) | +| 2089 | Random Forest Classification of Breathing Phases from Audio Signals Recorded using Mobile Devices | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fahed23_interspeech.pdf) | +| 1581 | GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ahn23b_interspeech.pdf) | +| 477 | Wav2ToBI: A New Approach to Automatic ToBI Transcription | [![GitHub](https://img.shields.io/github/stars/reginazhai/Wav2ToBI?style=flat)](https://github.com/reginazhai/Wav2ToBI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhai23_interspeech.pdf) | +| 344 | Joint-Former: Jointly Regularized and Locally Down-Sampled Conformer for Semi-Supervised Sound Event Detection | [![GitHub](https://img.shields.io/github/stars/mastergofujs/Joint-Former?style=flat)](https://github.com/mastergofujs/Joint-Former) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23b_interspeech.pdf) | +| 245 | Towards Attention-based Contrastive Learning for Audio Spoof Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/goel23_interspeech.pdf) | +| 2488 | Masked Audio Modeling with CLAP and Multi-Objective Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23d_interspeech.pdf) | +| 1904 | Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems | [![GitHub](https://img.shields.io/github/stars/mrusci/ondevice-fewshot-kws?style=flat)](https://github.com/mrusci/ondevice-fewshot-kws) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rusci23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02161-b31b1b.svg)](https://arxiv.org/abs/2306.02161) | +| 481 | Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/azeemi23_interspeech.pdf) | +| 491 | Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18419-b31b1b.svg)](https://arxiv.org/abs/2305.18419) | +| 684 | Multi-Microphone Automatic Speech Segmentation in Meetings based on Circular Harmonics Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mariotte23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04268-b31b1b.svg)](https://arxiv.org/abs/2306.04268) | +| 542 | Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23h_interspeech.pdf) | +| 88 | Insights Into End-to-End Audio-to-Score Transcription with Real Recordings: A Case Study with Saxophone Works | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martinezsevilla23_interspeech.pdf) | +| 2193 | Whisper-AT: Noise-Robust Automatic Speech Recognizers are also Strong Audio Event Taggers | [![GitHub](https://img.shields.io/github/stars/YuanGongND/whisper-at?style=flat)](https://github.com/YuanGongND/whisper-at)
[![PyPI](https://img.shields.io/pypi/v/whisper-at)](https://pypi.org/project/whisper-at/)
[![Whisper-AT](https://img.shields.io/badge/🤗-demo-FFD21F.svg)](https://huggingface.co/spaces/yuangongfdu/whisper-at) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.03183-b31b1b.svg)](https://arxiv.org/abs/2307.03183) | +| 1621 | Synthetic Voice Spoofing Detection based on Feature Pyramid Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23b_interspeech.pdf) | +| 1383 | Learning a Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23c_interspeech.pdf) | +| 2011 | Application of Knowledge Distillation to Multi-Task Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kerpicci23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.16611-b31b1b.svg)](https://arxiv.org/abs/2210.16611) | +| 2297 | DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18441-b31b1b.svg)](https://arxiv.org/abs/2305.18441) | +| 1965 | Variational Classifier for Unsupervised Anomalous Sound Detection under Domain Generalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/almudevar23_interspeech.pdf) | +| 745 | FlexiAST: Flexibility is What AST Needs | [![GitHub](https://img.shields.io/github/stars/JiuFengSC/FlexiAST_INTERSPEECH23?style=flat)](https://github.com/JiuFengSC/FlexiAST_INTERSPEECH23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.09286-b31b1b.svg)](https://arxiv.org/abs/2307.09286) | +| 1579 | MCR-Data2vec 2.0: Improving Self-Supervised Speech Pre-training via Model-Level Consistency Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08463-b31b1b.svg)](https://arxiv.org/abs/2306.08463) | +| 914 | Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention | [![GitHub](https://img.shields.io/github/stars/liuxubo717/V-ACT?style=flat)](https://github.com/liuxubo717/V-ACT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23l_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.16428-b31b1b.svg)](https://arxiv.org/abs/2210.16428) | +| 165 | Time-Frequency Domain Filter-and-Sum Network for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/JonathanDZ/TF-FaSNet?style=flat)](https://github.com/JonathanDZ/TF-FaSNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deng23_interspeech.pdf) | +| 801 | Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23h_interspeech.pdf) | +| 1431 | An Efficient Speech Separation Network based on Recurrent Fusion Dilated Convolution and Channel Attention | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ca_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05887-b31b1b.svg)](https://arxiv.org/abs/2306.05887) | +| 2015 | Binaural Sound Localization in Noisy Environments using Frequency-based Audio Vision Transformer (FAViT) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/phokhinanan23_interspeech.pdf) | +| 1723 | Contrastive Learning based Deep Latent Masking for Music Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23i_interspeech.pdf) | +| 655 | Speaker Extraction with Detection of Presence and Absence of Target Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23k_interspeech.pdf) | +| 889 | PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23k_interspeech.pdf) | +| 2117 | Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning | [![GitHub](https://img.shields.io/github/stars/apple/ml-spatial-librispeech?style=flat)](https://github.com/apple/ml-spatial-librispeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sarabia23_interspeech.pdf)
[![Apple](https://img.shields.io/badge/apple-ml-FE9901.svg)](https://machinelearning.apple.com/research/spatial-librispeech) | +| 1309 | Image-Driven Audio-Visual Universal Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23q_interspeech.pdf) | +| 2520 | Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fras23_interspeech.pdf) | +| 1766 | SDNet: Stream-Attention and Dual-Feature Learning Network for Ad-hoc Array Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23na_interspeech.pdf) | +| 2451 | Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baek23_interspeech.pdf) | +| 164 | Multi-Channel Separation of Dynamic Speech and Sound Events | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.robinscheibler.org/interspeech2023-moving-iva-samples/)
[![GitHub](https://img.shields.io/github/stars/fakufaku/interspeech2023-moving-iva-samples?style=flat)](https://github.com/fakufaku/interspeech2023-moving-iva-samples) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fujimura23_interspeech.pdf) | +| 2545 | Rethinking the Visual Cues in Audio-Visual Speaker Extraction | [![GitHub](https://img.shields.io/github/stars/mrjunjieli/DAVSE?style=flat)](https://github.com/mrjunjieli/DAVSE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ja_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02625-b31b1b.svg)](https://arxiv.org/abs/2306.02625) | +| 85 | Using Semi-Supervised Learning for Monaural Time-Domain Speech Separation with a Self-Supervised Learning-based SI-SNR Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dang23_interspeech.pdf) | +| 1158 | Investigation of Training Mute-Expressive End-to-End Speech Separation Networks for an Unknown Number of Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23d_interspeech.pdf) | +| 2369 | SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cho23_interspeech.pdf) | +| 613 | Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23g_interspeech.pdf) | +| 714 | FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization | [![GitHub](https://img.shields.io/github/stars/Audio-WestlakeU/FN-SSL?style=flat)](https://github.com/Audio-WestlakeU/FN-SSL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19610-b31b1b.svg)](https://arxiv.org/abs/2305.19610) | +| 696 | A Neural State-Space Modeling Approach to Efficient Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16932-b31b1b.svg)](https://arxiv.org/abs/2305.16932) | +| 1777 | Locate and Beamform: Two-Dimensional Locating All-Neural Beamformer for Multi-Channel Speech Separation | [![GitHub](https://img.shields.io/github/stars/FYJNEVERFOLLOWS/LaBNet?style=flat)](https://github.com/FYJNEVERFOLLOWS/LaBNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fu23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10821-b31b1b.svg)](https://arxiv.org/abs/2305.10821) | +| 518 | Monaural Speech Separation Method based on Recurrent Attention with Parallel Branches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23f_interspeech.pdf) | +| 979 | Ontology-Aware Learning and Evaluation for Audio Tagging | [![GitHub](https://img.shields.io/github/stars/haoheliu/ontology-aware-audio-tagging?style=flat)](https://github.com/haoheliu/ontology-aware-audio-tagging) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.12195-b31b1b.svg)](https://arxiv.org/abs/2211.12195) | +| 951 | What do Self-Supervised Speech Representations Encode? An Analysis of Languages, Varieties, Speaking Styles and Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://gitlab.tugraz.at/speech/speechcodebookanalysis) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/linke23_interspeech.pdf) | +| 1696 | A Compressed Synthetic Speech Detection Method with Compression Feature Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ca_interspeech.pdf) | +| 572 | Outlier-aware Inlier Modeling and Multi-Scale Scoring for Anomalous Sound Detection via Multitask Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23j_interspeech.pdf) | +| 263 | MOSLight: A Lightweight Data-Efficient System for Non-Intrusive Speech Quality Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23c_interspeech.pdf) | +| 1626 | A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation | [![GitHub](https://img.shields.io/github/stars/HaRry-qaq/MSAT?style=flat)](https://github.com/HaRry-qaq/MSAT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16592-b31b1b.svg)](https://arxiv.org/abs/2305.16592) | +| 2494 | MTANet: Multi-band Time-Frequency Attention Network for Singing Melody Extraction from Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Annmixiu/MTANet?style=flat)](https://github.com/Annmixiu/MTANet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23i_interspeech.pdf) | +| 119 | Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer based on Generative Adversarial Network | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wavelandspeech.github.io/xiaoice2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chunhui23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.14666-b31b1b.svg)](https://arxiv.org/abs/2210.14666) | +| 2190 | Do Vocal Breath Sounds Encode Gender cues for Automatic Gender Classification? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/solanki23_interspeech.pdf) | +| 202 | Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation using Improved Differentiable Automatic Data Augmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sugiura23_interspeech.pdf) | +| 1430 | A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis | [![GitHub](https://img.shields.io/github/stars/xiaoli1996/SSBPR?style=flat)](https://github.com/xiaoli1996/SSBPR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23b_interspeech.pdf) | +| 528 | RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music | [![GitHub](https://img.shields.io/github/stars/Dream-High/RMVPE?style=flat)](https://github.com/Dream-High/RMVPE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15412-b31b1b.svg)](https://arxiv.org/abs/2306.15412) | +| 832 | Spatialization Quality Metric for Binaural Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/manocha23_interspeech.pdf) | +| 428 | AsthmaSCELNet: A Lightweight Supervised Contrastive Embedding Learning Framework for Asthma Classification using Lung Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/roy23_interspeech.pdf) | +| 1426 | Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification | [![GitHub](https://img.shields.io/github/stars/raymin0223/patch-mix_contrastive_learning?style=flat)](https://github.com/raymin0223/patch-mix_contrastive_learning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bae23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14032-b31b1b.svg)](https://arxiv.org/abs/2305.14032) | +| 2115 | Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/richter23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1ydqKLgO18TFrMzFC2Bz_6y3Uml0bUaaN/view) | +| 852 | AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://pages.cs.huji.ac.il/adiyoss-lab/AudioToken/)
[![GitHub](https://img.shields.io/github/stars/guyyariv/AudioToken?style=flat)](https://github.com/guyyariv/AudioToken) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yariv23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13050-b31b1b.svg)](https://arxiv.org/abs/2305.13050) | +| 209 | Obstructive Sleep Apnea Screening with Breathing Sounds and Respiratory Effort: A Multimodal Deep Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/romero23_interspeech.pdf) | +| 2275 | Investigation of Music Emotion Recognition based on Segmented Semi-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23f_interspeech.pdf) |
@@ -596,56 +596,56 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2344 | Diacritic Recognition Performance in Arabic ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/aldarmaki23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14022-b31b1b.svg)](https://arxiv.org/abs/2302.14022) | -| 990 | Personalization for BERT-based Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kolehmainen23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/personalization-for-bert-based-discriminative-speech-recognition-rescoring) | -| 2182 | On the N-gram Approximation of Pre-trained Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/krishnan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06892-b31b1b.svg)](https://arxiv.org/abs/2306.06892) | -| 2147 | Record Deduplication for Entity Distribution Modeling in ASR Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06246-b31b1b.svg)](https://arxiv.org/abs/2306.06246) | -| 2205 | Learning When to Trust Which Teacher for Weakly Supervised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/agrawal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12012-b31b1b.svg)](https://arxiv.org/abs/2306.12012) | -| 1313 | Text-Only Domain Adaptation using Unified Speech-Text Representation in Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04076-b31b1b.svg)](https://arxiv.org/abs/2306.04076) | -| 1378 | Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23f_interspeech.pdf) | -| 2479 | Knowledge Distillation Approach for Efficient Internal Language Model Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23u_interspeech.pdf) | -| 276 | Language Model Personalization for Improved Touchscreen Typing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/adhikary23_interspeech.pdf) | -| 1223 | Blank Collapse: Compressing CTC Emission for the Faster Decoding | [![GitHub](https://img.shields.io/github/stars/minkjung/blankcollapse?style=flat)](https://github.com/minkjung/blankcollapse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jung23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17017-b31b1b.svg)](https://arxiv.org/abs/2210.17017) | -| 403 | Improving Joint Speech-Text Representations without Alignment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peyser23_interspeech.pdf) | -| 1941 | Leveraging Cross-Utterance Context for ASR Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/flynn23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16903-b31b1b.svg)](https://arxiv.org/abs/2306.16903) | -| 423 | Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation | [![GitHub](https://img.shields.io/github/stars/MingLunHan/CIF-HieraDist?style=flat)](https://github.com/MingLunHan/CIF-HieraDist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/han23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.13003-b31b1b.svg)](https://arxiv.org/abs/2301.13003) | -| 1517 | Integration of Frame- and Label-Synchronous Beam Search for Streaming Encoder-Decoder Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tsunoo23_interspeech.pdf) | -| 1071 | A Neural Time Alignment Module for End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23e_interspeech.pdf) | -| 599 | Accelerating Transducers through Adjacent Token Merging | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16009-b31b1b.svg)](https://arxiv.org/abs/2306.16009) | -| 617 | Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11569-b31b1b.svg)](https://arxiv.org/abs/2305.11569) | -| 2292 | Language-Routing Mixture of Experts for Multi-Lingual and Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23sa_interspeech.pdf) | -| 1437 | Embedding Articulatory Constraints for Low-Resource Speech Recognition based on Large Pre-trained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23h_interspeech.pdf) | -| 2051 | Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18108-b31b1b.svg)](https://arxiv.org/abs/2305.18108) | -| 768 | SpellMapper: A Non-Autoregressive Neural Spellchecker for ASR Customization with Candidate Retrieval based on N-Gram Mappings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb)
[![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/antonova23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02317-b31b1b.svg)](https://arxiv.org/abs/2306.02317) | -| 2037 | Text Injection for Capitalization and Turn-Taking Prediction in Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bijwadia23_interspeech.pdf) | -| 1281 | Confidence-based Ensembles of End-to-End Speech Recognition Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Confidence_Ensembles.ipynb)
[![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gitman23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15824-b31b1b.svg)](https://arxiv.org/abs/2306.15824) | -| 1050 | Unsupervised Code-Switched Text Generation from Parallel Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chi23_interspeech.pdf) | -| 258 | A Binary Keyword Spotting System With Error-Diffusion Speech Feature Binarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23b_interspeech.pdf) | -| 621 | Language-Universal Phonetic Encoder for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11576-b31b1b.svg)](https://arxiv.org/abs/2305.11576) | -| 863 | A Lexical-aware Non-Autoregressive Transformer-based ASR Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10839-b31b1b.svg)](https://arxiv.org/abs/2305.10839) | -| 1841 | Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vanvuren23_interspeech.pdf) | -| 61 | A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization | [![GitHub](https://img.shields.io/github/stars/SamsungLabs/myQASR?style=flat)](https://github.com/SamsungLabs/myQASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fish23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.12659-b31b1b.svg)](https://arxiv.org/abs/2307.12659) | -| 137 | Modeling Dependent Structure for Utterances in ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2209.05281-b31b1b.svg)](https://arxiv.org/abs/2209.05281) | -| 757 | ASR for Low Resource and Multilingual Noisy Code-Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/verma23_interspeech.pdf) | -| 390 | Accurate and Reliable Confidence Estimation based on Non-Autoregressive End-to-End Speech Recognition System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10680-b31b1b.svg)](https://arxiv.org/abs/2305.10680) | -| 737 | Combining Multilingual Resources and Models to Develop State-of-the-Art E2E ASR for Swedish | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mateju23_interspeech.pdf) | -| 1171 | Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-Streaming Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.06735-b31b1b.svg)](https://arxiv.org/abs/2301.06735) | -| 1867 | Towards Continually Learning New Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pham23_interspeech.pdf) | -| 1616 | N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00456-b31b1b.svg)](https://arxiv.org/abs/2303.00456) | -| 1432 | SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23g_interspeech.pdf) | -| 1162 | miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gulzar23_interspeech.pdf) | -| 1469 | CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning | [![GitHub](https://img.shields.io/github/stars/louislau1129/CoMFLP?style=flat)](https://github.com/louislau1129/CoMFLP) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23p_interspeech.pdf) | -| 1337 | Exploration on HuBERT with Multiple Resolution | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01084-b31b1b.svg)](https://arxiv.org/abs/2306.01084) | -| 2045 | Quantization-aware and Tensor-compressed Training of Transformers for Natural Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23s_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01076-b31b1b.svg)](https://arxiv.org/abs/2306.01076) | -| 2355 | Word-Level Confidence Estimation for CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/naowarat23b_interspeech.pdf) | -| 2235 | Multilingual Contextual Adapters to Improve Custom Word Recognition in Low-Resource Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kulshreshtha23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00759-b31b1b.svg)](https://arxiv.org/abs/2307.00759) | -| 614 | Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zheng23_interspeech.pdf) | -| 1303 | 4D ASR: Joint Modeling of CTC, Attention, Transducer, and Mask-Predict Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sudo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.10818-b31b1b.svg)](https://arxiv.org/abs/2212.10818) | -| 1086 | Neural Model Reprogramming with Similarity based Mapping for Low-Resource Spoken Command Recognition | [![GitHub](https://img.shields.io/github/stars/dodohow1011/SpeechAdvReprogram?style=flat)](https://github.com/dodohow1011/SpeechAdvReprogram) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2110.03894-b31b1b.svg)](https://arxiv.org/abs/2110.03894) | -| 262 | Language-Specific Boundary Learning for Improving Mandarin-English Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fan23_interspeech.pdf) | -| 480 | Mixture-of-Expert Conformer for Streaming Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15663-b31b1b.svg)](https://arxiv.org/abs/2305.15663) | -| 1665 | Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switch-board Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23x_interspeech.pdf) | -| 2544 | Compressed MoE ASR Model based on Knowledge Distillation and Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuan23c_interspeech.pdf) | +| 2344 | Diacritic Recognition Performance in Arabic ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/aldarmaki23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14022-b31b1b.svg)](https://arxiv.org/abs/2302.14022) | +| 990 | Personalization for BERT-based Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kolehmainen23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/personalization-for-bert-based-discriminative-speech-recognition-rescoring) | +| 2182 | On the N-gram Approximation of Pre-trained Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/krishnan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06892-b31b1b.svg)](https://arxiv.org/abs/2306.06892) | +| 2147 | Record Deduplication for Entity Distribution Modeling in ASR Transcripts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06246-b31b1b.svg)](https://arxiv.org/abs/2306.06246) | +| 2205 | Learning When to Trust Which Teacher for Weakly Supervised ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/agrawal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12012-b31b1b.svg)](https://arxiv.org/abs/2306.12012) | +| 1313 | Text-Only Domain Adaptation using Unified Speech-Text Representation in Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04076-b31b1b.svg)](https://arxiv.org/abs/2306.04076) | +| 1378 | Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23f_interspeech.pdf) | +| 2479 | Knowledge Distillation Approach for Efficient Internal Language Model Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23u_interspeech.pdf) | +| 276 | Language Model Personalization for Improved Touchscreen Typing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/adhikary23_interspeech.pdf) | +| 1223 | Blank Collapse: Compressing CTC Emission for the Faster Decoding | [![GitHub](https://img.shields.io/github/stars/minkjung/blankcollapse?style=flat)](https://github.com/minkjung/blankcollapse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jung23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17017-b31b1b.svg)](https://arxiv.org/abs/2210.17017) | +| 403 | Improving Joint Speech-Text Representations without Alignment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peyser23_interspeech.pdf) | +| 1941 | Leveraging Cross-Utterance Context for ASR Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/flynn23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16903-b31b1b.svg)](https://arxiv.org/abs/2306.16903) | +| 423 | Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation | [![GitHub](https://img.shields.io/github/stars/MingLunHan/CIF-HieraDist?style=flat)](https://github.com/MingLunHan/CIF-HieraDist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/han23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.13003-b31b1b.svg)](https://arxiv.org/abs/2301.13003) | +| 1517 | Integration of Frame- and Label-Synchronous Beam Search for Streaming Encoder-Decoder Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tsunoo23_interspeech.pdf) | +| 1071 | A Neural Time Alignment Module for End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23e_interspeech.pdf) | +| 599 | Accelerating Transducers through Adjacent Token Merging | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16009-b31b1b.svg)](https://arxiv.org/abs/2306.16009) | +| 617 | Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11569-b31b1b.svg)](https://arxiv.org/abs/2305.11569) | +| 2292 | Language-Routing Mixture of Experts for Multi-Lingual and Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23sa_interspeech.pdf) | +| 1437 | Embedding Articulatory Constraints for Low-Resource Speech Recognition based on Large Pre-trained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23h_interspeech.pdf) | +| 2051 | Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18108-b31b1b.svg)](https://arxiv.org/abs/2305.18108) | +| 768 | SpellMapper: A Non-Autoregressive Neural Spellchecker for ASR Customization with Candidate Retrieval based on N-Gram Mappings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb)
[![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/antonova23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02317-b31b1b.svg)](https://arxiv.org/abs/2306.02317) | +| 2037 | Text Injection for Capitalization and Turn-Taking Prediction in Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bijwadia23_interspeech.pdf) | +| 1281 | Confidence-based Ensembles of End-to-End Speech Recognition Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Confidence_Ensembles.ipynb)
[![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gitman23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15824-b31b1b.svg)](https://arxiv.org/abs/2306.15824) | +| 1050 | Unsupervised Code-Switched Text Generation from Parallel Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chi23_interspeech.pdf) | +| 258 | A Binary Keyword Spotting System With Error-Diffusion Speech Feature Binarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23b_interspeech.pdf) | +| 621 | Language-Universal Phonetic Encoder for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11576-b31b1b.svg)](https://arxiv.org/abs/2305.11576) | +| 863 | A Lexical-aware Non-Autoregressive Transformer-based ASR Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10839-b31b1b.svg)](https://arxiv.org/abs/2305.10839) | +| 1841 | Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vanvuren23_interspeech.pdf) | +| 61 | A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization | [![GitHub](https://img.shields.io/github/stars/SamsungLabs/myQASR?style=flat)](https://github.com/SamsungLabs/myQASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fish23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.12659-b31b1b.svg)](https://arxiv.org/abs/2307.12659) | +| 137 | Modeling Dependent Structure for Utterances in ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2209.05281-b31b1b.svg)](https://arxiv.org/abs/2209.05281) | +| 757 | ASR for Low Resource and Multilingual Noisy Code-Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/verma23_interspeech.pdf) | +| 390 | Accurate and Reliable Confidence Estimation based on Non-Autoregressive End-to-End Speech Recognition System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10680-b31b1b.svg)](https://arxiv.org/abs/2305.10680) | +| 737 | Combining Multilingual Resources and Models to Develop State-of-the-Art E2E ASR for Swedish | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mateju23_interspeech.pdf) | +| 1171 | Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-Streaming Transducer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.06735-b31b1b.svg)](https://arxiv.org/abs/2301.06735) | +| 1867 | Towards Continually Learning New Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pham23_interspeech.pdf) | +| 1616 | N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00456-b31b1b.svg)](https://arxiv.org/abs/2303.00456) | +| 1432 | SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23g_interspeech.pdf) | +| 1162 | miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gulzar23_interspeech.pdf) | +| 1469 | CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning | [![GitHub](https://img.shields.io/github/stars/louislau1129/CoMFLP?style=flat)](https://github.com/louislau1129/CoMFLP) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23p_interspeech.pdf) | +| 1337 | Exploration on HuBERT with Multiple Resolution | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01084-b31b1b.svg)](https://arxiv.org/abs/2306.01084) | +| 2045 | Quantization-aware and Tensor-compressed Training of Transformers for Natural Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23s_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01076-b31b1b.svg)](https://arxiv.org/abs/2306.01076) | +| 2355 | Word-Level Confidence Estimation for CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/naowarat23b_interspeech.pdf) | +| 2235 | Multilingual Contextual Adapters to Improve Custom Word Recognition in Low-Resource Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kulshreshtha23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00759-b31b1b.svg)](https://arxiv.org/abs/2307.00759) | +| 614 | Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zheng23_interspeech.pdf) | +| 1303 | 4D ASR: Joint Modeling of CTC, Attention, Transducer, and Mask-Predict Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sudo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.10818-b31b1b.svg)](https://arxiv.org/abs/2212.10818) | +| 1086 | Neural Model Reprogramming with Similarity based Mapping for Low-Resource Spoken Command Recognition | [![GitHub](https://img.shields.io/github/stars/dodohow1011/SpeechAdvReprogram?style=flat)](https://github.com/dodohow1011/SpeechAdvReprogram) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2110.03894-b31b1b.svg)](https://arxiv.org/abs/2110.03894) | +| 262 | Language-Specific Boundary Learning for Improving Mandarin-English Code-Switching Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fan23_interspeech.pdf) | +| 480 | Mixture-of-Expert Conformer for Streaming Multilingual ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15663-b31b1b.svg)](https://arxiv.org/abs/2305.15663) | +| 1665 | Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switch-board Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23x_interspeech.pdf) | +| 2544 | Compressed MoE ASR Model based on Knowledge Distillation and Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuan23c_interspeech.pdf) |
@@ -657,41 +657,41 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2044 | Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model | [![GitHub](https://img.shields.io/github/stars/jasonppy/syllable-discovery?style=flat)](https://github.com/jasonppy/syllable-discovery) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11435-b31b1b.svg)](https://arxiv.org/abs/2305.11435) | -| 2032 | Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization | [![GitHub](https://img.shields.io/github/stars/jasonppy/PromptingWhisper?style=flat)](https://github.com/jasonppy/PromptingWhisper) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11095-b31b1b.svg)](https://arxiv.org/abs/2305.11095) | -| 235 | Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/moore23_interspeech.pdf) | -| 268 | Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sanabria23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02153-b31b1b.svg)](https://arxiv.org/abs/2306.02153) | -| 601 | CASA-ASR: Context-Aware Speaker-Attributed ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12459-b31b1b.svg)](https://arxiv.org/abs/2305.12459) | -| 1321 | Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/takahashi23_interspeech.pdf) | -| 1167 | AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark | [![GitHub](https://img.shields.io/github/stars/liyunlongaaa/AD-TUNING?style=flat)](https://github.com/liyunlongaaa/AD-TUNING) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23n_interspeech.pdf) | -| 190 | Distilling Knowledge from Gaussian Process Teacher to Neural Network Student | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wong23_interspeech.pdf) | -| 135 | Segmental SpeechCLIP: Utilizing Pretrained Image-Text Models for Audio-Visual Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhati23_interspeech.pdf) | -| 421 | Towards Hate Speech Detection in Low-Resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jacobs23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00410-b31b1b.svg)](https://arxiv.org/abs/2306.00410) | -| 385 | Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification through Meta-Learning | [![GitHub](https://img.shields.io/github/stars/ByteFuse/MAMLCon?style=flat)](https://github.com/ByteFuse/MAMLCon) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vandermerwe23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13080-b31b1b.svg)](https://arxiv.org/abs/2305.13080) | -| 664 | Online Punctuation Restoration using ELECTRA Model for Streaming ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/polacek23_interspeech.pdf) | -| 2066 | Language Agnostic Data-Driven Inverse Text Normalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23s_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.08506-b31b1b.svg)](https://arxiv.org/abs/2301.08506) | -| 1079 | How to Estimate Model Transferability of Pre-trained Speech Models? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01015-b31b1b.svg)](https://arxiv.org/abs/2306.01015) | -| 1655 | Transcribing Speech as Spoken and Written Dual Text using an Autoregressive Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ihori23_interspeech.pdf) | -| 587 | Phonetic and Prosody-aware Self-Supervised Learning Approach for Non-Native Fluency Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11438-b31b1b.svg)](https://arxiv.org/abs/2305.11438) | -| 380 | Disentangling the Contribution of Non-Native Speech in Automated Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23_interspeech.pdf) | -| 337 | A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ryu23_interspeech.pdf) | -| 1635 | Assessing Intelligibility in Non-Native Speech: Comparing Measures Obtained at Different Levels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23d_interspeech.pdf) | -| 585 | End-to-End Word-Level Pronunciation Assessment with MASK Pre-training | [![GitHub](https://img.shields.io/github/stars/liangyukang/MPA-InterSpeech2023?style=flat)](https://github.com/liangyukang/MPA-InterSpeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02682-b31b1b.svg)](https://arxiv.org/abs/2306.02682) | -| 550 | A Hierarchical Context-aware Modeling Approach for Multi-Aspect and Multi-Granular Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chao23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18146-b31b1b.svg)](https://arxiv.org/abs/2305.18146) | -| 2541 | Automatic Prediction of Language Learners' Listenability using Speech and Text Features Extracted from Listening Drills | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23j_interspeech.pdf) | -| 2371 | Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-Level Goodness of Pronunciation Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shekar23b_interspeech.pdf) | -| 1899 | Adapting an Unadaptable ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01208-b31b1b.svg)](https://arxiv.org/abs/2306.01208) | -| 533 | Addressing Cold Start Problem for End-to-End Automatic Speech Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14310-b31b1b.svg)](https://arxiv.org/abs/2306.14310) | -| 816 | Improving Grapheme-to-Phoneme Conversion by Learning Pronunciations from Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ribeiro23b_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/improving-grapheme-to-phoneme-conversion-by-learning-pronunciations-from-speech-recordings) | -| 2577 | Orthography-based Pronunciation Scoring for Better CAPT Feedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/richter23b_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://catir.github.io/art/capt_phone_ctc_2023.pdf) | -| 1592 | Zero-Shot Automatic Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23r_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19563-b31b1b.svg)](https://arxiv.org/abs/2305.19563) | -| 364 | Mispronunciation Detection and Diagnosis Model for Tonal Language, Applied to Vietnamese | [![GitHub](https://img.shields.io/github/stars/VietMDDDataset/VietMDD?style=flat)](https://github.com/VietMDDDataset/VietMDD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huu23_interspeech.pdf) | -| 793 | An Efficient and Noise-Robust Audiovisual Encoder for Audiovisual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23k_interspeech.pdf) | -| 540 | A Novel Self-training Approach for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23b_interspeech.pdf) | -| 1428 | FunASR: A Fundamental End-to-End Speech Recognition Toolkit | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11013-b31b1b.svg)](https://arxiv.org/abs/2305.11013) | -| 487 | Streaming Audio-Visual Speech Recognition with Alignment Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02133-b31b1b.svg)](https://arxiv.org/abs/2211.02133) | -| 462 | SparseVSR: Lightweight and Noise Robust Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fernandezlopez23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.04552-b31b1b.svg)](https://arxiv.org/abs/2307.04552) | -| 2262 | Multimodal Speech Recognition for Language-Guided Embodied Agents | [![GitHub](https://img.shields.io/github/stars/Cylumn/embodied-multimodal-asr?style=flat)](https://github.com/Cylumn/embodied-multimodal-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14030-b31b1b.svg)](https://arxiv.org/abs/2302.14030) | +| 2044 | Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model | [![GitHub](https://img.shields.io/github/stars/jasonppy/syllable-discovery?style=flat)](https://github.com/jasonppy/syllable-discovery) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11435-b31b1b.svg)](https://arxiv.org/abs/2305.11435) | +| 2032 | Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization | [![GitHub](https://img.shields.io/github/stars/jasonppy/PromptingWhisper?style=flat)](https://github.com/jasonppy/PromptingWhisper) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11095-b31b1b.svg)](https://arxiv.org/abs/2305.11095) | +| 235 | Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/moore23_interspeech.pdf) | +| 268 | Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sanabria23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02153-b31b1b.svg)](https://arxiv.org/abs/2306.02153) | +| 601 | CASA-ASR: Context-Aware Speaker-Attributed ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12459-b31b1b.svg)](https://arxiv.org/abs/2305.12459) | +| 1321 | Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/takahashi23_interspeech.pdf) | +| 1167 | AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark | [![GitHub](https://img.shields.io/github/stars/liyunlongaaa/AD-TUNING?style=flat)](https://github.com/liyunlongaaa/AD-TUNING) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23n_interspeech.pdf) | +| 190 | Distilling Knowledge from Gaussian Process Teacher to Neural Network Student | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wong23_interspeech.pdf) | +| 135 | Segmental SpeechCLIP: Utilizing Pretrained Image-Text Models for Audio-Visual Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhati23_interspeech.pdf) | +| 421 | Towards Hate Speech Detection in Low-Resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jacobs23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00410-b31b1b.svg)](https://arxiv.org/abs/2306.00410) | +| 385 | Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification through Meta-Learning | [![GitHub](https://img.shields.io/github/stars/ByteFuse/MAMLCon?style=flat)](https://github.com/ByteFuse/MAMLCon) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vandermerwe23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13080-b31b1b.svg)](https://arxiv.org/abs/2305.13080) | +| 664 | Online Punctuation Restoration using ELECTRA Model for Streaming ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/polacek23_interspeech.pdf) | +| 2066 | Language Agnostic Data-Driven Inverse Text Normalization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23s_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.08506-b31b1b.svg)](https://arxiv.org/abs/2301.08506) | +| 1079 | How to Estimate Model Transferability of Pre-trained Speech Models? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01015-b31b1b.svg)](https://arxiv.org/abs/2306.01015) | +| 1655 | Transcribing Speech as Spoken and Written Dual Text using an Autoregressive Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ihori23_interspeech.pdf) | +| 587 | Phonetic and Prosody-aware Self-Supervised Learning Approach for Non-Native Fluency Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11438-b31b1b.svg)](https://arxiv.org/abs/2305.11438) | +| 380 | Disentangling the Contribution of Non-Native Speech in Automated Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23_interspeech.pdf) | +| 337 | A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ryu23_interspeech.pdf) | +| 1635 | Assessing Intelligibility in Non-Native Speech: Comparing Measures Obtained at Different Levels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23d_interspeech.pdf) | +| 585 | End-to-End Word-Level Pronunciation Assessment with MASK Pre-training | [![GitHub](https://img.shields.io/github/stars/liangyukang/MPA-InterSpeech2023?style=flat)](https://github.com/liangyukang/MPA-InterSpeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02682-b31b1b.svg)](https://arxiv.org/abs/2306.02682) | +| 550 | A Hierarchical Context-aware Modeling Approach for Multi-Aspect and Multi-Granular Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chao23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18146-b31b1b.svg)](https://arxiv.org/abs/2305.18146) | +| 2541 | Automatic Prediction of Language Learners' Listenability using Speech and Text Features Extracted from Listening Drills | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23j_interspeech.pdf) | +| 2371 | Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-Level Goodness of Pronunciation Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shekar23b_interspeech.pdf) | +| 1899 | Adapting an Unadaptable ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01208-b31b1b.svg)](https://arxiv.org/abs/2306.01208) | +| 533 | Addressing Cold Start Problem for End-to-End Automatic Speech Scoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14310-b31b1b.svg)](https://arxiv.org/abs/2306.14310) | +| 816 | Improving Grapheme-to-Phoneme Conversion by Learning Pronunciations from Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ribeiro23b_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/improving-grapheme-to-phoneme-conversion-by-learning-pronunciations-from-speech-recordings) | +| 2577 | Orthography-based Pronunciation Scoring for Better CAPT Feedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/richter23b_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://catir.github.io/art/capt_phone_ctc_2023.pdf) | +| 1592 | Zero-Shot Automatic Pronunciation Assessment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23r_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19563-b31b1b.svg)](https://arxiv.org/abs/2305.19563) | +| 364 | Mispronunciation Detection and Diagnosis Model for Tonal Language, Applied to Vietnamese | [![GitHub](https://img.shields.io/github/stars/VietMDDDataset/VietMDD?style=flat)](https://github.com/VietMDDDataset/VietMDD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huu23_interspeech.pdf) | +| 793 | An Efficient and Noise-Robust Audiovisual Encoder for Audiovisual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23k_interspeech.pdf) | +| 540 | A Novel Self-training Approach for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23b_interspeech.pdf) | +| 1428 | FunASR: A Fundamental End-to-End Speech Recognition Toolkit | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11013-b31b1b.svg)](https://arxiv.org/abs/2305.11013) | +| 487 | Streaming Audio-Visual Speech Recognition with Alignment Regularization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02133-b31b1b.svg)](https://arxiv.org/abs/2211.02133) | +| 462 | SparseVSR: Lightweight and Noise Robust Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fernandezlopez23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.04552-b31b1b.svg)](https://arxiv.org/abs/2307.04552) | +| 2262 | Multimodal Speech Recognition for Language-Guided Embodied Agents | [![GitHub](https://img.shields.io/github/stars/Cylumn/embodied-multimodal-asr?style=flat)](https://github.com/Cylumn/embodied-multimodal-asr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14030-b31b1b.svg)](https://arxiv.org/abs/2302.14030) |
@@ -703,12 +703,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 643 | NoRefER: A Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning | [![GitHub](https://img.shields.io/github/stars/aixplain/NoRefER?style=flat)](https://github.com/aixplain/NoRefER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuksel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12577-b31b1b.svg)](https://arxiv.org/abs/2306.12577) | -| 2128 | Scaling Laws for Discriminative Speech Recognition Rescoring Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gu23c_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/scaling-laws-for-discriminative-speech-recognition-rescoring-models)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15815-b31b1b.svg)](https://arxiv.org/abs/2306.15815) | -| 2429 | Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/thu-spmi/CAT/blob/master/docs/energy-based_LM_training.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12676-b31b1b.svg)](https://arxiv.org/abs/2305.12676) | -| 1362 | Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feng23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.00066-b31b1b.svg)](https://arxiv.org/abs/2301.00066) | -| 1251 | Memory Network-based End-To-End Neural ES-KMeans for Improved Word Segmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/iwamoto23_interspeech.pdf) | -| 1320 | Retraining-free Customized ASR for Enharmonic Words based on a Named-Entity-Aware Model and Phoneme Similarity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sudo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17846-b31b1b.svg)](https://arxiv.org/abs/2305.17846) | +| 643 | NoRefER: A Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning | [![GitHub](https://img.shields.io/github/stars/aixplain/NoRefER?style=flat)](https://github.com/aixplain/NoRefER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuksel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12577-b31b1b.svg)](https://arxiv.org/abs/2306.12577) | +| 2128 | Scaling Laws for Discriminative Speech Recognition Rescoring Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gu23c_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/scaling-laws-for-discriminative-speech-recognition-rescoring-models)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15815-b31b1b.svg)](https://arxiv.org/abs/2306.15815) | +| 2429 | Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/thu-spmi/CAT/blob/master/docs/energy-based_LM_training.md) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12676-b31b1b.svg)](https://arxiv.org/abs/2305.12676) | +| 1362 | Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feng23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.00066-b31b1b.svg)](https://arxiv.org/abs/2301.00066) | +| 1251 | Memory Network-based End-To-End Neural ES-KMeans for Improved Word Segmentation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/iwamoto23_interspeech.pdf) | +| 1320 | Retraining-free Customized ASR for Enharmonic Words based on a Named-Entity-Aware Model and Phoneme Similarity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sudo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17846-b31b1b.svg)](https://arxiv.org/abs/2305.17846) |
@@ -720,12 +720,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 304 | Lightweight and Efficient Spoken Language Identification of Long-form Audio | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23c_interspeech.pdf) | -| 1109 | End-to-End Spoken Language Diarization with Wav2vec Embeddings | [![GitHub](https://img.shields.io/github/stars/jagabandhumishra/W2V-E2E-Language-Diarization?style=flat)](https://github.com/jagabandhumishra/W2V-E2E-Language-Diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mishra23_interspeech.pdf) | -| 1986 | Efficient Spoken Language Recognition via Multilabel Classification | [![Dropbox](https://img.shields.io/badge/Dropbox-Video-%233B4D98.svg?style=for-the-badge&logo=Dropbox&logoColor=white)](https://www.dropbox.com/scl/fi/625psvljnntyiajrzmy9w/20230821-Interspeech-ONieto-Paper1986.mp4?dl=0&rlkey=w2nkc7zn9fvqcc5iwldlqbbrb) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nieto23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01945-b31b1b.svg)](https://arxiv.org/abs/2306.01945) | -| 1529 | Description and Analysis of ABC Submission to NIST LRE 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/matejka23_interspeech.pdf) | -| 1790 | Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alumae23_interspeech.pdf) | -| 1094 | Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/villalba23_interspeech.pdf) | +| 304 | Lightweight and Efficient Spoken Language Identification of Long-form Audio | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23c_interspeech.pdf) | +| 1109 | End-to-End Spoken Language Diarization with Wav2vec Embeddings | [![GitHub](https://img.shields.io/github/stars/jagabandhumishra/W2V-E2E-Language-Diarization?style=flat)](https://github.com/jagabandhumishra/W2V-E2E-Language-Diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mishra23_interspeech.pdf) | +| 1986 | Efficient Spoken Language Recognition via Multilabel Classification | [![Dropbox](https://img.shields.io/badge/Dropbox-Video-%233B4D98.svg?style=for-the-badge&logo=Dropbox&logoColor=white)](https://www.dropbox.com/scl/fi/625psvljnntyiajrzmy9w/20230821-Interspeech-ONieto-Paper1986.mp4?dl=0&rlkey=w2nkc7zn9fvqcc5iwldlqbbrb) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nieto23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01945-b31b1b.svg)](https://arxiv.org/abs/2306.01945) | +| 1529 | Description and Analysis of ABC Submission to NIST LRE 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/matejka23_interspeech.pdf) | +| 1790 | Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alumae23_interspeech.pdf) | +| 1094 | Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/villalba23_interspeech.pdf) |
@@ -737,12 +737,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1436 | DeePMOS: Deep Posterior Mean-Opinion-Score of Speech | [![GitHub](https://img.shields.io/github/stars/Hope-Liang/DeePMOS?style=flat)](https://github.com/Hope-Liang/DeePMOS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23d_interspeech.pdf) | -| 1644 | The Role of Formant and Excitation Source Features in Perceived Naturalness of Low Resource Tribal Language TTS: An Empirical Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dasare23_interspeech.pdf) | -| 811 | A No-Reference Speech Quality Assessment Method based on Neural Network with Densely Connected Convolutional Architecture | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23_interspeech.pdf) | -| 2507 | Probing Speech Quality Information in ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ta23_interspeech.pdf) | -| 589 | Preference-based Training Framework for Automatic Speech Quality Assessment using Deep Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23d_interspeech.pdf) | -| 389 | Crowdsourced Data Validation for ASR Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/phatthiyaphaibun23_interspeech.pdf) | +| 1436 | DeePMOS: Deep Posterior Mean-Opinion-Score of Speech | [![GitHub](https://img.shields.io/github/stars/Hope-Liang/DeePMOS?style=flat)](https://github.com/Hope-Liang/DeePMOS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23d_interspeech.pdf) | +| 1644 | The Role of Formant and Excitation Source Features in Perceived Naturalness of Low Resource Tribal Language TTS: An Empirical Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dasare23_interspeech.pdf) | +| 811 | A No-Reference Speech Quality Assessment Method based on Neural Network with Densely Connected Convolutional Architecture | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23_interspeech.pdf) | +| 2507 | Probing Speech Quality Information in ASR Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ta23_interspeech.pdf) | +| 589 | Preference-based Training Framework for Automatic Speech Quality Assessment using Deep Neural Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23d_interspeech.pdf) | +| 389 | Crowdsourced Data Validation for ASR Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/phatthiyaphaibun23_interspeech.pdf) | @@ -754,12 +754,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2296 | Re-Investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huo23b_interspeech.pdf) | -| 1556 | Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/alibaba/easyrobust/tree/main/examples/asr/WAPAT)
[![GitHub](https://img.shields.io/github/stars/alibaba/easyrobust?style=flat)](https://github.com/alibaba/easyrobust) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qi23_interspeech.pdf) | -| 509 | InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lai23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16342-b31b1b.svg)](https://arxiv.org/abs/2305.16342) | -| 579 | Transductive Feature Space Regularization for Few-Shot Bioacoustic Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tan23_interspeech.pdf) | -| 615 | Incorporating L2 Phonemes using Articulatory Features for Robust Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02534-b31b1b.svg)](https://arxiv.org/abs/2306.02534) | -| 1510 | On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/parcollet23_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04116371) | +| 2296 | Re-Investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huo23b_interspeech.pdf) | +| 1556 | Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/alibaba/easyrobust/tree/main/examples/asr/WAPAT)
[![GitHub](https://img.shields.io/github/stars/alibaba/easyrobust?style=flat)](https://github.com/alibaba/easyrobust) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qi23_interspeech.pdf) | +| 509 | InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lai23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16342-b31b1b.svg)](https://arxiv.org/abs/2305.16342) | +| 579 | Transductive Feature Space Regularization for Few-Shot Bioacoustic Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tan23_interspeech.pdf) | +| 615 | Incorporating L2 Phonemes using Articulatory Features for Robust Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02534-b31b1b.svg)](https://arxiv.org/abs/2306.02534) | +| 1510 | On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/parcollet23_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04116371) |
@@ -771,10 +771,10 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1846 | Phonemic Competition in End-to-End ASR models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tenbosch23_interspeech.pdf) | -| 443 | Automatic Speaker Recognition with Variation Across Vocal Conditions: A Controlled Experiment with Implications for Forensics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hughes23_interspeech.pdf) | -| 1398 | Exploring Graph Theory Methods for the Analysis of Pronunciation Variation in Spontaneous Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geiger23_interspeech.pdf) | -| 680 | Automatic Speaker Recognition Performance with Matched and Mismatched Female Bilingual Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nuttall23_interspeech.pdf) | +| 1846 | Phonemic Competition in End-to-End ASR models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tenbosch23_interspeech.pdf) | +| 443 | Automatic Speaker Recognition with Variation Across Vocal Conditions: A Controlled Experiment with Implications for Forensics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hughes23_interspeech.pdf) | +| 1398 | Exploring Graph Theory Methods for the Analysis of Pronunciation Variation in Spontaneous Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geiger23_interspeech.pdf) | +| 680 | Automatic Speaker Recognition Performance with Matched and Mismatched Female Bilingual Speech Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nuttall23_interspeech.pdf) | @@ -786,12 +786,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2303 | FACTSpeech: Speaking a Foreign Language Pronunciation using Only Your Native Characters | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://atozto9.github.io/demo/FACTSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23x_interspeech.pdf) | -| 934 | Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02579-b31b1b.svg)](https://arxiv.org/abs/2306.02579) | -| 363 | DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://goarsenal.github.io/DSE-TTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14145-b31b1b.svg)](https://arxiv.org/abs/2306.14145) | -| 1467 | Generating Multilingual Gender-Ambiguous Text-to-Speech Voices | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://innoetics.github.io/publications/gender-ambiguous/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/markopoulos23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00375-b31b1b.svg)](https://arxiv.org/abs/2211.00375) | -| 2330 | RADMMM: Multilingual Multiaccented Multispeaker Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/badlani23_interspeech.pdf)
[![NVidia AI](https://img.shields.io/badge/NVidia-AI-78B900.svg)](https://research.nvidia.com/labs/adlr/projects/radmmm/) | -| 861 | Multilingual Context-based Pronunciation Learning for Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/comini23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/multilingual-context-based-pronunciation-learning-for-text-to-speech) | +| 2303 | FACTSpeech: Speaking a Foreign Language Pronunciation using Only Your Native Characters | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://atozto9.github.io/demo/FACTSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23x_interspeech.pdf) | +| 934 | Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02579-b31b1b.svg)](https://arxiv.org/abs/2306.02579) | +| 363 | DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://goarsenal.github.io/DSE-TTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14145-b31b1b.svg)](https://arxiv.org/abs/2306.14145) | +| 1467 | Generating Multilingual Gender-Ambiguous Text-to-Speech Voices | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://innoetics.github.io/publications/gender-ambiguous/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/markopoulos23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00375-b31b1b.svg)](https://arxiv.org/abs/2211.00375) | +| 2330 | RADMMM: Multilingual Multiaccented Multispeaker Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/badlani23_interspeech.pdf)
[![NVidia AI](https://img.shields.io/badge/NVidia-AI-78B900.svg)](https://research.nvidia.com/labs/adlr/projects/radmmm/) | +| 861 | Multilingual Context-based Pronunciation Learning for Text-to-Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/comini23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/multilingual-context-based-pronunciation-learning-for-text-to-speech) |
@@ -803,35 +803,35 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2170 | Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23c_interspeech.pdf) | -| 1113 | The Importance of Calibration: Rethinking Confidence and Performance of Speech Multi-Label Emotion Classifiers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chou23_interspeech.pdf)
[![BIIC](https://img.shields.io/badge/biic-research-F7C552.svg)](https://biic.ee.nthu.edu.tw/research.php?id=166) | -| 1080 | A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model | [![Emulation AI](https://img.shields.io/badge/Emulation-AI-161B1F.svg)](https://emulationai.com/research/diffusion-ser/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/malik23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11413-b31b1b.svg)](https://arxiv.org/abs/2305.11413) | -| 454 | Privacy Risks in Speech Emotion Recognition: A Systematic Study on Gender Inference Attack | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alsenani23_interspeech.pdf) | -| 2111 | Episodic Memory For Domain-Adaptable, Robust Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tavernor23_interspeech.pdf) | -| 80 | Stable Speech Emotion Recognition with Head-k-Pooling Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ding23_interspeech.pdf) | -| 890 | A Context-Constrained Sentence Modeling for Deception Detection in Real Interrogation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23d_interspeech.pdf) | -| 819 | MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer | [![GitHub](https://img.shields.io/github/stars/crowpeter/MetricAug?style=flat)](https://github.com/crowpeter/MetricAug) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23c_interspeech.pdf) | -| 240 | The Co-use of Laughter and Head Gestures Across Speech Styles | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ludusan23_interspeech.pdf) | -| 1351 | EmotionNAS: Two-Stream Neural Architecture Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.13617-b31b1b.svg)](https://arxiv.org/abs/2203.13617) | -| 136 | Pre-Finetuning for Few-Shot Emotional Speech Recognition | [![GitHub](https://img.shields.io/github/stars/maxlchen/Speech-PreFinetuning?style=flat)](https://github.com/maxlchen/Speech-PreFinetuning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12921-b31b1b.svg)](https://arxiv.org/abs/2302.12921) | -| 293 | Integrating Emotion Recognition with Speech Recognition and Speaker Diarization for Conversations | [![GitHub](https://img.shields.io/github/stars/W-Wu/sTEER?style=flat)](https://github.com/W-Wu/sTEER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23_interspeech.pdf) | -| 1075 | Utility-Preserving Privacy-Enabled Speech Embeddings for Emotion Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lavania23_interspeech.pdf) | -| 1923 | Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews | [![GitHub](https://img.shields.io/github/stars/idiap/Node_weighted_GCN_for_depression_detection?style=flat)](https://github.com/idiap/Node_weighted_GCN_for_depression_detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/burdisso23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00920-b31b1b.svg)](https://arxiv.org/abs/2307.00920) | -| 1914 | Laughter in Task-based Settings: Whom We Talk to Affects How, When, and How Often We Laugh | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/branco23_interspeech.pdf) | -| 653 | Exploring Downstream Transfer of Self-Supervised Features for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fang23b_interspeech.pdf) | -| 1758 | Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deoliveira23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19184-b31b1b.svg)](https://arxiv.org/abs/2305.19184) | -| 756 | Two-Stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23d_interspeech.pdf) | -| 1311 | Investigating Acoustic Cues for Multilingual Abuse Detection | [![GitHub](https://img.shields.io/github/stars/Cross-Caps/ACMAD?style=flat)](https://github.com/Cross-Caps/ACMAD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/thakran23_interspeech.pdf) | -| 1600 | A Novel Frequency Warping Scale for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23c_interspeech.pdf) | -| 1170 | Multi-Scale Temporal Transformer for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23m_interspeech.pdf) | -| 1169 | Distant Speech Emotion Recognition in an Indoor Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/grageda23_interspeech.pdf) | -| 2498 | A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tao23b_interspeech.pdf) | -| 2375 | Improving Joint Speech and Emotion Recognition using Global Style Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kyung23_interspeech.pdf) | -| 1163 | Speech Emotion Recognition by Estimating Emotional Label Sequences with Phoneme Class Attribute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nagase23_interspeech.pdf) | -| 274 | Unsupervised Transfer Components Learning for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23_interspeech.pdf) | -| 1090 | Dual Memory Fusion for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/prisayad23_interspeech.pdf) | -| 311 | Hybrid Dataset for Speech Emotion Recognition in Russian Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kondratenko23_interspeech.pdf) | -| 396 | Speech Emotion Recognition using Decomposed Speech via Multi-Task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hsu23_interspeech.pdf) | +| 2170 | Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23c_interspeech.pdf) | +| 1113 | The Importance of Calibration: Rethinking Confidence and Performance of Speech Multi-Label Emotion Classifiers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chou23_interspeech.pdf)
[![BIIC](https://img.shields.io/badge/biic-research-F7C552.svg)](https://biic.ee.nthu.edu.tw/research.php?id=166) | +| 1080 | A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model | [![Emulation AI](https://img.shields.io/badge/Emulation-AI-161B1F.svg)](https://emulationai.com/research/diffusion-ser/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/malik23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11413-b31b1b.svg)](https://arxiv.org/abs/2305.11413) | +| 454 | Privacy Risks in Speech Emotion Recognition: A Systematic Study on Gender Inference Attack | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alsenani23_interspeech.pdf) | +| 2111 | Episodic Memory For Domain-Adaptable, Robust Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tavernor23_interspeech.pdf) | +| 80 | Stable Speech Emotion Recognition with Head-k-Pooling Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ding23_interspeech.pdf) | +| 890 | A Context-Constrained Sentence Modeling for Deception Detection in Real Interrogation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23d_interspeech.pdf) | +| 819 | MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer | [![GitHub](https://img.shields.io/github/stars/crowpeter/MetricAug?style=flat)](https://github.com/crowpeter/MetricAug) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23c_interspeech.pdf) | +| 240 | The Co-use of Laughter and Head Gestures Across Speech Styles | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ludusan23_interspeech.pdf) | +| 1351 | EmotionNAS: Two-Stream Neural Architecture Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.13617-b31b1b.svg)](https://arxiv.org/abs/2203.13617) | +| 136 | Pre-Finetuning for Few-Shot Emotional Speech Recognition | [![GitHub](https://img.shields.io/github/stars/maxlchen/Speech-PreFinetuning?style=flat)](https://github.com/maxlchen/Speech-PreFinetuning) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12921-b31b1b.svg)](https://arxiv.org/abs/2302.12921) | +| 293 | Integrating Emotion Recognition with Speech Recognition and Speaker Diarization for Conversations | [![GitHub](https://img.shields.io/github/stars/W-Wu/sTEER?style=flat)](https://github.com/W-Wu/sTEER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23_interspeech.pdf) | +| 1075 | Utility-Preserving Privacy-Enabled Speech Embeddings for Emotion Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lavania23_interspeech.pdf) | +| 1923 | Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews | [![GitHub](https://img.shields.io/github/stars/idiap/Node_weighted_GCN_for_depression_detection?style=flat)](https://github.com/idiap/Node_weighted_GCN_for_depression_detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/burdisso23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00920-b31b1b.svg)](https://arxiv.org/abs/2307.00920) | +| 1914 | Laughter in Task-based Settings: Whom We Talk to Affects How, When, and How Often We Laugh | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/branco23_interspeech.pdf) | +| 653 | Exploring Downstream Transfer of Self-Supervised Features for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fang23b_interspeech.pdf) | +| 1758 | Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deoliveira23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19184-b31b1b.svg)](https://arxiv.org/abs/2305.19184) | +| 756 | Two-Stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23d_interspeech.pdf) | +| 1311 | Investigating Acoustic Cues for Multilingual Abuse Detection | [![GitHub](https://img.shields.io/github/stars/Cross-Caps/ACMAD?style=flat)](https://github.com/Cross-Caps/ACMAD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/thakran23_interspeech.pdf) | +| 1600 | A Novel Frequency Warping Scale for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23c_interspeech.pdf) | +| 1170 | Multi-Scale Temporal Transformer for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23m_interspeech.pdf) | +| 1169 | Distant Speech Emotion Recognition in an Indoor Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/grageda23_interspeech.pdf) | +| 2498 | A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tao23b_interspeech.pdf) | +| 2375 | Improving Joint Speech and Emotion Recognition using Global Style Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kyung23_interspeech.pdf) | +| 1163 | Speech Emotion Recognition by Estimating Emotional Label Sequences with Phoneme Class Attribute | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nagase23_interspeech.pdf) | +| 274 | Unsupervised Transfer Components Learning for Cross-Domain Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23_interspeech.pdf) | +| 1090 | Dual Memory Fusion for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/prisayad23_interspeech.pdf) | +| 311 | Hybrid Dataset for Speech Emotion Recognition in Russian Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kondratenko23_interspeech.pdf) | +| 396 | Speech Emotion Recognition using Decomposed Speech via Multi-Task Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hsu23_interspeech.pdf) |
@@ -843,43 +843,43 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 46 | FC-MTLF: A Fine- and Coarse-grained Multi-task Learning Framework for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23b_interspeech.pdf) | -| 93 | C2A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23c_interspeech.pdf) | -| 2300 | Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets | [![GitHub](https://img.shields.io/github/stars/adlnlp/Tri-NLU?style=flat)](https://github.com/adlnlp/Tri-NLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/weld23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17729-b31b1b.svg)](https://arxiv.org/abs/2305.17729) | -| 2234 | Semantic Enrichment Towards Efficient Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/laperriere23_interspeech.pdf) | -| 1299 | Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kashiwagi23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01247-b31b1b.svg)](https://arxiv.org/abs/2306.01247) | -| 699 | DiffSLU: Knowledge Distillation based Diffusion Model for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mao23_interspeech.pdf) | -| 1962 | Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arora23_interspeech.pdf) | -| 644 | Contrastive Learning based ASR Robust Knowledge Selection for Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23e_interspeech.pdf) | -| 1859 | Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space | [![GitHub](https://img.shields.io/github/stars/seongminp/hyperseg?style=flat)](https://github.com/seongminp/hyperseg) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23f_interspeech.pdf) | -| 198 | An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/CL_SLU?style=flat)](https://github.com/umbertocappellazzo/CL_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cappellazzo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.08161-b31b1b.svg)](https://arxiv.org/abs/2211.08161) | -| 1740 | Enhancing New Intent Discovery via Robust Neighbor-based Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23h_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://chenmengdx.github.io/papers/IS23-NID.pdf) | -| 211 | Personalized Predictive ASR for Latency Reduction in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schwarz23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13794-b31b1b.svg)](https://arxiv.org/abs/2305.13794) | -| 1419 | Compositional Generalization in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ray23_interspeech.pdf) | -| 2314 | Sampling Bias in NLU Models: Impact and Mitigation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ha_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/sampling-bias-in-nlu-models-impact-and-mitigation) | -| 1038 | 5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01855-b31b1b.svg)](https://arxiv.org/abs/2306.01855) | -| 1236 | Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23e_interspeech.pdf) | -| 1505 | WhiSLU: End-to-End Spoken Language Understanding with Whisper | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ga_interspeech.pdf) | -| 1947 | Relationship between Auditory and Semantic Entrainment using Deep Neural Networks (DNN) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kejriwal23b_interspeech.pdf) | -| 1929 | Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kejriwal23_interspeech.pdf) | -| 952 | Prosodic Features Improve Sentence Segmentation and Parsing in English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nielsen23_interspeech.pdf) | -| 320 | Estimation of Listening Response Timing by Generative Model and Parameter Control of Response Substantialness using Dynamic-Prompt-Tune | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/muromachi23_interspeech.pdf) | -| 1885 | Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chowdhury23_interspeech.pdf) | -| 2341 | Efficient Multimodal Neural Networks for Trigger-Less Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/buddi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12063-b31b1b.svg)](https://arxiv.org/abs/2305.12063) | -| 2332 | Rapid Lexical Alignment to a Conversational Agent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ostrand23_interspeech.pdf) | -| 578 | Multimodal Turn-Taking Model using Visual cues for End-of-Utterance Prediction in Spoken Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kurata23_interspeech.pdf) | -| 1464 | Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hojo23_interspeech.pdf) | -| 1618 | Improving the Response Timing Estimation for Spoken Dialogue Systems by Reducing the Effect of Speech Recognition Delay | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sakuma23_interspeech.pdf) | -| 555 | Focus-Attention-Enhanced Cross-Modal Transformer with Metric Learning for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23c_interspeech.pdf) | -| 1717 | A Multiple-Teacher Pruning based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23la_interspeech.pdf) | -| 789 | Abusive Speech Detection in Indic Languages using Acoustic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/spiesberger23_interspeech.pdf) | -| 1791 | Listening to Silences In Contact Center Conversations using Textual cues | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ingle23_interspeech.pdf) | -| 2475 | I Learned Error, I Can Fix It!: A Detector-Corrector Structure for ASR Error Calibration | [![GitHub](https://img.shields.io/github/stars/yeonheuiyeon/Detector_Corrector_SLU?style=flat)](https://github.com/yeonheuiyeon/Detector_Corrector_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yeen23_interspeech.pdf) | -| 1074 | Verbal and Nonverbal Feedback Signals in Response to Increasing Levels of Miscommunication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/garnier23b_interspeech.pdf) | -| 76 | Speech-based Classification of Defensive Communication: A Novel Dataset and Results | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/amiriparian23_interspeech.pdf) | -| 1951 | Quantifying the Perceptual Value of Lexical and Non-Lexical Channels in Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarenne.github.io/is-2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wallbridge23_interspeech.pdf) | -| 1267 | Relationships between Gender, Personality Traits and Features of Multi-Modal Data to Responses to Spoken Dialog Systems Breakdown | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tsubokura23_interspeech.pdf) | -| 1650 | Speaker-aware Cross-Modal Fusion Architecture for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23e_interspeech.pdf) | +| 46 | FC-MTLF: A Fine- and Coarse-grained Multi-task Learning Framework for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23b_interspeech.pdf) | +| 93 | C2A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23c_interspeech.pdf) | +| 2300 | Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets | [![GitHub](https://img.shields.io/github/stars/adlnlp/Tri-NLU?style=flat)](https://github.com/adlnlp/Tri-NLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/weld23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17729-b31b1b.svg)](https://arxiv.org/abs/2305.17729) | +| 2234 | Semantic Enrichment Towards Efficient Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/laperriere23_interspeech.pdf) | +| 1299 | Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kashiwagi23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01247-b31b1b.svg)](https://arxiv.org/abs/2306.01247) | +| 699 | DiffSLU: Knowledge Distillation based Diffusion Model for Cross-Lingual Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mao23_interspeech.pdf) | +| 1962 | Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arora23_interspeech.pdf) | +| 644 | Contrastive Learning based ASR Robust Knowledge Selection for Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23e_interspeech.pdf) | +| 1859 | Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space | [![GitHub](https://img.shields.io/github/stars/seongminp/hyperseg?style=flat)](https://github.com/seongminp/hyperseg) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23f_interspeech.pdf) | +| 198 | An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/CL_SLU?style=flat)](https://github.com/umbertocappellazzo/CL_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cappellazzo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.08161-b31b1b.svg)](https://arxiv.org/abs/2211.08161) | +| 1740 | Enhancing New Intent Discovery via Robust Neighbor-based Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23h_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://chenmengdx.github.io/papers/IS23-NID.pdf) | +| 211 | Personalized Predictive ASR for Latency Reduction in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schwarz23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13794-b31b1b.svg)](https://arxiv.org/abs/2305.13794) | +| 1419 | Compositional Generalization in Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ray23_interspeech.pdf) | +| 2314 | Sampling Bias in NLU Models: Impact and Mitigation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ha_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/sampling-bias-in-nlu-models-impact-and-mitigation) | +| 1038 | 5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01855-b31b1b.svg)](https://arxiv.org/abs/2306.01855) | +| 1236 | Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23e_interspeech.pdf) | +| 1505 | WhiSLU: End-to-End Spoken Language Understanding with Whisper | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ga_interspeech.pdf) | +| 1947 | Relationship between Auditory and Semantic Entrainment using Deep Neural Networks (DNN) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kejriwal23b_interspeech.pdf) | +| 1929 | Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kejriwal23_interspeech.pdf) | +| 952 | Prosodic Features Improve Sentence Segmentation and Parsing in English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nielsen23_interspeech.pdf) | +| 320 | Estimation of Listening Response Timing by Generative Model and Parameter Control of Response Substantialness using Dynamic-Prompt-Tune | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/muromachi23_interspeech.pdf) | +| 1885 | Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chowdhury23_interspeech.pdf) | +| 2341 | Efficient Multimodal Neural Networks for Trigger-Less Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/buddi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12063-b31b1b.svg)](https://arxiv.org/abs/2305.12063) | +| 2332 | Rapid Lexical Alignment to a Conversational Agent | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ostrand23_interspeech.pdf) | +| 578 | Multimodal Turn-Taking Model using Visual cues for End-of-Utterance Prediction in Spoken Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kurata23_interspeech.pdf) | +| 1464 | Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hojo23_interspeech.pdf) | +| 1618 | Improving the Response Timing Estimation for Spoken Dialogue Systems by Reducing the Effect of Speech Recognition Delay | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sakuma23_interspeech.pdf) | +| 555 | Focus-Attention-Enhanced Cross-Modal Transformer with Metric Learning for Multimodal Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23c_interspeech.pdf) | +| 1717 | A Multiple-Teacher Pruning based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23la_interspeech.pdf) | +| 789 | Abusive Speech Detection in Indic Languages using Acoustic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/spiesberger23_interspeech.pdf) | +| 1791 | Listening to Silences In Contact Center Conversations using Textual cues | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ingle23_interspeech.pdf) | +| 2475 | I Learned Error, I Can Fix It!: A Detector-Corrector Structure for ASR Error Calibration | [![GitHub](https://img.shields.io/github/stars/yeonheuiyeon/Detector_Corrector_SLU?style=flat)](https://github.com/yeonheuiyeon/Detector_Corrector_SLU) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yeen23_interspeech.pdf) | +| 1074 | Verbal and Nonverbal Feedback Signals in Response to Increasing Levels of Miscommunication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/garnier23b_interspeech.pdf) | +| 76 | Speech-based Classification of Defensive Communication: A Novel Dataset and Results | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/amiriparian23_interspeech.pdf) | +| 1951 | Quantifying the Perceptual Value of Lexical and Non-Lexical Channels in Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarenne.github.io/is-2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wallbridge23_interspeech.pdf) | +| 1267 | Relationships between Gender, Personality Traits and Features of Multi-Modal Data to Responses to Spoken Dialog Systems Breakdown | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tsubokura23_interspeech.pdf) | +| 1650 | Speaker-aware Cross-Modal Fusion Architecture for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23e_interspeech.pdf) |
@@ -891,64 +891,64 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 936 | Biophysically-Inspired Single-Channel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wen23b_interspeech.pdf) | -| 1902 | On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on Flexible Location Gradient Reversal Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jalal23_interspeech.pdf) | -| 1901 | How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shim23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00044-b31b1b.svg)](https://arxiv.org/abs/2306.00044) | -| 1287 | CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cleanunet2.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kong23c_interspeech.pdf) | -| 521 | A Two-Stage Progressive Neural Network for Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23e_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371040399_A_Two-stage_Progressive_Neural_Network_for_Acoustic_Echo_Cancellation) | -| 537 | An Intra-BRNN and GB-RVQ based End-to-End Neural Audio Codec | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23_interspeech.pdf) | -| 1066 | Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-Attended Speaker Representations | [![GitHub](https://img.shields.io/github/stars/shucongzhang/CrossAttnPse?style=flat)](https://github.com/shucongzhang/CrossAttnPse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23r_interspeech.pdf) | -| 280 | CFTNet: Complex-Valued Frequency Transformation Network for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mamun23_interspeech.pdf) | -| 623 | Feature Normalization for Fine-Tuning Self-Supervised Models in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08406-b31b1b.svg)](https://arxiv.org/abs/2306.08406) | -| 1490 | Multi-Mode Neural Speech Coding based on Deep Generative Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23c_interspeech.pdf) | -| 751 | Streaming Dual-Path Transformer for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bae23_interspeech.pdf) | -| 1848 | Sequence-to-Sequence Multi-Modal Speech In-Painting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kadkhodaeielyaderani23_interspeech.pdf) | -| 984 | Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23q_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.02583-b31b1b.svg)](https://arxiv.org/abs/2305.02583) | -| 551 | Differentially Private Adapters for Parameter Efficient Acoustic Modeling | [![GitHub](https://img.shields.io/github/stars/Chun-wei-Ho/Private-Speech-Adapter?style=flat)](https://github.com/Chun-wei-Ho/Private-Speech-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ho23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11360-b31b1b.svg)](https://arxiv.org/abs/2305.11360) | -| 780 | Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhengrachel.github.io/UTIforAVSE-demo/)
[![GitHub](https://img.shields.io/github/stars/ZhengRachel/UTIforAVSE-demo?style=flat)](https://github.com/ZhengRachel/UTIforAVSE-demo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zheng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14933-b31b1b.svg)](https://arxiv.org/abs/2305.14933) | -| 2568 | Consonant-Emphasis Method Incorporating Robust Consonant-Section Detection to Improve Intelligibility of Bone-Conducted Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/uezu23_interspeech.pdf) | -| 1578 | Downstream Task-Agnostic Speech Enhancement with Self-Supervised Representation Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sato23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14723-b31b1b.svg)](https://arxiv.org/abs/2305.14723) | -| 2305 | Perceptual Improvement of Deep Neural Network (DNN) Speech Coder using Parametric and Nonparametric Density Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/byun23_interspeech.pdf) | -| 2437 | DeFT-AN RT: Real-Time Multichannel Speech Enhancement using Dense Frequency-Time Attentive Network and Non-overlapping Synthesis Window | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23j_interspeech.pdf) | -| 1376 | PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23f_interspeech.pdf) | -| 1364 | Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/han23b_interspeech.pdf) | -| 365 | Iterative Autoregression: A Novel Trick to Improve your Low-Latency Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/andreev23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.01751-b31b1b.svg)](https://arxiv.org/abs/2211.01751) | -| 1084 | A Multi-Dimensional Deep Structured State Space Approach to Speech Enhancement using Small-Footprint Models | [![GitHub](https://img.shields.io/github/stars/Kuray107/S4ND-U-Net_speech_enhancement?style=flat)](https://github.com/Kuray107/S4ND-U-Net_speech_enhancement) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ku23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00331-b31b1b.svg)](https://arxiv.org/abs/2306.00331) | -| 705 | Domain Adaptation for Speech Enhancement in a Large Domain Gap | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/frenkel23_interspeech.pdf) | -| 456 | SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zadorozhnyy23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.14474-b31b1b.svg)](https://arxiv.org/abs/2210.14474) | -| 339 | A Mask Free Neural Network for Monaural Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ioyy900205/MFNet?style=flat)](https://github.com/ioyy900205/MFNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04286-b31b1b.svg)](https://arxiv.org/abs/2306.04286) | -| 1548 | A Training and Inference Strategy using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech | [![GitHub](https://img.shields.io/github/stars/Sinica-SLAM/Ny-EnhTT?style=flat)](https://github.com/Sinica-SLAM/Ny-EnhTT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15368-b31b1b.svg)](https://arxiv.org/abs/2210.15368) | -| 2418 | A Simple RNN Model for Lightweight, Low-Compute and Low-Latency Multichannel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pandey23b_interspeech.pdf) | -| 1433 | High Fidelity Speech Enhancement with Band-Split RNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.00406-b31b1b.svg)](https://arxiv.org/abs/2212.00406) | -| 218 | Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-NS-Extractor/)
[![GitHub](https://img.shields.io/github/stars/thuhcsi/interspeech2023-NS-Extractor?style=flat)](https://github.com/thuhcsi/interspeech2023-NS-Extractor) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16241-b31b1b.svg)](https://arxiv.org/abs/2306.16241) | -| 882 | DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kovalyov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.13407-b31b1b.svg)](https://arxiv.org/abs/2302.13407) | -| 1323 | Speaker-Aware Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.01126-b31b1b.svg)](https://arxiv.org/abs/2303.01126) | -| 1116 | Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/araki23_interspeech.pdf) | -| 799 | EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sach23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02778-b31b1b.svg)](https://arxiv.org/abs/2306.02778) | -| 1795 | HAD-ANC: A Hybrid System Comprising an Adaptive Filter and Deep Neural Networks for Active Noise Control | [![GitHub](https://img.shields.io/github/stars/wndvlf96/HAD-ANC?style=flat)](https://github.com/wndvlf96/HAD-ANC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23e_interspeech.pdf) | -| 886 | MSAF: A Multiple Self-Attention Field Method for Speech Enhancement | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mmf-sasegan.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chu23_interspeech.pdf) | -| 2302 | Ultra Dual-Path Compression for Joint echo Cancellation and Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23t_interspeech.pdf) | -| 971 | ABC-KD: Attention-based-Compression Knowledge Distillation for Deep Learning-based Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16665-b31b1b.svg)](https://arxiv.org/abs/2305.16665) | -| 1532 | PLCMOS – a Data-Driven Non-Intrusive Metric for the Evaluation of Packet Loss Concealment Algorithms | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/microsoft/PLC-Challenge/tree/main/PLCMOS)
[![PyPI](https://img.shields.io/pypi/v/speechmos)](https://pypi.org/project/speechmos/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/diener23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15127-b31b1b.svg)](https://arxiv.org/abs/2305.15127) | -| 1910 | Multi-Dataset Co-training with Sharpness-aware Optimization for Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shim23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19953-b31b1b.svg)](https://arxiv.org/abs/2305.19953) | -| 1445 | Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sp-uhh/sgmse-bbed?style=flat)](https://github.com/sp-uhh/sgmse-bbed) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lay23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14748-b31b1b.svg)](https://arxiv.org/abs/2302.14748) | -| 901 | Complex-valued Neural Networks for Voice Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/muller23_interspeech.pdf) | -| 1028 | DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic echo Cancellation, Noise Suppression and Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ristea.github.io/deep-vqe/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ristea23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03177-b31b1b.svg)](https://arxiv.org/abs/2306.03177) | -| 1547 | Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sony/diffiner?style=flat)](https://github.com/sony/diffiner) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sawata23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17287-b31b1b.svg)](https://arxiv.org/abs/2210.17287) | -| 1642 | HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01411-b31b1b.svg)](https://arxiv.org/abs/2306.01411) | -| 1441 | MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yxlu-0102.github.io/mpsenet-demo/)
[![GitHub](https://img.shields.io/github/stars/yxlu-0102/MP-SENet?style=flat)](https://github.com/yxlu-0102/MP-SENet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13686-b31b1b.svg)](https://arxiv.org/abs/2305.13686) | -| 565 | TRIDENTSE: Guiding Speech Enhancement with 32 Global Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yin23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.12995-b31b1b.svg)](https://arxiv.org/abs/2210.12995) | -| 1254 | Detection of Cross-Dataset Fake Audio based on Prosodic and Pronunciation Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23x_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13700-b31b1b.svg)](https://arxiv.org/abs/2305.13700) | -| 1890 | Self-Supervised Learning with Diffusion based Multichannel Speech Enhancement for Speaker Verification under Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dowerah23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.02244-b31b1b.svg)](https://arxiv.org/abs/2307.02244) | -| 1341 | Two-Stage Voice Anonymization for Enhanced Privacy | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nespoli23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16069-b31b1b.svg)](https://arxiv.org/abs/2306.16069) | -| 2055 | Personalized Dereverberation of Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dereverb.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23h_interspeech.pdf) | -| 580 | Weighted Von Mises Distribution-based Loss Function for Real-Time STFT Phase Reconstruction using DNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/binhthien23_interspeech.pdf) | -| 272 | Deep Multi-Frame Filtering for Hearing Aids | [![GitHub](https://img.shields.io/github/stars/rikorose/deepfilternet?style=flat)](https://github.com/rikorose/deepfilternet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schroter23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.08225-b31b1b.svg)](https://arxiv.org/abs/2305.08225) | -| 1232 | Aligning Speech Enhancement for Improving Downstream Classification Performance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiong23_interspeech.pdf) | -| 420 | DNN-based Parameter Estimation for MVDR Beamforming and Post-Filtering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23b_interspeech.pdf) | -| 675 | FRA-RIR: Fast Random Approximation of the Image-Source | [![GitHub](https://img.shields.io/github/stars/tencent-ailab/FRA-RIR?style=flat)](https://github.com/tencent-ailab/FRA-RIR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2208.04101-b31b1b.svg)](https://arxiv.org/abs/2208.04101) | -| 686 | Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.04320-b31b1b.svg)](https://arxiv.org/abs/2301.04320) | -| 186 | Harmonic Enhancement using Learnable Comb Filter for Light-Weight Full-band Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/le23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00812-b31b1b.svg)](https://arxiv.org/abs/2306.00812) | +| 936 | Biophysically-Inspired Single-Channel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wen23b_interspeech.pdf) | +| 1902 | On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on Flexible Location Gradient Reversal Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jalal23_interspeech.pdf) | +| 1901 | How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shim23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00044-b31b1b.svg)](https://arxiv.org/abs/2306.00044) | +| 1287 | CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cleanunet2.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kong23c_interspeech.pdf) | +| 521 | A Two-Stage Progressive Neural Network for Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23e_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371040399_A_Two-stage_Progressive_Neural_Network_for_Acoustic_Echo_Cancellation) | +| 537 | An Intra-BRNN and GB-RVQ based End-to-End Neural Audio Codec | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23_interspeech.pdf) | +| 1066 | Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-Attended Speaker Representations | [![GitHub](https://img.shields.io/github/stars/shucongzhang/CrossAttnPse?style=flat)](https://github.com/shucongzhang/CrossAttnPse) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23r_interspeech.pdf) | +| 280 | CFTNet: Complex-Valued Frequency Transformation Network for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mamun23_interspeech.pdf) | +| 623 | Feature Normalization for Fine-Tuning Self-Supervised Models in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08406-b31b1b.svg)](https://arxiv.org/abs/2306.08406) | +| 1490 | Multi-Mode Neural Speech Coding based on Deep Generative Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23c_interspeech.pdf) | +| 751 | Streaming Dual-Path Transformer for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bae23_interspeech.pdf) | +| 1848 | Sequence-to-Sequence Multi-Modal Speech In-Painting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kadkhodaeielyaderani23_interspeech.pdf) | +| 984 | Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23q_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.02583-b31b1b.svg)](https://arxiv.org/abs/2305.02583) | +| 551 | Differentially Private Adapters for Parameter Efficient Acoustic Modeling | [![GitHub](https://img.shields.io/github/stars/Chun-wei-Ho/Private-Speech-Adapter?style=flat)](https://github.com/Chun-wei-Ho/Private-Speech-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ho23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11360-b31b1b.svg)](https://arxiv.org/abs/2305.11360) | +| 780 | Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhengrachel.github.io/UTIforAVSE-demo/)
[![GitHub](https://img.shields.io/github/stars/ZhengRachel/UTIforAVSE-demo?style=flat)](https://github.com/ZhengRachel/UTIforAVSE-demo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zheng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14933-b31b1b.svg)](https://arxiv.org/abs/2305.14933) | +| 2568 | Consonant-Emphasis Method Incorporating Robust Consonant-Section Detection to Improve Intelligibility of Bone-Conducted Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/uezu23_interspeech.pdf) | +| 1578 | Downstream Task-Agnostic Speech Enhancement with Self-Supervised Representation Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sato23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14723-b31b1b.svg)](https://arxiv.org/abs/2305.14723) | +| 2305 | Perceptual Improvement of Deep Neural Network (DNN) Speech Coder using Parametric and Nonparametric Density Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/byun23_interspeech.pdf) | +| 2437 | DeFT-AN RT: Real-Time Multichannel Speech Enhancement using Dense Frequency-Time Attentive Network and Non-overlapping Synthesis Window | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23j_interspeech.pdf) | +| 1376 | PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23f_interspeech.pdf) | +| 1364 | Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/han23b_interspeech.pdf) | +| 365 | Iterative Autoregression: A Novel Trick to Improve your Low-Latency Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/andreev23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.01751-b31b1b.svg)](https://arxiv.org/abs/2211.01751) | +| 1084 | A Multi-Dimensional Deep Structured State Space Approach to Speech Enhancement using Small-Footprint Models | [![GitHub](https://img.shields.io/github/stars/Kuray107/S4ND-U-Net_speech_enhancement?style=flat)](https://github.com/Kuray107/S4ND-U-Net_speech_enhancement) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ku23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00331-b31b1b.svg)](https://arxiv.org/abs/2306.00331) | +| 705 | Domain Adaptation for Speech Enhancement in a Large Domain Gap | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/frenkel23_interspeech.pdf) | +| 456 | SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zadorozhnyy23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.14474-b31b1b.svg)](https://arxiv.org/abs/2210.14474) | +| 339 | A Mask Free Neural Network for Monaural Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/ioyy900205/MFNet?style=flat)](https://github.com/ioyy900205/MFNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04286-b31b1b.svg)](https://arxiv.org/abs/2306.04286) | +| 1548 | A Training and Inference Strategy using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech | [![GitHub](https://img.shields.io/github/stars/Sinica-SLAM/Ny-EnhTT?style=flat)](https://github.com/Sinica-SLAM/Ny-EnhTT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15368-b31b1b.svg)](https://arxiv.org/abs/2210.15368) | +| 2418 | A Simple RNN Model for Lightweight, Low-Compute and Low-Latency Multichannel Speech Enhancement in the Time Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pandey23b_interspeech.pdf) | +| 1433 | High Fidelity Speech Enhancement with Band-Split RNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.00406-b31b1b.svg)](https://arxiv.org/abs/2212.00406) | +| 218 | Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-NS-Extractor/)
[![GitHub](https://img.shields.io/github/stars/thuhcsi/interspeech2023-NS-Extractor?style=flat)](https://github.com/thuhcsi/interspeech2023-NS-Extractor) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16241-b31b1b.svg)](https://arxiv.org/abs/2306.16241) | +| 882 | DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kovalyov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.13407-b31b1b.svg)](https://arxiv.org/abs/2302.13407) | +| 1323 | Speaker-Aware Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.01126-b31b1b.svg)](https://arxiv.org/abs/2303.01126) | +| 1116 | Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/araki23_interspeech.pdf) | +| 799 | EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sach23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02778-b31b1b.svg)](https://arxiv.org/abs/2306.02778) | +| 1795 | HAD-ANC: A Hybrid System Comprising an Adaptive Filter and Deep Neural Networks for Active Noise Control | [![GitHub](https://img.shields.io/github/stars/wndvlf96/HAD-ANC?style=flat)](https://github.com/wndvlf96/HAD-ANC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23e_interspeech.pdf) | +| 886 | MSAF: A Multiple Self-Attention Field Method for Speech Enhancement | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mmf-sasegan.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chu23_interspeech.pdf) | +| 2302 | Ultra Dual-Path Compression for Joint echo Cancellation and Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23t_interspeech.pdf) | +| 971 | ABC-KD: Attention-based-Compression Knowledge Distillation for Deep Learning-based Noise Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16665-b31b1b.svg)](https://arxiv.org/abs/2305.16665) | +| 1532 | PLCMOS – a Data-Driven Non-Intrusive Metric for the Evaluation of Packet Loss Concealment Algorithms | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/microsoft/PLC-Challenge/tree/main/PLCMOS)
[![PyPI](https://img.shields.io/pypi/v/speechmos)](https://pypi.org/project/speechmos/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/diener23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15127-b31b1b.svg)](https://arxiv.org/abs/2305.15127) | +| 1910 | Multi-Dataset Co-training with Sharpness-aware Optimization for Audio Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shim23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19953-b31b1b.svg)](https://arxiv.org/abs/2305.19953) | +| 1445 | Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sp-uhh/sgmse-bbed?style=flat)](https://github.com/sp-uhh/sgmse-bbed) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lay23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14748-b31b1b.svg)](https://arxiv.org/abs/2302.14748) | +| 901 | Complex-valued Neural Networks for Voice Anti-Spoofing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/muller23_interspeech.pdf) | +| 1028 | DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic echo Cancellation, Noise Suppression and Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ristea.github.io/deep-vqe/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ristea23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03177-b31b1b.svg)](https://arxiv.org/abs/2306.03177) | +| 1547 | Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/sony/diffiner?style=flat)](https://github.com/sony/diffiner) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sawata23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17287-b31b1b.svg)](https://arxiv.org/abs/2210.17287) | +| 1642 | HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01411-b31b1b.svg)](https://arxiv.org/abs/2306.01411) | +| 1441 | MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yxlu-0102.github.io/mpsenet-demo/)
[![GitHub](https://img.shields.io/github/stars/yxlu-0102/MP-SENet?style=flat)](https://github.com/yxlu-0102/MP-SENet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13686-b31b1b.svg)](https://arxiv.org/abs/2305.13686) | +| 565 | TRIDENTSE: Guiding Speech Enhancement with 32 Global Tokens | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yin23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.12995-b31b1b.svg)](https://arxiv.org/abs/2210.12995) | +| 1254 | Detection of Cross-Dataset Fake Audio based on Prosodic and Pronunciation Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23x_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13700-b31b1b.svg)](https://arxiv.org/abs/2305.13700) | +| 1890 | Self-Supervised Learning with Diffusion based Multichannel Speech Enhancement for Speaker Verification under Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dowerah23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.02244-b31b1b.svg)](https://arxiv.org/abs/2307.02244) | +| 1341 | Two-Stage Voice Anonymization for Enhanced Privacy | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nespoli23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16069-b31b1b.svg)](https://arxiv.org/abs/2306.16069) | +| 2055 | Personalized Dereverberation of Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dereverb.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23h_interspeech.pdf) | +| 580 | Weighted Von Mises Distribution-based Loss Function for Real-Time STFT Phase Reconstruction using DNN | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/binhthien23_interspeech.pdf) | +| 272 | Deep Multi-Frame Filtering for Hearing Aids | [![GitHub](https://img.shields.io/github/stars/rikorose/deepfilternet?style=flat)](https://github.com/rikorose/deepfilternet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schroter23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.08225-b31b1b.svg)](https://arxiv.org/abs/2305.08225) | +| 1232 | Aligning Speech Enhancement for Improving Downstream Classification Performance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiong23_interspeech.pdf) | +| 420 | DNN-based Parameter Estimation for MVDR Beamforming and Post-Filtering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23b_interspeech.pdf) | +| 675 | FRA-RIR: Fast Random Approximation of the Image-Source | [![GitHub](https://img.shields.io/github/stars/tencent-ailab/FRA-RIR?style=flat)](https://github.com/tencent-ailab/FRA-RIR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2208.04101-b31b1b.svg)](https://arxiv.org/abs/2208.04101) | +| 686 | Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.04320-b31b1b.svg)](https://arxiv.org/abs/2301.04320) | +| 186 | Harmonic Enhancement using Learnable Comb Filter for Light-Weight Full-band Speech Enhancement Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/le23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00812-b31b1b.svg)](https://arxiv.org/abs/2306.00812) |
@@ -960,28 +960,28 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1023 | Detection of Emotional Hotspots in Meetings using a Cross-Corpus Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/stemmer23_interspeech.pdf) | -| 1412 | Detection of Laughter and Screaming using the Attention and CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/matsuda23_interspeech.pdf) | -| 1852 | Capturing Formality in Speech Across Domains and Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhattacharya23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.cs.columbia.edu/speech/PaperFiles/2023/interspeech23_formality.pdf) | -| 460 | Towards Robust Family-Infant Audio Analysis based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/wav2vec_LittleBeats)
[![Hugging Face](https://img.shields.io/badge/🤗-lijialudew-FFD21F.svg)](https://huggingface.co/lijialudew/wav2vec_LittleBeats_LENA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12530-b31b1b.svg)](https://arxiv.org/abs/2305.12530) | -| 778 | Cues to Next-Speaker Projection in Conversational Swedish: Evidence from Reaction Times | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/feindt23_interspeech.pdf)
[![psyArXiv](https://img.shields.io/badge/psyArXiv-Preprints-226B79.svg)](https://psyarxiv.com/qasge/) | -| 1200 | Multiple Instance Learning for Inference of Child Attachment from Paralinguistic Aspects of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/buker23_interspeech.pdf) | -| 2070 | Speaker Embeddings as Individuality Proxy for Voice Stress Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05915-b31b1b.svg)](https://arxiv.org/abs/2306.05915) | -| 2213 | From Interval to Ordinal: A HMM based Approach for Emotion Label Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23j_interspeech.pdf) | -| 661 | Turbo your Multi-Modal Classification with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23l_interspeech.pdf) | -| 497 | Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ioannides23_interspeech.pdf) | -| 1360 | SOT: Self-Supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23y_interspeech.pdf) | -| 2464 | On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bansal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12540-b31b1b.svg)](https://arxiv.org/abs/2305.12540) | -| 830 | Speaking State Decoder with Transition Detection for Next Speaker Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23b_interspeech.pdf) | -| 1507 | What are Differences? Comparing DNN and Human by their Performance and Characteristics in Speaker Age Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kitagishi23_interspeech.pdf) | -| 846 | Effects of Perceived Gender on the Perceived Social Function of Laughter | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arts23_interspeech.pdf) | -| 1999 | Implicit Phonetic Information Modeling for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/purohit23_interspeech.pdf)
[![idiap](https://img.shields.io/badge/idiap.ch.5062-FF6A00.svg)](https://publications.idiap.ch/publications/show/5062) | -| 1034 | Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/leem23_interspeech.pdf) | -| 300 | Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23b_interspeech.pdf) | -| 1108 | Preference Learning Labels by Anchoring on Consecutive Annotations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/naini23_interspeech.pdf) | -| 2561 | Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chetiaphukan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18640-b31b1b.svg)](https://arxiv.org/abs/2305.18640) | -| 543 | Learning Local to Global Feature Aggregation for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01491-b31b1b.svg)](https://arxiv.org/abs/2306.01491) | -| 842 | Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23q_interspeech.pdf) | +| 1023 | Detection of Emotional Hotspots in Meetings using a Cross-Corpus Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/stemmer23_interspeech.pdf) | +| 1412 | Detection of Laughter and Screaming using the Attention and CTC Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/matsuda23_interspeech.pdf) | +| 1852 | Capturing Formality in Speech Across Domains and Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhattacharya23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.cs.columbia.edu/speech/PaperFiles/2023/interspeech23_formality.pdf) | +| 460 | Towards Robust Family-Infant Audio Analysis based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/wav2vec_LittleBeats)
[![Hugging Face](https://img.shields.io/badge/🤗-lijialudew-FFD21F.svg)](https://huggingface.co/lijialudew/wav2vec_LittleBeats_LENA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12530-b31b1b.svg)](https://arxiv.org/abs/2305.12530) | +| 778 | Cues to Next-Speaker Projection in Conversational Swedish: Evidence from Reaction Times | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/feindt23_interspeech.pdf)
[![psyArXiv](https://img.shields.io/badge/psyArXiv-Preprints-226B79.svg)](https://psyarxiv.com/qasge/) | +| 1200 | Multiple Instance Learning for Inference of Child Attachment from Paralinguistic Aspects of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/buker23_interspeech.pdf) | +| 2070 | Speaker Embeddings as Individuality Proxy for Voice Stress Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05915-b31b1b.svg)](https://arxiv.org/abs/2306.05915) | +| 2213 | From Interval to Ordinal: A HMM based Approach for Emotion Label Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23j_interspeech.pdf) | +| 661 | Turbo your Multi-Modal Classification with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23l_interspeech.pdf) | +| 497 | Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ioannides23_interspeech.pdf) | +| 1360 | SOT: Self-Supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23y_interspeech.pdf) | +| 2464 | On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bansal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12540-b31b1b.svg)](https://arxiv.org/abs/2305.12540) | +| 830 | Speaking State Decoder with Transition Detection for Next Speaker Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23b_interspeech.pdf) | +| 1507 | What are Differences? Comparing DNN and Human by their Performance and Characteristics in Speaker Age Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kitagishi23_interspeech.pdf) | +| 846 | Effects of Perceived Gender on the Perceived Social Function of Laughter | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arts23_interspeech.pdf) | +| 1999 | Implicit Phonetic Information Modeling for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/purohit23_interspeech.pdf)
[![idiap](https://img.shields.io/badge/idiap.ch.5062-FF6A00.svg)](https://publications.idiap.ch/publications/show/5062) | +| 1034 | Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/leem23_interspeech.pdf) | +| 300 | Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23b_interspeech.pdf) | +| 1108 | Preference Learning Labels by Anchoring on Consecutive Annotations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/naini23_interspeech.pdf) | +| 2561 | Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chetiaphukan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18640-b31b1b.svg)](https://arxiv.org/abs/2305.18640) | +| 543 | Learning Local to Global Feature Aggregation for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01491-b31b1b.svg)](https://arxiv.org/abs/2306.01491) | +| 842 | Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23q_interspeech.pdf) |
@@ -993,12 +993,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1088 | Real-Time Joint Personalized Speech Enhancement and Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eskimez23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02773-b31b1b.svg)](https://arxiv.org/abs/2211.02773) | -| 514 | TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://andong-li-speech.github.io/TaylorBM-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.12024-b31b1b.svg)](https://arxiv.org/abs/2211.12024) | -| 865 | MFT-CRN:Multi-Scale Fourier Transform for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23s_interspeech.pdf) | -| 1265 | Variance-Preserving-based Interpolation Diffusion Models for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/guo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08527-b31b1b.svg)](https://arxiv.org/abs/2306.08527) | -| 318 | Multi-Input Multi-Output Complex Spectral Mapping for Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/taherian23_interspeech.pdf) | -| 992 | Short-Term Extrapolation of Speech Signals using Recursive Neural Networks in the STFT Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oberhag23_interspeech.pdf) | +| 1088 | Real-Time Joint Personalized Speech Enhancement and Acoustic echo Cancellation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eskimez23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02773-b31b1b.svg)](https://arxiv.org/abs/2211.02773) | +| 514 | TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://andong-li-speech.github.io/TaylorBM-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.12024-b31b1b.svg)](https://arxiv.org/abs/2211.12024) | +| 865 | MFT-CRN:Multi-Scale Fourier Transform for Monaural Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23s_interspeech.pdf) | +| 1265 | Variance-Preserving-based Interpolation Diffusion Models for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/guo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08527-b31b1b.svg)](https://arxiv.org/abs/2306.08527) | +| 318 | Multi-Input Multi-Output Complex Spectral Mapping for Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/taherian23_interspeech.pdf) | +| 992 | Short-Term Extrapolation of Speech Signals using Recursive Neural Networks in the STFT Domain | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oberhag23_interspeech.pdf) |
@@ -1010,12 +1010,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1843 | Listener Sensitivity to Deviating Obstruents in WaveNet | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pandey23_interspeech.pdf) | -| 981 | How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00697-b31b1b.svg)](https://arxiv.org/abs/2306.00697) | -| 2014 | MOS vs. AB: Evaluating Text-to-Speech Systems Reliably using Clustered Standard Errors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/camp23_interspeech.pdf) | -| 851 | RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23r_interspeech.pdf) | -| 2013 | Can Better Perception Become a Disadvantage? Synthetic Speech Perception in Congenitally Blind Users | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/melnikleroy23_interspeech.pdf) | -| 1076 | Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cooper23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10608-b31b1b.svg)](https://arxiv.org/abs/2305.10608) | +| 1843 | Listener Sensitivity to Deviating Obstruents in WaveNet | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pandey23_interspeech.pdf) | +| 981 | How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00697-b31b1b.svg)](https://arxiv.org/abs/2306.00697) | +| 2014 | MOS vs. AB: Evaluating Text-to-Speech Systems Reliably using Clustered Standard Errors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/camp23_interspeech.pdf) | +| 851 | RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23r_interspeech.pdf) | +| 2013 | Can Better Perception Become a Disadvantage? Synthetic Speech Perception in Congenitally Blind Users | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/melnikleroy23_interspeech.pdf) | +| 1076 | Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cooper23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10608-b31b1b.svg)](https://arxiv.org/abs/2305.10608) |
@@ -1027,12 +1027,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1799 | Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mutiann.github.io/papers/ChatGPT_SLU/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/he23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13512-b31b1b.svg)](https://arxiv.org/abs/2305.13512) | -| 1760 | Improving End-to-End SLU performance with Prosodic Attention and Distillation | [![GitHub](https://img.shields.io/github/stars/skit-ai/slu-prosody?style=flat)](https://github.com/skit-ai/slu-prosody) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rajaa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.08067-b31b1b.svg)](https://arxiv.org/abs/2305.08067) | -| 2575 | Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23n_interspeech.pdf) | -|758 | Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23c_interspeech.pdf) | -| 2018 | ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sunder23_interspeech.pdf) | -| 41 | GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23_interspeech.pdf) | +| 1799 | Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mutiann.github.io/papers/ChatGPT_SLU/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/he23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13512-b31b1b.svg)](https://arxiv.org/abs/2305.13512) | +| 1760 | Improving End-to-End SLU performance with Prosodic Attention and Distillation | [![GitHub](https://img.shields.io/github/stars/skit-ai/slu-prosody?style=flat)](https://github.com/skit-ai/slu-prosody) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rajaa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.08067-b31b1b.svg)](https://arxiv.org/abs/2305.08067) | +| 2575 | Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23n_interspeech.pdf) | +|758 | Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23c_interspeech.pdf) | +| 2018 | ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sunder23_interspeech.pdf) | +| 41 | GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23_interspeech.pdf) |
@@ -1044,16 +1044,16 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 278 | Obstructive Sleep Apnea Detection using Pretrained Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23b_interspeech.pdf) | -| 620 | EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23f_interspeech.pdf) | -| 1966 | Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/beeson23_interspeech.pdf) | -| 1377 | Auditory Attention Detection in Real-Life Scenarios using Common Spatial Patterns from EEG | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23r_interspeech.pdf) | -| 1381 | Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG | [![GitHub](https://img.shields.io/github/stars/yorgoon/DiffE?style=flat)](https://github.com/yorgoon/DiffE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23g_interspeech.pdf) | -| 40 | Towards Ultrasound Tongue Image Prediction from EEG During Speech Production | [![GitHub](https://img.shields.io/github/stars/BME-SmartLab/EEG-to-UTI?style=flat)](https://github.com/BME-SmartLab/EEG-to-UTI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/csapo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05374-b31b1b.svg)](https://arxiv.org/abs/2306.05374) | -| 1607 | Adaptation of Tongue Ultrasound-based Silent Speech Interfaces using Spatial Transformer Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/toth23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19130-b31b1b.svg)](https://arxiv.org/abs/2305.19130) | -| 174 | STE-GAN: Speech-to-Electromyography Signal Conversion using Generative Adversarial Networks | [![GitHub](https://img.shields.io/github/stars/scheck-k/ste-gan?style=flat)](https://github.com/scheck-k/ste-gan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/scheck23_interspeech.pdf) | -| 1881 | Spanish Phone Confusion Analysis for EMG-based Silent Speech Interfaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/salomons23_interspeech.pdf) | -| 805 | Hybrid Silent Speech Interface through Fusion of Electroencephalography and Electromyography | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://stone-wave.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23l_interspeech.pdf) | +| 278 | Obstructive Sleep Apnea Detection using Pretrained Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23b_interspeech.pdf) | +| 620 | EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23f_interspeech.pdf) | +| 1966 | Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/beeson23_interspeech.pdf) | +| 1377 | Auditory Attention Detection in Real-Life Scenarios using Common Spatial Patterns from EEG | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23r_interspeech.pdf) | +| 1381 | Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG | [![GitHub](https://img.shields.io/github/stars/yorgoon/DiffE?style=flat)](https://github.com/yorgoon/DiffE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23g_interspeech.pdf) | +| 40 | Towards Ultrasound Tongue Image Prediction from EEG During Speech Production | [![GitHub](https://img.shields.io/github/stars/BME-SmartLab/EEG-to-UTI?style=flat)](https://github.com/BME-SmartLab/EEG-to-UTI) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/csapo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05374-b31b1b.svg)](https://arxiv.org/abs/2306.05374) | +| 1607 | Adaptation of Tongue Ultrasound-based Silent Speech Interfaces using Spatial Transformer Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/toth23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19130-b31b1b.svg)](https://arxiv.org/abs/2305.19130) | +| 174 | STE-GAN: Speech-to-Electromyography Signal Conversion using Generative Adversarial Networks | [![GitHub](https://img.shields.io/github/stars/scheck-k/ste-gan?style=flat)](https://github.com/scheck-k/ste-gan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/scheck23_interspeech.pdf) | +| 1881 | Spanish Phone Confusion Analysis for EMG-based Silent Speech Interfaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/salomons23_interspeech.pdf) | +| 805 | Hybrid Silent Speech Interface through Fusion of Electroencephalography and Electromyography | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://stone-wave.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23l_interspeech.pdf) |
@@ -1065,12 +1065,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1968 | Can Self-Supervised Neural Representations Pre-trained on Human Speech Distinguish Animal Callers? | [![GitHub](https://img.shields.io/github/stars/idiap/ssl-caller-detection?style=flat)](https://github.com/idiap/ssl-caller-detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sarkar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14035-b31b1b.svg)](https://arxiv.org/abs/2305.14035) | -| 2342 | Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data using Contrastive Learning with Varying Pre-Training Domains | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cai23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01864-b31b1b.svg)](https://arxiv.org/abs/2306.01864) | -| 330 | Background-aware Modeling for Weakly Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xin23_interspeech.pdf) | -| 1065 | How to (Virtually) Train Your Speaker Localizer | [![GitHub](https://img.shields.io/github/stars/prerak23/Dir_SrcMic_DOA?style=flat)](https://github.com/prerak23/Dir_SrcMic_DOA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/srivastava23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.16958-b31b1b.svg)](https://arxiv.org/abs/2211.16958) | -| 2271 | MMER: Multimodal Multi-task Learning for Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Sreyan88/MMER?style=flat)](https://github.com/Sreyan88/MMER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ghosh23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.16794-b31b1b.svg)](https://arxiv.org/abs/2203.16794) | -| 909 | A Multi-task Learning Framework for Sound Event Detection using High-Level Acoustic Characteristics of Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/khandelwal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10729-b31b1b.svg)](https://arxiv.org/abs/2305.10729) | +| 1968 | Can Self-Supervised Neural Representations Pre-trained on Human Speech Distinguish Animal Callers? | [![GitHub](https://img.shields.io/github/stars/idiap/ssl-caller-detection?style=flat)](https://github.com/idiap/ssl-caller-detection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sarkar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14035-b31b1b.svg)](https://arxiv.org/abs/2305.14035) | +| 2342 | Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data using Contrastive Learning with Varying Pre-Training Domains | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cai23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01864-b31b1b.svg)](https://arxiv.org/abs/2306.01864) | +| 330 | Background-aware Modeling for Weakly Supervised Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xin23_interspeech.pdf) | +| 1065 | How to (Virtually) Train Your Speaker Localizer | [![GitHub](https://img.shields.io/github/stars/prerak23/Dir_SrcMic_DOA?style=flat)](https://github.com/prerak23/Dir_SrcMic_DOA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/srivastava23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.16958-b31b1b.svg)](https://arxiv.org/abs/2211.16958) | +| 2271 | MMER: Multimodal Multi-task Learning for Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Sreyan88/MMER?style=flat)](https://github.com/Sreyan88/MMER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ghosh23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.16794-b31b1b.svg)](https://arxiv.org/abs/2203.16794) | +| 909 | A Multi-task Learning Framework for Sound Event Detection using High-Level Acoustic Characteristics of Sounds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/khandelwal23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10729-b31b1b.svg)](https://arxiv.org/abs/2305.10729) |
@@ -1082,11 +1082,11 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2194 | A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression with and without Medication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/neumann23b_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1n6ymnLKt21RDfawu9WHsd8tgmBPuz9SC/view) | -| 307 | Understanding Disrupted Sentences using Underspecified Abstract Meaning Representation | [![GitHub](https://img.shields.io/github/stars/amazon-science/disrupt-amr?style=flat)](https://github.com/amazon-science/disrupt-amr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/addlesee23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/understanding-disrupted-sentences-using-underspecified-abstract-meaning-representation) | -| 2109 | Developing Speech Processing Pipelines for Police Accountability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/field23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06086-b31b1b.svg)](https://arxiv.org/abs/2306.06086) | -| 2086 | Prosody-Controllable Gender-Ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception | [![GitHub](https://img.shields.io/github/stars/evaszekely/ambiguous?style=flat)](https://github.com/evaszekely/ambiguous) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/szekely23_interspeech.pdf) | -| 848 | Affective Attributes of French Caregivers' Professional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rouas23_interspeech.pdf) | +| 2194 | A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression with and without Medication | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/neumann23b_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1n6ymnLKt21RDfawu9WHsd8tgmBPuz9SC/view) | +| 307 | Understanding Disrupted Sentences using Underspecified Abstract Meaning Representation | [![GitHub](https://img.shields.io/github/stars/amazon-science/disrupt-amr?style=flat)](https://github.com/amazon-science/disrupt-amr) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/addlesee23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/understanding-disrupted-sentences-using-underspecified-abstract-meaning-representation) | +| 2109 | Developing Speech Processing Pipelines for Police Accountability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/field23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06086-b31b1b.svg)](https://arxiv.org/abs/2306.06086) | +| 2086 | Prosody-Controllable Gender-Ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception | [![GitHub](https://img.shields.io/github/stars/evaszekely/ambiguous?style=flat)](https://github.com/evaszekely/ambiguous) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/szekely23_interspeech.pdf) | +| 848 | Affective Attributes of French Caregivers' Professional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rouas23_interspeech.pdf) |
@@ -1098,54 +1098,54 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 180 | Pragmatic Pertinence: A Learnable Confidence Metric to Assess the Subjective Quality of LM-Generated Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bellegarda23_interspeech.pdf) | -| 2078 | ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ea_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16065-b31b1b.svg)](https://arxiv.org/abs/2305.16065) | -| 916 | BASS: Block-wise Adaptation for Speech Summarization | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sharma23_interspeech.pdf) | -| 1258 | Speaker Tracking using Graph Attention Networks with Varying Duration Utterances in Multi-Channel Naturalistic Data: Fearless Steps Apollo 11 Audio Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shekar23_interspeech.pdf) | -| 36 | Combining Language Corpora in a Japanese Electromagnetic Articulography Database for Acoustic-to-Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yan23_interspeech.pdf) | -| 523 | A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/zxiaohen/Speech-emotion-recognition-MCFN?style=flat)](https://github.com/zxiaohen/Speech-emotion-recognition-MCFN) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23g_interspeech.pdf) | -| 2174 | Large Dataset Generation of Synchronized Music Audio and Lyrics at Scale using Teacher-Student Paradigm | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chivriga23_interspeech.pdf) | -| 483 | Enc-Dec RNN Acoustic Word Embeddings Learned via Pairwise Prediction | [![GitHub](https://img.shields.io/github/stars/madhavlab/2023_adhiraj_encdecPairwisePred?style=flat)](https://github.com/madhavlab/2023_adhiraj_encdecPairwisePred) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/banerjee23_interspeech.pdf) | -| 864 | Query based Acoustic Summarization for Podcasts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kotey23_interspeech.pdf) | -| 1242 | Spot Keywords from Very Noisy and Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17706-b31b1b.svg)](https://arxiv.org/abs/2305.17706) | -| 891 | Knowledge Distillation on Joint Task End-to-End Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nayem23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/knowledge-distillation-on-joint-task-end-to-end-speech-translation) | -| 343 | Investigating Pre-trained Audio Encoders in the Low-Resource Condition | [![GitHub](https://img.shields.io/github/stars/YangHao97/investigateAudioEncoders?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17733-b31b1b.svg)](https://arxiv.org/abs/2305.17733) | -| 1718 | Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18096-b31b1b.svg)](https://arxiv.org/abs/2305.18096) | -| 823 | MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information | [![GitHub](https://img.shields.io/github/stars/SpringHuo/MAVD?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02263-b31b1b.svg)](https://arxiv.org/abs/2306.02263) | -| 1674 | CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://cnceleb.org/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23y_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16049-b31b1b.svg)](https://arxiv.org/abs/2305.16049) | -| 1762 | Improving Zero-Shot Cross-Domain Slot Filling via Transformer-based Slot Semantics Fusion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ca_interspeech.pdf) | -| 619 | Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shin23_interspeech.pdf) | -| 1468 | Boosting Punctuation Restoration with Data Generation and Reinforcement Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lai23c_interspeech.pdf) | -| 695 | J-ToneNet: A Transformer-based Encoding Network for Improving Tone Classification in Continuous Speech via F0 Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23e_interspeech.pdf) | -| 1152 | Towards Cross-Language Prosody Transfer for Dialog | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.cs.utep.edu/nigel/abstracts/interspeech2023.html)
[![GitHub](https://img.shields.io/github/stars/joneavila/DRAL?style=flat)](https://github.com/joneavila/DRAL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/avila23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.cs.utep.edu/nigel/papers/interspeech2023.pdf) | -| 2506 | Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kesiraju23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00208-b31b1b.svg)](https://arxiv.org/abs/2306.00208) | -| 1980 | ITALIC: An Italian Intent Classification Dataset | [![GitHub](https://img.shields.io/github/stars/RiTA-nlp/ITALIC?style=flat)](https://github.com/RiTA-nlp/ITALIC)
[![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/8040649) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/koudounas23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08502-b31b1b.svg)](https://arxiv.org/abs/2306.08502) | -| 1778 | Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rugayan23_interspeech.pdf) | -| 1466 | How ChatGPT is Robust for Spoken Language Understanding? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23s_interspeech.pdf) | -| 1233 | GigaST: A 10,000-hour Pseudo Speech Translation Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://st-benchmark.github.io/resources/GigaST.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ye23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2204.03939-b31b1b.svg)](https://arxiv.org/abs/2204.03939) | -| 1570 | Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fan23b_interspeech.pdf) | -| 2473 | Crowdsource-based Validation of the Audio Cocktail as a Sound Browsing Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fallgren23_interspeech.pdf) | -| 1675 | PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts | [![GitHub](https://img.shields.io/github/stars/cpii-cai/PunCantonese?style=flat)](https://github.com/cpii-cai/PunCantonese) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23z_interspeech.pdf) | -| 1358 | Speech-to-Face Conversion using Denoising Diffusion Probabilistic Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kato23_interspeech.pdf) | -| 2255 | Inter-Connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nishikawa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16897-b31b1b.svg)](https://arxiv.org/abs/2305.16897) | -| 1068 | How Does Pretraining Improve Discourse-aware Translation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19847-b31b1b.svg)](https://arxiv.org/abs/2305.19847) | -| 1135 | PATCorrect: Non-Autoregressive Phoneme-Augmented Transformer for ASR Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.05040-b31b1b.svg)](https://arxiv.org/abs/2302.05040) | -| 161 | Model-assisted Lexical Tone Evaluation of Three-Year-Old Chinese-Speaking Children by also Considering Segment Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tseng23_interspeech.pdf) | -| 1392 | Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/declare-lab/segue?style=flat)](https://github.com/declare-lab/segue) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tan23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12301-b31b1b.svg)](https://arxiv.org/abs/2305.12301) | -| 1582 | Joint Time and Frequency Transformer for Chinese Opera Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23u_interspeech.pdf) | -| 116 | AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jung23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.14564-b31b1b.svg)](https://arxiv.org/abs/2210.14564) | -| 2252 | Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arvan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.10033-b31b1b.svg)](https://arxiv.org/abs/2306.10033) | -| 2250 | Combining Heterogeneous Structures for Event Causality Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pouranbenveyseh23_interspeech.pdf) | -| 1208 | An Efficient Approach for the Automated Segmentation and Transcription of the People's Speech Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/biswas23_interspeech.pdf) | -| 1425 | Diverse Feature Mapping and Fusion via Multitask Learning for Multilingual Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23g_interspeech.pdf) | -| 903 | Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text | [![GitHub](https://img.shields.io/github/stars/apptek/ArabicDiacritizationInterspeech2023?style=flat)](https://github.com/apptek/ArabicDiacritizationInterspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bahar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03557-b31b1b.svg)](https://arxiv.org/abs/2306.03557) | -| 466 | Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin | [![GitHub](https://img.shields.io/github/stars/muhammed-saeed/CLaT?style=flat)](https://github.com/muhammed-saeed/CLaT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00382-b31b1b.svg)](https://arxiv.org/abs/2307.00382) | -| 1878 | Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23j_interspeech.pdf) | -| 597 | PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords | [![GitHub](https://img.shields.io/github/stars/ncsoft/PhonMatchNet?style=flat)](https://github.com/ncsoft/PhonMatchNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23d_interspeech.pdf) | -| 69 | Mix before Align: Towards Zero-Shot Cross-Lingual Sentiment Analysis via Soft-Mix and Multi-View Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23_interspeech.pdf) | -| 170 | AlignAtt: using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/papi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11408-b31b1b.svg)](https://arxiv.org/abs/2305.11408) | -| 2225 | Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/polak23_interspeech.pdf) | -| 1979 | Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages | [![GitHub](https://img.shields.io/github/stars/unza-speech-lab/zambezi-voice?style=flat)](https://github.com/unza-speech-lab/zambezi-voice) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sikasote23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04428-b31b1b.svg)](https://arxiv.org/abs/2306.04428) | +| 180 | Pragmatic Pertinence: A Learnable Confidence Metric to Assess the Subjective Quality of LM-Generated Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bellegarda23_interspeech.pdf) | +| 2078 | ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ea_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16065-b31b1b.svg)](https://arxiv.org/abs/2305.16065) | +| 916 | BASS: Block-wise Adaptation for Speech Summarization | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sharma23_interspeech.pdf) | +| 1258 | Speaker Tracking using Graph Attention Networks with Varying Duration Utterances in Multi-Channel Naturalistic Data: Fearless Steps Apollo 11 Audio Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shekar23_interspeech.pdf) | +| 36 | Combining Language Corpora in a Japanese Electromagnetic Articulography Database for Acoustic-to-Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yan23_interspeech.pdf) | +| 523 | A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/zxiaohen/Speech-emotion-recognition-MCFN?style=flat)](https://github.com/zxiaohen/Speech-emotion-recognition-MCFN) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23g_interspeech.pdf) | +| 2174 | Large Dataset Generation of Synchronized Music Audio and Lyrics at Scale using Teacher-Student Paradigm | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chivriga23_interspeech.pdf) | +| 483 | Enc-Dec RNN Acoustic Word Embeddings Learned via Pairwise Prediction | [![GitHub](https://img.shields.io/github/stars/madhavlab/2023_adhiraj_encdecPairwisePred?style=flat)](https://github.com/madhavlab/2023_adhiraj_encdecPairwisePred) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/banerjee23_interspeech.pdf) | +| 864 | Query based Acoustic Summarization for Podcasts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kotey23_interspeech.pdf) | +| 1242 | Spot Keywords from Very Noisy and Mixed Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17706-b31b1b.svg)](https://arxiv.org/abs/2305.17706) | +| 891 | Knowledge Distillation on Joint Task End-to-End Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nayem23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/knowledge-distillation-on-joint-task-end-to-end-speech-translation) | +| 343 | Investigating Pre-trained Audio Encoders in the Low-Resource Condition | [![GitHub](https://img.shields.io/github/stars/YangHao97/investigateAudioEncoders?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17733-b31b1b.svg)](https://arxiv.org/abs/2305.17733) | +| 1718 | Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18096-b31b1b.svg)](https://arxiv.org/abs/2305.18096) | +| 823 | MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information | [![GitHub](https://img.shields.io/github/stars/SpringHuo/MAVD?style=flat)](https://github.com/YangHao97/investigateAudioEncoders) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02263-b31b1b.svg)](https://arxiv.org/abs/2306.02263) | +| 1674 | CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://cnceleb.org/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23y_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16049-b31b1b.svg)](https://arxiv.org/abs/2305.16049) | +| 1762 | Improving Zero-Shot Cross-Domain Slot Filling via Transformer-based Slot Semantics Fusion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ca_interspeech.pdf) | +| 619 | Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shin23_interspeech.pdf) | +| 1468 | Boosting Punctuation Restoration with Data Generation and Reinforcement Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lai23c_interspeech.pdf) | +| 695 | J-ToneNet: A Transformer-based Encoding Network for Improving Tone Classification in Continuous Speech via F0 Sequences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23e_interspeech.pdf) | +| 1152 | Towards Cross-Language Prosody Transfer for Dialog | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.cs.utep.edu/nigel/abstracts/interspeech2023.html)
[![GitHub](https://img.shields.io/github/stars/joneavila/DRAL?style=flat)](https://github.com/joneavila/DRAL) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/avila23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.cs.utep.edu/nigel/papers/interspeech2023.pdf) | +| 2506 | Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kesiraju23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00208-b31b1b.svg)](https://arxiv.org/abs/2306.00208) | +| 1980 | ITALIC: An Italian Intent Classification Dataset | [![GitHub](https://img.shields.io/github/stars/RiTA-nlp/ITALIC?style=flat)](https://github.com/RiTA-nlp/ITALIC)
[![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/8040649) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/koudounas23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08502-b31b1b.svg)](https://arxiv.org/abs/2306.08502) | +| 1778 | Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rugayan23_interspeech.pdf) | +| 1466 | How ChatGPT is Robust for Spoken Language Understanding? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23s_interspeech.pdf) | +| 1233 | GigaST: A 10,000-hour Pseudo Speech Translation Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://st-benchmark.github.io/resources/GigaST.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ye23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2204.03939-b31b1b.svg)](https://arxiv.org/abs/2204.03939) | +| 1570 | Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fan23b_interspeech.pdf) | +| 2473 | Crowdsource-based Validation of the Audio Cocktail as a Sound Browsing Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fallgren23_interspeech.pdf) | +| 1675 | PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts | [![GitHub](https://img.shields.io/github/stars/cpii-cai/PunCantonese?style=flat)](https://github.com/cpii-cai/PunCantonese) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23z_interspeech.pdf) | +| 1358 | Speech-to-Face Conversion using Denoising Diffusion Probabilistic Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kato23_interspeech.pdf) | +| 2255 | Inter-Connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nishikawa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16897-b31b1b.svg)](https://arxiv.org/abs/2305.16897) | +| 1068 | How Does Pretraining Improve Discourse-aware Translation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19847-b31b1b.svg)](https://arxiv.org/abs/2305.19847) | +| 1135 | PATCorrect: Non-Autoregressive Phoneme-Augmented Transformer for ASR Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.05040-b31b1b.svg)](https://arxiv.org/abs/2302.05040) | +| 161 | Model-assisted Lexical Tone Evaluation of Three-Year-Old Chinese-Speaking Children by also Considering Segment Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tseng23_interspeech.pdf) | +| 1392 | Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/declare-lab/segue?style=flat)](https://github.com/declare-lab/segue) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tan23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12301-b31b1b.svg)](https://arxiv.org/abs/2305.12301) | +| 1582 | Joint Time and Frequency Transformer for Chinese Opera Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23u_interspeech.pdf) | +| 116 | AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jung23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.14564-b31b1b.svg)](https://arxiv.org/abs/2210.14564) | +| 2252 | Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arvan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.10033-b31b1b.svg)](https://arxiv.org/abs/2306.10033) | +| 2250 | Combining Heterogeneous Structures for Event Causality Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pouranbenveyseh23_interspeech.pdf) | +| 1208 | An Efficient Approach for the Automated Segmentation and Transcription of the People's Speech Corpus | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/biswas23_interspeech.pdf) | +| 1425 | Diverse Feature Mapping and Fusion via Multitask Learning for Multilingual Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23g_interspeech.pdf) | +| 903 | Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text | [![GitHub](https://img.shields.io/github/stars/apptek/ArabicDiacritizationInterspeech2023?style=flat)](https://github.com/apptek/ArabicDiacritizationInterspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bahar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03557-b31b1b.svg)](https://arxiv.org/abs/2306.03557) | +| 466 | Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin | [![GitHub](https://img.shields.io/github/stars/muhammed-saeed/CLaT?style=flat)](https://github.com/muhammed-saeed/CLaT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00382-b31b1b.svg)](https://arxiv.org/abs/2307.00382) | +| 1878 | Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23j_interspeech.pdf) | +| 597 | PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords | [![GitHub](https://img.shields.io/github/stars/ncsoft/PhonMatchNet?style=flat)](https://github.com/ncsoft/PhonMatchNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23d_interspeech.pdf) | +| 69 | Mix before Align: Towards Zero-Shot Cross-Lingual Sentiment Analysis via Soft-Mix and Multi-View Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23_interspeech.pdf) | +| 170 | AlignAtt: using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/papi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11408-b31b1b.svg)](https://arxiv.org/abs/2305.11408) | +| 2225 | Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/polak23_interspeech.pdf) | +| 1979 | Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages | [![GitHub](https://img.shields.io/github/stars/unza-speech-lab/zambezi-voice?style=flat)](https://github.com/unza-speech-lab/zambezi-voice) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sikasote23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04428-b31b1b.svg)](https://arxiv.org/abs/2306.04428) |
@@ -1157,34 +1157,34 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2421 | Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13108-b31b1b.svg)](https://arxiv.org/abs/2305.13108) | -| 2198 | Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/papadimitriou23_interspeech.pdf) | -| 1759 | Towards Supporting an Early Diagnosis of Multiple Sclerosis using Vocal Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gonzalezmachorro23_interspeech.pdf) | -| 1891 | Whisper Features for Dysarthric Severity-Level Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rathod23_interspeech.pdf) | -| 2191 | A New Benchmark of Aphasia Speech Recognition and Detection based on E-Branchformer and Multi-task Learning | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/aphasiabank/asr1) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13331-b31b1b.svg)](https://arxiv.org/abs/2305.13331) | -| 222 | Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yue23_interspeech.pdf) | -| 2026 | A Stutter Seldom Comes Alone - Cross-Corpus Stuttering Detection as a Multi-label Problem | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bayerl23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19255-b31b1b.svg)](https://arxiv.org/abs/2305.19255) | -| 1542 | Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhattacharjee23_interspeech.pdf) | -| 2203 | DuTa-VC: A Duration-aware Typical-to-Atypical Voice Conversion Approach with Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wanghelin1997.github.io/DuTa-VC-Demo/)
[![GitHub](https://img.shields.io/github/stars/WangHelin1997/DuTa-VC?style=flat)](https://github.com/WangHelin1997/DuTa-VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23qa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.10588-b31b1b.svg)](https://arxiv.org/abs/2306.10588) | -| 201 | CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice | [![GitHub](https://img.shields.io/github/stars/hedeshy/CNVVE?style=flat)](https://github.com/hedeshy/CNVVE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hedeshy23_interspeech.pdf)
[![University of Southampton](https://img.shields.io/badge/soton-ac-015C84.svg)](https://eprints.soton.ac.uk/478344/) | -| 1541 | Arabic Dysarthric Speech Recognition using Adversarial and Signal-based Augmentation | [![GitHub](https://img.shields.io/github/stars/massabaali7/AR_Dysarthric?style=flat)](https://github.com/massabaali7/AR_Dysarthric) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baali23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04368-b31b1b.svg)](https://arxiv.org/abs/2306.04368) | -| 1887 | Weakly-Supervised Forced Alignment of Disfluent Speech using Phoneme-level Modeling | [![GitHub](https://img.shields.io/github/stars/zelaki/WSFA?style=flat)](https://github.com/zelaki/WSFA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kouzelis23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00996-b31b1b.svg)](https://arxiv.org/abs/2306.00996) | -| 1998 | Glottal Source Analysis of Voice Deficits in Basal Ganglia Dysfunction: Evidence from de novo Parkinson's Disease and Huntington's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/novotny23_interspeech.pdf) | -| 2478 | An Analysis of Glottal Features of Chronic Kidney Disease Speech and its Application to CKD Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mun23b_interspeech.pdf) | -| 983 | Weakly Supervised Glottis Segmentation in High-Speed Video Endoscopy using Bounding Box Labels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/belagali23_interspeech.pdf) | -| 1669 | Investigating the Dynamics of Hand and Lips in French Cued Speech using Attention Mechanisms and CTC-based Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sankar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08290-b31b1b.svg)](https://arxiv.org/abs/2306.08290) | -| 670 | Hearing Loss Affects Emotion Perception in Older Adults: Evidence from a Prosody-Semantics Stroop Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23h_interspeech.pdf) | -| 554 | Cochlear-Implant Listeners Listening to Cochlear-Implant Simulated Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kong23b_interspeech.pdf) | -| 2168 | Validation of a Task-Independent Cepstral Peak Prominence Measure with Voice Activity Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/murton23_interspeech.pdf) | -| 1679 | Score-balanced Loss for Multi-aspect Pronunciation Assessment | [![GitHub](https://img.shields.io/github/stars/doheejin/SB_loss_PA?style=flat)](https://github.com/doheejin/SB_loss_PA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16664-b31b1b.svg)](https://arxiv.org/abs/2305.16664) | -| 2108 | Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection using Speech from Different Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tayebiarasteh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11284-b31b1b.svg)](https://arxiv.org/abs/2305.11284) | -| 652 | F0inTFS: A Lightweight Periodicity Enhancement Strategy for Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23c_interspeech.pdf) | -| 1678 | Differentiating Acoustic and Physiological Features in Speech for Hypoxia Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/obrien23_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-04154914.svg)](https://hal.science/hal-04154914) | -| 786 | Mandarin Electrolaryngeal Speech Voice Conversion using Cross-Domain Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06653-b31b1b.svg)](https://arxiv.org/abs/2306.06653) | -| 866 | Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chien23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06652-b31b1b.svg)](https://arxiv.org/abs/2306.06652) | -| 1744 | Which Aspects of Motor Speech Disorder are Captured by Mel Frequency Cepstral Coefficients? Evidence from the Change in STN-DBS Conditions in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/illner23_interspeech.pdf) | -| 1096 | Detecting Manifest Huntington's Disease using Vocal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/subramanian23_interspeech.pdf) | -| 1623 | Exploring Multi-Task Learning and Data Augmentation in Dementia Detection with Self-Supervised Pre-trained Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23q_interspeech.pdf) | +| 2421 | Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13108-b31b1b.svg)](https://arxiv.org/abs/2305.13108) | +| 2198 | Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/papadimitriou23_interspeech.pdf) | +| 1759 | Towards Supporting an Early Diagnosis of Multiple Sclerosis using Vocal Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gonzalezmachorro23_interspeech.pdf) | +| 1891 | Whisper Features for Dysarthric Severity-Level Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rathod23_interspeech.pdf) | +| 2191 | A New Benchmark of Aphasia Speech Recognition and Detection based on E-Branchformer and Multi-task Learning | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet/tree/master/egs2/aphasiabank/asr1) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13331-b31b1b.svg)](https://arxiv.org/abs/2305.13331) | +| 222 | Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yue23_interspeech.pdf) | +| 2026 | A Stutter Seldom Comes Alone - Cross-Corpus Stuttering Detection as a Multi-label Problem | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bayerl23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19255-b31b1b.svg)](https://arxiv.org/abs/2305.19255) | +| 1542 | Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhattacharjee23_interspeech.pdf) | +| 2203 | DuTa-VC: A Duration-aware Typical-to-Atypical Voice Conversion Approach with Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wanghelin1997.github.io/DuTa-VC-Demo/)
[![GitHub](https://img.shields.io/github/stars/WangHelin1997/DuTa-VC?style=flat)](https://github.com/WangHelin1997/DuTa-VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23qa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.10588-b31b1b.svg)](https://arxiv.org/abs/2306.10588) | +| 201 | CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice | [![GitHub](https://img.shields.io/github/stars/hedeshy/CNVVE?style=flat)](https://github.com/hedeshy/CNVVE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hedeshy23_interspeech.pdf)
[![University of Southampton](https://img.shields.io/badge/soton-ac-015C84.svg)](https://eprints.soton.ac.uk/478344/) | +| 1541 | Arabic Dysarthric Speech Recognition using Adversarial and Signal-based Augmentation | [![GitHub](https://img.shields.io/github/stars/massabaali7/AR_Dysarthric?style=flat)](https://github.com/massabaali7/AR_Dysarthric) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baali23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04368-b31b1b.svg)](https://arxiv.org/abs/2306.04368) | +| 1887 | Weakly-Supervised Forced Alignment of Disfluent Speech using Phoneme-level Modeling | [![GitHub](https://img.shields.io/github/stars/zelaki/WSFA?style=flat)](https://github.com/zelaki/WSFA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kouzelis23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00996-b31b1b.svg)](https://arxiv.org/abs/2306.00996) | +| 1998 | Glottal Source Analysis of Voice Deficits in Basal Ganglia Dysfunction: Evidence from de novo Parkinson's Disease and Huntington's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/novotny23_interspeech.pdf) | +| 2478 | An Analysis of Glottal Features of Chronic Kidney Disease Speech and its Application to CKD Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mun23b_interspeech.pdf) | +| 983 | Weakly Supervised Glottis Segmentation in High-Speed Video Endoscopy using Bounding Box Labels | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/belagali23_interspeech.pdf) | +| 1669 | Investigating the Dynamics of Hand and Lips in French Cued Speech using Attention Mechanisms and CTC-based Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sankar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08290-b31b1b.svg)](https://arxiv.org/abs/2306.08290) | +| 670 | Hearing Loss Affects Emotion Perception in Older Adults: Evidence from a Prosody-Semantics Stroop Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23h_interspeech.pdf) | +| 554 | Cochlear-Implant Listeners Listening to Cochlear-Implant Simulated Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kong23b_interspeech.pdf) | +| 2168 | Validation of a Task-Independent Cepstral Peak Prominence Measure with Voice Activity Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/murton23_interspeech.pdf) | +| 1679 | Score-balanced Loss for Multi-aspect Pronunciation Assessment | [![GitHub](https://img.shields.io/github/stars/doheejin/SB_loss_PA?style=flat)](https://github.com/doheejin/SB_loss_PA) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16664-b31b1b.svg)](https://arxiv.org/abs/2305.16664) | +| 2108 | Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection using Speech from Different Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tayebiarasteh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11284-b31b1b.svg)](https://arxiv.org/abs/2305.11284) | +| 652 | F0inTFS: A Lightweight Periodicity Enhancement Strategy for Cochlear Implants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23c_interspeech.pdf) | +| 1678 | Differentiating Acoustic and Physiological Features in Speech for Hypoxia Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/obrien23_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-04154914.svg)](https://hal.science/hal-04154914) | +| 786 | Mandarin Electrolaryngeal Speech Voice Conversion using Cross-Domain Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23h_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06653-b31b1b.svg)](https://arxiv.org/abs/2306.06653) | +| 866 | Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chien23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06652-b31b1b.svg)](https://arxiv.org/abs/2306.06652) | +| 1744 | Which Aspects of Motor Speech Disorder are Captured by Mel Frequency Cepstral Coefficients? Evidence from the Change in STN-DBS Conditions in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/illner23_interspeech.pdf) | +| 1096 | Detecting Manifest Huntington's Disease using Vocal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/subramanian23_interspeech.pdf) | +| 1623 | Exploring Multi-Task Learning and Data Augmentation in Dementia Detection with Self-Supervised Pre-trained Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23q_interspeech.pdf) |
@@ -1196,12 +1196,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 478 | Matching Latent Encoding for Audio-Text based Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nishu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05245-b31b1b.svg)](https://arxiv.org/abs/2306.05245) | -| 1215 | Self-Paced Pattern Augmentation for Spoken Term Detection in Zero-Resource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/p23_interspeech.pdf) | -| 2362 | On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23y_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/on-device-constrained-self-supervised-speech-representation-learning-for-keyword-spotting-via-knowledge-distillation) | -| 90 | Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/michieli23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.12660-b31b1b.svg)](https://arxiv.org/abs/2307.12660) | -| 689 | Improving Small Footprint Few-Shot Keyword Spotting with Supervision on Auxiliary Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23j_interspeech.pdf) | -| 2222 | Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23t_interspeech.pdf) | +| 478 | Matching Latent Encoding for Audio-Text based Keyword Spotting | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nishu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05245-b31b1b.svg)](https://arxiv.org/abs/2306.05245) | +| 1215 | Self-Paced Pattern Augmentation for Spoken Term Detection in Zero-Resource | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/p23_interspeech.pdf) | +| 2362 | On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23y_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/on-device-constrained-self-supervised-speech-representation-learning-for-keyword-spotting-via-knowledge-distillation) | +| 90 | Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/michieli23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.12660-b31b1b.svg)](https://arxiv.org/abs/2307.12660) | +| 689 | Improving Small Footprint Few-Shot Keyword Spotting with Supervision on Auxiliary Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23j_interspeech.pdf) | +| 2222 | Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23t_interspeech.pdf) |
@@ -1213,12 +1213,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 831 | Enhancing the Unified Streaming and Non-Streaming Model with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00755-b31b1b.svg)](https://arxiv.org/abs/2306.00755) | -| 1497 | ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/song23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10649-b31b1b.svg)](https://arxiv.org/abs/2305.10649) | -| 361 | Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01296-b31b1b.svg)](https://arxiv.org/abs/2306.01296) | -| 1129 | DCTX-Conformer: Dynamic Context Carry-over for Low Latency Unified Streaming and Non-Streaming Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huybrechts23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08175-b31b1b.svg)](https://arxiv.org/abs/2306.08175) | -| 1121 | Knowledge Distillation from Non-Streaming to Streaming ASR Encoder using Auxiliary Non-Streaming Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shim23_interspeech.pdf) | -| 884 | Adaptive Contextual Biasing for Transducer based Streaming Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00804-b31b1b.svg)](https://arxiv.org/abs/2306.00804) | +| 831 | Enhancing the Unified Streaming and Non-Streaming Model with Contrastive Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00755-b31b1b.svg)](https://arxiv.org/abs/2306.00755) | +| 1497 | ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/song23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10649-b31b1b.svg)](https://arxiv.org/abs/2305.10649) | +| 361 | Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01296-b31b1b.svg)](https://arxiv.org/abs/2306.01296) | +| 1129 | DCTX-Conformer: Dynamic Context Carry-over for Low Latency Unified Streaming and Non-Streaming Conformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huybrechts23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08175-b31b1b.svg)](https://arxiv.org/abs/2306.08175) | +| 1121 | Knowledge Distillation from Non-Streaming to Streaming ASR Encoder using Auxiliary Non-Streaming Layer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shim23_interspeech.pdf) | +| 884 | Adaptive Contextual Biasing for Transducer based Streaming Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00804-b31b1b.svg)](https://arxiv.org/abs/2306.00804) |
@@ -1230,12 +1230,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1753 | Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://avlit-interspeech.github.io/)
[![GitHub](https://img.shields.io/github/stars/hmartelb/avlit?style=flat)](https://github.com/hmartelb/avlit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00160-b31b1b.svg)](https://arxiv.org/abs/2306.00160) | -| 1389 | Remixing-based Unsupervised Source Separation from Scratch | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/saijo23_interspeech.pdf) | -| 577 | CAPTDURE: Captioned Sound Dataset of Single Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/okamoto23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17758-b31b1b.svg)](https://arxiv.org/abs/2305.17758) | -| 488 | Recursive Sound Source Separation with Deep Learning-based Beamforming for Unknown Number of Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/munakata23_interspeech.pdf) | -| 2537 | Multi-Channel Speech Separation with Cross-Attention and Beamforming | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mosner23_interspeech.pdf) | -| 185 | Background-Sound Controllable Voice Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eom23_interspeech.pdf) | +| 1753 | Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://avlit-interspeech.github.io/)
[![GitHub](https://img.shields.io/github/stars/hmartelb/avlit?style=flat)](https://github.com/hmartelb/avlit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00160-b31b1b.svg)](https://arxiv.org/abs/2306.00160) | +| 1389 | Remixing-based Unsupervised Source Separation from Scratch | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/saijo23_interspeech.pdf) | +| 577 | CAPTDURE: Captioned Sound Dataset of Single Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/okamoto23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17758-b31b1b.svg)](https://arxiv.org/abs/2305.17758) | +| 488 | Recursive Sound Source Separation with Deep Learning-based Beamforming for Unknown Number of Sources | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/munakata23_interspeech.pdf) | +| 2537 | Multi-Channel Speech Separation with Cross-Attention and Beamforming | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mosner23_interspeech.pdf) | +| 185 | Background-Sound Controllable Voice Source Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eom23_interspeech.pdf) |
@@ -1247,12 +1247,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1922 | A Neural Architecture for Selective Attention to Speech Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jurov23_interspeech.pdf) | -| 1122 | Quantifying Informational Masking due to Masker Intelligibility in Same-Talker Speech-in-Speech Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huo23_interspeech.pdf) | -| 1476 | On the Benefits of Self-Supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cuervo23_interspeech.pdf) | -| 2154 | Predicting Perceptual Centers Located at Vowel Onset in German Speech using Long Short-Term Memory Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schulz23_interspeech.pdf) | -| 63 | Exploring the Mutual Intelligibility Breakdown Caused by Sculpting Speech from a Competing Speech Signal | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cooke23_interspeech.pdf) | -| 2103 | Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kitahara23_interspeech.pdf) | +| 1922 | A Neural Architecture for Selective Attention to Speech Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jurov23_interspeech.pdf) | +| 1122 | Quantifying Informational Masking due to Masker Intelligibility in Same-Talker Speech-in-Speech Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huo23_interspeech.pdf) | +| 1476 | On the Benefits of Self-Supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cuervo23_interspeech.pdf) | +| 2154 | Predicting Perceptual Centers Located at Vowel Onset in German Speech using Long Short-Term Memory Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schulz23_interspeech.pdf) | +| 63 | Exploring the Mutual Intelligibility Breakdown Caused by Sculpting Speech from a Competing Speech Signal | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cooke23_interspeech.pdf) | +| 2103 | Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kitahara23_interspeech.pdf) | @@ -1264,12 +1264,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1879 | The Emergence of Obstruent-Intrinsic f0 and VOT as Cues to the Fortis/Lenis Contrast in West Central Bavarian | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pohnlein23_interspeech.pdf) | -| 431 | 〈'〉 in Tsimane': A Preliminary Investigation | [![GIN](https://img.shields.io/badge/G-Node-2854A4.svg)](https://gin.g-node.org/William-N-Havard/tsimane-glottal-interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/havard23_interspeech.pdf) | -| 2200 | Segmental Features of Brazilian (Santa Catarina) Hunsrik | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hoffmann23_interspeech.pdf) | -| 2337 | Opening or Closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ratko23_interspeech.pdf) | -| 295 | Increasing Aspiration of Word-Medial Fortis Plosives in Swiss Standard German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zebe23_interspeech.pdf) | -| 1456 | Lexical Stress and Velar Palatalization in Italian: A Spatio-Temporal Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shao23_interspeech.pdf) | +| 1879 | The Emergence of Obstruent-Intrinsic f0 and VOT as Cues to the Fortis/Lenis Contrast in West Central Bavarian | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pohnlein23_interspeech.pdf) | +| 431 | 〈'〉 in Tsimane': A Preliminary Investigation | [![GIN](https://img.shields.io/badge/G-Node-2854A4.svg)](https://gin.g-node.org/William-N-Havard/tsimane-glottal-interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/havard23_interspeech.pdf) | +| 2200 | Segmental Features of Brazilian (Santa Catarina) Hunsrik | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hoffmann23_interspeech.pdf) | +| 2337 | Opening or Closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ratko23_interspeech.pdf) | +| 295 | Increasing Aspiration of Word-Medial Fortis Plosives in Swiss Standard German | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zebe23_interspeech.pdf) | +| 1456 | Lexical Stress and Velar Palatalization in Italian: A Spatio-Temporal Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shao23_interspeech.pdf) | @@ -1281,65 +1281,65 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1989 | Vietnam-Celeb: A Large-Scale Dataset for Vietnamese Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/thanhpv2102/Vietnam-Celeb.Interspeech?style=flat)](https://github.com/thanhpv2102/Vietnam-Celeb.Interspeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pham23b_interspeech.pdf) | -| 2254 | What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://is23-2254.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06524-b31b1b.svg)](https://arxiv.org/abs/2306.06524) | -| 241 | The 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14624-b31b1b.svg)](https://arxiv.org/abs/2302.14624) | -| 155 | Description and Analysis of the KPT system for NIST Language Recognition Evaluation 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sarni23_interspeech.pdf) | -| 1725 | ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention | [![GitHub](https://img.shields.io/github/stars/Yip-Jia-Qi/ACA-Net?style=flat)](https://github.com/Yip-Jia-Qi/ACA-Net) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yip23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12121-b31b1b.svg)](https://arxiv.org/abs/2305.12121) | -| 402 | Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yao23_interspeech.pdf) | -| 2052 | Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/singh23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07501-b31b1b.svg)](https://arxiv.org/abs/2306.07501)| -| 2569 | Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dey23_interspeech.pdf) | -| 1407 | A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-trained General Purpose Speech Model | [![GitHub](https://img.shields.io/github/stars/Srijith-rkr/KAUST-Whisper-Adapter?style=flat)](https://github.com/Srijith-rkr/KAUST-Whisper-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/radhakrishnan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11244-b31b1b.svg)](https://arxiv.org/abs/2305.11244)| -| 2272 | HABLA: A Dataset of Latin American Spanish Accents for Voice Anti-Spoofing | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7370805) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tamayoflorez23_interspeech.pdf) | -| 1702 | Self-Supervised Learning Representation based Accent Recognition with Persistent Accent Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23aa_interspeech.pdf) | -| 800 | Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23g_interspeech.pdf) | -| 1974 | Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/das23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.10326-b31b1b.svg)](https://arxiv.org/abs/2302.10326) | -| 105 | Pyannote.Audio 2.1 Speaker Diarization Pipeline: Principle, Benchmark and Recipe | [![GitHub](https://img.shields.io/github/stars/pyannote/pyannote-audio?style=flat)](https://github.com/pyannote/pyannote-audio) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bredin23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://huggingface.co/bhuvanesh25/pyannote-diar-copy/resolve/main/technical_report_2.1.pdf) | -| 1524 | Model Compression for DNN-based Speaker Verification using Weight Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17326-b31b1b.svg)](https://arxiv.org/abs/2210.17326) | -| 1354 | Multi-Resolution Approach to Identification of Spoken Languages and to Improve Overall Language Diarization System using Whisper Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vachhani23_interspeech.pdf) | -| 125 | Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zeng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10940-b31b1b.svg)](https://arxiv.org/abs/2305.10940) | -| 849 | Dynamic Fully-Connected Layer for Large-Scale Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/song23b_interspeech.pdf) | -| 844 | Reversible Neural Networks for Memory-Efficient Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23i_interspeech.pdf) | -| 777 | ECAPA++: Fine-grained Deep Embedding Learning for TDNN based Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23f_interspeech.pdf) | -| 1206 | TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13701-b31b1b.svg)](https://arxiv.org/abs/2305.13701) | -| 100 | Fooling Speaker Identification Systems with Adversarial Background Music | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zuo23_interspeech.pdf) | -| 1314 | Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23r_interspeech.pdf) | -| 574 | Target Active Speaker Detection with Audio-Visual Cues | [![GitHub](https://img.shields.io/github/stars/Jiang-Yidi/TS-TalkNet?style=flat)](https://github.com/Jiang-Yidi/TS-TalkNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12831-b31b1b.svg)](https://arxiv.org/abs/2305.12831) | -| 2401 | Improving End-to-End Neural Diarization using Conversational Summary Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/broughton23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.13863-b31b1b.svg)](https://arxiv.org/abs/2306.13863) | -| 2039 | Phase Perturbation Improves Channel Robustness for Speech Spoofing Countermeasures | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongyi.dev/phase-antispoofing/)
[![GitHub](https://img.shields.io/github/stars/yongyizang/PhaseAntispoofing_INTERSPEECH?style=flat)](https://github.com/yongyizang/PhaseAntispoofing_INTERSPEECH) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03389-b31b1b.svg)](https://arxiv.org/abs/2306.03389) | -| 210 | Improving Training Datasets for Resource-constrained Speaker Recognition Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bousquet23_interspeech.pdf) | -| 1498 | Instance-based Temporal Normalization for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lertpetchpun23_interspeech.pdf) | -| 881 | On the Robustness of Wav2Vec 2.0 based Speaker Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/novoselov23_interspeech.pdf) | -| 697 | P-Vectors: A Parallel-coupled TDNN/Transformer Network for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/xyw7/pvector?style=flat)](https://github.com/xyw7/pvector) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14778-b31b1b.svg)](https://arxiv.org/abs/2305.14778) | -| 1249 | Group GMM-ResNet for Detection of Synthetic Speech Attacks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lei23_interspeech.pdf) | -| 452 | Robust Training for Speaker Verification against Noisy Labels | [![GitHub](https://img.shields.io/github/stars/PunkMale/OR-Gate?style=flat)](https://github.com/PunkMale/OR-Gate) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.12080-b31b1b.svg)](https://arxiv.org/abs/2211.12080) | -| 1404 | Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jeoung23_interspeech.pdf) | -| 1217 | Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022 | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.cnceleb.org/competition) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00815-b31b1b.svg)](https://arxiv.org/abs/2211.00815) | -| 1648 | Describing the Phonetics in the Underlying Speech Attributes for Deep and Interpretable Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/LIAvignon/BA-LR?style=flat)](https://github.com/LIAvignon/BA-LR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benamor23_interspeech.pdf) | -| 1214 | Range-based Equal Error Rate for Spoof Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17739-b31b1b.svg)](https://arxiv.org/abs/2305.17739) | -| 1888 | Exploring the English Accent-Independent Features for Speech Emotion Recognition using Filter and Wrapper-based Methods for Feature Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tabassum23_interspeech.pdf) | -| 205 | Powerset Multi-Class Cross Entropy Loss for Neural Speaker Diarization | [![GitHub](https://img.shields.io/github/stars/FrenchKrab/IS2023-powerset-diarization?style=flat)](https://github.com/FrenchKrab/IS2023-powerset-diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/plaquet23_interspeech.pdf) | -| 394 | A Method of Audio-Visual Person Verification by Mining Connections between Time Series | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23_interspeech.pdf) | -| 605 | One-Step Knowledge Distillation and Fine-Tuning in using Large Pre-trained Self-Supervised Learning Models for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/jungwoo4021/OS-KDFT?style=flat)](https://github.com/jungwoo4021/OS-KDFT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/heo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17394-b31b1b.svg)](https://arxiv.org/abs/2305.17394) | -| 409 | Defense Against Adversarial Attacks on Audio DeepFake Detection | [![GitHub](https://img.shields.io/github/stars/piotrkawa/audio-deepfake-adversarial-attacks?style=flat)](https://github.com/piotrkawa/audio-deepfake-adversarial-attacks) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kawa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.14597-b31b1b.svg)](https://arxiv.org/abs/2212.14597) | -| 1820 | A Conformer-based Classifier for Variable-Length Utterance Processing in Anti-Spoofing | [![GitHub](https://img.shields.io/github/stars/ErosRos/conformer-based-classifier-for-anti-spoofing?style=flat)](https://github.com/ErosRos/conformer-based-classifier-for-anti-spoofing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rosello23_interspeech.pdf) | -| 1557 | Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ia_interspeech.pdf) | -| 2419 | CommonAccent: Exploring Large Acoustic Pre-trained Models for Accent Classification based on Common Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zuluagagomez23_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371136510_CommonAccent_Exploring_Large_Acoustic_Pretrained_Models_for_Accent_Classification_Based_on_Common_Voice) | -| 266 | From Adaptive Score Normalization to Adaptive Data Normalization for Speaker Verification Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cumani23_interspeech.pdf) | -| 1513 | CAM++: A Fast and Efficient Network for Speaker Verification using Context-aware Masking | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ha_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00332-b31b1b.svg)](https://arxiv.org/abs/2303.00332) | -| 1928 | North Sámi Dialect Identification with Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/skakouros/sami_dialects?style=flat)](https://github.com/skakouros/sami_dialects) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kakouros23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11864-b31b1b.svg)](https://arxiv.org/abs/2305.11864) | -| 2289 | Encoder-Decoder Multimodal Speaker Change Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jung23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00680-b31b1b.svg)](https://arxiv.org/abs/2306.00680) | -| 1603 | Disentangled Representation Learning for Multilingual Speaker Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://mm.kaist.ac.kr/projects/voxceleb1-b/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nam23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00437-b31b1b.svg)](https://arxiv.org/abs/2211.00437) | -| 2310 | A Compact End-to-End Model with Local and Global Context for Spoken Language Identification | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jia23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15781-b31b1b.svg)](https://arxiv.org/abs/2210.15781) | -| 1005 | On the Robustness of Arabic Speech Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sullivan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03789-b31b1b.svg)](https://arxiv.org/abs/2306.03789) | -| 927 | Adaptive Neural Network Quantization for Lightweight Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23u_interspeech.pdf) | -| 1205 | Adversarial Diffusion Probability Model For Cross-Domain Speaker Verification Integrating Contrastive Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/su23_interspeech.pdf) | -| 1554 | Chinese Dialect Recognition based on Transfer Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23f_interspeech.pdf) | -| 270 | Spoofing Attacker also Benefits from Self-Supervised Pretrained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ito23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15518-b31b1b.svg)](https://arxiv.org/abs/2305.15518) | -| 854 | Label aware Speech Representation Learning for Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/vashishth23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04374-b31b1b.svg)](https://arxiv.org/abs/2306.04374) | -| 1761 | Exploring the Impact of Back-end Network on Wav2vec 2.0 for Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luo23c_interspeech.pdf) | -| 453 | Improving Speaker Verification with Self-pretrained Transformer Models | [![GitHub](https://img.shields.io/github/stars/JunyiPeng00/Interspeech23_SelfPretraining?style=flat)](https://github.com/JunyiPeng00/Interspeech23_SelfPretraining) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10517-b31b1b.svg)](https://arxiv.org/abs/2305.10517) | -| 372 | Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-based, Alignment-Free and Hybrid Approaches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ribeiro23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.08950-b31b1b.svg)](https://arxiv.org/abs/2302.08950) | +| 1989 | Vietnam-Celeb: A Large-Scale Dataset for Vietnamese Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/thanhpv2102/Vietnam-Celeb.Interspeech?style=flat)](https://github.com/thanhpv2102/Vietnam-Celeb.Interspeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pham23b_interspeech.pdf) | +| 2254 | What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://is23-2254.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.06524-b31b1b.svg)](https://arxiv.org/abs/2306.06524) | +| 241 | The 2022 NIST Language Recognition Evaluation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14624-b31b1b.svg)](https://arxiv.org/abs/2302.14624) | +| 155 | Description and Analysis of the KPT system for NIST Language Recognition Evaluation 2022 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sarni23_interspeech.pdf) | +| 1725 | ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention | [![GitHub](https://img.shields.io/github/stars/Yip-Jia-Qi/ACA-Net?style=flat)](https://github.com/Yip-Jia-Qi/ACA-Net) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yip23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12121-b31b1b.svg)](https://arxiv.org/abs/2305.12121) | +| 402 | Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yao23_interspeech.pdf) | +| 2052 | Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/singh23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07501-b31b1b.svg)](https://arxiv.org/abs/2306.07501)| +| 2569 | Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dey23_interspeech.pdf) | +| 1407 | A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-trained General Purpose Speech Model | [![GitHub](https://img.shields.io/github/stars/Srijith-rkr/KAUST-Whisper-Adapter?style=flat)](https://github.com/Srijith-rkr/KAUST-Whisper-Adapter) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/radhakrishnan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11244-b31b1b.svg)](https://arxiv.org/abs/2305.11244)| +| 2272 | HABLA: A Dataset of Latin American Spanish Accents for Voice Anti-Spoofing | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7370805) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tamayoflorez23_interspeech.pdf) | +| 1702 | Self-Supervised Learning Representation based Accent Recognition with Persistent Accent Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23aa_interspeech.pdf) | +| 800 | Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23g_interspeech.pdf) | +| 1974 | Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/das23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.10326-b31b1b.svg)](https://arxiv.org/abs/2302.10326) | +| 105 | Pyannote.Audio 2.1 Speaker Diarization Pipeline: Principle, Benchmark and Recipe | [![GitHub](https://img.shields.io/github/stars/pyannote/pyannote-audio?style=flat)](https://github.com/pyannote/pyannote-audio) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bredin23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://huggingface.co/bhuvanesh25/pyannote-diar-copy/resolve/main/technical_report_2.1.pdf) | +| 1524 | Model Compression for DNN-based Speaker Verification using Weight Quantization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.17326-b31b1b.svg)](https://arxiv.org/abs/2210.17326) | +| 1354 | Multi-Resolution Approach to Identification of Spoken Languages and to Improve Overall Language Diarization System using Whisper Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vachhani23_interspeech.pdf) | +| 125 | Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zeng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10940-b31b1b.svg)](https://arxiv.org/abs/2305.10940) | +| 849 | Dynamic Fully-Connected Layer for Large-Scale Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/song23b_interspeech.pdf) | +| 844 | Reversible Neural Networks for Memory-Efficient Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23i_interspeech.pdf) | +| 777 | ECAPA++: Fine-grained Deep Embedding Learning for TDNN based Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23f_interspeech.pdf) | +| 1206 | TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13701-b31b1b.svg)](https://arxiv.org/abs/2305.13701) | +| 100 | Fooling Speaker Identification Systems with Adversarial Background Music | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zuo23_interspeech.pdf) | +| 1314 | Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23r_interspeech.pdf) | +| 574 | Target Active Speaker Detection with Audio-Visual Cues | [![GitHub](https://img.shields.io/github/stars/Jiang-Yidi/TS-TalkNet?style=flat)](https://github.com/Jiang-Yidi/TS-TalkNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12831-b31b1b.svg)](https://arxiv.org/abs/2305.12831) | +| 2401 | Improving End-to-End Neural Diarization using Conversational Summary Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/broughton23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.13863-b31b1b.svg)](https://arxiv.org/abs/2306.13863) | +| 2039 | Phase Perturbation Improves Channel Robustness for Speech Spoofing Countermeasures | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongyi.dev/phase-antispoofing/)
[![GitHub](https://img.shields.io/github/stars/yongyizang/PhaseAntispoofing_INTERSPEECH?style=flat)](https://github.com/yongyizang/PhaseAntispoofing_INTERSPEECH) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03389-b31b1b.svg)](https://arxiv.org/abs/2306.03389) | +| 210 | Improving Training Datasets for Resource-constrained Speaker Recognition Neural Networks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bousquet23_interspeech.pdf) | +| 1498 | Instance-based Temporal Normalization for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lertpetchpun23_interspeech.pdf) | +| 881 | On the Robustness of Wav2Vec 2.0 based Speaker Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/novoselov23_interspeech.pdf) | +| 697 | P-Vectors: A Parallel-coupled TDNN/Transformer Network for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/xyw7/pvector?style=flat)](https://github.com/xyw7/pvector) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14778-b31b1b.svg)](https://arxiv.org/abs/2305.14778) | +| 1249 | Group GMM-ResNet for Detection of Synthetic Speech Attacks | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lei23_interspeech.pdf) | +| 452 | Robust Training for Speaker Verification against Noisy Labels | [![GitHub](https://img.shields.io/github/stars/PunkMale/OR-Gate?style=flat)](https://github.com/PunkMale/OR-Gate) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.12080-b31b1b.svg)](https://arxiv.org/abs/2211.12080) | +| 1404 | Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jeoung23_interspeech.pdf) | +| 1217 | Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022 | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://www.cnceleb.org/competition) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00815-b31b1b.svg)](https://arxiv.org/abs/2211.00815) | +| 1648 | Describing the Phonetics in the Underlying Speech Attributes for Deep and Interpretable Speaker Recognition | [![GitHub](https://img.shields.io/github/stars/LIAvignon/BA-LR?style=flat)](https://github.com/LIAvignon/BA-LR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benamor23_interspeech.pdf) | +| 1214 | Range-based Equal Error Rate for Spoof Localization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17739-b31b1b.svg)](https://arxiv.org/abs/2305.17739) | +| 1888 | Exploring the English Accent-Independent Features for Speech Emotion Recognition using Filter and Wrapper-based Methods for Feature Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tabassum23_interspeech.pdf) | +| 205 | Powerset Multi-Class Cross Entropy Loss for Neural Speaker Diarization | [![GitHub](https://img.shields.io/github/stars/FrenchKrab/IS2023-powerset-diarization?style=flat)](https://github.com/FrenchKrab/IS2023-powerset-diarization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/plaquet23_interspeech.pdf) | +| 394 | A Method of Audio-Visual Person Verification by Mining Connections between Time Series | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23_interspeech.pdf) | +| 605 | One-Step Knowledge Distillation and Fine-Tuning in using Large Pre-trained Self-Supervised Learning Models for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/jungwoo4021/OS-KDFT?style=flat)](https://github.com/jungwoo4021/OS-KDFT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/heo23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17394-b31b1b.svg)](https://arxiv.org/abs/2305.17394) | +| 409 | Defense Against Adversarial Attacks on Audio DeepFake Detection | [![GitHub](https://img.shields.io/github/stars/piotrkawa/audio-deepfake-adversarial-attacks?style=flat)](https://github.com/piotrkawa/audio-deepfake-adversarial-attacks) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kawa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.14597-b31b1b.svg)](https://arxiv.org/abs/2212.14597) | +| 1820 | A Conformer-based Classifier for Variable-Length Utterance Processing in Anti-Spoofing | [![GitHub](https://img.shields.io/github/stars/ErosRos/conformer-based-classifier-for-anti-spoofing?style=flat)](https://github.com/ErosRos/conformer-based-classifier-for-anti-spoofing) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rosello23_interspeech.pdf) | +| 1557 | Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ia_interspeech.pdf) | +| 2419 | CommonAccent: Exploring Large Acoustic Pre-trained Models for Accent Classification based on Common Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zuluagagomez23_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371136510_CommonAccent_Exploring_Large_Acoustic_Pretrained_Models_for_Accent_Classification_Based_on_Common_Voice) | +| 266 | From Adaptive Score Normalization to Adaptive Data Normalization for Speaker Verification Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cumani23_interspeech.pdf) | +| 1513 | CAM++: A Fast and Efficient Network for Speaker Verification using Context-aware Masking | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ha_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00332-b31b1b.svg)](https://arxiv.org/abs/2303.00332) | +| 1928 | North Sámi Dialect Identification with Self-Supervised Speech Models | [![GitHub](https://img.shields.io/github/stars/skakouros/sami_dialects?style=flat)](https://github.com/skakouros/sami_dialects) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kakouros23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11864-b31b1b.svg)](https://arxiv.org/abs/2305.11864) | +| 2289 | Encoder-Decoder Multimodal Speaker Change Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jung23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00680-b31b1b.svg)](https://arxiv.org/abs/2306.00680) | +| 1603 | Disentangled Representation Learning for Multilingual Speaker Recognition | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://mm.kaist.ac.kr/projects/voxceleb1-b/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nam23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00437-b31b1b.svg)](https://arxiv.org/abs/2211.00437) | +| 2310 | A Compact End-to-End Model with Local and Global Context for Spoken Language Identification | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jia23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15781-b31b1b.svg)](https://arxiv.org/abs/2210.15781) | +| 1005 | On the Robustness of Arabic Speech Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sullivan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03789-b31b1b.svg)](https://arxiv.org/abs/2306.03789) | +| 927 | Adaptive Neural Network Quantization for Lightweight Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23u_interspeech.pdf) | +| 1205 | Adversarial Diffusion Probability Model For Cross-Domain Speaker Verification Integrating Contrastive Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/su23_interspeech.pdf) | +| 1554 | Chinese Dialect Recognition based on Transfer Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23f_interspeech.pdf) | +| 270 | Spoofing Attacker also Benefits from Self-Supervised Pretrained Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ito23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15518-b31b1b.svg)](https://arxiv.org/abs/2305.15518) | +| 854 | Label aware Speech Representation Learning for Language Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/vashishth23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04374-b31b1b.svg)](https://arxiv.org/abs/2306.04374) | +| 1761 | Exploring the Impact of Back-end Network on Wav2vec 2.0 for Dialect Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luo23c_interspeech.pdf) | +| 453 | Improving Speaker Verification with Self-pretrained Transformer Models | [![GitHub](https://img.shields.io/github/stars/JunyiPeng00/Interspeech23_SelfPretraining?style=flat)](https://github.com/JunyiPeng00/Interspeech23_SelfPretraining) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10517-b31b1b.svg)](https://arxiv.org/abs/2305.10517) | +| 372 | Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-based, Alignment-Free and Hybrid Approaches | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ribeiro23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.08950-b31b1b.svg)](https://arxiv.org/abs/2302.08950) |
@@ -1351,23 +1351,23 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2336 | Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23d_interspeech.pdf) | -| 160 | Streaming Parrotron for On-Device Speech-to-Speech Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rybakov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.13761-b31b1b.svg)](https://arxiv.org/abs/2210.13761) | -| 2407 | Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://controllable-tts.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shaheen23_interspeech.pdf) | -| 2518 | E2E-S2S-VC: End-to-End Sequence-to-Sequence Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ast-astrec.nict.go.jp/demo_samples/e2e-s2s-vc/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/okamoto23b_interspeech.pdf) | -| 2403 | DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer | [![GitHub](https://img.shields.io/github/stars/lakahaga/dc-comix-tts?style=flat)](https://github.com/lakahaga/dc-comix-tts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19567-b31b1b.svg)](https://arxiv.org/abs/2305.19567) | -| 419 | Voice Conversion with Just Nearest Neighbors | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bshall.github.io/knn-vc/)
[![GitHub](https://img.shields.io/github/stars/bshall/knn-vc?style=flat)](https://github.com/bshall/knn-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baas23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18975-b31b1b.svg)](https://arxiv.org/abs/2305.18975) | -| 1193 | CFVC: Conditional Filtering for Controllable Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](http://www.kecl.ntt.co.jp/people/tanaka.ko/projects/cfvc/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tanaka23_interspeech.pdf) | -| 1157 | DualVC: Dual-mode Voice Conversion using Intra-Model Knowledge Distillation and Hybrid Predictive Coding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dualvc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ning23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12425-b31b1b.svg)](https://arxiv.org/abs/2305.12425) | -| 39 | Attention-based Interactive Disentangling Network for Instance-Level Emotional Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ainn-evc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23_interspeech.pdf) | -| 836 | ALO-VC: Any-to-Any Low-Latency One-Shot Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bohan7.github.io/ALO-VC-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01100-b31b1b.svg)](https://arxiv.org/abs/2306.01100) | -| 1978 | Evaluating and Reducing the Distance between Synthetic and Real Speech Distributions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/minixhofer23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.16049-b31b1b.svg)](https://arxiv.org/abs/2211.16049) | -| 2202 | Decoupling Segmental and Prosodic cues of Non-Native Speech through Vector Quantization | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymousis23.github.io/demos/prosody-accent-conversion/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/quamer23_interspeech.pdf) | -| 2383 | VC-T: Streaming Voice Conversion based on Neural Transducer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023vct/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kanagawa23_interspeech.pdf) | -| 191 | Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion Preserving Voice Conversion | [![GitHub](https://img.shields.io/github/stars/suhitaghosh10/emo-stargan?style=flat)](https://github.com/suhitaghosh10/emo-stargan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ghosh23_interspeech.pdf) | -| 1788 | ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://melissachen15.notion.site/melissachen15/ControlVC-Audio-Demo-dd0ea58c5b7f434a81af9cbcd67f56f6) [![GitHub](https://img.shields.io/github/stars/MelissaChen15/control-vc?style=flat)](https://github.com/MelissaChen15/control-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23r_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2209.11866-b31b1b.svg)](https://arxiv.org/abs/2209.11866) | -| 1356 | Reverberation-Controllable Voice Conversion using Reverberation Time Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23e_interspeech.pdf) | -| 2558 | Cross-Utterance Conditioned Coherent Speech Editing | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://speechediting-8gdxbpso7cc72014-1307012619.tcloudbaseapp.com/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23d_interspeech.pdf) | +| 2336 | Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23d_interspeech.pdf) | +| 160 | Streaming Parrotron for On-Device Speech-to-Speech Conversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rybakov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.13761-b31b1b.svg)](https://arxiv.org/abs/2210.13761) | +| 2407 | Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://controllable-tts.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shaheen23_interspeech.pdf) | +| 2518 | E2E-S2S-VC: End-to-End Sequence-to-Sequence Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ast-astrec.nict.go.jp/demo_samples/e2e-s2s-vc/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/okamoto23b_interspeech.pdf) | +| 2403 | DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer | [![GitHub](https://img.shields.io/github/stars/lakahaga/dc-comix-tts?style=flat)](https://github.com/lakahaga/dc-comix-tts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19567-b31b1b.svg)](https://arxiv.org/abs/2305.19567) | +| 419 | Voice Conversion with Just Nearest Neighbors | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bshall.github.io/knn-vc/)
[![GitHub](https://img.shields.io/github/stars/bshall/knn-vc?style=flat)](https://github.com/bshall/knn-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baas23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18975-b31b1b.svg)](https://arxiv.org/abs/2305.18975) | +| 1193 | CFVC: Conditional Filtering for Controllable Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](http://www.kecl.ntt.co.jp/people/tanaka.ko/projects/cfvc/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tanaka23_interspeech.pdf) | +| 1157 | DualVC: Dual-mode Voice Conversion using Intra-Model Knowledge Distillation and Hybrid Predictive Coding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://dualvc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ning23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12425-b31b1b.svg)](https://arxiv.org/abs/2305.12425) | +| 39 | Attention-based Interactive Disentangling Network for Instance-Level Emotional Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ainn-evc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23_interspeech.pdf) | +| 836 | ALO-VC: Any-to-Any Low-Latency One-Shot Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bohan7.github.io/ALO-VC-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01100-b31b1b.svg)](https://arxiv.org/abs/2306.01100) | +| 1978 | Evaluating and Reducing the Distance between Synthetic and Real Speech Distributions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/minixhofer23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.16049-b31b1b.svg)](https://arxiv.org/abs/2211.16049) | +| 2202 | Decoupling Segmental and Prosodic cues of Non-Native Speech through Vector Quantization | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymousis23.github.io/demos/prosody-accent-conversion/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/quamer23_interspeech.pdf) | +| 2383 | VC-T: Streaming Voice Conversion based on Neural Transducer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023vct/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kanagawa23_interspeech.pdf) | +| 191 | Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion Preserving Voice Conversion | [![GitHub](https://img.shields.io/github/stars/suhitaghosh10/emo-stargan?style=flat)](https://github.com/suhitaghosh10/emo-stargan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ghosh23_interspeech.pdf) | +| 1788 | ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://melissachen15.notion.site/melissachen15/ControlVC-Audio-Demo-dd0ea58c5b7f434a81af9cbcd67f56f6) [![GitHub](https://img.shields.io/github/stars/MelissaChen15/control-vc?style=flat)](https://github.com/MelissaChen15/control-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23r_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2209.11866-b31b1b.svg)](https://arxiv.org/abs/2209.11866) | +| 1356 | Reverberation-Controllable Voice Conversion using Reverberation Time Estimator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23e_interspeech.pdf) | +| 2558 | Cross-Utterance Conditioned Coherent Speech Editing | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://speechediting-8gdxbpso7cc72014-1307012619.tcloudbaseapp.com/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23d_interspeech.pdf) |
@@ -1379,35 +1379,35 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2287 | An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/escobargrisales23_interspeech.pdf) | -| 1332 | Personalization for Robust Voice Pathology Detection in Sound Waves | [![GitHub](https://img.shields.io/github/stars/Fsoft-AIC/RoPADet?style=flat)](https://github.com/Fsoft-AIC/RoPADet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23_interspeech.pdf) | -| 2249 | Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23d_interspeech.pdf) | -| 1990 | Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/niu23b_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://emp.engin.umich.edu/wp-content/uploads/sites/67/2023/06/Capturing_Mismatch_between_Textual_and_Acoustic_Emotion_Expressions_for_Mood_Identification_in_Bipolar_Disorder-3.pdf) | -| 296 | FTA-Net: A Frequency and Time Attention Network for Speech Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23d_interspeech.pdf) | -| 1709 | Bayesian Networks for the Robust and Unbiased Prediction of Depression and its Symptoms Utilizing Speech and Multimodal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fara23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://readpaper.com/paper/4770892998779076609) | -| 1263 | Hyper-Parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23y_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15265-b31b1b.svg)](https://arxiv.org/abs/2306.15265) | -| 1721 | Classifying Depression Symptom Severity: Assessment of Speech Representations in Personalized and Generalized Machine Learning Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/campbell23_interspeech.pdf) | -| 1946 | Active Learning for Abnormal Lung Sound Data Curation and Detection in Asthma | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ghaffarzadegan23_interspeech.pdf) | -| 2079 | Automatic Assessment of Alzheimer's across Three Languages using Speech and Language Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pereztoro23_interspeech.pdf) | -| 301 | On-the-Fly Feature based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition | [![GitHub](https://img.shields.io/github/stars/timspeech/on_the_fly_adapt?style=flat)](https://github.com/timspeech/on_the_fly_adapt) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.14593-b31b1b.svg)](https://arxiv.org/abs/2203.14593) | -| 1722 | Relationship between LTAS-based Spectral Moments and Acoustic Parameters of Hypokinetic Dysarthria in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/svihlik23_interspeech.pdf) | -| 963 | Respiratory Distress Estimation in Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/alvarado23_interspeech.pdf) | -| 1771 | Prediction of the Gender-based Violence Victim Condition using Speech: What do Machine Learning Models rely on? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/reynerfuentes23_interspeech.pdf) | -| 1916 | Whisper Encoder features for Infant Cry Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/charola23_interspeech.pdf) | -| 1997 | Classifying Dementia in the Presence of Depression: A Cross-Corpus Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/braun23_interspeech.pdf) | -| 297 | Exploiting Cross-Domain and Cross-Lingual Ultrasound Tongue Imaging Features for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2206.07327-b31b1b.svg)](https://arxiv.org/abs/2206.07327) | -| 464 | Multi-Class Detection of Pathological Speech with Latent Features: How does It Perform on Unseen Data? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wagner23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15336-b31b1b.svg)](https://arxiv.org/abs/2210.15336) | -| 2002 | Responsiveness, Sensitivity and Clinical Utility of Timing-Related Speech Biomarkers for Remote Monitoring of ALS Disease Progression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kothare23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1-W1buG48sqQnd9uld2c-z-Ls0NSS-bNn/view) | -| 322 | Use of Speech Impairment Severity for Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10659-b31b1b.svg)](https://arxiv.org/abs/2305.10659) | -| 721 | MMLung: Moving Closer to Practical Lung Health Estimation using Smartphones | [![GitHub](https://img.shields.io/github/stars/MohammedMosuily/mmlung?style=flat)](https://github.com/MohammedMosuily/mmlung) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mosuily23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://mobiuk.org/2023/abstract/S5_P1_Mosuily_MMLung.pdf) | -| 913 | Investigating the Utility of Synthetic Data for Doctor-Patient Conversation Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23i_interspeech.pdf) | -| 2101 | Non-Uniform Speaker Disentanglement for Depression Detection from Raw Speech Signals | [![GitHub](https://img.shields.io/github/stars/kingformatty/NUSD?style=flat)](https://github.com/kingformatty/NUSD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23pa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01861-b31b1b.svg)](https://arxiv.org/abs/2306.01861) | -| 753 | PoCaPNet: A Novel Approach for Surgical Phase Recognition using Speech and X-Ray Images | [![GitHub](https://img.shields.io/github/stars/kubicndmr/PoCaPNet?style=flat)](https://github.com/kubicndmr/PoCaPNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/demir23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15993-b31b1b.svg)](https://arxiv.org/abs/2305.15993) | -| 2100 | Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/neumann23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1FfcQifvTL9bTD7SBU7y_A3APgX8N_Vd0/view) | -| 1438 | The MASCFLICHT Corpus: Face Mask Type and Coverage Area Recognition from Speech | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7985457) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mallolragolta23_interspeech.pdf) | -| 1435 | Towards Reference Speech Characterization for Health Applications | [![GitHub](https://img.shields.io/github/stars/mcatarinatb/reference-speech-characterization?style=flat)](https://github.com/mcatarinatb/reference-speech-characterization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/botelho23_interspeech.pdf) | -| 2146 | Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/riosurrego23_interspeech.pdf) | -| 947 | Towards Robust Paralinguistic Assessment for Real-World Mobile Health (mHealth) Monitoring: an Initial Study of Reverberation Effects on Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dineley23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12514-b31b1b.svg)](https://arxiv.org/abs/2305.12514) | +| 2287 | An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/escobargrisales23_interspeech.pdf) | +| 1332 | Personalization for Robust Voice Pathology Detection in Sound Waves | [![GitHub](https://img.shields.io/github/stars/Fsoft-AIC/RoPADet?style=flat)](https://github.com/Fsoft-AIC/RoPADet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23_interspeech.pdf) | +| 2249 | Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23d_interspeech.pdf) | +| 1990 | Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/niu23b_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://emp.engin.umich.edu/wp-content/uploads/sites/67/2023/06/Capturing_Mismatch_between_Textual_and_Acoustic_Emotion_Expressions_for_Mood_Identification_in_Bipolar_Disorder-3.pdf) | +| 296 | FTA-Net: A Frequency and Time Attention Network for Speech Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23d_interspeech.pdf) | +| 1709 | Bayesian Networks for the Robust and Unbiased Prediction of Depression and its Symptoms Utilizing Speech and Multimodal Data | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fara23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://readpaper.com/paper/4770892998779076609) | +| 1263 | Hyper-Parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23y_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15265-b31b1b.svg)](https://arxiv.org/abs/2306.15265) | +| 1721 | Classifying Depression Symptom Severity: Assessment of Speech Representations in Personalized and Generalized Machine Learning Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/campbell23_interspeech.pdf) | +| 1946 | Active Learning for Abnormal Lung Sound Data Curation and Detection in Asthma | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ghaffarzadegan23_interspeech.pdf) | +| 2079 | Automatic Assessment of Alzheimer's across Three Languages using Speech and Language Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pereztoro23_interspeech.pdf) | +| 301 | On-the-Fly Feature based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition | [![GitHub](https://img.shields.io/github/stars/timspeech/on_the_fly_adapt?style=flat)](https://github.com/timspeech/on_the_fly_adapt) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.14593-b31b1b.svg)](https://arxiv.org/abs/2203.14593) | +| 1722 | Relationship between LTAS-based Spectral Moments and Acoustic Parameters of Hypokinetic Dysarthria in Parkinson's Disease | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/svihlik23_interspeech.pdf) | +| 963 | Respiratory Distress Estimation in Human-Robot Interaction Scenario | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/alvarado23_interspeech.pdf) | +| 1771 | Prediction of the Gender-based Violence Victim Condition using Speech: What do Machine Learning Models rely on? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/reynerfuentes23_interspeech.pdf) | +| 1916 | Whisper Encoder features for Infant Cry Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/charola23_interspeech.pdf) | +| 1997 | Classifying Dementia in the Presence of Depression: A Cross-Corpus Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/braun23_interspeech.pdf) | +| 297 | Exploiting Cross-Domain and Cross-Lingual Ultrasound Tongue Imaging Features for Elderly and Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2206.07327-b31b1b.svg)](https://arxiv.org/abs/2206.07327) | +| 464 | Multi-Class Detection of Pathological Speech with Latent Features: How does It Perform on Unseen Data? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wagner23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15336-b31b1b.svg)](https://arxiv.org/abs/2210.15336) | +| 2002 | Responsiveness, Sensitivity and Clinical Utility of Timing-Related Speech Biomarkers for Remote Monitoring of ALS Disease Progression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kothare23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1-W1buG48sqQnd9uld2c-z-Ls0NSS-bNn/view) | +| 322 | Use of Speech Impairment Severity for Dysarthric Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10659-b31b1b.svg)](https://arxiv.org/abs/2305.10659) | +| 721 | MMLung: Moving Closer to Practical Lung Health Estimation using Smartphones | [![GitHub](https://img.shields.io/github/stars/MohammedMosuily/mmlung?style=flat)](https://github.com/MohammedMosuily/mmlung) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mosuily23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://mobiuk.org/2023/abstract/S5_P1_Mosuily_MMLung.pdf) | +| 913 | Investigating the Utility of Synthetic Data for Doctor-Patient Conversation Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23i_interspeech.pdf) | +| 2101 | Non-Uniform Speaker Disentanglement for Depression Detection from Raw Speech Signals | [![GitHub](https://img.shields.io/github/stars/kingformatty/NUSD?style=flat)](https://github.com/kingformatty/NUSD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23pa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01861-b31b1b.svg)](https://arxiv.org/abs/2306.01861) | +| 753 | PoCaPNet: A Novel Approach for Surgical Phase Recognition using Speech and X-Ray Images | [![GitHub](https://img.shields.io/github/stars/kubicndmr/PoCaPNet?style=flat)](https://github.com/kubicndmr/PoCaPNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/demir23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15993-b31b1b.svg)](https://arxiv.org/abs/2305.15993) | +| 2100 | Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/neumann23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1FfcQifvTL9bTD7SBU7y_A3APgX8N_Vd0/view) | +| 1438 | The MASCFLICHT Corpus: Face Mask Type and Coverage Area Recognition from Speech | [![Zenodo](https://img.shields.io/badge/Zenodo-dataset-FFD1BF.svg)](https://zenodo.org/record/7985457) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mallolragolta23_interspeech.pdf) | +| 1435 | Towards Reference Speech Characterization for Health Applications | [![GitHub](https://img.shields.io/github/stars/mcatarinatb/reference-speech-characterization?style=flat)](https://github.com/mcatarinatb/reference-speech-characterization) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/botelho23_interspeech.pdf) | +| 2146 | Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/riosurrego23_interspeech.pdf) | +| 947 | Towards Robust Paralinguistic Assessment for Real-World Mobile Health (mHealth) Monitoring: an Initial Study of Reverberation Effects on Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dineley23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12514-b31b1b.svg)](https://arxiv.org/abs/2305.12514) |
@@ -1419,12 +1419,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2228 | Conmer: Streaming Conformer without Self-Attention for Interactive Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/radfar23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/conmer-streaming-conformer-without-self-attention-for-interactive-voice-assistants) | -| 1255 | Intra-Ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23e_interspeech.pdf) | -| 1194 | A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11073-b31b1b.svg)](https://arxiv.org/abs/2305.11073) | -| 1611 | HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mai23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18281-b31b1b.svg)](https://arxiv.org/abs/2305.18281) | -| 893 | Memory-Augmented Conformer for Improved End-To-End Long-form ASR | [![GitHub](https://img.shields.io/github/stars/Miamoto/Conformer-NTM?style=flat)](https://github.com/Miamoto/Conformer-NTM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/carvalho23_interspeech.pdf) | -| 552 | Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cui23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.13307-b31b1b.svg)](https://arxiv.org/abs/2306.13307) | +| 2228 | Conmer: Streaming Conformer without Self-Attention for Interactive Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/radfar23_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/conmer-streaming-conformer-without-self-attention-for-interactive-voice-assistants) | +| 1255 | Intra-Ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23e_interspeech.pdf) | +| 1194 | A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks | [![GitHub](https://img.shields.io/github/stars/espnet/espnet?style=flat)](https://github.com/espnet/espnet) [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/FunASR?style=flat)](https://github.com/alibaba-damo-academy/FunASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/peng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11073-b31b1b.svg)](https://arxiv.org/abs/2305.11073) | +| 1611 | HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mai23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18281-b31b1b.svg)](https://arxiv.org/abs/2305.18281) | +| 893 | Memory-Augmented Conformer for Improved End-To-End Long-form ASR | [![GitHub](https://img.shields.io/github/stars/Miamoto/Conformer-NTM?style=flat)](https://github.com/Miamoto/Conformer-NTM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/carvalho23_interspeech.pdf) | +| 552 | Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cui23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.13307-b31b1b.svg)](https://arxiv.org/abs/2306.13307) |
@@ -1436,16 +1436,16 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1294 | An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12838-b31b1b.svg)](https://arxiv.org/abs/2305.12838) | -| 1286 | A Study on Visualization of Voiceprint Feature | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23x_interspeech.pdf) | -| 1083 | VoxTube: A Multilingual Speaker Recognition Dataset | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://idrnd.github.io/VoxTube/)
[![GitHub](https://img.shields.io/github/stars/IDRnD/VoxTube?style=flat)](https://github.com/IDRnD/VoxTube) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yakovlev23_interspeech.pdf) | -| 1298 | Visualizing Data Augmentation in Deep Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16070-b31b1b.svg)](https://arxiv.org/abs/2305.16070) | -| 1565 | Ordered and Binary Speaker Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ja_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16043-b31b1b.svg)](https://arxiv.org/abs/2305.16043) | -| 2031 | Self-FiLM: Conditioning GANs with Self-Supervised Representations for Bandwidth Extension based Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kataria23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.03657-b31b1b.svg)](https://arxiv.org/abs/2303.03657) | -| 1202 | Curriculum Learning for Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/heo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.14525-b31b1b.svg)](https://arxiv.org/abs/2203.14525) | -| 1558 | Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23aa_interspeech.pdf) | -| 1379 | A Teacher-Student Approach for Extracting Informative Speaker Embeddings from Speech Mixtures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cordlandwehr23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00634-b31b1b.svg)](https://arxiv.org/abs/2306.00634) | -| 1479 | Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lepage23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03664-b31b1b.svg)](https://arxiv.org/abs/2306.03664) | +| 1294 | An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification | [![GitHub](https://img.shields.io/github/stars/alibaba-damo-academy/3D-Speaker?style=flat)](https://github.com/alibaba-damo-academy/3D-Speaker) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23o_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12838-b31b1b.svg)](https://arxiv.org/abs/2305.12838) | +| 1286 | A Study on Visualization of Voiceprint Feature | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23x_interspeech.pdf) | +| 1083 | VoxTube: A Multilingual Speaker Recognition Dataset | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://idrnd.github.io/VoxTube/)
[![GitHub](https://img.shields.io/github/stars/IDRnD/VoxTube?style=flat)](https://github.com/IDRnD/VoxTube) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yakovlev23_interspeech.pdf) | +| 1298 | Visualizing Data Augmentation in Deep Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16070-b31b1b.svg)](https://arxiv.org/abs/2305.16070) | +| 1565 | Ordered and Binary Speaker Embedding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ja_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16043-b31b1b.svg)](https://arxiv.org/abs/2305.16043) | +| 2031 | Self-FiLM: Conditioning GANs with Self-Supervised Representations for Bandwidth Extension based Speaker Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kataria23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.03657-b31b1b.svg)](https://arxiv.org/abs/2303.03657) | +| 1202 | Curriculum Learning for Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/heo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.14525-b31b1b.svg)](https://arxiv.org/abs/2203.14525) | +| 1558 | Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23aa_interspeech.pdf) | +| 1379 | A Teacher-Student Approach for Extracting Informative Speaker Embeddings from Speech Mixtures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cordlandwehr23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00634-b31b1b.svg)](https://arxiv.org/abs/2306.00634) | +| 1479 | Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lepage23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03664-b31b1b.svg)](https://arxiv.org/abs/2306.03664) |
@@ -1457,12 +1457,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1630 | Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ba_interspeech.pdf) | -| 1338 | UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23z_interspeech.pdf) | -| 772 | Allophant: Cross-Lingual Phoneme Recognition with Articulatory Attributes | [![GitHub](https://img.shields.io/github/stars/kgnlp/allophant?style=flat)](https://github.com/kgnlp/allophant) [![GitHub](https://img.shields.io/github/stars/Aariciah/allophoible?style=flat)](https://github.com/Aariciah/allophoible) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/glocker23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04306-b31b1b.svg)](https://arxiv.org/abs/2306.04306) | -| 97 | Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.01571-b31b1b.svg)](https://arxiv.org/abs/2211.01571) | -| 1061 | Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-training for Adaptation to Unseen Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rouditchenko23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12606-b31b1b.svg)](https://arxiv.org/abs/2305.12606) | -| 1444 | DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model | [![GitHub](https://img.shields.io/github/stars/backspacetg/distilXLSR?style=flat)](https://github.com/backspacetg/distilXLSR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ea_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01303-b31b1b.svg)](https://arxiv.org/abs/2306.01303) | +| 1630 | Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ba_interspeech.pdf) | +| 1338 | UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23z_interspeech.pdf) | +| 772 | Allophant: Cross-Lingual Phoneme Recognition with Articulatory Attributes | [![GitHub](https://img.shields.io/github/stars/kgnlp/allophant?style=flat)](https://github.com/kgnlp/allophant) [![GitHub](https://img.shields.io/github/stars/Aariciah/allophoible?style=flat)](https://github.com/Aariciah/allophoible) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/glocker23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04306-b31b1b.svg)](https://arxiv.org/abs/2306.04306) | +| 97 | Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.01571-b31b1b.svg)](https://arxiv.org/abs/2211.01571) | +| 1061 | Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-training for Adaptation to Unseen Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rouditchenko23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12606-b31b1b.svg)](https://arxiv.org/abs/2305.12606) | +| 1444 | DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model | [![GitHub](https://img.shields.io/github/stars/backspacetg/distilXLSR?style=flat)](https://github.com/backspacetg/distilXLSR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ea_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01303-b31b1b.svg)](https://arxiv.org/abs/2306.01303) |
@@ -1474,12 +1474,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 251 | Emotional Voice Conversion with Semi-Supervised Generative Modeling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://haizhu1.github.io/sgevc/)
[![GitHub](https://img.shields.io/github/stars/haizhu1/sgevc?style=flat)](https://github.com/haizhu1/sgevc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhu23b_interspeech.pdf) | -| 817 | Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-Shot Speaker Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diff-hiervc.github.io/)
[![GitHub](https://img.shields.io/github/stars/hayeong0/Diff-HierVC?style=flat)](https://github.com/hayeong0/Diff-HierVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23d_interspeech.pdf) | -| 215 | S2CD-VC: Self-Heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wmaiga.github.io/S2CD/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wei23_interspeech.pdf) | -| 1508 | Flow-VAE VC: End-to-End Flow Framework with Contrastive Loss for Zero-Shot Voice Conversion | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://blog.frostmiku.com/Flow-VAE-VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23g_interspeech.pdf) | -| 1602 | Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hhhuazi.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23s_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12259-b31b1b.svg)](https://arxiv.org/abs/2306.12259) | -| 2298 | End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lvc-vc.github.io/lvc-vc-demo/)
[![GitHub](https://img.shields.io/github/stars/wonjune-kang/lvc-vc?style=flat)](https://github.com/wonjune-kang/lvc-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2205.09784-b31b1b.svg)](https://arxiv.org/abs/2205.09784) | +| 251 | Emotional Voice Conversion with Semi-Supervised Generative Modeling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://haizhu1.github.io/sgevc/)
[![GitHub](https://img.shields.io/github/stars/haizhu1/sgevc?style=flat)](https://github.com/haizhu1/sgevc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhu23b_interspeech.pdf) | +| 817 | Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-Shot Speaker Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diff-hiervc.github.io/)
[![GitHub](https://img.shields.io/github/stars/hayeong0/Diff-HierVC?style=flat)](https://github.com/hayeong0/Diff-HierVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23d_interspeech.pdf) | +| 215 | S2CD-VC: Self-Heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://wmaiga.github.io/S2CD/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wei23_interspeech.pdf) | +| 1508 | Flow-VAE VC: End-to-End Flow Framework with Contrastive Loss for Zero-Shot Voice Conversion | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://blog.frostmiku.com/Flow-VAE-VC/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23g_interspeech.pdf) | +| 1602 | Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hhhuazi.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23s_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12259-b31b1b.svg)](https://arxiv.org/abs/2306.12259) | +| 2298 | End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lvc-vc.github.io/lvc-vc-demo/)
[![GitHub](https://img.shields.io/github/stars/wonjune-kang/lvc-vc?style=flat)](https://github.com/wonjune-kang/lvc-vc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2205.09784-b31b1b.svg)](https://arxiv.org/abs/2205.09784) |
@@ -1491,18 +1491,18 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2093 | Multimodal Assessment of Bulbar Amyotrophic Lateral Sclerosis (ALS) using a Novel Remote Speech Assessment App | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/simmatis23_interspeech.pdf) | -| 2181 | On the use of High Frequency Information for Voice Pathology Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/martinez23_interspeech.pdf) | -| 1784 | Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/favaro23_interspeech.pdf) | -| 2531 | Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kadiri23_interspeech.pdf) | -| 1915 | Comparison of Acoustic Measures of Dysphonia in Parkinson's Disease and Huntington's Disease: Effect of Sex and Speaking Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/simek23_interspeech.pdf) | -| 1734 | Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses | [![GitHub](https://img.shields.io/github/stars/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease?style=flat)](https://github.com/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gomezzaragoza23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03443-b31b1b.svg)](https://arxiv.org/abs/2306.03443) | -| 1574 | A Pipeline to Evaluate the Effects of Noise on Machine Learning Detection of Laryngeal Cancer | [![GitHub](https://img.shields.io/github/stars/mary-paterson/Interspeech2023-EvaluationPipeline?style=flat)](https://github.com/mary-paterson/Interspeech2023-EvaluationPipeline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/paterson23_interspeech.pdf) | -| 2474 | ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ga_interspeech.pdf) | -| 234 | Automated Multiple Sclerosis Screening based on Encoded Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/egaslopez23_interspeech.pdf) | -| 1934 | Cross-Lingual Features for Alzheimer's Dementia Detection from Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/melistas23_interspeech.pdf) | -| 1653 | Careful Whisper - Leveraging Advances in Automatic Speech Recognition for Robust and Interpretable Aphasia Subtype Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zusag23_interspeech.pdf) | -| 1868 | Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/thienpondt23_interspeech.pdf) | +| 2093 | Multimodal Assessment of Bulbar Amyotrophic Lateral Sclerosis (ALS) using a Novel Remote Speech Assessment App | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/simmatis23_interspeech.pdf) | +| 2181 | On the use of High Frequency Information for Voice Pathology Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/martinez23_interspeech.pdf) | +| 1784 | Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/favaro23_interspeech.pdf) | +| 2531 | Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kadiri23_interspeech.pdf) | +| 1915 | Comparison of Acoustic Measures of Dysphonia in Parkinson's Disease and Huntington's Disease: Effect of Sex and Speaking Task | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/simek23_interspeech.pdf) | +| 1734 | Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses | [![GitHub](https://img.shields.io/github/stars/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease?style=flat)](https://github.com/LuciaGomZa/INTERSPEECH2023_AlzheimersDisease) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gomezzaragoza23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03443-b31b1b.svg)](https://arxiv.org/abs/2306.03443) | +| 1574 | A Pipeline to Evaluate the Effects of Noise on Machine Learning Detection of Laryngeal Cancer | [![GitHub](https://img.shields.io/github/stars/mary-paterson/Interspeech2023-EvaluationPipeline?style=flat)](https://github.com/mary-paterson/Interspeech2023-EvaluationPipeline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/paterson23_interspeech.pdf) | +| 2474 | ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ga_interspeech.pdf) | +| 234 | Automated Multiple Sclerosis Screening based on Encoded Speech Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/egaslopez23_interspeech.pdf) | +| 1934 | Cross-Lingual Features for Alzheimer's Dementia Detection from Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/melistas23_interspeech.pdf) | +| 1653 | Careful Whisper - Leveraging Advances in Automatic Speech Recognition for Robust and Interpretable Aphasia Subtype Classification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zusag23_interspeech.pdf) | +| 1868 | Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/thienpondt23_interspeech.pdf) |
@@ -1514,12 +1514,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1832 | LanSER: Language-Model Supported Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gong23c_interspeech.pdf) | -| 463 | Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luo23_interspeech.pdf) | -| 1591 | Emotion Label Encoding using Word Embeddings for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/stanley23_interspeech.pdf) | -| 2444 | Discrimination of the Different Intents Carried by the Same Text through Integrating Multimodal Information | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ia_interspeech.pdf) | -| 510 | Meta-Domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-Sentiment Predictions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23f_interspeech.pdf) | -| 413 | SWRR: Feature Map Classifier based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23b_interspeech.pdf) | +| 1832 | LanSER: Language-Model Supported Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gong23c_interspeech.pdf) | +| 463 | Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luo23_interspeech.pdf) | +| 1591 | Emotion Label Encoding using Word Embeddings for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/stanley23_interspeech.pdf) | +| 2444 | Discrimination of the Different Intents Carried by the Same Text through Integrating Multimodal Information | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ia_interspeech.pdf) | +| 510 | Meta-Domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-Sentiment Predictions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23f_interspeech.pdf) | +| 413 | SWRR: Feature Map Classifier based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23b_interspeech.pdf) | @@ -1531,38 +1531,38 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1443 | Effects of Meter, Genre and Experience on Pausing, Lengthening and Prosodic Phrasing in German Poetry Reading | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wagner23b_interspeech.pdf) | -| 1142 | Comparing First Spectral Moment of Australian English /s/ between Straight and Gay Voices using Three Analysis Window Sizes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/szalay23_interspeech.pdf) | -| 2584 | Universal Automatic Phonetic Transcription into the International Phonetic Alphabet | [![GitHub](https://img.shields.io/github/stars/ctaguchi/multipa?style=flat)](https://github.com/ctaguchi/multipa) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/taguchi23_interspeech.pdf) | -| 2134 | Voice Twins: Discovering Extremely Similar-Sounding, Unrelated Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gerlach23_interspeech.pdf) | -| 1042 | Filling the Population Statistics Gap: Swiss German Reference Data on F0 and Speech Tempo for Forensic Contexts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hedegard23_interspeech.pdf) | -| 1619 | Investigating the Syntax-Discourse Interface in the Phonetic Implementation of Discourse Markers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hutin23_interspeech.pdf) | -| 2214 | Evaluation of a Forensic Automatic Speaker Recognition System with Emotional Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/essery23_interspeech.pdf) | -| 1052 | An Outlier Analysis of Vowel Formants from a Corpus Phonetics Pipeline | [![GitHub](https://img.shields.io/github/stars/emilyahn/outliers?style=flat)](https://github.com/emilyahn/outliers) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ahn23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.eleanorchodroff.com/articles/AhnLevowWrightChodroff_Outliers_Interspeech_2023.pdf) | -| 340 | The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features | [![GitHub](https://img.shields.io/github/stars/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage?style=flat)](https://github.com/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qu23_interspeech.pdf) | -| 1880 | Beatboxing Kick Drum Kinematics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/blaylock23_interspeech.pdf) | -| 536 | Effects of Hearing Loss and Amplification on Mandarin Consonant Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23b_interspeech.pdf) | -| 2020 | An Acoustic Analysis of Fricative Variation in Three Accents of English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/adams23_interspeech.pdf) | -| 109 | Acoustic Cues to Stress Perception in Spanish – a Mismatch Negativity Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bros23_interspeech.pdf) | -| 976 | Bulgarian Unstressed Vowel Reduction: Received Views vs Corpus Findings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sabev23_interspeech.pdf) | -| 1764 | An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jain23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.09284-b31b1b.svg)](https://arxiv.org/abs/2212.09284) | -| 498 | Identifying Stable Sections for Formant Frequency Extraction of French Nasal Vowels based on Difference Thresholds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23b_interspeech.pdf) | -| 1903 | Evaluation of Delexicalization Methods for Research on Emotional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/audibert23_interspeech.pdf) | -| 1772 | Nonbinary American English Speakers Encode Gender in Vowel Acoustics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hope23_interspeech.pdf) | -| 44 | Coarticulation of Sibe Vowels and Dorsal Fricatives in Spontaneous Speech: An Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sharp23_interspeech.pdf) | -| 1013 | Using Speech Synthesis to Explain Automatic Speaker Recognition: A New Application of Synthetic Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/brown23_interspeech.pdf) | -| 2534 | Same F0, Different Tones: A Multidimensional Investigation of Zhangzhou Tones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23i_interspeech.pdf) | -| 1985 | Discovering Phonetic Feature Event Patterns in Transformer Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/english23_interspeech.pdf) | -| 2204 | A System for Generating Voice Source Signals that Implements the Transformed LF-Model Parameter Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ra_interspeech.pdf) | -| 2352 | Speaker-Independent Speech Inversion for Estimation of Nasalance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/siriwardena23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00203-b31b1b.svg)](https://arxiv.org/abs/2306.00203) | -| 1359 | Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02251-b31b1b.svg)](https://arxiv.org/abs/2306.02251) | -| 2187 | Durational and Non-Durational Correlates of Lexical and Derived Geminates in Arabic | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/issa23_interspeech.pdf) | -| 68 | Mapping Phonemes to Acoustic Symbols and Codes using Synchrony in Speech Modulation Vectors Estimated by the Travellingwave Filter Bank | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rao23_interspeech.pdf) | -| 1480 | Rhythmic Characteristics of L2 German Speech by Advanced Chinese Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ge23_interspeech.pdf) | -| 1538 | (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-Prosodic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kelterer23_interspeech.pdf) | -| 995 | Vowel Reduction by Greek-Speaking Children: The Effect of Stress and Word Length | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/christodoulidou23_interspeech.pdf) | -| 1822 | Pitch Distributions in a Very Large Corpus of Spontaneous Finnish Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lennes23_interspeech.pdf) | -| 828 | Speech Enhancement Patterns in Human-Robot Interaction: A Cross-Linguistic Perspective | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/qwyzv/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kudera23_interspeech.pdf) | +| 1443 | Effects of Meter, Genre and Experience on Pausing, Lengthening and Prosodic Phrasing in German Poetry Reading | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wagner23b_interspeech.pdf) | +| 1142 | Comparing First Spectral Moment of Australian English /s/ between Straight and Gay Voices using Three Analysis Window Sizes | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/szalay23_interspeech.pdf) | +| 2584 | Universal Automatic Phonetic Transcription into the International Phonetic Alphabet | [![GitHub](https://img.shields.io/github/stars/ctaguchi/multipa?style=flat)](https://github.com/ctaguchi/multipa) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/taguchi23_interspeech.pdf) | +| 2134 | Voice Twins: Discovering Extremely Similar-Sounding, Unrelated Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gerlach23_interspeech.pdf) | +| 1042 | Filling the Population Statistics Gap: Swiss German Reference Data on F0 and Speech Tempo for Forensic Contexts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hedegard23_interspeech.pdf) | +| 1619 | Investigating the Syntax-Discourse Interface in the Phonetic Implementation of Discourse Markers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hutin23_interspeech.pdf) | +| 2214 | Evaluation of a Forensic Automatic Speaker Recognition System with Emotional Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/essery23_interspeech.pdf) | +| 1052 | An Outlier Analysis of Vowel Formants from a Corpus Phonetics Pipeline | [![GitHub](https://img.shields.io/github/stars/emilyahn/outliers?style=flat)](https://github.com/emilyahn/outliers) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ahn23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.eleanorchodroff.com/articles/AhnLevowWrightChodroff_Outliers_Interspeech_2023.pdf) | +| 340 | The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features | [![GitHub](https://img.shields.io/github/stars/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage?style=flat)](https://github.com/Oscarwasoccupied/Interspeech2023_The_Hidden_Dance_of_Phonemes_and_Visage) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qu23_interspeech.pdf) | +| 1880 | Beatboxing Kick Drum Kinematics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/blaylock23_interspeech.pdf) | +| 536 | Effects of Hearing Loss and Amplification on Mandarin Consonant Perception | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23b_interspeech.pdf) | +| 2020 | An Acoustic Analysis of Fricative Variation in Three Accents of English | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/adams23_interspeech.pdf) | +| 109 | Acoustic Cues to Stress Perception in Spanish – a Mismatch Negativity Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bros23_interspeech.pdf) | +| 976 | Bulgarian Unstressed Vowel Reduction: Received Views vs Corpus Findings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sabev23_interspeech.pdf) | +| 1764 | An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jain23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.09284-b31b1b.svg)](https://arxiv.org/abs/2212.09284) | +| 498 | Identifying Stable Sections for Formant Frequency Extraction of French Nasal Vowels based on Difference Thresholds | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23b_interspeech.pdf) | +| 1903 | Evaluation of Delexicalization Methods for Research on Emotional Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/audibert23_interspeech.pdf) | +| 1772 | Nonbinary American English Speakers Encode Gender in Vowel Acoustics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hope23_interspeech.pdf) | +| 44 | Coarticulation of Sibe Vowels and Dorsal Fricatives in Spontaneous Speech: An Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sharp23_interspeech.pdf) | +| 1013 | Using Speech Synthesis to Explain Automatic Speaker Recognition: A New Application of Synthetic Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/brown23_interspeech.pdf) | +| 2534 | Same F0, Different Tones: A Multidimensional Investigation of Zhangzhou Tones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23i_interspeech.pdf) | +| 1985 | Discovering Phonetic Feature Event Patterns in Transformer Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/english23_interspeech.pdf) | +| 2204 | A System for Generating Voice Source Signals that Implements the Transformed LF-Model Parameter Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ra_interspeech.pdf) | +| 2352 | Speaker-Independent Speech Inversion for Estimation of Nasalance | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/siriwardena23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00203-b31b1b.svg)](https://arxiv.org/abs/2306.00203) | +| 1359 | Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02251-b31b1b.svg)](https://arxiv.org/abs/2306.02251) | +| 2187 | Durational and Non-Durational Correlates of Lexical and Derived Geminates in Arabic | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/issa23_interspeech.pdf) | +| 68 | Mapping Phonemes to Acoustic Symbols and Codes using Synchrony in Speech Modulation Vectors Estimated by the Travellingwave Filter Bank | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rao23_interspeech.pdf) | +| 1480 | Rhythmic Characteristics of L2 German Speech by Advanced Chinese Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ge23_interspeech.pdf) | +| 1538 | (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-Prosodic Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kelterer23_interspeech.pdf) | +| 995 | Vowel Reduction by Greek-Speaking Children: The Effect of Stress and Word Length | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/christodoulidou23_interspeech.pdf) | +| 1822 | Pitch Distributions in a Very Large Corpus of Spontaneous Finnish Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lennes23_interspeech.pdf) | +| 828 | Speech Enhancement Patterns in Human-Robot Interaction: A Cross-Linguistic Perspective | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/qwyzv/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kudera23_interspeech.pdf) |
@@ -1574,12 +1574,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1026 | Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health | [![GitHub](https://img.shields.io/github/stars/aditthapron/windowMasking?style=flat)](https://github.com/aditthapron/windowMasking) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ditthapron23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.04161-b31b1b.svg)](https://arxiv.org/abs/2302.04161) | -| 727 | eSTImate: A Real-Time Speech Transmission Index Estimator with Speech Enhancement Auxiliary Task using Self-Attention Feature Pyramid Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiang23_interspeech.pdf) | -| 815 | Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23n_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05861-b31b1b.svg)](https://arxiv.org/abs/2306.05861) | -| 2138 | Privacy-Preserving Representation Learning for Speech Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23b_interspeech.pdf) | -| 448 | Vocoder Drift in X-Vector–based Speaker Anonymization | [![GitHub](https://img.shields.io/github/stars/eurecom-asp/vocoder-drift?style=flat)](https://github.com/eurecom-asp/vocoder-drift) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/panariello23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02892-b31b1b.svg)](https://arxiv.org/abs/2306.02892) | -| 703 | Malafide: A Novel Adversarial Convolutive Noise Attack Against Deepfake and Spoofing Detection Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/panariello23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07655-b31b1b.svg)](https://arxiv.org/abs/2306.07655) | +| 1026 | Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health | [![GitHub](https://img.shields.io/github/stars/aditthapron/windowMasking?style=flat)](https://github.com/aditthapron/windowMasking) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ditthapron23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.04161-b31b1b.svg)](https://arxiv.org/abs/2302.04161) | +| 727 | eSTImate: A Real-Time Speech Transmission Index Estimator with Speech Enhancement Auxiliary Task using Self-Attention Feature Pyramid Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiang23_interspeech.pdf) | +| 815 | Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23n_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05861-b31b1b.svg)](https://arxiv.org/abs/2306.05861) | +| 2138 | Privacy-Preserving Representation Learning for Speech Understanding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23b_interspeech.pdf) | +| 448 | Vocoder Drift in X-Vector–based Speaker Anonymization | [![GitHub](https://img.shields.io/github/stars/eurecom-asp/vocoder-drift?style=flat)](https://github.com/eurecom-asp/vocoder-drift) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/panariello23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02892-b31b1b.svg)](https://arxiv.org/abs/2306.02892) | +| 703 | Malafide: A Novel Adversarial Convolutive Noise Attack Against Deepfake and Spoofing Detection Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/panariello23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07655-b31b1b.svg)](https://arxiv.org/abs/2306.07655) |
@@ -1591,12 +1591,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1087 | Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/salah-zaiem/speechbrain-2/tree/develop/recipes/SSL_benchmark) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zaiem23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00452-b31b1b.svg)](https://arxiv.org/abs/2306.00452) | -| 383 | An Extension of Disentanglement Metrics and its Application to Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23d_interspeech.pdf) | -| 2131 | An Information-Theoretic Analysis of Self-Supervised Discrete Representations of Speech | [![GitHub](https://img.shields.io/github/stars/uds-lsv/phone2unit?style=flat)](https://github.com/uds-lsv/phone2unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/abdullah23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02405-b31b1b.svg)](https://arxiv.org/abs/2306.02405) | -| 1823 | SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? | [![GitHub](https://img.shields.io/github/stars/ashi-ta/speechGLUE?style=flat)](https://github.com/ashi-ta/speechGLUE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ashihara23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08374-b31b1b.svg)](https://arxiv.org/abs/2306.08374) | -| 1418 | Comparison of GIF- and SSL-based Features in Pathological Voice Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sasou23_interspeech.pdf) | -| 1617 | What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions | [![GitHub](https://img.shields.io/github/stars/Hanyu-Meng/Adapting-LEAF?style=flat)](https://github.com/Hanyu-Meng/Adapting-LEAF) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23c_interspeech.pdf) | +| 1087 | Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/salah-zaiem/speechbrain-2/tree/develop/recipes/SSL_benchmark) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zaiem23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00452-b31b1b.svg)](https://arxiv.org/abs/2306.00452) | +| 383 | An Extension of Disentanglement Metrics and its Application to Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23d_interspeech.pdf) | +| 2131 | An Information-Theoretic Analysis of Self-Supervised Discrete Representations of Speech | [![GitHub](https://img.shields.io/github/stars/uds-lsv/phone2unit?style=flat)](https://github.com/uds-lsv/phone2unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/abdullah23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02405-b31b1b.svg)](https://arxiv.org/abs/2306.02405) | +| 1823 | SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? | [![GitHub](https://img.shields.io/github/stars/ashi-ta/speechGLUE?style=flat)](https://github.com/ashi-ta/speechGLUE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ashihara23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08374-b31b1b.svg)](https://arxiv.org/abs/2306.08374) | +| 1418 | Comparison of GIF- and SSL-based Features in Pathological Voice Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sasou23_interspeech.pdf) | +| 1617 | What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions | [![GitHub](https://img.shields.io/github/stars/Hanyu-Meng/Adapting-LEAF?style=flat)](https://github.com/Hanyu-Meng/Adapting-LEAF) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23c_interspeech.pdf) |
@@ -1608,12 +1608,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1640 | End-to-End Joint Target and Non-Target Speakers ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/masumura23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02273-b31b1b.svg)](https://arxiv.org/abs/2306.02273) | -| 144 | Improving Frame-Level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07949-b31b1b.svg)](https://arxiv.org/abs/2306.07949) | -| 564 | Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-Level Timestamp Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/makishima23_interspeech.pdf) | -| 101 | Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition | [![GitHub](https://img.shields.io/github/stars/YUCHEN005/DPSL-ASR?style=flat)](https://github.com/YUCHEN005/DPSL-ASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.14838-b31b1b.svg)](https://arxiv.org/abs/2203.14838) | -| 142 | Multi-Pass Training and Cross-Information Fusion for Low-Resource End-to-End Accented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11309-b31b1b.svg)](https://arxiv.org/abs/2306.11309) | -| 906 | Text-Only Domain Adaptation for End-to-End ASR using Integrated Text-to-Mel-Spectrogram Generator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bataev23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) | +| 1640 | End-to-End Joint Target and Non-Target Speakers ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/masumura23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02273-b31b1b.svg)](https://arxiv.org/abs/2306.02273) | +| 144 | Improving Frame-Level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07949-b31b1b.svg)](https://arxiv.org/abs/2306.07949) | +| 564 | Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-Level Timestamp Prediction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/makishima23_interspeech.pdf) | +| 101 | Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition | [![GitHub](https://img.shields.io/github/stars/YUCHEN005/DPSL-ASR?style=flat)](https://github.com/YUCHEN005/DPSL-ASR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hu23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.14838-b31b1b.svg)](https://arxiv.org/abs/2203.14838) | +| 142 | Multi-Pass Training and Cross-Information Fusion for Low-Resource End-to-End Accented Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11309-b31b1b.svg)](https://arxiv.org/abs/2306.11309) | +| 906 | Text-Only Domain Adaptation for End-to-End ASR using Integrated Text-to-Mel-Spectrogram Generator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bataev23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) |
@@ -1625,12 +1625,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 461 | Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/tree/main/examples/slu/speech_intent_slot)
[![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huang23_interspeech.pdf) | -| 277 | Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models | [![GitHub](https://img.shields.io/github/stars/hryang06/rda-rcl?style=flat)](https://github.com/hryang06/rda-rcl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23b_interspeech.pdf) | -| 1307 | Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/matsuura23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04233-b31b1b.svg)](https://arxiv.org/abs/2306.04233) | -| 1136 | Audio Retrieval with WavText5K and CLAP Training | [![GitHub](https://img.shields.io/github/stars/microsoft/WavText5K?style=flat)](https://github.com/microsoft/WavText5K) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deshmukh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2209.14275-b31b1b.svg)](https://arxiv.org/abs/2209.14275) | -| 242 | Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/slurp-seqkd?style=flat)](https://github.com/umbertocappellazzo/slurp-seqkd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cappellazzo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13899-b31b1b.svg)](https://arxiv.org/abs/2305.13899) | -| 1652 | Contrastive Disentangled Learning for Memory-Augmented Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chien23b_interspeech.pdf) | +| 461 | Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/NVIDIA/NeMo/tree/main/examples/slu/speech_intent_slot)
[![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huang23_interspeech.pdf) | +| 277 | Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models | [![GitHub](https://img.shields.io/github/stars/hryang06/rda-rcl?style=flat)](https://github.com/hryang06/rda-rcl) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23b_interspeech.pdf) | +| 1307 | Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/matsuura23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04233-b31b1b.svg)](https://arxiv.org/abs/2306.04233) | +| 1136 | Audio Retrieval with WavText5K and CLAP Training | [![GitHub](https://img.shields.io/github/stars/microsoft/WavText5K?style=flat)](https://github.com/microsoft/WavText5K) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deshmukh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2209.14275-b31b1b.svg)](https://arxiv.org/abs/2209.14275) | +| 242 | Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding | [![GitHub](https://img.shields.io/github/stars/umbertocappellazzo/slurp-seqkd?style=flat)](https://github.com/umbertocappellazzo/slurp-seqkd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cappellazzo23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13899-b31b1b.svg)](https://arxiv.org/abs/2305.13899) | +| 1652 | Contrastive Disentangled Learning for Memory-Augmented Transformer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chien23b_interspeech.pdf) |
@@ -1642,12 +1642,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 438 | ProsAudit, a Prosodic Benchmark for Self-Supervised Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deseyssel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12057-b31b1b.svg)](https://arxiv.org/abs/2302.12057) | -| 871 | Self-Supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12464-b31b1b.svg)](https://arxiv.org/abs/2305.12464) | -| 1862 | Evaluating Context-Invariance in Unsupervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/perceptimatic/irpam2023?style=flat)](https://github.com/perceptimatic/irpam2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hallap23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15775-b31b1b.svg)](https://arxiv.org/abs/2210.15775) | -| 1390 | CoBERT: Self-Supervised Speech Representation Learning through Code Representation Learning | [![GitHub](https://img.shields.io/github/stars/mct10/CoBERT?style=flat)](https://github.com/mct10/CoBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.04062-b31b1b.svg)](https://arxiv.org/abs/2210.04062) | -| 847 | Self-Supervised Fine-tuning for Improved Content Representations by Speaker-Invariant Clustering | [![GitHub](https://img.shields.io/github/stars/vectominist/spin?style=flat)](https://github.com/vectominist/spin) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11072-b31b1b.svg)](https://arxiv.org/abs/2305.11072) | -| 359 | Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23d_interspeech.pdf) | +| 438 | ProsAudit, a Prosodic Benchmark for Self-Supervised Speech Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deseyssel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.12057-b31b1b.svg)](https://arxiv.org/abs/2302.12057) | +| 871 | Self-Supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12464-b31b1b.svg)](https://arxiv.org/abs/2305.12464) | +| 1862 | Evaluating Context-Invariance in Unsupervised Speech Representations | [![GitHub](https://img.shields.io/github/stars/perceptimatic/irpam2023?style=flat)](https://github.com/perceptimatic/irpam2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hallap23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.15775-b31b1b.svg)](https://arxiv.org/abs/2210.15775) | +| 1390 | CoBERT: Self-Supervised Speech Representation Learning through Code Representation Learning | [![GitHub](https://img.shields.io/github/stars/mct10/CoBERT?style=flat)](https://github.com/mct10/CoBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.04062-b31b1b.svg)](https://arxiv.org/abs/2210.04062) | +| 847 | Self-Supervised Fine-tuning for Improved Content Representations by Speaker-Invariant Clustering | [![GitHub](https://img.shields.io/github/stars/vectominist/spin?style=flat)](https://github.com/vectominist/spin) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11072-b31b1b.svg)](https://arxiv.org/abs/2305.11072) | +| 359 | Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23d_interspeech.pdf) |
@@ -1659,12 +1659,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1571 | Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/AILTTS_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23b_interspeech.pdf) | -| 2313 | Adapter-based Extension of Multi-Speaker Text-To-Speech Model for New Speakers | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hsiehjackson.github.io/adapter-tts-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hsieh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00585-b31b1b.svg)](https://arxiv.org/abs/2211.00585) | -| 2574 | SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sivaguru23_interspeech.pdf) | -| 2326 | UnitSpeech: Speaker-Adaptive Speech Synthesis with Untranscribed Data | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://unitspeech.github.io)
[![GitHub](https://img.shields.io/github/stars/gmltmd789/UnitSpeech?style=flat)](https://github.com/gmltmd789/UnitSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16083-b31b1b.svg)](https://arxiv.org/abs/2306.16083) | -| 677 | LightVoc: an Upsampling-Free GAN Vocoder based on Conformer and Inverse Short-time Fourier Transform | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightvoc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dang23b_interspeech.pdf) | -| 1095 | ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarulab-speech.github.io/demo_ChatGPT_EDSS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/saito23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13724-b31b1b.svg)](https://arxiv.org/abs/2305.13724) | +| 1571 | Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/AILTTS_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23b_interspeech.pdf) | +| 2313 | Adapter-based Extension of Multi-Speaker Text-To-Speech Model for New Speakers | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hsiehjackson.github.io/adapter-tts-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hsieh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.00585-b31b1b.svg)](https://arxiv.org/abs/2211.00585) | +| 2574 | SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sivaguru23_interspeech.pdf) | +| 2326 | UnitSpeech: Speaker-Adaptive Speech Synthesis with Untranscribed Data | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://unitspeech.github.io)
[![GitHub](https://img.shields.io/github/stars/gmltmd789/UnitSpeech?style=flat)](https://github.com/gmltmd789/UnitSpeech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16083-b31b1b.svg)](https://arxiv.org/abs/2306.16083) | +| 677 | LightVoc: an Upsampling-Free GAN Vocoder based on Conformer and Inverse Short-time Fourier Transform | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightvoc.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dang23b_interspeech.pdf) | +| 1095 | ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sarulab-speech.github.io/demo_ChatGPT_EDSS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/saito23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13724-b31b1b.svg)](https://arxiv.org/abs/2305.13724) |
@@ -1676,39 +1676,39 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1330 | Human Transcription Quality Improvement | [![GitHub](https://img.shields.io/github/stars/GenerateAI/TransAudioUI?style=flat)](https://github.com/GenerateAI/TransAudioUI)
[![GitHub](https://img.shields.io/github/stars/GenerateAI/LibriCrowd?style=flat)](https://github.com/GenerateAI/LibriCrowd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23f_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/human-transcription-quality-improvement) | -| 1604 | The Effect of Masking Noise on Listeners' Spectral Tilt Preferences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/simantiraki23_interspeech.pdf) | -| 1967 | The Effect of Whistled Vowels on Whistled Word Categorization for Naive Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tranngoc23_interspeech.pdf) | -| 1481 | Automatic Deep Neural Network-based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bharati23_interspeech.pdf) | -| 1662 | The Effect of Stress on Mandarin Tonal Perception in Continuous Speech for Spanish-Speaking Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hao23_interspeech.pdf) | -| 1918 | Combining Acoustic and Aerodynamic Data Collection: A Perceptual Evaluation of Acoustic Distortions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elmerich23_interspeech.pdf) | -| 953 | Estimating Virtual Targets for Lingual Stop Consonants using General Tau Theory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elie23b_interspeech.pdf) | -| 1931 | Using Random Forests to Classify Language as a Function of Syllable Timing in Two Groups: Children with Cochlear Implants and with Normal Hearing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gibson23_interspeech.pdf) | -| 2256 | An Improved End-to-End Audio-Visual Speech Recognition Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23w_interspeech.pdf) | -| 1954 | What Influences the Foreign Accent Strength? Phonological and Grammatical Errors in the Perception of Accentedness | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/k2mta/?view_only=f65bdededa9c4ad0b81c43c380ae5b3b) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wesoek23_interspeech.pdf) | -| 2077 | Investigating the Perception Production Link through Perceptual Adaptation and Phonetic Convergence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/huttner23_interspeech.pdf) | -| 1385 | Emotion Prompting for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23f_interspeech.pdf) | -| 1196 | Speech-in-Speech Recognition is Modulated by Familiarity to Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chin23_interspeech.pdf) | -| 673 | BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-Talker Conditions | [![GitHub](https://img.shields.io/github/stars/jzhangU/Basen?style=flat)](https://github.com/jzhangU/Basen) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.09994-b31b1b.svg)](https://arxiv.org/abs/2305.09994) | -| 2046 | Are Retroflex-to-Dental Sibilant Substitutions in Polish Children's Speech an Example of a Covert Contrast? A Preliminary Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/miodonska23_interspeech.pdf) | -| 1123 | First Language Effects on Second Language Perception: Evidence from English Low-Vowel Nasal Sequences Perceived by L1 Mandarin Chinese Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23s_interspeech.pdf) | -| 2247 | Motor Control Similarity between Speakers Saying "a Souk" using Inverse Atlas Tongue Modeling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/maity23_interspeech.pdf) | -| 910 | Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04980-b31b1b.svg)](https://arxiv.org/abs/2306.04980) | -| 317 | A Relationship between Vocal Fold Vibration and Droplet Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoshinaga23_interspeech.pdf) | -| 803 | Audio, Visual and Audiovisual Intelligibility of Vowels Produced in Noise | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/garnier23_interspeech.pdf) | -| 172 | Optimal Control of Speech with Context-Dependent Articulatory Targets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elie23_interspeech.pdf) | -| 593 | Computational Modeling of Auditory Brainstem Responses Derived from Modified Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23d_interspeech.pdf) | -| 1732 | Leveraging Label Information for Multimodal Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Digimonseeker/LE-MER?style=flat)](https://github.com/Digimonseeker/LE-MER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ma_interspeech.pdf) | -| 1465 | Improving End-to-End Modeling for Mandarin-English Code-Switching using Lightweight Switch-Routing Mixture-of-Experts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tan23c_interspeech.pdf) | -| 1803 | Frequency Patterns of Individual Speaker Characteristics at Higher and Lower Spectral Ranges | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23ea_interspeech.pdf) | -| 1818 | Adaptation to Predictive Prosodic cues in Non-Native Standard Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gosselkeberthelsen23_interspeech.pdf) | -| 1007 | Head Movements in Two- and Four-Person Inter-Active Conversational Tasks in Noisy and Moderately Reverberant Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/archerboyd23_interspeech.pdf) | -| 334 | Second Language Identification of Vietnamese Tones by Native Mandarin Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23d_interspeech.pdf) | -| 203 | Nasal Vowel Production and Grammatical Processing in French-Speaking Children with Cochlear Implants and Normal-Hearing Peers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/fagniart23_interspeech.pdf) | -| 412 | Emotion Classification with EEG Responses Evoked by Emotional Prosody of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23f_interspeech.pdf) | -| 145 | L2-Mandarin Regional Accent Variability During Mandarin Tone-Word Training Facilitates English listeners' Subsequent tone Categorizations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23b_interspeech.pdf) | -| 1680 | HumanDiffusion: Diffusion Model using Perceptual Gradients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ueda23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12169-b31b1b.svg)](https://arxiv.org/abs/2306.12169) | -| 2087 | Queer Events, Relationships, and Sports: Does Topic Influence Speakers' Acoustic Expression of Sexual Orientation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kachel23_interspeech.pdf) | +| 1330 | Human Transcription Quality Improvement | [![GitHub](https://img.shields.io/github/stars/GenerateAI/TransAudioUI?style=flat)](https://github.com/GenerateAI/TransAudioUI)
[![GitHub](https://img.shields.io/github/stars/GenerateAI/LibriCrowd?style=flat)](https://github.com/GenerateAI/LibriCrowd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23f_interspeech.pdf)
[![Amazon Science](https://img.shields.io/badge/amazon-science-FE9901.svg)](https://www.amazon.science/publications/human-transcription-quality-improvement) | +| 1604 | The Effect of Masking Noise on Listeners' Spectral Tilt Preferences | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/simantiraki23_interspeech.pdf) | +| 1967 | The Effect of Whistled Vowels on Whistled Word Categorization for Naive Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tranngoc23_interspeech.pdf) | +| 1481 | Automatic Deep Neural Network-based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali) | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bharati23_interspeech.pdf) | +| 1662 | The Effect of Stress on Mandarin Tonal Perception in Continuous Speech for Spanish-Speaking Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hao23_interspeech.pdf) | +| 1918 | Combining Acoustic and Aerodynamic Data Collection: A Perceptual Evaluation of Acoustic Distortions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elmerich23_interspeech.pdf) | +| 953 | Estimating Virtual Targets for Lingual Stop Consonants using General Tau Theory | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elie23b_interspeech.pdf) | +| 1931 | Using Random Forests to Classify Language as a Function of Syllable Timing in Two Groups: Children with Cochlear Implants and with Normal Hearing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gibson23_interspeech.pdf) | +| 2256 | An Improved End-to-End Audio-Visual Speech Recognition Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23w_interspeech.pdf) | +| 1954 | What Influences the Foreign Accent Strength? Phonological and Grammatical Errors in the Perception of Accentedness | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://osf.io/k2mta/?view_only=f65bdededa9c4ad0b81c43c380ae5b3b) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wesoek23_interspeech.pdf) | +| 2077 | Investigating the Perception Production Link through Perceptual Adaptation and Phonetic Convergence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/huttner23_interspeech.pdf) | +| 1385 | Emotion Prompting for Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23f_interspeech.pdf) | +| 1196 | Speech-in-Speech Recognition is Modulated by Familiarity to Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chin23_interspeech.pdf) | +| 673 | BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-Talker Conditions | [![GitHub](https://img.shields.io/github/stars/jzhangU/Basen?style=flat)](https://github.com/jzhangU/Basen) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23m_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.09994-b31b1b.svg)](https://arxiv.org/abs/2305.09994) | +| 2046 | Are Retroflex-to-Dental Sibilant Substitutions in Polish Children's Speech an Example of a Covert Contrast? A Preliminary Acoustic Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/miodonska23_interspeech.pdf) | +| 1123 | First Language Effects on Second Language Perception: Evidence from English Low-Vowel Nasal Sequences Perceived by L1 Mandarin Chinese Listeners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23s_interspeech.pdf) | +| 2247 | Motor Control Similarity between Speakers Saying "a Souk" using Inverse Atlas Tongue Modeling | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/maity23_interspeech.pdf) | +| 910 | Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04980-b31b1b.svg)](https://arxiv.org/abs/2306.04980) | +| 317 | A Relationship between Vocal Fold Vibration and Droplet Production | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoshinaga23_interspeech.pdf) | +| 803 | Audio, Visual and Audiovisual Intelligibility of Vowels Produced in Noise | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/garnier23_interspeech.pdf) | +| 172 | Optimal Control of Speech with Context-Dependent Articulatory Targets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elie23_interspeech.pdf) | +| 593 | Computational Modeling of Auditory Brainstem Responses Derived from Modified Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23d_interspeech.pdf) | +| 1732 | Leveraging Label Information for Multimodal Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/Digimonseeker/LE-MER?style=flat)](https://github.com/Digimonseeker/LE-MER) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ma_interspeech.pdf) | +| 1465 | Improving End-to-End Modeling for Mandarin-English Code-Switching using Lightweight Switch-Routing Mixture-of-Experts | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tan23c_interspeech.pdf) | +| 1803 | Frequency Patterns of Individual Speaker Characteristics at Higher and Lower Spectral Ranges | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23ea_interspeech.pdf) | +| 1818 | Adaptation to Predictive Prosodic cues in Non-Native Standard Dialect | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gosselkeberthelsen23_interspeech.pdf) | +| 1007 | Head Movements in Two- and Four-Person Inter-Active Conversational Tasks in Noisy and Moderately Reverberant Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/archerboyd23_interspeech.pdf) | +| 334 | Second Language Identification of Vietnamese Tones by Native Mandarin Learners | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23d_interspeech.pdf) | +| 203 | Nasal Vowel Production and Grammatical Processing in French-Speaking Children with Cochlear Implants and Normal-Hearing Peers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/fagniart23_interspeech.pdf) | +| 412 | Emotion Classification with EEG Responses Evoked by Emotional Prosody of Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23f_interspeech.pdf) | +| 145 | L2-Mandarin Regional Accent Variability During Mandarin Tone-Word Training Facilitates English listeners' Subsequent tone Categorizations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23b_interspeech.pdf) | +| 1680 | HumanDiffusion: Diffusion Model using Perceptual Gradients | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ueda23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12169-b31b1b.svg)](https://arxiv.org/abs/2306.12169) | +| 2087 | Queer Events, Relationships, and Sports: Does Topic Influence Speakers' Acoustic Expression of Sexual Orientation? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kachel23_interspeech.pdf) |
@@ -1720,12 +1720,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 583 | Factorised Speaker-Environment Adaptive Training of Conformer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14608-b31b1b.svg)](https://arxiv.org/abs/2306.14608) | -| 1349 | Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23aa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) | - | 327 | Cross-Lingual Cross-Age Adaptation for Low-Resource Elderly Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/hltchkust/elderly_ser?style=flat)](https://github.com/hltchkust/elderly_ser) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cahyawijaya23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14517-b31b1b.svg)](https://arxiv.org/abs/2306.14517) | - | 2215 | Modular Domain Adaptation for Conformer-based Streaming ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23fa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13408-b31b1b.svg)](https://arxiv.org/abs/2305.13408) | - | 2192 | Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhatia23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00453-b31b1b.svg)](https://arxiv.org/abs/2307.00453) | -| 1282 | SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization | [![GitHub](https://img.shields.io/github/stars/drumpt/SGEM?style=flat)](https://github.com/drumpt/SGEM/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kim23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01981-b31b1b.svg)](https://arxiv.org/abs/2306.01981) | +| 583 | Factorised Speaker-Environment Adaptive Training of Conformer Speech Recognition Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14608-b31b1b.svg)](https://arxiv.org/abs/2306.14608) | +| 1349 | Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition | [![GitHub](https://img.shields.io/github/stars/NVIDIA/NeMo?style=flat)](https://github.com/NVIDIA/NeMo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23aa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14036-b31b1b.svg)](https://arxiv.org/abs/2302.14036) | + | 327 | Cross-Lingual Cross-Age Adaptation for Low-Resource Elderly Speech Emotion Recognition | [![GitHub](https://img.shields.io/github/stars/hltchkust/elderly_ser?style=flat)](https://github.com/hltchkust/elderly_ser) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cahyawijaya23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.14517-b31b1b.svg)](https://arxiv.org/abs/2306.14517) | + | 2215 | Modular Domain Adaptation for Conformer-based Streaming ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23fa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13408-b31b1b.svg)](https://arxiv.org/abs/2305.13408) | + | 2192 | Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhatia23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00453-b31b1b.svg)](https://arxiv.org/abs/2307.00453) | +| 1282 | SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization | [![GitHub](https://img.shields.io/github/stars/drumpt/SGEM?style=flat)](https://github.com/drumpt/SGEM/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kim23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01981-b31b1b.svg)](https://arxiv.org/abs/2306.01981) |
@@ -1737,32 +1737,32 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 858 | Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions | [![GitHub](https://img.shields.io/github/stars/DigitalPhonetics/IMS-Toucan?style=flat)](https://github.com/DigitalPhonetics/IMS-Toucan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lux23_interspeech.pdf) | -| 2242 | Dual Audio Encoders based Mandarin Prosodic Boundary Prediction by using Multi-Granularity Prosodic Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ga_interspeech.pdf) | -| 645 | NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://anonymousdemo.fun/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02448-b31b1b.svg)](https://arxiv.org/abs/2211.02448) | -| 782 | MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speech11.github.io/MaskedSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23n_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.06170-b31b1b.svg)](https://arxiv.org/abs/2211.06170) | -| 2469 | Narrator or Character: Voice Modulation in an Expressive Multi-Speaker TTS | [![GitHub](https://img.shields.io/github/stars/tpavankalyan/Storynory?style=flat)](https://github.com/tpavankalyan/Storynory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pavankalyan23_interspeech.pdf) | -| 843 | CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cui23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00020-b31b1b.svg)](https://arxiv.org/abs/2307.00020) | -| 1405 | Semi-Supervised Learning for Continuous Emotional Intensity Controllable Speech Synthesis with Disentangled Representations | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://tinyurl.com/2p8vdcnd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.06160-b31b1b.svg)](https://arxiv.org/abs/2211.06160) | -| 1905 | Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speechbot.github.io/expresso/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nguyen23_interspeech.pdf) | -| 1460 | ComedicSpeech: Adaptive Text to Speech For Stand-up Comedy in Low-Resource Scenario | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://xh621.github.io/stand-up-comedy-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23fa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12200-b31b1b.svg)](https://arxiv.org/abs/2305.12200) | -| 1552 | Neural Speech Synthesis with Enriched Phrase Boundaries | [![GitHub](https://img.shields.io/github/stars/mkunes/w2v2_audioFrameClassification?style=flat)](https://github.com/mkunes/w2v2_audioFrameClassification) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kunesova23_interspeech.pdf) | -| 437 | Cross-Lingual Prosody Transfer for Expressive Machine Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/swiatkowski23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11658-b31b1b.svg)](https://arxiv.org/abs/2306.11658) | -| 2178 | Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception | [![GitHub](https://img.shields.io/github/stars/MikeyElmers/paper_interspeech23?style=flat)](https://github.com/MikeyElmers/paper_interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elmers23_interspeech.pdf) | -| 433 | Accentor: An Explicit Lexical Stress Model for TTS Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/geneva23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://lml.bas.bg/~stoyan/interspeech2023.pdf) | -| 1032 | A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ibm.biz/IS23-TBE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shechtman23_interspeech.pdf) | -| 715 | Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffvar.github.io/DDPM-prosody-predictor/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16749-b31b1b.svg)](https://arxiv.org/abs/2305.16749) | -| 289 | Prosody Modeling with 3D Visual Information for Expressive Video Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23c_interspeech.pdf) | -| 1528 | LightClone: Speaker-Guided Parallel Subnet Selection for Few-Shot Voice Cloning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightclone2023.github.io/INTERSPEECH2023-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23f_interspeech.pdf) | -| 1671 | EE-TTS: Emphatic Expressive TTS with Linguistic Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://expressive-emphatic-ttsdemo.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhong23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12107-b31b1b.svg)](https://arxiv.org/abs/2305.12107) | -| 1673 | Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ogun23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17724-b31b1b.svg)](https://arxiv.org/abs/2305.17724) | -| 122 | ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://contextspeech.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xiao23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00782-b31b1b.svg)](https://arxiv.org/abs/2307.00782) | -| 1779 | PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://promptstyle.github.io/PromptStyle) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19522-b31b1b.svg)](https://arxiv.org/abs/2305.19522) -| 1639 | Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffcorrect.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tian23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17436-b31b1b.svg)](https://arxiv.org/abs/2305.17436) | -| 2453 | A Generative Framework for Conversational Laughter: Its "Language Model" and Laughter Sound Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mori23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03465-b31b1b.svg)](https://arxiv.org/abs/2306.03465) | -| 1754 | Towards Spontaneous Style Modeling with Semi-Supervised Pre-training for Conversational Text-to-Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-spontaneousTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23ba_interspeech.pdf) | -| 2072 | Beyond Style: Synthesizing Speech with Pragmatic Functions | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.speech.kth.se/tts-demos/beyond_style/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lameris23_interspeech.pdf) | -| 965 | eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/abbas23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11327-b31b1b.svg)](https://arxiv.org/abs/2306.11327) | +| 858 | Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions | [![GitHub](https://img.shields.io/github/stars/DigitalPhonetics/IMS-Toucan?style=flat)](https://github.com/DigitalPhonetics/IMS-Toucan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lux23_interspeech.pdf) | +| 2242 | Dual Audio Encoders based Mandarin Prosodic Boundary Prediction by using Multi-Granularity Prosodic Representations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ga_interspeech.pdf) | +| 645 | NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](http://anonymousdemo.fun/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23i_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02448-b31b1b.svg)](https://arxiv.org/abs/2211.02448) | +| 782 | MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speech11.github.io/MaskedSpeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23n_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.06170-b31b1b.svg)](https://arxiv.org/abs/2211.06170) | +| 2469 | Narrator or Character: Voice Modulation in an Expressive Multi-Speaker TTS | [![GitHub](https://img.shields.io/github/stars/tpavankalyan/Storynory?style=flat)](https://github.com/tpavankalyan/Storynory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pavankalyan23_interspeech.pdf) | +| 843 | CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cui23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00020-b31b1b.svg)](https://arxiv.org/abs/2307.00020) | +| 1405 | Semi-Supervised Learning for Continuous Emotional Intensity Controllable Speech Synthesis with Disentangled Representations | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://tinyurl.com/2p8vdcnd) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oh23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.06160-b31b1b.svg)](https://arxiv.org/abs/2211.06160) | +| 1905 | Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://speechbot.github.io/expresso/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nguyen23_interspeech.pdf) | +| 1460 | ComedicSpeech: Adaptive Text to Speech For Stand-up Comedy in Low-Resource Scenario | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://xh621.github.io/stand-up-comedy-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23fa_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12200-b31b1b.svg)](https://arxiv.org/abs/2305.12200) | +| 1552 | Neural Speech Synthesis with Enriched Phrase Boundaries | [![GitHub](https://img.shields.io/github/stars/mkunes/w2v2_audioFrameClassification?style=flat)](https://github.com/mkunes/w2v2_audioFrameClassification) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kunesova23_interspeech.pdf) | +| 437 | Cross-Lingual Prosody Transfer for Expressive Machine Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/swiatkowski23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11658-b31b1b.svg)](https://arxiv.org/abs/2306.11658) | +| 2178 | Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception | [![GitHub](https://img.shields.io/github/stars/MikeyElmers/paper_interspeech23?style=flat)](https://github.com/MikeyElmers/paper_interspeech23) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elmers23_interspeech.pdf) | +| 433 | Accentor: An Explicit Lexical Stress Model for TTS Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/geneva23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://lml.bas.bg/~stoyan/interspeech2023.pdf) | +| 1032 | A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ibm.biz/IS23-TBE) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shechtman23_interspeech.pdf) | +| 715 | Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffvar.github.io/DDPM-prosody-predictor/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23j_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16749-b31b1b.svg)](https://arxiv.org/abs/2305.16749) | +| 289 | Prosody Modeling with 3D Visual Information for Expressive Video Dubbing | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23c_interspeech.pdf) | +| 1528 | LightClone: Speaker-Guided Parallel Subnet Selection for Few-Shot Voice Cloning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://lightclone2023.github.io/INTERSPEECH2023-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23f_interspeech.pdf) | +| 1671 | EE-TTS: Emphatic Expressive TTS with Linguistic Information | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://expressive-emphatic-ttsdemo.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhong23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12107-b31b1b.svg)](https://arxiv.org/abs/2305.12107) | +| 1673 | Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ogun23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17724-b31b1b.svg)](https://arxiv.org/abs/2305.17724) | +| 122 | ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://contextspeech.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xiao23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00782-b31b1b.svg)](https://arxiv.org/abs/2307.00782) | +| 1779 | PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://promptstyle.github.io/PromptStyle) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23t_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19522-b31b1b.svg)](https://arxiv.org/abs/2305.19522) +| 1639 | Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://diffcorrect.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tian23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17436-b31b1b.svg)](https://arxiv.org/abs/2305.17436) | +| 2453 | A Generative Framework for Conversational Laughter: Its "Language Model" and Laughter Sound Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mori23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03465-b31b1b.svg)](https://arxiv.org/abs/2306.03465) | +| 1754 | Towards Spontaneous Style Modeling with Semi-Supervised Pre-training for Conversational Text-to-Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://thuhcsi.github.io/interspeech2023-spontaneousTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23ba_interspeech.pdf) | +| 2072 | Beyond Style: Synthesizing Speech with Pragmatic Functions | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.speech.kth.se/tts-demos/beyond_style/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lameris23_interspeech.pdf) | +| 965 | eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/abbas23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11327-b31b1b.svg)](https://arxiv.org/abs/2306.11327) |
@@ -1774,12 +1774,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1146 | BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://soumitri2001.github.io/BeAts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deb23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02680-b31b1b.svg)](https://arxiv.org/abs/2306.02680) | -| 370 | Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech based on Metric Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kashiwagi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14203-b31b1b.svg)](https://arxiv.org/abs/2305.14203) | -| 989 | Whistle-to-Text: Automatic Recognition of the Silbo Gomero Whistled Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jakubiak23_interspeech.pdf) | -| 663 | A Novel Interpretable and Generalizable Re-Synchronization Model for Cued Speech based on a Multi-Cuer Corpus | [![GitHub](https://img.shields.io/github/stars/lufei321/ReSync-CS?style=flat)](https://github.com/lufei321/ReSync-CS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02596-b31b1b.svg)](https://arxiv.org/abs/2306.02596) | -| 668 | Visually Grounded Few-Shot Word Acquisition with Fewer Shots | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nortje23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15937-b31b1b.svg)](https://arxiv.org/abs/2305.15937) | -| 183 | JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23_interspeech.pdf) | +| 1146 | BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://soumitri2001.github.io/BeAts) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deb23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02680-b31b1b.svg)](https://arxiv.org/abs/2306.02680) | +| 370 | Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech based on Metric Learning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kashiwagi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14203-b31b1b.svg)](https://arxiv.org/abs/2305.14203) | +| 989 | Whistle-to-Text: Automatic Recognition of the Silbo Gomero Whistled Language | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jakubiak23_interspeech.pdf) | +| 663 | A Novel Interpretable and Generalizable Re-Synchronization Model for Cued Speech based on a Multi-Cuer Corpus | [![GitHub](https://img.shields.io/github/stars/lufei321/ReSync-CS?style=flat)](https://github.com/lufei321/ReSync-CS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02596-b31b1b.svg)](https://arxiv.org/abs/2306.02596) | +| 668 | Visually Grounded Few-Shot Word Acquisition with Fewer Shots | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nortje23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15937-b31b1b.svg)](https://arxiv.org/abs/2305.15937) | +| 183 | JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23_interspeech.pdf) |
@@ -1791,12 +1791,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1485 | Prompt Guided Copy Mechanism for Conversational Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23z_interspeech.pdf) | -| 1240 | Composing Spoken Hints for Follow-on Question Suggestion in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/faustini23_interspeech.pdf) | -| 1391 | On Monotonic Aggregation for Open-Domain QA | [![GitHub](https://img.shields.io/github/stars/YeonseokJeong/Judge-Specialist?style=flat)](https://github.com/YeonseokJeong/Judge-Specialist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/han23c_interspeech.pdf) | -| 2240 | Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nguyen23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02196-b31b1b.svg)](https://arxiv.org/abs/2306.02196) | -| 1606 | Multi-Scale Attention for Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/GeWu-Lab/MWAFM?style=flat)](https://github.com/GeWu-Lab/MWAFM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17993-b31b1b.svg)](https://arxiv.org/abs/2305.17993) | -| 539 | Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23f_interspeech.pdf) | +| 1485 | Prompt Guided Copy Mechanism for Conversational Question Answering | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23z_interspeech.pdf) | +| 1240 | Composing Spoken Hints for Follow-on Question Suggestion in Voice Assistants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/faustini23_interspeech.pdf) | +| 1391 | On Monotonic Aggregation for Open-Domain QA | [![GitHub](https://img.shields.io/github/stars/YeonseokJeong/Judge-Specialist?style=flat)](https://github.com/YeonseokJeong/Judge-Specialist) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/han23c_interspeech.pdf) | +| 2240 | Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nguyen23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02196-b31b1b.svg)](https://arxiv.org/abs/2306.02196) | +| 1606 | Multi-Scale Attention for Audio Question Answering | [![GitHub](https://img.shields.io/github/stars/GeWu-Lab/MWAFM?style=flat)](https://github.com/GeWu-Lab/MWAFM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17993-b31b1b.svg)](https://arxiv.org/abs/2305.17993) | +| 539 | Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23f_interspeech.pdf) |
@@ -1808,22 +1808,22 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1749 | SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zeng23c_interspeech.pdf) | -| 1530 | Overlap aware Continuous Speech Separation without Permutation Invariant Training Linfeng | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23c_interspeech.pdf) | -| 1952 | Cascaded Encoders for Fine-Tuning ASR Models on Overlapped Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rose23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16398-b31b1b.svg)](https://arxiv.org/abs/2306.16398) | -| 2069 | TokenSplit: using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/erdogan23_interspeech.pdf) | -| 1422 | Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/meng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16263-b31b1b.svg)](https://arxiv.org/abs/2305.16263) | -| 2098 | Time-Domain Transformer-based Audiovisual Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ahmadikalkhorani23_interspeech.pdf) | -| 628 | Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13580-b31b1b.svg)](https://arxiv.org/abs/2305.13580) | -| 1502 | Unsupervised Adaptation with Quality-aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/niu23_interspeech.pdf) | -| 1521 | BA-SOT: Boundary-aware Serialized Output Training for Multi-Talker ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13716-b31b1b.svg)](https://arxiv.org/abs/2305.13716) | -| 1172 | Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gao23e_interspeech.pdf) | -| 975 | Joint Compensation of Multi-Talker Noise and Reverberation for Speech Enhancement with Cochlear Implants using One or More Microphones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gaultier23_interspeech.pdf) | -| 494 | Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yousefi23_interspeech.pdf) | -| 42 | GPU-accelerated Guided Source Separation for Meeting Transcription | [![GitHub](https://img.shields.io/github/stars/desh2608/gss?style=flat)](https://github.com/desh2608/gss) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/raj23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.05271-b31b1b.svg)](https://arxiv.org/abs/2212.05271) | -| 1280 | Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Emrys365/fairseq/tree/wavlm/examples/tshubert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16286-b31b1b.svg)](https://arxiv.org/abs/2305.16286) | -| 2076 | Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23j_interspeech.pdf) | -| 1815 | Mixture Encoder for Joint Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/berger23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12173-b31b1b.svg)](https://arxiv.org/abs/2306.12173) | +| 1749 | SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zeng23c_interspeech.pdf) | +| 1530 | Overlap aware Continuous Speech Separation without Permutation Invariant Training Linfeng | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23c_interspeech.pdf) | +| 1952 | Cascaded Encoders for Fine-Tuning ASR Models on Overlapped Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rose23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16398-b31b1b.svg)](https://arxiv.org/abs/2306.16398) | +| 2069 | TokenSplit: using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/erdogan23_interspeech.pdf) | +| 1422 | Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/meng23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16263-b31b1b.svg)](https://arxiv.org/abs/2305.16263) | +| 2098 | Time-Domain Transformer-based Audiovisual Speaker Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ahmadikalkhorani23_interspeech.pdf) | +| 628 | Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/delcroix23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13580-b31b1b.svg)](https://arxiv.org/abs/2305.13580) | +| 1502 | Unsupervised Adaptation with Quality-aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/niu23_interspeech.pdf) | +| 1521 | BA-SOT: Boundary-aware Serialized Output Training for Multi-Talker ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13716-b31b1b.svg)](https://arxiv.org/abs/2305.13716) | +| 1172 | Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gao23e_interspeech.pdf) | +| 975 | Joint Compensation of Multi-Talker Noise and Reverberation for Speech Enhancement with Cochlear Implants using One or More Microphones | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gaultier23_interspeech.pdf) | +| 494 | Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yousefi23_interspeech.pdf) | +| 42 | GPU-accelerated Guided Source Separation for Meeting Transcription | [![GitHub](https://img.shields.io/github/stars/desh2608/gss?style=flat)](https://github.com/desh2608/gss) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/raj23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.05271-b31b1b.svg)](https://arxiv.org/abs/2212.05271) | +| 1280 | Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/Emrys365/fairseq/tree/wavlm/examples/tshubert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16286-b31b1b.svg)](https://arxiv.org/abs/2305.16286) | +| 2076 | Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23j_interspeech.pdf) | +| 1815 | Mixture Encoder for Joint Speech Separation and Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/berger23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.12173-b31b1b.svg)](https://arxiv.org/abs/2306.12173) |
@@ -1835,10 +1835,10 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 206 | Aberystwyth English Pre-Aspiration in Apparent Time |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hejna23_interspeech.pdf) | -| 1154 | Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23c_interspeech.pdf) | -| 1414 | Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/steiner23_interspeech.pdf) | -| 1704 | Vowel Normalisation in Latent Space for Sociolinguistics |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/burridge23_interspeech.pdf) | +| 206 | Aberystwyth English Pre-Aspiration in Apparent Time |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hejna23_interspeech.pdf) | +| 1154 | Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23c_interspeech.pdf) | +| 1414 | Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/steiner23_interspeech.pdf) | +| 1704 | Vowel Normalisation in Latent Space for Sociolinguistics |:heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/burridge23_interspeech.pdf) | @@ -1850,12 +1850,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1228 | Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23n_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10704-b31b1b.svg)](https://arxiv.org/abs/2305.10704) | -| 1447 | Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lahiri23_interspeech.pdf) | -| 2367 | The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://displace2023.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baghel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00830-b31b1b.svg)](https://arxiv.org/abs/2303.00830) | -| 1982 | Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/paturi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.09313-b31b1b.svg)](https://arxiv.org/abs/2306.09313) | -| 1839 | The SpeeD-ZevoTech Submission at DISPLACE 2023 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pirlogeanu23_interspeech.pdf) | -| 656 | End-to-End Neural Speaker Diarization with Absolute Speaker Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23g_interspeech.pdf) | +| 1228 | Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23n_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10704-b31b1b.svg)](https://arxiv.org/abs/2305.10704) | +| 1447 | Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lahiri23_interspeech.pdf) | +| 2367 | The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://displace2023.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baghel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00830-b31b1b.svg)](https://arxiv.org/abs/2303.00830) | +| 1982 | Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/paturi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.09313-b31b1b.svg)](https://arxiv.org/abs/2306.09313) | +| 1839 | The SpeeD-ZevoTech Submission at DISPLACE 2023 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pirlogeanu23_interspeech.pdf) | +| 656 | End-to-End Neural Speaker Diarization with Absolute Speaker Loss | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23g_interspeech.pdf) |
@@ -1867,12 +1867,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1402 | Towards Single Integrated Spoofing-aware Speaker Verification Embeddings | [![GitHub](https://img.shields.io/github/stars/sasv-challenge/ASVSpoof5-SASVBaseline?style=flat)](https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mun23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19051-b31b1b.svg)](https://arxiv.org/abs/2305.19051) | -| 1352 | Pseudo-Siamese Network based Timbre-Reserved Black-Box Adversarial Attack in Speaker Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23ba_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19020-b31b1b.svg)](https://arxiv.org/abs/2305.19020) | -| 2335 | Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion | [![GitHub](https://img.shields.io/github/stars/ttslr/M2S-ADD?style=flat)](https://github.com/ttslr/M2S-ADD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16353-b31b1b.svg)](https://arxiv.org/abs/2305.16353) | -| 1166 | Robust Audio Anti-Spoofing Countermeasure with Joint Training of Front-end and Back-end and Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23v_interspeech.pdf) | -| 1537 | Improved DeepFake Detection using Whisper Features | [![GitHub](https://img.shields.io/github/stars/piotrkawa/deepfake-whisper-features?style=flat)](https://github.com/piotrkawa/deepfake-whisper-features) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kawa23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01428-b31b1b.svg)](https://arxiv.org/abs/2306.01428) | -| 371 | DoubleDeceiver: Deceiving the Speaker Verification System Protected by Spoofing Countermeasures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23c_interspeech.pdf) | +| 1402 | Towards Single Integrated Spoofing-aware Speaker Verification Embeddings | [![GitHub](https://img.shields.io/github/stars/sasv-challenge/ASVSpoof5-SASVBaseline?style=flat)](https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mun23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19051-b31b1b.svg)](https://arxiv.org/abs/2305.19051) | +| 1352 | Pseudo-Siamese Network based Timbre-Reserved Black-Box Adversarial Attack in Speaker Identification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23ba_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19020-b31b1b.svg)](https://arxiv.org/abs/2305.19020) | +| 2335 | Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion | [![GitHub](https://img.shields.io/github/stars/ttslr/M2S-ADD?style=flat)](https://github.com/ttslr/M2S-ADD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23v_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16353-b31b1b.svg)](https://arxiv.org/abs/2305.16353) | +| 1166 | Robust Audio Anti-Spoofing Countermeasure with Joint Training of Front-end and Back-end and Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23v_interspeech.pdf) | +| 1537 | Improved DeepFake Detection using Whisper Features | [![GitHub](https://img.shields.io/github/stars/piotrkawa/deepfake-whisper-features?style=flat)](https://github.com/piotrkawa/deepfake-whisper-features) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kawa23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01428-b31b1b.svg)](https://arxiv.org/abs/2306.01428) | +| 371 | DoubleDeceiver: Deceiving the Speaker Verification System Protected by Spoofing Countermeasures | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23c_interspeech.pdf) |
@@ -1884,12 +1884,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2209 | On Training a Neural Residual Acoustic echo Suppressor for Improved ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/panchapagesan23_interspeech.pdf) | -| 1429 | Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jmlemercier.github.io/2023/05/30/interspeech2023.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lemercier23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00529-b31b1b.svg)](https://arxiv.org/abs/2303.00529) | -| 378 | UnSE: Unsupervised Speech Enhancement using Optimal Transport | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jiang-wenbin.github.io/UnSE/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23b_interspeech.pdf) | -| 1130 | MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rookiejunchen.github.io/MC-SpEx_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chen23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16250-b31b1b.svg)](https://arxiv.org/abs/2306.16250) | -| 2177 | Causal Signal-based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bartolewska23_interspeech.pdf) | -| 1511 | Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23q_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08454-b31b1b.svg)](https://arxiv.org/abs/2306.08454) | +| 2209 | On Training a Neural Residual Acoustic echo Suppressor for Improved ASR | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/panchapagesan23_interspeech.pdf) | +| 1429 | Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jmlemercier.github.io/2023/05/30/interspeech2023.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lemercier23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00529-b31b1b.svg)](https://arxiv.org/abs/2303.00529) | +| 378 | UnSE: Unsupervised Speech Enhancement using Optimal Transport | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://jiang-wenbin.github.io/UnSE/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23b_interspeech.pdf) | +| 1130 | MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rookiejunchen.github.io/MC-SpEx_demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chen23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.16250-b31b1b.svg)](https://arxiv.org/abs/2306.16250) | +| 2177 | Causal Signal-based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bartolewska23_interspeech.pdf) | +| 1511 | Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23q_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08454-b31b1b.svg)](https://arxiv.org/abs/2306.08454) |
@@ -1901,12 +1901,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2183 | A Metric-Driven Approach to Conformer Layer Pruning for Efficient ASR Inference | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bekal23_interspeech.pdf) | -| 1981 | Distillation Strategies for Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gurunathshivakumar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.09452-b31b1b.svg)](https://arxiv.org/abs/2306.09452) | -| 969 | Another Point of View on Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pouthier23_interspeech.pdf) | -| 1062 | RASR2: The RWTH ASR Toolkit for Generic Sequence-to-Sequence Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/rwth-i6/rasr/tree/generic-seq2seq-decoder) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhou23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17782-b31b1b.svg)](https://arxiv.org/abs/2305.17782) | -| 486 | Streaming Speech-to-Confusion Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/filimonov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03778-b31b1b.svg)](https://arxiv.org/abs/2306.03778) | -| 809 | Accurate and Structured Pruning for Efficient Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jiang23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19549-b31b1b.svg)](https://arxiv.org/abs/2305.19549) | +| 2183 | A Metric-Driven Approach to Conformer Layer Pruning for Efficient ASR Inference | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bekal23_interspeech.pdf) | +| 1981 | Distillation Strategies for Discriminative Speech Recognition Rescoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gurunathshivakumar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.09452-b31b1b.svg)](https://arxiv.org/abs/2306.09452) | +| 969 | Another Point of View on Visual Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pouthier23_interspeech.pdf) | +| 1062 | RASR2: The RWTH ASR Toolkit for Generic Sequence-to-Sequence Speech Recognition | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/rwth-i6/rasr/tree/generic-seq2seq-decoder) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhou23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17782-b31b1b.svg)](https://arxiv.org/abs/2305.17782) | +| 486 | Streaming Speech-to-Confusion Network Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/filimonov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03778-b31b1b.svg)](https://arxiv.org/abs/2306.03778) | +| 809 | Accurate and Structured Pruning for Efficient Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jiang23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19549-b31b1b.svg)](https://arxiv.org/abs/2305.19549) |
@@ -1918,11 +1918,11 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1446 | MERLIon CCS Challenge: A English-Mandarin Code-Switching Child-directed Speech Corpus for Language Identification and Diarization | [![GitHub](https://img.shields.io/github/stars/MERLIon-Challenge/merlion-ccs-2023?style=flat)](https://github.com/MERLIon-Challenge/merlion-ccs-2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chua23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18881-b31b1b.svg)](https://arxiv.org/abs/2305.18881) | -| 1335 | Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech | [![GitHub](https://img.shields.io/github/stars/shashikg/LID-Code-Switching?style=flat)](https://github.com/shashikg/LID-Code-Switching) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gupta23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00736-b31b1b.svg)](https://arxiv.org/abs/2306.00736) | -| 1707 | Investigating Model Performance in Language Identification: beyond Simple Error Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/styles23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18925-b31b1b.svg)](https://arxiv.org/abs/2305.18925) | -| 2533 | Improving Wav2vec2-based Spoken Language Identification by Learning Phonological Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shahin23_interspeech.pdf) | -| 2047 | Language Identification Networks for Multilingual Everyday Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/praveen23_interspeech.pdf) | +| 1446 | MERLIon CCS Challenge: A English-Mandarin Code-Switching Child-directed Speech Corpus for Language Identification and Diarization | [![GitHub](https://img.shields.io/github/stars/MERLIon-Challenge/merlion-ccs-2023?style=flat)](https://github.com/MERLIon-Challenge/merlion-ccs-2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chua23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18881-b31b1b.svg)](https://arxiv.org/abs/2305.18881) | +| 1335 | Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech | [![GitHub](https://img.shields.io/github/stars/shashikg/LID-Code-Switching?style=flat)](https://github.com/shashikg/LID-Code-Switching) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gupta23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00736-b31b1b.svg)](https://arxiv.org/abs/2306.00736) | +| 1707 | Investigating Model Performance in Language Identification: beyond Simple Error Statistics | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/styles23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18925-b31b1b.svg)](https://arxiv.org/abs/2305.18925) | +| 2533 | Improving Wav2vec2-based Spoken Language Identification by Learning Phonological Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shahin23_interspeech.pdf) | +| 2047 | Language Identification Networks for Multilingual Everyday Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/praveen23_interspeech.pdf) |
@@ -1934,12 +1934,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2038 | Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kodali23_interspeech.pdf) | -| 1668 | The Effect of Clinical Intervention on the Speech of Individuals with PTSD: Features and Recognition Performances | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kathan23_interspeech.pdf) | -| 470 | Analysis and Automatic Prediction of Exertion from Speech: Contrasting Objective and Subjective Measures Collected while Running | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/triantafyllopoulos23_interspeech.pdf) | -| 894 | The Androids Corpus: A New Publicly Available Benchmark for Speech based Depression Detection | [![GitHub](https://img.shields.io/github/stars/androidscorpus/data?style=flat)](https://github.com/androidscorpus/data) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tao23_interspeech.pdf) | -| 658 | Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eni23_interspeech.pdf) | -| 839 | Acoustic Characteristics of Depression in Older Adults' Speech: the Role of Covariates | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mijnders23_interspeech.pdf) | +| 2038 | Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kodali23_interspeech.pdf) | +| 1668 | The Effect of Clinical Intervention on the Speech of Individuals with PTSD: Features and Recognition Performances | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kathan23_interspeech.pdf) | +| 470 | Analysis and Automatic Prediction of Exertion from Speech: Contrasting Objective and Subjective Measures Collected while Running | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/triantafyllopoulos23_interspeech.pdf) | +| 894 | The Androids Corpus: A New Publicly Available Benchmark for Speech based Depression Detection | [![GitHub](https://img.shields.io/github/stars/androidscorpus/data?style=flat)](https://github.com/androidscorpus/data) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tao23_interspeech.pdf) | +| 658 | Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eni23_interspeech.pdf) | +| 839 | Acoustic Characteristics of Depression in Older Adults' Speech: the Role of Covariates | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mijnders23_interspeech.pdf) | @@ -1951,10 +1951,10 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 943 | Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sun23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18753-b31b1b.svg)](https://arxiv.org/abs/2305.18753) | -| 1564 | Adapting a ConvNeXt Model to Audio Classification on AudioSet | [![GitHub](https://img.shields.io/github/stars/topel/audioset-convnext-inf?style=flat)](https://github.com/topel/audioset-convnext-inf) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pellegrini23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00830-b31b1b.svg)](https://arxiv.org/abs/2306.00830) | -| 1610 | Few-Shot Class-Incremental Audio Classification using Stochastic Classifier | [![GitHub](https://img.shields.io/github/stars/vinceasvp/meta-sc?style=flat)](https://github.com/vinceasvp/meta-sc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02053-b31b1b.svg)](https://arxiv.org/abs/2306.02053) | -| 1614 | Enhance Temporal Relations in Audio Captioning with Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xie23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01533-b31b1b.svg)](https://arxiv.org/abs/2306.01533) | +| 943 | Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sun23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18753-b31b1b.svg)](https://arxiv.org/abs/2305.18753) | +| 1564 | Adapting a ConvNeXt Model to Audio Classification on AudioSet | [![GitHub](https://img.shields.io/github/stars/topel/audioset-convnext-inf?style=flat)](https://github.com/topel/audioset-convnext-inf) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pellegrini23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00830-b31b1b.svg)](https://arxiv.org/abs/2306.00830) | +| 1610 | Few-Shot Class-Incremental Audio Classification using Stochastic Classifier | [![GitHub](https://img.shields.io/github/stars/vinceasvp/meta-sc?style=flat)](https://github.com/vinceasvp/meta-sc) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23w_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02053-b31b1b.svg)](https://arxiv.org/abs/2306.02053) | +| 1614 | Enhance Temporal Relations in Audio Captioning with Sound Event Detection | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xie23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01533-b31b1b.svg)](https://arxiv.org/abs/2306.01533) |
@@ -1966,28 +1966,28 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 407 | Epoch-based Spectrum Estimation for Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/cadia-lvl/ebs/tree/interspeech2023/)
[![GitHub](https://img.shields.io/github/stars/cadia-lvl/ebs?style=flat)](https://github.com/cadia-lvl/ebs/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gunason23_interspeech.pdf) | -| 1996 | OverFlow: Putting Flows on Top of Neural Transducers for Better TTS | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://shivammehta25.github.io/OverFlow/)
[![GitHub](https://img.shields.io/github/stars/shivammehta25/OverFlow?style=flat)](https://github.com/shivammehta25/OverFlow) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mehta23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.06892-b31b1b.svg)](https://arxiv.org/abs/2211.06892) | -| 1568 | AdapterMix: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | [![GitHub](https://img.shields.io/github/stars/declare-lab/adapter-mix?style=flat)](https://github.com/declare-lab/adapter-mix) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mehrish23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18028-b31b1b.svg)](https://arxiv.org/abs/2305.18028) | -| 506 | Prior-Free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23c_interspeech.pdf) | -| 367 | UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/iashchenko23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00721-b31b1b.svg)](https://arxiv.org/abs/2306.00721) | -| 1301 | Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/SparseTTS-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yoon23_interspeech.pdf) | -| 1151 | Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gwh22.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/guan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04301-b31b1b.svg)](https://arxiv.org/abs/2306.04301) | -| 879 | Towards Robust FastSpeech 2 by Modelling Residual Multimodality | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sony.github.io/ai-research-code/tvcgmm/project_page/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kogel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01442-b31b1b.svg)](https://arxiv.org/abs/2306.01442) | -| 1137 | Real Time Spectrogram Inversion on Mobile Phone | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/google-research/google-research/tree/master/specinvert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rybakov23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.00756-b31b1b.svg)](https://arxiv.org/abs/2203.00756) | -| 58 | Automatic Tuning of Loss Trade-offs without Hyper-Parameter Search in End-to-End Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cnaigithub.github.io/Auto_Tuning_Zeroshot_TTS_and_VC/)
[![GitHub](https://img.shields.io/github/stars/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC?style=flat)](https://github.com/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/park23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16699-b31b1b.svg)](https://arxiv.org/abs/2305.16699) | -| 2056 | A Low-Resource Pipeline for Text-to-Speech from Found Data With Application to Scottish Gaelic | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/dan-wells/kiss-aligner/tree/main/egs/learngaelic_litir) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wells23_interspeech.pdf) | -| 2173 | Self-Supervised Solution to the Control Problem of Articulatory Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tensortract.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/krug23_interspeech.pdf) | -| 1128 | Hierarchical Timbre-Cadence Speaker Encoder for Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://srtts.github.io/tc-zstts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23f_interspeech.pdf) | -| 754 | ZET-Speech: Zero-Shot adaptive Emotion-Controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zet-speech.github.io/ZET-Speech-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13831-b31b1b.svg)](https://arxiv.org/abs/2305.13831) | -| 690 | Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://muyangdu.github.io/WaveRNN-Heuristic-Dynamic-Blending/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/du23_interspeech.pdf) | -| 194 | Intelligible Lip-to-Speech Synthesis with Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://choijeongsoo.github.io/lip2speech-unit/)
[![GitHub](https://img.shields.io/github/stars/choijeongsoo/lip2speech-unit?style=flat)](https://github.com/choijeongsoo/lip2speech-unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/choi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19603-b31b1b.svg)](https://arxiv.org/abs/2305.19603) | -| 1212 | Parameter-Efficient Learning for Text-to-Speech Accent Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tts-research.github.io)
[![GitHub](https://img.shields.io/github/stars/TTS-Research/PEL-TTS?style=flat)](https://github.com/TTS-Research/PEL-TTS)
[![GitHub](https://img.shields.io/github/stars/Li-JEN/PEL-accent-adaptaion?style=flat)](https://github.com/Li-JEN/PEL-accent-adaptaion) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11320-b31b1b.svg)](https://arxiv.org/abs/2305.11320) | -| 820 | Controlling Formant Frequencies with Neural Text-to-Speech for the Manipulation of Perceived Speaker Age | [![GitHub](https://img.shields.io/github/stars/ziafkhan/FastPitch?style=flat)](https://github.com/ziafkhan/FastPitch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/khan23_interspeech.pdf) | -| 2379 | FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder with Multiple STFTs | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://kallavinka8045.github.io/is2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10823-b31b1b.svg)](https://arxiv.org/abs/2305.10823) | -| 1726 | iSTFTNet2: Faster and more Lightweight iSTFT-based Neural Vocoder using 1D-2D CNN | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kaneko23_interspeech.pdf) | -| 534 | VITS2: Improving Quality and Efficiency of Single Stage Text to Speech with Adversarial Learning and Architecture Design | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://vits-2.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kong23_interspeech.pdf) | -| 1175 | Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/luong23_interspeech.pdf) | +| 407 | Epoch-based Spectrum Estimation for Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/cadia-lvl/ebs/tree/interspeech2023/)
[![GitHub](https://img.shields.io/github/stars/cadia-lvl/ebs?style=flat)](https://github.com/cadia-lvl/ebs/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gunason23_interspeech.pdf) | +| 1996 | OverFlow: Putting Flows on Top of Neural Transducers for Better TTS | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://shivammehta25.github.io/OverFlow/)
[![GitHub](https://img.shields.io/github/stars/shivammehta25/OverFlow?style=flat)](https://github.com/shivammehta25/OverFlow) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mehta23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.06892-b31b1b.svg)](https://arxiv.org/abs/2211.06892) | +| 1568 | AdapterMix: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | [![GitHub](https://img.shields.io/github/stars/declare-lab/adapter-mix?style=flat)](https://github.com/declare-lab/adapter-mix) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mehrish23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18028-b31b1b.svg)](https://arxiv.org/abs/2305.18028) | +| 506 | Prior-Free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23c_interspeech.pdf) | +| 367 | UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/iashchenko23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00721-b31b1b.svg)](https://arxiv.org/abs/2306.00721) | +| 1301 | Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hcy71o.github.io/SparseTTS-demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yoon23_interspeech.pdf) | +| 1151 | Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gwh22.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/guan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.04301-b31b1b.svg)](https://arxiv.org/abs/2306.04301) | +| 879 | Towards Robust FastSpeech 2 by Modelling Residual Multimodality | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sony.github.io/ai-research-code/tvcgmm/project_page/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kogel23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01442-b31b1b.svg)](https://arxiv.org/abs/2306.01442) | +| 1137 | Real Time Spectrogram Inversion on Mobile Phone | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/google-research/google-research/tree/master/specinvert) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rybakov23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2203.00756-b31b1b.svg)](https://arxiv.org/abs/2203.00756) | +| 58 | Automatic Tuning of Loss Trade-offs without Hyper-Parameter Search in End-to-End Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://cnaigithub.github.io/Auto_Tuning_Zeroshot_TTS_and_VC/)
[![GitHub](https://img.shields.io/github/stars/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC?style=flat)](https://github.com/cnaigithub/Auto_Tuning_Zeroshot_TTS_and_VC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/park23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16699-b31b1b.svg)](https://arxiv.org/abs/2305.16699) | +| 2056 | A Low-Resource Pipeline for Text-to-Speech from Found Data With Application to Scottish Gaelic | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg?style=flat)](https://github.com/dan-wells/kiss-aligner/tree/main/egs/learngaelic_litir) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wells23_interspeech.pdf) | +| 2173 | Self-Supervised Solution to the Control Problem of Articulatory Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tensortract.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/krug23_interspeech.pdf) | +| 1128 | Hierarchical Timbre-Cadence Speaker Encoder for Zero-Shot Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://srtts.github.io/tc-zstts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23f_interspeech.pdf) | +| 754 | ZET-Speech: Zero-Shot adaptive Emotion-Controllable Text-to-Speech Synthesis with Diffusion and Style-based Models | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zet-speech.github.io/ZET-Speech-Demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13831-b31b1b.svg)](https://arxiv.org/abs/2305.13831) | +| 690 | Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://muyangdu.github.io/WaveRNN-Heuristic-Dynamic-Blending/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/du23_interspeech.pdf) | +| 194 | Intelligible Lip-to-Speech Synthesis with Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://choijeongsoo.github.io/lip2speech-unit/)
[![GitHub](https://img.shields.io/github/stars/choijeongsoo/lip2speech-unit?style=flat)](https://github.com/choijeongsoo/lip2speech-unit) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/choi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19603-b31b1b.svg)](https://arxiv.org/abs/2305.19603) | +| 1212 | Parameter-Efficient Learning for Text-to-Speech Accent Adaptation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://tts-research.github.io)
[![GitHub](https://img.shields.io/github/stars/TTS-Research/PEL-TTS?style=flat)](https://github.com/TTS-Research/PEL-TTS)
[![GitHub](https://img.shields.io/github/stars/Li-JEN/PEL-accent-adaptaion?style=flat)](https://github.com/Li-JEN/PEL-accent-adaptaion) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23p_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11320-b31b1b.svg)](https://arxiv.org/abs/2305.11320) | +| 820 | Controlling Formant Frequencies with Neural Text-to-Speech for the Manipulation of Perceived Speaker Age | [![GitHub](https://img.shields.io/github/stars/ziafkhan/FastPitch?style=flat)](https://github.com/ziafkhan/FastPitch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/khan23_interspeech.pdf) | +| 2379 | FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder with Multiple STFTs | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://kallavinka8045.github.io/is2023/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jang23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10823-b31b1b.svg)](https://arxiv.org/abs/2305.10823) | +| 1726 | iSTFTNet2: Faster and more Lightweight iSTFT-based Neural Vocoder using 1D-2D CNN | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kaneko23_interspeech.pdf) | +| 534 | VITS2: Improving Quality and Efficiency of Single Stage Text to Speech with Adversarial Learning and Architecture Design | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://vits-2.github.io/demo/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kong23_interspeech.pdf) | +| 1175 | Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/luong23_interspeech.pdf) |
@@ -1999,12 +1999,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1608 | HierVST: Hierarchical Adaptive Zero-Shot Voice Style Transfer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hiervst.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23i_interspeech.pdf) | -| 391 | VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhangyongmao.github.io/VISinger2/)
[![GitHub](https://img.shields.io/github/stars/zhangyongmao/VISinger2?style=flat)](https://github.com/zhangyongmao/VISinger2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02903-b31b1b.svg)](https://arxiv.org/abs/2211.02903) | -| 700 | EdenTTS: A Simple and Efficient Parallel Text-to-Speech Architecture with Collaborative Duration-Alignment Learning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://edenynm.github.io/edentts-demo/)
[![GitHub](https://img.shields.io/github/stars/younengma/eden-tts?style=flat)](https://github.com/younengma/eden-tts)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23c_interspeech.pdf) | -| 368 | Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gzs-tv.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23c_interspeech.pdf) | -| 1020 | Speech Inpainting: Context-based Speech Synthesis Guided by Video | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ipcv.github.io/avsi/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/montesinos23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00489-b31b1b.svg)](https://arxiv.org/abs/2306.00489) | -| 2243 | STEN-TTS: Improving Zero-Shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tran23d_interspeech.pdf) | +| 1608 | HierVST: Hierarchical Adaptive Zero-Shot Voice Style Transfer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://hiervst.github.io) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23i_interspeech.pdf) | +| 391 | VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://zhangyongmao.github.io/VISinger2/)
[![GitHub](https://img.shields.io/github/stars/zhangyongmao/VISinger2?style=flat)](https://github.com/zhangyongmao/VISinger2) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.02903-b31b1b.svg)](https://arxiv.org/abs/2211.02903) | +| 700 | EdenTTS: A Simple and Efficient Parallel Text-to-Speech Architecture with Collaborative Duration-Alignment Learning | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://edenynm.github.io/edentts-demo/)
[![GitHub](https://img.shields.io/github/stars/younengma/eden-tts?style=flat)](https://github.com/younengma/eden-tts)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23c_interspeech.pdf) | +| 368 | Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://gzs-tv.github.io/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23c_interspeech.pdf) | +| 1020 | Speech Inpainting: Context-based Speech Synthesis Guided by Video | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ipcv.github.io/avsi/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/montesinos23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00489-b31b1b.svg)](https://arxiv.org/abs/2306.00489) | +| 2243 | STEN-TTS: Improving Zero-Shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tran23d_interspeech.pdf) |
@@ -2016,12 +2016,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 933 | Average Token Delay: A Latency Metric for Simultaneous Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kano23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.13173-b31b1b.svg)](https://arxiv.org/abs/2211.13173) | -| 1450 | Automatic Speech Recognition Transformer with Global Contextual Information Decoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qian23_interspeech.pdf) | -| 1333 | Time-Synchronous One-Pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sudo23c_interspeech.pdf) | -| 2065 | Prefix Search Decoding for RNN Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/praveen23b_interspeech.pdf) | -| 78 | WhisperX: Time-Accurate Speech Transcription of Long-Form Audio | [![GitHub](https://img.shields.io/github/stars/m-bain/whisperX?style=flat)](https://github.com/m-bain/whisperX) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bain23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00747-b31b1b.svg)](https://arxiv.org/abs/2303.00747) | -| 2449 | Implementing Contextual Biasing in GPU Decoder for Online ASR | [![GitHub](https://img.shields.io/github/stars/idiap/contextual-biasing-on-gpus?style=flat)](https://github.com/idiap/contextual-biasing-on-gpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nigmatulina23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15685-b31b1b.svg)](https://arxiv.org/abs/2306.15685) | +| 933 | Average Token Delay: A Latency Metric for Simultaneous Translation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kano23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.13173-b31b1b.svg)](https://arxiv.org/abs/2211.13173) | +| 1450 | Automatic Speech Recognition Transformer with Global Contextual Information Decoder | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qian23_interspeech.pdf) | +| 1333 | Time-Synchronous One-Pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sudo23c_interspeech.pdf) | +| 2065 | Prefix Search Decoding for RNN Transducers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/praveen23b_interspeech.pdf) | +| 78 | WhisperX: Time-Accurate Speech Transcription of Long-Form Audio | [![GitHub](https://img.shields.io/github/stars/m-bain/whisperX?style=flat)](https://github.com/m-bain/whisperX) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bain23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00747-b31b1b.svg)](https://arxiv.org/abs/2303.00747) | +| 2449 | Implementing Contextual Biasing in GPU Decoder for Online ASR | [![GitHub](https://img.shields.io/github/stars/idiap/contextual-biasing-on-gpus?style=flat)](https://github.com/idiap/contextual-biasing-on-gpus) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nigmatulina23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15685-b31b1b.svg)](https://arxiv.org/abs/2306.15685) |
@@ -2033,12 +2033,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2487 | MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-Level Feature Fusion | [![GitHub](https://img.shields.io/github/stars/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch?style=flat)](https://github.com/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chung23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.09640-b31b1b.svg)](https://arxiv.org/abs/2306.09640) | -| 2211 | Enhancing Speech Articulation Analysis using A Geometric Transformation of the X-ray Microbeam Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/attia23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10775-b31b1b.svg)](https://arxiv.org/abs/2305.10775) | -| 1729 | Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jouaiti23_interspeech.pdf) | -| 283 | Improved Contextualized Speech Representations for Tonal Analysis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yuan23_interspeech.pdf) | -| 1738 | A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chandrasekar23_interspeech.pdf)
[![idiap](https://img.shields.io/badge/idiap.ch.5064-FF6A00.svg)](https://publications.idiap.ch/index.php/publications/show/5064) | -| 2229 | FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/eren23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.seas.ucla.edu/spapl/paper/Eray_IS_2023.pdf) | +| 2487 | MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-Level Feature Fusion | [![GitHub](https://img.shields.io/github/stars/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch?style=flat)](https://github.com/Woo-jin-Chung/MF-PAM_mfpam_pitch_estimation_pytorch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chung23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.09640-b31b1b.svg)](https://arxiv.org/abs/2306.09640) | +| 2211 | Enhancing Speech Articulation Analysis using A Geometric Transformation of the X-ray Microbeam Dataset | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/attia23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10775-b31b1b.svg)](https://arxiv.org/abs/2305.10775) | +| 1729 | Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jouaiti23_interspeech.pdf) | +| 283 | Improved Contextualized Speech Representations for Tonal Analysis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yuan23_interspeech.pdf) | +| 1738 | A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chandrasekar23_interspeech.pdf)
[![idiap](https://img.shields.io/badge/idiap.ch.5064-FF6A00.svg)](https://publications.idiap.ch/index.php/publications/show/5064) | +| 2229 | FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/eren23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](http://www.seas.ucla.edu/spapl/paper/Eray_IS_2023.pdf) |
@@ -2050,25 +2050,25 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 928 | Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/piton23_interspeech.pdf) | -| 907 | Uncertainty Estimation for Connectionist Temporal Classification based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rumberg23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1678/2023-Rumberg-Uncertainty_Estimation_for_Connectionist_Temporal_Classification_Based_Speech_Recognition.pdf) | -| 2185 | Speech Breathing Behavior During Pauses in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/charuau23_interspeech.pdf) | -| 926 | Exploiting Diversity of Automatic Transcripts from Distinct Speech Recognition Techniques for Children's Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gebauer23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1679/gebauer_interspeech23_childspeechdiversity.pdf) | -| 1924 | Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benway23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16085-b31b1b.svg)](https://arxiv.org/abs/2305.16085) | -| 978 | BabySLM: Language-Acquisition-Friendly Benchmark of Self-Supervised Spoken Language Models | [![GitHub](https://img.shields.io/github/stars/MarvinLvn/BabySLM?style=flat)](https://github.com/MarvinLvn/BabySLM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lavechin23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01506-b31b1b.svg)](https://arxiv.org/abs/2306.01506) | -| 702 | Data Augmentation for Children ASR and Child-adult Speaker Classification using Voice Conversion Methods | [![GitHub](https://img.shields.io/github/stars/zhao-shuyang/childrenize?style=flat)](https://github.com/zhao-shuyang/childrenize) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23c_interspeech.pdf) | -| 2236 | Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shetty23_interspeech.pdf) | -| 2251 | Automatically Predicting Perceived Conversation Quality in a Pediatric Sample Enriched for Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23u_interspeech.pdf) | -| 1257 | An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/johnson23_interspeech.pdf) | -| 743 | An Analysis of Goodness of Pronunciation for Child Speech | [![GitHub](https://img.shields.io/github/stars/frank613/GOPs?style=flat)](https://github.com/frank613/GOPs) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cao23_interspeech.pdf) | -| 1569 | Measuring Language Development from Child-centered Recordings | [![GitHub](https://img.shields.io/github/stars/yaya-sy/EntropyBasedCLDMetrics?style=flat)](https://github.com/yaya-sy/EntropyBasedCLDMetrics) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sy23_interspeech.pdf) | -| 2057 | Speaking Clearly, Understanding Better: Predicting the L2 Narrative Comprehension of Chinese Bilingual Kindergarten Children based on Speech Intelligibility using a Machine Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hung23_interspeech.pdf) | -| 312 | Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benway23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16111-b31b1b.svg)](https://arxiv.org/abs/2305.16111) | -| 1273 | Understanding Spoken Language Development of Children with ASD using Pre-trained Speech Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14117-b31b1b.svg)](https://arxiv.org/abs/2305.14117) | -| 2099 | Measuring Phonological Precision in Children with Cleft Lip and Palate | [![GitHub](https://img.shields.io/github/stars/TAriasVergara/PhonoQ?style=flat)](https://github.com/TAriasVergara/PhonoQ) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ariasvergara23_interspeech.pdf) | -| 937 | A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ng23_interspeech.pdf) | -| 1873 | Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://clpclf.github.io/clp-clf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baumann23_interspeech.pdf)| -| 1882 | Prospective Validation of Motor-based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/benway23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19090-b31b1b.svg)](https://arxiv.org/abs/2305.19090) | +| 928 | Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/piton23_interspeech.pdf) | +| 907 | Uncertainty Estimation for Connectionist Temporal Classification based Automatic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rumberg23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1678/2023-Rumberg-Uncertainty_Estimation_for_Connectionist_Temporal_Classification_Based_Speech_Recognition.pdf) | +| 2185 | Speech Breathing Behavior During Pauses in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/charuau23_interspeech.pdf) | +| 926 | Exploiting Diversity of Automatic Transcripts from Distinct Speech Recognition Techniques for Children's Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gebauer23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://www.tnt.uni-hannover.de/papers/data/1679/gebauer_interspeech23_childspeechdiversity.pdf) | +| 1924 | Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benway23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16085-b31b1b.svg)](https://arxiv.org/abs/2305.16085) | +| 978 | BabySLM: Language-Acquisition-Friendly Benchmark of Self-Supervised Spoken Language Models | [![GitHub](https://img.shields.io/github/stars/MarvinLvn/BabySLM?style=flat)](https://github.com/MarvinLvn/BabySLM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lavechin23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01506-b31b1b.svg)](https://arxiv.org/abs/2306.01506) | +| 702 | Data Augmentation for Children ASR and Child-adult Speaker Classification using Voice Conversion Methods | [![GitHub](https://img.shields.io/github/stars/zhao-shuyang/childrenize?style=flat)](https://github.com/zhao-shuyang/childrenize) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23c_interspeech.pdf) | +| 2236 | Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shetty23_interspeech.pdf) | +| 2251 | Automatically Predicting Perceived Conversation Quality in a Pediatric Sample Enriched for Autism | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23u_interspeech.pdf) | +| 1257 | An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/johnson23_interspeech.pdf) | +| 743 | An Analysis of Goodness of Pronunciation for Child Speech | [![GitHub](https://img.shields.io/github/stars/frank613/GOPs?style=flat)](https://github.com/frank613/GOPs) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cao23_interspeech.pdf) | +| 1569 | Measuring Language Development from Child-centered Recordings | [![GitHub](https://img.shields.io/github/stars/yaya-sy/EntropyBasedCLDMetrics?style=flat)](https://github.com/yaya-sy/EntropyBasedCLDMetrics) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sy23_interspeech.pdf) | +| 2057 | Speaking Clearly, Understanding Better: Predicting the L2 Narrative Comprehension of Chinese Bilingual Kindergarten Children based on Speech Intelligibility using a Machine Learning Approach | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hung23_interspeech.pdf) | +| 312 | Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benway23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16111-b31b1b.svg)](https://arxiv.org/abs/2305.16111) | +| 1273 | Understanding Spoken Language Development of Children with ASD using Pre-trained Speech Embeddings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23e_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.14117-b31b1b.svg)](https://arxiv.org/abs/2305.14117) | +| 2099 | Measuring Phonological Precision in Children with Cleft Lip and Palate | [![GitHub](https://img.shields.io/github/stars/TAriasVergara/PhonoQ?style=flat)](https://github.com/TAriasVergara/PhonoQ) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ariasvergara23_interspeech.pdf) | +| 937 | A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ng23_interspeech.pdf) | +| 1873 | Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://clpclf.github.io/clp-clf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baumann23_interspeech.pdf)| +| 1882 | Prospective Validation of Motor-based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/benway23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19090-b31b1b.svg)](https://arxiv.org/abs/2305.19090) |
@@ -2080,12 +2080,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2238 | Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ma23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.10915-b31b1b.svg)](https://arxiv.org/abs/2301.10915) | -| 2525 | An Autoregressive Conversational Dynamics Model for Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mcneill23_interspeech.pdf) | -| 1983 | Style-Transfer based Speech and Audio-Visual Scene Understanding for Robot Action Sequence Acquisition from Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hori23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15644-b31b1b.svg)](https://arxiv.org/abs/2306.15644) | -| 1037 | Speech aware Dialog System Technology Challenge (DSTC11) | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://dstc11.dstc.community/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/soltau23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.08704-b31b1b.svg)](https://arxiv.org/abs/2212.08704) | -| 1397 | Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision | [![GitHub](https://img.shields.io/github/stars/thu-spmi/JSA-KRTOD?style=flat)](https://github.com/thu-spmi/JSA-KRTOD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cai23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13199-b31b1b.svg)](https://arxiv.org/abs/2305.13199) | -| 2513 | Tracking Must Go On: Dialogue State Tracking with Verified Self-Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23k_interspeech.pdf) | +| 2238 | Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ma23g_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2301.10915-b31b1b.svg)](https://arxiv.org/abs/2301.10915) | +| 2525 | An Autoregressive Conversational Dynamics Model for Dialogue Systems | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mcneill23_interspeech.pdf) | +| 1983 | Style-Transfer based Speech and Audio-Visual Scene Understanding for Robot Action Sequence Acquisition from Videos | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hori23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15644-b31b1b.svg)](https://arxiv.org/abs/2306.15644) | +| 1037 | Speech aware Dialog System Technology Challenge (DSTC11) | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://dstc11.dstc.community/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/soltau23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.08704-b31b1b.svg)](https://arxiv.org/abs/2212.08704) | +| 1397 | Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision | [![GitHub](https://img.shields.io/github/stars/thu-spmi/JSA-KRTOD?style=flat)](https://github.com/thu-spmi/JSA-KRTOD) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cai23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13199-b31b1b.svg)](https://arxiv.org/abs/2305.13199) | +| 2513 | Tracking Must Go On: Dialogue State Tracking with Verified Self-Training | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23k_interspeech.pdf) |
@@ -2097,12 +2097,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 558 | GL-SSD: Global and Local Speech Style Disentanglement by Vector Quantization for Robust Sentence Boundary Detection in Speech Stream | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23i_interspeech.pdf) | -| 598 | Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shi23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12450-b31b1b.svg)](https://arxiv.org/abs/2305.12450) | -| 2466 | Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gudepu23_interspeech.pdf)| -| 996 | Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/moussa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.05641-b31b1b.svg)](https://arxiv.org/abs/2307.05641) | -| 716 | Real-Time Causal Spectro-Temporal Voice Activity Detection based on Convolutional Encoding and Residual Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wang23k_interspeech.pdf) | -| 2413 | SVVAD: Personal Voice Activity Detection for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19581-b31b1b.svg)](https://arxiv.org/abs/2305.19581) | +| 558 | GL-SSD: Global and Local Speech Style Disentanglement by Vector Quantization for Robust Sentence Boundary Detection in Speech Stream | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23i_interspeech.pdf) | +| 598 | Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shi23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.12450-b31b1b.svg)](https://arxiv.org/abs/2305.12450) | +| 2466 | Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gudepu23_interspeech.pdf)| +| 996 | Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/moussa23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.05641-b31b1b.svg)](https://arxiv.org/abs/2307.05641) | +| 716 | Real-Time Causal Spectro-Temporal Voice Activity Detection based on Convolutional Encoding and Residual Decoding | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wang23k_interspeech.pdf) | +| 2413 | SVVAD: Personal Voice Activity Detection for Speaker Verification | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kang23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19581-b31b1b.svg)](https://arxiv.org/abs/2305.19581) |
@@ -2114,12 +2114,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1613 | Learning Cross-Lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/farooq23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08577-b31b1b.svg)](https://arxiv.org/abs/2306.08577) | -| 2122 | AfriNames: Most ASR models "butcher" African Names | [![Hugging Face](https://img.shields.io/badge/🤗-tobiolatunji-FFD21F.svg)](https://huggingface.co/datasets/tobiolatunji/afrispeech-200) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/olatunji23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00253-b31b1b.svg)](https://arxiv.org/abs/2306.00253) | -| 2528 | Towards Dialect-Inclusive Recognition in a Low-Resource Language: are Balanced Corpora the Answer? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lonergan23_interspeech.pdf) | -| 2588 | Svarah: Evaluating English ASR Systems on Indian Accents | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/Svarah?style=flat)](https://github.com/AI4Bharat/Svarah) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/javed23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15760-b31b1b.svg)](https://arxiv.org/abs/2305.15760) | -| 1044 | N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/talafha23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02902-b31b1b.svg)](https://arxiv.org/abs/2306.02902) | -| 1014 | The MALACH Corpus: Results with End-to-End Architectures and Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/picheny23_interspeech.pdf) | +| 1613 | Learning Cross-Lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/farooq23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.08577-b31b1b.svg)](https://arxiv.org/abs/2306.08577) | +| 2122 | AfriNames: Most ASR models "butcher" African Names | [![Hugging Face](https://img.shields.io/badge/🤗-tobiolatunji-FFD21F.svg)](https://huggingface.co/datasets/tobiolatunji/afrispeech-200) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/olatunji23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00253-b31b1b.svg)](https://arxiv.org/abs/2306.00253) | +| 2528 | Towards Dialect-Inclusive Recognition in a Low-Resource Language: are Balanced Corpora the Answer? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lonergan23_interspeech.pdf) | +| 2588 | Svarah: Evaluating English ASR Systems on Indian Accents | [![GitHub](https://img.shields.io/github/stars/AI4Bharat/Svarah?style=flat)](https://github.com/AI4Bharat/Svarah) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/javed23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15760-b31b1b.svg)](https://arxiv.org/abs/2305.15760) | +| 1044 | N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/talafha23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02902-b31b1b.svg)](https://arxiv.org/abs/2306.02902) | +| 1014 | The MALACH Corpus: Results with End-to-End Architectures and Pretraining | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/picheny23_interspeech.pdf) |
@@ -2131,12 +2131,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 232 | Unsupervised Speech Enhancement with Deep Dynamical Generative Speech and Noise Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07820-b31b1b.svg)](https://arxiv.org/abs/2306.07820) | -| 857 | Noise-Robust Bandwidth Expansion for 8K Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lin23f_interspeech.pdf) | -| 113 | mdctGAN: Taming Transformer-based GAN for Speech Super-Resolution with Modified DCT Spectra | [![GitHub](https://img.shields.io/github/stars/neoncloud/mdctgan?style=flat)](https://github.com/neoncloud/mdctgan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/shuai23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11104-b31b1b.svg)](https://arxiv.org/abs/2305.11104) | -| 625 | Zoneformer: On-Device Neural Beamformer for In-Car Multi-Zone Speech Separation, Enhancement and echo Cancellation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongxuustc.github.io/zf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23b_interspeech.pdf) | -| 634 | Low-Complexity Broadband Beampattern Synthesis using Array Response Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/xu23c_interspeech.pdf) | -| 904 | A GAN Speech Inpainting Model for Audio Editing Software | [![GitHub](https://img.shields.io/github/stars/HXZhao1/GSIM?style=flat)](https://github.com/HXZhao1/GSIM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhao23d_interspeech.pdf) | +| 232 | Unsupervised Speech Enhancement with Deep Dynamical Generative Speech and Noise Models | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07820-b31b1b.svg)](https://arxiv.org/abs/2306.07820) | +| 857 | Noise-Robust Bandwidth Expansion for 8K Speech Recordings | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lin23f_interspeech.pdf) | +| 113 | mdctGAN: Taming Transformer-based GAN for Speech Super-Resolution with Modified DCT Spectra | [![GitHub](https://img.shields.io/github/stars/neoncloud/mdctgan?style=flat)](https://github.com/neoncloud/mdctgan) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/shuai23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.11104-b31b1b.svg)](https://arxiv.org/abs/2305.11104) | +| 625 | Zoneformer: On-Device Neural Beamformer for In-Car Multi-Zone Speech Separation, Enhancement and echo Cancellation | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yongxuustc.github.io/zf/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23b_interspeech.pdf) | +| 634 | Low-Complexity Broadband Beampattern Synthesis using Array Response Control | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/xu23c_interspeech.pdf) | +| 904 | A GAN Speech Inpainting Model for Audio Editing Software | [![GitHub](https://img.shields.io/github/stars/HXZhao1/GSIM?style=flat)](https://github.com/HXZhao1/GSIM) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhao23d_interspeech.pdf) |
@@ -2148,10 +2148,10 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2316 | Deep Speech Synthesis from MRI-based Articulatory Representations | [![GitHub](https://img.shields.io/github/stars/articulatory/articulatory?style=flat)](https://github.com/articulatory/articulatory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/wu23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.02471-b31b1b.svg)](https://arxiv.org/abs/2307.02471) | -| 562 | Learning to Compute the Articulatory Representations of Speech with the MIRRORNET | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yashish92.github.io/MirrorNet-for-speech/)
[![GitHub](https://img.shields.io/github/stars/Yashish92/MirrorNet-for-speech?style=flat)](https://github.com/Yashish92/MirrorNet-for-speech)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/siriwardena23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.16454-b31b1b.svg)](https://arxiv.org/abs/2210.16454) | -| 804 | Generating High-Resolution 3D Real-Time MRI of the Vocal Tract | [![GitHub](https://img.shields.io/github/stars/tonioser/supplementary-material-Interspeech2023-paper804?style=flat)](https://github.com/tonioser/supplementary-material-Interspeech2023-paper804) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/strauch23_interspeech.pdf) | -| 1593 | Exploring a Classification Approach using Quantised Articulatory Movements for Acoustic to Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bandekar23_interspeech.pdf) | +| 2316 | Deep Speech Synthesis from MRI-based Articulatory Representations | [![GitHub](https://img.shields.io/github/stars/articulatory/articulatory?style=flat)](https://github.com/articulatory/articulatory) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/wu23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.02471-b31b1b.svg)](https://arxiv.org/abs/2307.02471) | +| 562 | Learning to Compute the Articulatory Representations of Speech with the MIRRORNET | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://yashish92.github.io/MirrorNet-for-speech/)
[![GitHub](https://img.shields.io/github/stars/Yashish92/MirrorNet-for-speech?style=flat)](https://github.com/Yashish92/MirrorNet-for-speech)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/siriwardena23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2210.16454-b31b1b.svg)](https://arxiv.org/abs/2210.16454) | +| 804 | Generating High-Resolution 3D Real-Time MRI of the Vocal Tract | [![GitHub](https://img.shields.io/github/stars/tonioser/supplementary-material-Interspeech2023-paper804?style=flat)](https://github.com/tonioser/supplementary-material-Interspeech2023-paper804) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/strauch23_interspeech.pdf) | +| 1593 | Exploring a Classification Approach using Quantised Articulatory Movements for Acoustic to Articulatory Inversion | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bandekar23_interspeech.pdf) |
@@ -2163,15 +2163,15 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 633 | Coherence Estimation Tracks Auditory Attention in Listeners with Hearing Impairment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/keding23_interspeech.pdf) | -| 2378 | Enhancing the EEG Speech Match Mismatch Tasks with Word Boundaries | [![GitHub](https://img.shields.io/github/stars/iiscleap/EEGspeech-MatchMismatch?style=flat)](https://github.com/iiscleap/EEGspeech-MatchMismatch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/soman23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00366-b31b1b.svg)](https://arxiv.org/abs/2307.00366) | -| 1347 | Similar Hierarchical Representation of Speech and Other Complex Sounds in the Brain and Deep Residual Networks: an MEG Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cheng23e_interspeech.pdf) | -| 121 | Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oota23_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04131475) | -| 282 | MEG Encoding using Word Context Semantics in Listening Stories | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/oota23b_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04148324) | -| 1949 | Investigating the Cortical Tracking of Speech and Music with Sung Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cantisani23_interspeech.pdf) | -| 414 | Exploring Auditory Attention Decoding using Speaker Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/qiu23_interspeech.pdf)| -| 1776 | Effects of Spectral Degradation on the Cortical Tracking of the Speech Envelope | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/macintyre23_interspeech.pdf) | -| 964 | Effects of Spectral and Temporal Modulation Degradation on Intelligibility and Cortical Tracking of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/calderondepalma23_interspeech.pdf) | +| 633 | Coherence Estimation Tracks Auditory Attention in Listeners with Hearing Impairment | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/keding23_interspeech.pdf) | +| 2378 | Enhancing the EEG Speech Match Mismatch Tasks with Word Boundaries | [![GitHub](https://img.shields.io/github/stars/iiscleap/EEGspeech-MatchMismatch?style=flat)](https://github.com/iiscleap/EEGspeech-MatchMismatch) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/soman23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2307.00366-b31b1b.svg)](https://arxiv.org/abs/2307.00366) | +| 1347 | Similar Hierarchical Representation of Speech and Other Complex Sounds in the Brain and Deep Residual Networks: an MEG Study | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cheng23e_interspeech.pdf) | +| 121 | Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oota23_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04131475) | +| 282 | MEG Encoding using Word Context Semantics in Listening Stories | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/oota23b_interspeech.pdf)
[![HAL Science](https://img.shields.io/badge/hal-science-040060.svg)](https://hal.science/hal-04148324) | +| 1949 | Investigating the Cortical Tracking of Speech and Music with Sung Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cantisani23_interspeech.pdf) | +| 414 | Exploring Auditory Attention Decoding using Speaker Features | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/qiu23_interspeech.pdf)| +| 1776 | Effects of Spectral Degradation on the Cortical Tracking of the Speech Envelope | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/macintyre23_interspeech.pdf) | +| 964 | Effects of Spectral and Temporal Modulation Degradation on Intelligibility and Cortical Tracking of Speech Signals | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/calderondepalma23_interspeech.pdf) |
@@ -2183,12 +2183,12 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2061 | Transfer Learning for Personality Perception via Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/li23da_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16076-b31b1b.svg)](https://arxiv.org/abs/2305.16076) | -| 1131 | A Stimulus-Organism-Response Model of Willingness to Buy from Advertising Speech using Voice Quality | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023-SOR-VQ/)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nagano23_interspeech.pdf) | -| 1835 | Voice Passing: A Non-Binary Voice Gender Prediction System for evaluating Transgender | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/doukhan23_interspeech.pdf) | -| 1139 | Influence of Personal Traits on Impressions of One's Own Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yanagida23_interspeech.pdf) | -| 887 | Pardon my Disfluency: The Impact of Disfluency Effects on the Perception of Speaker Competence and Confidence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kirkland23_interspeech.pdf) | -| 711 | Cross-Linguistic Emotion Perception in Human and TTS Voices | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://michelledcohn.com/2023/05/19/interspeech-2023-paper-on-cross-cultural-emotion-perception/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gessinger23_interspeech.pdf)| +| 2061 | Transfer Learning for Personality Perception via Speech Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/li23da_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16076-b31b1b.svg)](https://arxiv.org/abs/2305.16076) | +| 1131 | A Stimulus-Organism-Response Model of Willingness to Buy from Advertising Speech using Voice Quality | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://ntt-hilab-gensp.github.io/is2023-SOR-VQ/)| [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nagano23_interspeech.pdf) | +| 1835 | Voice Passing: A Non-Binary Voice Gender Prediction System for evaluating Transgender | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/doukhan23_interspeech.pdf) | +| 1139 | Influence of Personal Traits on Impressions of One's Own Voice | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yanagida23_interspeech.pdf) | +| 887 | Pardon my Disfluency: The Impact of Disfluency Effects on the Perception of Speaker Competence and Confidence | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kirkland23_interspeech.pdf) | +| 711 | Cross-Linguistic Emotion Perception in Human and TTS Voices | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://michelledcohn.com/2023/05/19/interspeech-2023-paper-on-cross-cultural-emotion-perception/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gessinger23_interspeech.pdf)|
@@ -2200,10 +2200,10 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 1302 | Joint Learning Feature and Model Adaptation for Unsupervised Acoustic Modelling of Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/duan23_interspeech.pdf) | -| 1681 | Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics | [![GitHub](https://img.shields.io/github/stars/bomolenaar/jasmin_data_prep?style=flat)](https://github.com/bomolenaar/jasmin_data_prep)
[![GitHub](https://img.shields.io/github/stars/cristiantg/kaldi_egs_CGN?style=flat)](https://github.com/cristiantg/kaldi_egs_CGN/tree/onPonyLand) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/molenaar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03444-b31b1b.svg)](https://arxiv.org/abs/2306.03444) | -| 2084 | An ASR-enabled Reading Tutor: Investigating Feedback to Optimize Interaction for Learning to Read | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bai23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://aichildinteraction.github.io/preprint/AIAIC23_paper_7671.pdf) | -| 935 | Adaptation of Whisper Models to Child Speech Recognition | [![GitHub](https://img.shields.io/github/stars/C3Imaging/whisper_child_asr?style=flat)](https://github.com/C3Imaging/whisper_child_asr)
[![Hugging Face](https://img.shields.io/badge/🤗-rishabhjain16-FFD21F.svg)](https://huggingface.co/rishabhjain16) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jain23_interspeech.pdf) | +| 1302 | Joint Learning Feature and Model Adaptation for Unsupervised Acoustic Modelling of Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/duan23_interspeech.pdf) | +| 1681 | Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics | [![GitHub](https://img.shields.io/github/stars/bomolenaar/jasmin_data_prep?style=flat)](https://github.com/bomolenaar/jasmin_data_prep)
[![GitHub](https://img.shields.io/github/stars/cristiantg/kaldi_egs_CGN?style=flat)](https://github.com/cristiantg/kaldi_egs_CGN/tree/onPonyLand) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/molenaar23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.03444-b31b1b.svg)](https://arxiv.org/abs/2306.03444) | +| 2084 | An ASR-enabled Reading Tutor: Investigating Feedback to Optimize Interaction for Learning to Read | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bai23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://aichildinteraction.github.io/preprint/AIAIC23_paper_7671.pdf) | +| 935 | Adaptation of Whisper Models to Child Speech Recognition | [![GitHub](https://img.shields.io/github/stars/C3Imaging/whisper_child_asr?style=flat)](https://github.com/C3Imaging/whisper_child_asr)
[![Hugging Face](https://img.shields.io/badge/🤗-rishabhjain16-FFD21F.svg)](https://huggingface.co/rishabhjain16) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jain23_interspeech.pdf) |
@@ -2215,28 +2215,28 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2064 | Automatic Evaluation of Turn-Taking Cues in Conversational Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://erikekstedt.github.io/vap_tts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ekstedt23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17971-b31b1b.svg)](https://arxiv.org/abs/2305.17971) | -| 441 | Expressive Machine Dubbing through Phrase-Level Cross-Lingual Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/swiatkowski23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11662-b31b1b.svg)](https://arxiv.org/abs/2306.11662) | -| 1691 | Robust Feature Decoupling in Voice Conversion by using Locality-based Instance Normalization | [![GitHub](https://img.shields.io/github/stars/BrightGu/LoINVC?style=flat)](https://github.com/BrightGu/LoINVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gu23b_interspeech.pdf) | -| 612 | Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/jia23_interspeech.pdf) | -| 2148 | The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://phat-do.github.io/nodict-IS23/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00535-b31b1b.svg)](https://arxiv.org/abs/2306.00535) | -| 1727 | GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bytecong.github.io/GenerTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cong23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15304-b31b1b.svg)](https://arxiv.org/abs/2306.15304) | -| 1285 | Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech based on Tail Probabilities | [![GitHub](https://img.shields.io/github/stars/todalab/mos-analysis-interspeech2023?style=flat)](https://github.com/todalab/mos-analysis-interspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yasuda23_interspeech.pdf) | -| 1584 | LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://google.github.io/df-conformer/)
[![Openslr](https://img.shields.io/badge/OpenSLR-dataset-FFD1BF.svg)](http://www.openslr.org/141/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/koizumi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18802-b31b1b.svg)](https://arxiv.org/abs/2305.18802) | -| 1067 | UniFLG: Unified Facial Landmark Generator from Text or Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rinnakk.github.io/research/publications/UniFLG/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mitsui23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14337-b31b1b.svg)](https://arxiv.org/abs/2302.14337) | -| 444 | XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech | [![GitHub](https://img.shields.io/github/stars/VinAIResearch/XPhoneBERT?style=flat)](https://github.com/VinAIResearch/XPhoneBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/thenguyen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19709-b31b1b.svg)](https://arxiv.org/abs/2305.19709) | -| 2224 | ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | [![ClArTTS](https://img.shields.io/badge/ClArTTS-dataset-CBB2FF.svg)](https://www.clartts.com) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/kulkarni23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00069-b31b1b.svg)](https://arxiv.org/abs/2303.00069) | -| 154 | Diffusion-based Accent Modelling in Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/deja23_interspeech.pdf) | -| 249 | Multilingual Text-to-Speech Synthesis for Turkic Languages using Transliteration | [![GitHub](https://img.shields.io/github/stars/IS2AI/TurkicTTS?style=flat)](https://github.com/IS2AI/TurkicTTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yeshpanov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15749-b31b1b.svg)](https://arxiv.org/abs/2305.15749) | -| 553 | CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation | [![GitHub](https://img.shields.io/github/stars/NewZsh/polyphone?style=flat)](https://github.com/NewZsh/polyphone) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zhang23h_interspeech.pdf) | -| 709 | Improve Bilingual TTS using Language and Phonology Embedding with Embedding Strength Modulator | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://fyyang1996.github.io/esm/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.03435-b31b1b.svg)](https://arxiv.org/abs/2212.03435) | -| 2179 | High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ranacm.github.io/DSU-AVO/)
[![GitHub](https://img.shields.io/github/stars/RanaCM/DSU-AVO?style=flat)](https://github.com/RanaCM/DSU-AVO) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lu23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.17005-b31b1b.svg)](https://arxiv.org/abs/2306.17005) | -| 1097 | PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yu23_interspeech.pdf) | -| 2158 | Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](phat-do.github.io/sigul22) |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/do23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19396-b31b1b.svg)](https://arxiv.org/abs/2305.19396) | -| 416 | Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously | [![GitHub](https://img.shields.io/github/stars/d223302/SubjectiveEvaluation?style=flat)](https://github.com/d223302/SubjectiveEvaluation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chiang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02044-b31b1b.svg)](https://arxiv.org/abs/2306.02044) | -| 1622 | Speaker-Independent Neural Formant Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://perezpoz.github.io/neuralformants) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/perezzarazaga23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01957-b31b1b.svg)](https://arxiv.org/abs/2306.01957) | -| 1098 | CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sython.org/Corpus/STUDIES-2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/saito23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13713-b31b1b.svg)](https://arxiv.org/abs/2305.13713) | -| 430 | SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous19283746.github.io/saspeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sharoni23_interspeech.pdf) | +| 2064 | Automatic Evaluation of Turn-Taking Cues in Conversational Speech Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://erikekstedt.github.io/vap_tts/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ekstedt23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.17971-b31b1b.svg)](https://arxiv.org/abs/2305.17971) | +| 441 | Expressive Machine Dubbing through Phrase-Level Cross-Lingual Prosody Transfer | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/swiatkowski23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.11662-b31b1b.svg)](https://arxiv.org/abs/2306.11662) | +| 1691 | Robust Feature Decoupling in Voice Conversion by using Locality-based Instance Normalization | [![GitHub](https://img.shields.io/github/stars/BrightGu/LoINVC?style=flat)](https://github.com/BrightGu/LoINVC) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gu23b_interspeech.pdf) | +| 612 | Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/jia23_interspeech.pdf) | +| 2148 | The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://phat-do.github.io/nodict-IS23/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23c_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.00535-b31b1b.svg)](https://arxiv.org/abs/2306.00535) | +| 1727 | GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://bytecong.github.io/GenerTTS/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cong23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.15304-b31b1b.svg)](https://arxiv.org/abs/2306.15304) | +| 1285 | Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech based on Tail Probabilities | [![GitHub](https://img.shields.io/github/stars/todalab/mos-analysis-interspeech2023?style=flat)](https://github.com/todalab/mos-analysis-interspeech2023) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yasuda23_interspeech.pdf) | +| 1584 | LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://google.github.io/df-conformer/)
[![Openslr](https://img.shields.io/badge/OpenSLR-dataset-FFD1BF.svg)](http://www.openslr.org/141/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/koizumi23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.18802-b31b1b.svg)](https://arxiv.org/abs/2305.18802) | +| 1067 | UniFLG: Unified Facial Landmark Generator from Text or Speech | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://rinnakk.github.io/research/publications/UniFLG/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mitsui23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2302.14337-b31b1b.svg)](https://arxiv.org/abs/2302.14337) | +| 444 | XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech | [![GitHub](https://img.shields.io/github/stars/VinAIResearch/XPhoneBERT?style=flat)](https://github.com/VinAIResearch/XPhoneBERT) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/thenguyen23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19709-b31b1b.svg)](https://arxiv.org/abs/2305.19709) | +| 2224 | ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | [![ClArTTS](https://img.shields.io/badge/ClArTTS-dataset-CBB2FF.svg)](https://www.clartts.com) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/kulkarni23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2303.00069-b31b1b.svg)](https://arxiv.org/abs/2303.00069) | +| 154 | Diffusion-based Accent Modelling in Speech Synthesis | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/deja23_interspeech.pdf) | +| 249 | Multilingual Text-to-Speech Synthesis for Turkic Languages using Transliteration | [![GitHub](https://img.shields.io/github/stars/IS2AI/TurkicTTS?style=flat)](https://github.com/IS2AI/TurkicTTS) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yeshpanov23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15749-b31b1b.svg)](https://arxiv.org/abs/2305.15749) | +| 553 | CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation | [![GitHub](https://img.shields.io/github/stars/NewZsh/polyphone?style=flat)](https://github.com/NewZsh/polyphone) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zhang23h_interspeech.pdf) | +| 709 | Improve Bilingual TTS using Language and Phonology Embedding with Embedding Strength Modulator | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://fyyang1996.github.io/esm/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yang23k_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2212.03435-b31b1b.svg)](https://arxiv.org/abs/2212.03435) | +| 2179 | High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://ranacm.github.io/DSU-AVO/)
[![GitHub](https://img.shields.io/github/stars/RanaCM/DSU-AVO?style=flat)](https://github.com/RanaCM/DSU-AVO) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lu23f_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.17005-b31b1b.svg)](https://arxiv.org/abs/2306.17005) | +| 1097 | PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yu23_interspeech.pdf) | +| 2158 | Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](phat-do.github.io/sigul22) |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/do23d_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.19396-b31b1b.svg)](https://arxiv.org/abs/2305.19396) | +| 416 | Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously | [![GitHub](https://img.shields.io/github/stars/d223302/SubjectiveEvaluation?style=flat)](https://github.com/d223302/SubjectiveEvaluation) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chiang23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.02044-b31b1b.svg)](https://arxiv.org/abs/2306.02044) | +| 1622 | Speaker-Independent Neural Formant Synthesis | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://perezpoz.github.io/neuralformants) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/perezzarazaga23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01957-b31b1b.svg)](https://arxiv.org/abs/2306.01957) | +| 1098 | CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://sython.org/Corpus/STUDIES-2/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/saito23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.13713-b31b1b.svg)](https://arxiv.org/abs/2305.13713) | +| 430 | SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://anonymous19283746.github.io/saspeech/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sharoni23_interspeech.pdf) |
@@ -2248,18 +2248,18 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2618 | A Personalised Speech Communication Application for Dysarthric Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gibson23b_interspeech.pdf) | -| 2624 | Video Multimodal Emotion Recognition System for Real World Applications | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lee23l_interspeech.pdf) | -| 2626 | Promoting Mental Self-Disclosure in a Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rohmatillah23_interspeech.pdf) | -| 2632 | "Select Language, Modality or Put on a Mask!" Experiments with Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bujnowski23_interspeech.pdf) | -| 2635 | My Vowels Matter: Formant Automation Tools for Diverse Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/valentine23_interspeech.pdf) | -| 2636 | NEMA: An Ecologically Valid Tool for Assessing Hearing Devices, Advanced Algorithms, and Communication in Diverse Listening Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/chongwhite23_interspeech.pdf) | -| 2644 | When Words Speak Just as Loudly as Actions: Virtual Agent based Remote Health Assessment Integrating What Patients Say with What They Do | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ramanarayanan23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1wxkBg7fqSi0yV6uLjNO4FyhT3cEKoDhF/view) | -| 2648 | Stuttering Detection Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/motepalli23_interspeech.pdf)| -| 2649 | Providing Interpretable Insights for Neurological Speech and Cognitive Disorders from Interactive Serious Games | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/zusag23b_interspeech.pdf) | -| 2651 | Automated Neural Nursing Assistant (ANNA): An Over-the-Phone System for Cognitive Monitoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/solinsky23_interspeech.pdf) | -| 2656 | 5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://cogmhear.org/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gupta23b_interspeech.pdf) | -| 2671 | Towards Two-Point Neuron-Inspired Energy-Efficient Multimodal Open Master Hearing aid | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/raza23_interspeech.pdf) | +| 2618 | A Personalised Speech Communication Application for Dysarthric Speakers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gibson23b_interspeech.pdf) | +| 2624 | Video Multimodal Emotion Recognition System for Real World Applications | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lee23l_interspeech.pdf) | +| 2626 | Promoting Mental Self-Disclosure in a Spoken Dialogue System | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rohmatillah23_interspeech.pdf) | +| 2632 | "Select Language, Modality or Put on a Mask!" Experiments with Multimodal Emotion Recognition | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bujnowski23_interspeech.pdf) | +| 2635 | My Vowels Matter: Formant Automation Tools for Diverse Child Speech | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/valentine23_interspeech.pdf) | +| 2636 | NEMA: An Ecologically Valid Tool for Assessing Hearing Devices, Advanced Algorithms, and Communication in Diverse Listening Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/chongwhite23_interspeech.pdf) | +| 2644 | When Words Speak Just as Loudly as Actions: Virtual Agent based Remote Health Assessment Integrating What Patients Say with What They Do | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ramanarayanan23_interspeech.pdf)
[![Pdf](https://img.shields.io/badge/pdf-version-003B10.svg)](https://drive.google.com/file/d/1wxkBg7fqSi0yV6uLjNO4FyhT3cEKoDhF/view) | +| 2648 | Stuttering Detection Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/motepalli23_interspeech.pdf)| +| 2649 | Providing Interpretable Insights for Neurological Speech and Cognitive Disorders from Interactive Serious Games | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/zusag23b_interspeech.pdf) | +| 2651 | Automated Neural Nursing Assistant (ANNA): An Over-the-Phone System for Cognitive Monitoring | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/solinsky23_interspeech.pdf) | +| 2656 | 5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids | [![WEB Page](https://img.shields.io/badge/WEB-Page-159957.svg)](https://cogmhear.org/index.html) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gupta23b_interspeech.pdf) | +| 2671 | Towards Two-Point Neuron-Inspired Energy-Efficient Multimodal Open Master Hearing aid | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/raza23_interspeech.pdf) |
@@ -2271,16 +2271,16 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2614 | DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/Rikorose/DeepFilterNet?style=flat)](https://github.com/Rikorose/DeepFilterNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/schroter23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.08227-b31b1b.svg)](https://arxiv.org/abs/2305.08227) | -| 2615 | Nkululeko: Machine Learning Experiments on Speaker Characteristics without Programming | [![GitHub](https://img.shields.io/github/stars/felixbur/nkululeko?style=flat)](https://github.com/felixbur/nkululeko) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/burkhardt23_interspeech.pdf) | -| 2625 | Sp1NY: A Quick and Flexible Python Speech Visualization Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/lemaguer23_interspeech.pdf) | -| 2629 | Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/corkey23_interspeech.pdf) | -| 2634 | So-to-Speak: an Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS | [![GitHub](https://img.shields.io/github/stars/evaszekely/So_To_Speak?style=flat)](https://github.com/evaszekely/So_To_Speak) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/szekely23b_interspeech.pdf) | -| 2638 | Comparing /b/ and /d/ with a Single Physical Model of the Human Vocal Tract to Visualize Droplets Produced while Speaking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/arai23_interspeech.pdf) | -| 2640 | Show & Tell: Voice Activity Projection and Turn-taking | [![GitHub](https://img.shields.io/github/stars/ErikEkstedt/VoiceActivityProjection?style=flat)](https://github.com/ErikEkstedt/VoiceActivityProjection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/ekstedt23b_interspeech.pdf)| -| 2652 | Real-Time Detection of Soft Voice for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cordourier23_interspeech.pdf) | -| 2655 | Data Augmentation for Diverse Voice Conversion in Noisy Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tanna23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10684-b31b1b.svg)](https://arxiv.org/abs/2305.10684) | -| 2667 | Application for Real-Time Audio-Visual Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/gogate23_interspeech.pdf)| +| 2614 | DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement | [![GitHub](https://img.shields.io/github/stars/Rikorose/DeepFilterNet?style=flat)](https://github.com/Rikorose/DeepFilterNet) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/schroter23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.08227-b31b1b.svg)](https://arxiv.org/abs/2305.08227) | +| 2615 | Nkululeko: Machine Learning Experiments on Speaker Characteristics without Programming | [![GitHub](https://img.shields.io/github/stars/felixbur/nkululeko?style=flat)](https://github.com/felixbur/nkululeko) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/burkhardt23_interspeech.pdf) | +| 2625 | Sp1NY: A Quick and Flexible Python Speech Visualization Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/lemaguer23_interspeech.pdf) | +| 2629 | Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0 | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/corkey23_interspeech.pdf) | +| 2634 | So-to-Speak: an Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS | [![GitHub](https://img.shields.io/github/stars/evaszekely/So_To_Speak?style=flat)](https://github.com/evaszekely/So_To_Speak) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/szekely23b_interspeech.pdf) | +| 2638 | Comparing /b/ and /d/ with a Single Physical Model of the Human Vocal Tract to Visualize Droplets Produced while Speaking | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/arai23_interspeech.pdf) | +| 2640 | Show & Tell: Voice Activity Projection and Turn-taking | [![GitHub](https://img.shields.io/github/stars/ErikEkstedt/VoiceActivityProjection?style=flat)](https://github.com/ErikEkstedt/VoiceActivityProjection) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/ekstedt23b_interspeech.pdf)| +| 2652 | Real-Time Detection of Soft Voice for Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cordourier23_interspeech.pdf) | +| 2655 | Data Augmentation for Diverse Voice Conversion in Noisy Environments | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tanna23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.10684-b31b1b.svg)](https://arxiv.org/abs/2305.10684) | +| 2667 | Application for Real-Time Audio-Visual Speech Enhancement | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/gogate23_interspeech.pdf)|
@@ -2292,17 +2292,17 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2623 | A Unified Framework to Improve Learners' Skills of Perception and Production based on Speech Shadowing and Overlapping | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/minematsu23_interspeech.pdf) | -| 2633 | Speak & Improve: L2 English Speaking Practice Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nicholls23_interspeech.pdf) | -| 2641 | Measuring Prosody in Child Speech using SoapBox Fluency API | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nicolao23_interspeech.pdf) | -| 2650 | Teaching Non-native Sound Contrasts using Visual Biofeedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nissen23_interspeech.pdf) | -| 2654 | Large-Scale Automatic Audiobook Creation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/walsh23_interspeech.pdf) | -| 2658 | QVoice: Arabic Speech Pronunciation Learning Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elkheir23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.07445-b31b1b.svg)](https://arxiv.org/abs/2305.07445) | -| 2659 | Asking Questions: an Innovative Way to Interact with Oral History Archives | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/svec23_interspeech.pdf) | -| 2660 | DisfluencyFixer: A Tool to Enhance Language Learning through Speech to Speech Disfluency Correction | [![React](https://img.shields.io/badge/react-%2320232a.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB)](https://www.cfilt.iitb.ac.in/speech2text/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/bhat23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16957-b31b1b.svg)](https://arxiv.org/abs/2305.16957) | -| 2661 | Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/prakash23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.01338-b31b1b.svg)](https://arxiv.org/abs/2211.01338) | -| 2668 | MyVoice: Arabic Speech Resource Collaboration Platform | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/elshahawy23_interspeech.pdf)| -| 2669 | Personal Primer Prototype 1: Invitation to Make Your Own Embooked Speech-based Educational Artifact | [![GitHub](https://img.shields.io/github/stars/hromi/lesen-mikroserver?style=flat)](https://github.com/hromi/lesen-mikroserver) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/hromada23_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371491906_Personal_Primer_Prototype_1_Invitation_to_Make_Your_Own_Embooked_Speech-Based_Educational_Artifact) | +| 2623 | A Unified Framework to Improve Learners' Skills of Perception and Production based on Speech Shadowing and Overlapping | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/minematsu23_interspeech.pdf) | +| 2633 | Speak & Improve: L2 English Speaking Practice Tool | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nicholls23_interspeech.pdf) | +| 2641 | Measuring Prosody in Child Speech using SoapBox Fluency API | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nicolao23_interspeech.pdf) | +| 2650 | Teaching Non-native Sound Contrasts using Visual Biofeedback | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nissen23_interspeech.pdf) | +| 2654 | Large-Scale Automatic Audiobook Creation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/walsh23_interspeech.pdf) | +| 2658 | QVoice: Arabic Speech Pronunciation Learning Application | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elkheir23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.07445-b31b1b.svg)](https://arxiv.org/abs/2305.07445) | +| 2659 | Asking Questions: an Innovative Way to Interact with Oral History Archives | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/svec23_interspeech.pdf) | +| 2660 | DisfluencyFixer: A Tool to Enhance Language Learning through Speech to Speech Disfluency Correction | [![React](https://img.shields.io/badge/react-%2320232a.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB)](https://www.cfilt.iitb.ac.in/speech2text/) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/bhat23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2305.16957-b31b1b.svg)](https://arxiv.org/abs/2305.16957) | +| 2661 | Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/prakash23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2211.01338-b31b1b.svg)](https://arxiv.org/abs/2211.01338) | +| 2668 | MyVoice: Arabic Speech Resource Collaboration Platform | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/elshahawy23_interspeech.pdf)| +| 2669 | Personal Primer Prototype 1: Invitation to Make Your Own Embooked Speech-based Educational Artifact | [![GitHub](https://img.shields.io/github/stars/hromi/lesen-mikroserver?style=flat)](https://github.com/hromi/lesen-mikroserver) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/hromada23_interspeech.pdf)
[![ResearchGate](https://img.shields.io/badge/Research-Gate-D7E7F5.svg)](https://www.researchgate.net/publication/371491906_Personal_Primer_Prototype_1_Invitation_to_Make_Your_Own_Embooked_Speech-Based_Educational_Artifact) |
@@ -2314,18 +2314,18 @@ Contributions to improve the completeness of this list are greatly appreciated. | :id: | **Title** | **Repo** | **Paper** | |------|-----------|:--------:|:---------:| -| 2621 | Let's Give a Voice to Conversational Agents in Virtual Reality | [![GitHub](https://img.shields.io/github/stars/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR?style=flat)](https://github.com/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/yin23b_interspeech.pdf) | -| 2622 | FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator | :heavy_minus_sign: |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/baali23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07936-b31b1b.svg)](https://arxiv.org/abs/2306.07936) | -| 2637 | Video Summarization Leveraging Multimodal Information for Presentations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/liu23x_interspeech.pdf) | -| 2645 | What Questions are My Customers Asking?: Towards Actionable Insights from Customer Questions in Contact Center Calls | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/nathan23_interspeech.pdf) | -| 2646 | COnVoy: A Contact Center Operated Pipeline for Voice of Customer Discovery | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/tripathi23_interspeech.pdf) | -| 2653 | NeMo Forced Aligner and its Application to Word Alignment for Subtitle Generation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/rastorgueva23_interspeech.pdf) | -| 2662 | CauSE: Causal Search Engine for Understanding Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/pattnaik23_interspeech.pdf)| -| 2663 | Tailored Real-Time Call Summarization System for Contact Centers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/sachdeva23_interspeech.pdf) | -| 2647 | Federated Learning Toolkit with Voice-based User Verification Demo | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/mandke23_interspeech.pdf) | -| 2657 | Learning when to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models | [![GitHub](https://img.shields.io/github/stars/liamdugan/speech-to-speech?style=flat)](https://github.com/liamdugan/speech-to-speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/dugan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01201-b31b1b.svg)](https://arxiv.org/abs/2306.01201) | -| 2628 | Fast Enrollable Streaming Keyword Spotting System: Training and Inference using a Web Browser | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/cho23b_interspeech.pdf) | -| 2665 | Cross-Lingual/Cross-Channel Intent Detection in Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-speech.org/archive/pdfs/interspeech_2023/agrawal23b_interspeech.pdf) | +| 2621 | Let's Give a Voice to Conversational Agents in Virtual Reality | [![GitHub](https://img.shields.io/github/stars/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR?style=flat)](https://github.com/sislab-unitn/Let-s-Give-a-Voice-to-Conversational-Agents-in-VR) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/yin23b_interspeech.pdf) | +| 2622 | FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator | :heavy_minus_sign: |[![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/baali23b_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.07936-b31b1b.svg)](https://arxiv.org/abs/2306.07936) | +| 2637 | Video Summarization Leveraging Multimodal Information for Presentations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/liu23x_interspeech.pdf) | +| 2645 | What Questions are My Customers Asking?: Towards Actionable Insights from Customer Questions in Contact Center Calls | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/nathan23_interspeech.pdf) | +| 2646 | COnVoy: A Contact Center Operated Pipeline for Voice of Customer Discovery | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/tripathi23_interspeech.pdf) | +| 2653 | NeMo Forced Aligner and its Application to Word Alignment for Subtitle Generation | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/rastorgueva23_interspeech.pdf) | +| 2662 | CauSE: Causal Search Engine for Understanding Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/pattnaik23_interspeech.pdf)| +| 2663 | Tailored Real-Time Call Summarization System for Contact Centers | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/sachdeva23_interspeech.pdf) | +| 2647 | Federated Learning Toolkit with Voice-based User Verification Demo | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/mandke23_interspeech.pdf) | +| 2657 | Learning when to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models | [![GitHub](https://img.shields.io/github/stars/liamdugan/speech-to-speech?style=flat)](https://github.com/liamdugan/speech-to-speech) | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/dugan23_interspeech.pdf)
[![arXiv](https://img.shields.io/badge/arXiv-2306.01201-b31b1b.svg)](https://arxiv.org/abs/2306.01201) | +| 2628 | Fast Enrollable Streaming Keyword Spotting System: Training and Inference using a Web Browser | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/cho23b_interspeech.pdf) | +| 2665 | Cross-Lingual/Cross-Channel Intent Detection in Contact-Center Conversations | :heavy_minus_sign: | [![ISCA](https://img.shields.io/badge/isca-version-355778.svg)](https://www.isca-archive.org/interspeech_2023/agrawal23b_interspeech.pdf) | ---