Skip to content

Latest commit

 

History

History
74 lines (69 loc) · 13.7 KB

speech-recognition-technologies-and-systems-for-new-applications.md

File metadata and controls

74 lines (69 loc) · 13.7 KB

INTERSPEECH-2023-Papers

Application App
New collections Conference

Speech Recognition: Technologies and Systems for New Applications

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2044 Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model GitHub ISCA
arXiv
2032 Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization GitHub ISCA
arXiv
235 Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys ISCA
268 Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling ISCA
arXiv
601 CASA-ASR: Context-Aware Speaker-Attributed ASR ISCA
arXiv
1321 Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams ISCA
1167 AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark GitHub ISCA
190 Distilling Knowledge from Gaussian Process Teacher to Neural Network Student ISCA
135 Segmental SpeechCLIP: Utilizing Pretrained Image-Text Models for Audio-Visual Learning ISCA
421 Towards Hate Speech Detection in Low-Resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili ISCA
arXiv
385 Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification through Meta-Learning GitHub ISCA
arXiv
664 Online Punctuation Restoration using ELECTRA Model for Streaming ASR Systems ISCA
2066 Language Agnostic Data-Driven Inverse Text Normalization ISCA
arXiv
1079 How to Estimate Model Transferability of Pre-trained Speech Models? ISCA
arXiv
1655 Transcribing Speech as Spoken and Written Dual Text using an Autoregressive Model ISCA
587 Phonetic and Prosody-aware Self-Supervised Learning Approach for Non-Native Fluency Scoring ISCA
arXiv
380 Disentangling the Contribution of Non-Native Speech in Automated Pronunciation Assessment ISCA
337 A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning ISCA
1635 Assessing Intelligibility in Non-Native Speech: Comparing Measures Obtained at Different Levels ISCA
585 End-to-End Word-Level Pronunciation Assessment with MASK Pre-training GitHub ISCA
arXiv
550 A Hierarchical Context-aware Modeling Approach for Multi-Aspect and Multi-Granular Pronunciation Assessment ISCA
arXiv
2541 Automatic Prediction of Language Learners' Listenability using Speech and Text Features Extracted from Listening Drills ISCA
2371 Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-Level Goodness of Pronunciation Transformer ISCA
1899 Adapting an Unadaptable ASR System ISCA
arXiv
533 Addressing Cold Start Problem for End-to-End Automatic Speech Scoring ISCA
arXiv
816 Improving Grapheme-to-Phoneme Conversion by Learning Pronunciations from Speech Recordings ISCA
Amazon Science
2577 Orthography-based Pronunciation Scoring for Better CAPT Feedback ISCA
Pdf
1592 Zero-Shot Automatic Pronunciation Assessment ISCA
arXiv
364 Mispronunciation Detection and Diagnosis Model for Tonal Language, Applied to Vietnamese GitHub ISCA
793 An Efficient and Noise-Robust Audiovisual Encoder for Audiovisual Speech Recognition ISCA
540 A Novel Self-training Approach for Low-Resource Speech Recognition ISCA
1428 FunASR: A Fundamental End-to-End Speech Recognition Toolkit GitHub ISCA
arXiv
487 Streaming Audio-Visual Speech Recognition with Alignment Regularization ISCA
arXiv
462 SparseVSR: Lightweight and Noise Robust Visual Speech Recognition ISCA
arXiv
2262 Multimodal Speech Recognition for Language-Guided Embodied Agents GitHub ISCA
arXiv