2044 |
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model |
2032 |
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization |
235 |
Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys |
➖ |
268 |
Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling |
➖ |
601 |
CASA-ASR: Context-Aware Speaker-Attributed ASR |
➖ |
1321 |
Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams |
➖ |
1167 |
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark |
190 |
Distilling Knowledge from Gaussian Process Teacher to Neural Network Student |
➖ |
135 |
Segmental SpeechCLIP: Utilizing Pretrained Image-Text Models for Audio-Visual Learning |
➖ |
421 |
Towards Hate Speech Detection in Low-Resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili |
➖ |
385 |
Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification through Meta-Learning |
664 |
Online Punctuation Restoration using ELECTRA Model for Streaming ASR Systems |
➖ |
2066 |
Language Agnostic Data-Driven Inverse Text Normalization |
➖ |
1079 |
How to Estimate Model Transferability of Pre-trained Speech Models? |
➖ |
1655 |
Transcribing Speech as Spoken and Written Dual Text using an Autoregressive Model |
➖ |
587 |
Phonetic and Prosody-aware Self-Supervised Learning Approach for Non-Native Fluency Scoring |
➖ |
380 |
Disentangling the Contribution of Non-Native Speech in Automated Pronunciation Assessment |
➖ |
337 |
A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning |
➖ |
1635 |
Assessing Intelligibility in Non-Native Speech: Comparing Measures Obtained at Different Levels |
➖ |
585 |
End-to-End Word-Level Pronunciation Assessment with MASK Pre-training |
550 |
A Hierarchical Context-aware Modeling Approach for Multi-Aspect and Multi-Granular Pronunciation Assessment |
➖ |
2541 |
Automatic Prediction of Language Learners' Listenability using Speech and Text Features Extracted from Listening Drills |
➖ |
2371 |
Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-Level Goodness of Pronunciation Transformer |
➖ |
1899 |
Adapting an Unadaptable ASR System |
➖ |
533 |
Addressing Cold Start Problem for End-to-End Automatic Speech Scoring |
➖ |
816 |
Improving Grapheme-to-Phoneme Conversion by Learning Pronunciations from Speech Recordings |
➖ |
2577 |
Orthography-based Pronunciation Scoring for Better CAPT Feedback |
➖ |
1592 |
Zero-Shot Automatic Pronunciation Assessment |
➖ |
364 |
Mispronunciation Detection and Diagnosis Model for Tonal Language, Applied to Vietnamese |
793 |
An Efficient and Noise-Robust Audiovisual Encoder for Audiovisual Speech Recognition |
➖ |
540 |
A Novel Self-training Approach for Low-Resource Speech Recognition |
➖ |
1428 |
FunASR: A Fundamental End-to-End Speech Recognition Toolkit |
487 |
Streaming Audio-Visual Speech Recognition with Alignment Regularization |
➖ |
462 |
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition |
➖ |
2262 |
Multimodal Speech Recognition for Language-Guided Embodied Agents |