Audio-Visual Speech Recognition using Sequence to Sequence Models
-
Updated
Jul 10, 2020 - Python
Audio-Visual Speech Recognition using Sequence to Sequence Models
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23
The official implementation of OpenSR (ACL2023 Oral)
Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024
Kaldi-based audio-visual speech recognition
This repository contains the development of SynthAVSR, the first Audiovisual Speech Recognition (AVSR) system tailored for the Spanish and Catalan languages. Based on the AV-HuBERT (Audio-Visual Hidden Unit BERT) model, SynthAVSR leverages synthetic audiovisual data to bridge the gap in speech recognition technology for these languages.
Add a description, image, and links to the avsr topic page so that developers can more easily learn about it.
To associate your repository with the avsr topic, visit your repo's landing page and select "manage topics."