
Research Article
A Review on Deep Learning-Based Automatic Lipreading
@INPROCEEDINGS{10.1007/978-3-031-32029-3_17, author={Carlos Santos and Ant\^{o}nio Cunha and Paulo Coelho}, title={A Review on Deep Learning-Based Automatic Lipreading}, proceedings={Wireless Mobile Communication and Healthcare. 11th EAI International Conference, MobiHealth 2022, Virtual Event, November 30 -- December 2, 2022, Proceedings}, proceedings_a={MOBIHEALTH}, year={2023}, month={5}, keywords={Automatic Lip-reading Deep Learning Audio-visual Automatic Speech Recognition}, doi={10.1007/978-3-031-32029-3_17} }
- Carlos Santos
António Cunha
Paulo Coelho
Year: 2023
A Review on Deep Learning-Based Automatic Lipreading
MOBIHEALTH
Springer
DOI: 10.1007/978-3-031-32029-3_17
Abstract
Automatic Lip-Reading (ALR), also known as Visual Speech Recognition (VSR), is the technological process to extract and recognize speech content, based solely on the visual recognition of the speaker’s lip movements. Besides hearing-impaired people, regular hearing people also resort to visual cues for word disambiguation, every time one is in a noisy environment. Due to the increasingly interest in developing ALR systems, a considerable number of research articles are being published. This article selects, analyses, and summarizes the main papers from 2018 to early 2022, from traditional methods with handcrafted feature extraction algorithms to end-to-end deep learning based ALR which fully take advantage of learning the best features, and of the evergrowing publicly available databases. By providing a recent state-of-the-art overview, identifying trends, and presenting a conclusion on what is to be expected in future work, this article becomes an efficient way to update on the most relevant ALR techniques.