A Review on Deep Learning-Based Automatic Lipreading

Carlos Santos; António Cunha; Paulo Coelho

Wireless Mobile Communication and Healthcare. 11th EAI International Conference, MobiHealth 2022, Virtual Event, November 30 – December 2, 2022, Proceedings

Research Article

A Review on Deep Learning-Based Automatic Lipreading

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-32029-3_17,
    author={Carlos Santos and Ant\^{o}nio Cunha and Paulo Coelho},
    title={A Review on Deep Learning-Based Automatic Lipreading},
    proceedings={Wireless Mobile Communication and Healthcare. 11th EAI International Conference, MobiHealth 2022, Virtual Event, November 30 -- December 2, 2022, Proceedings},
    proceedings_a={MOBIHEALTH},
    year={2023},
    month={5},
    keywords={Automatic Lip-reading Deep Learning Audio-visual Automatic Speech Recognition},
    doi={10.1007/978-3-031-32029-3_17}
}

Carlos Santos
António Cunha
Paulo Coelho
Year: 2023
A Review on Deep Learning-Based Automatic Lipreading
MOBIHEALTH
Springer
DOI: 10.1007/978-3-031-32029-3_17

Carlos Santos¹, António Cunha², Paulo Coelho¹^,*

1: School of Technology and Management, Polytechnic of Leiria
2: Escola de Ciências e Tecnologias, University of Trás-os-Montes e Alto Douro, Quinta de Prados

*Contact email: paulo.coelho@ipleiria.pt

Abstract

Automatic Lip-Reading (ALR), also known as Visual Speech Recognition (VSR), is the technological process to extract and recognize speech content, based solely on the visual recognition of the speaker’s lip movements. Besides hearing-impaired people, regular hearing people also resort to visual cues for word disambiguation, every time one is in a noisy environment. Due to the increasingly interest in developing ALR systems, a considerable number of research articles are being published. This article selects, analyses, and summarizes the main papers from 2018 to early 2022, from traditional methods with handcrafted feature extraction algorithms to end-to-end deep learning based ALR which fully take advantage of learning the best features, and of the evergrowing publicly available databases. By providing a recent state-of-the-art overview, identifying trends, and presenting a conclusion on what is to be expected in future work, this article becomes an efficient way to update on the most relevant ALR techniques.

Keywords: Automatic Lip-reading, Deep Learning, Audio-visual Automatic Speech Recognition

Published: 2023-05-14
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-32029-3_17

A Review on Deep Learning-Based Automatic Lipreading

Abstract

About EAI

Community

Publish with EAI