
Research Article
Deepfake Face Recognition using Pretrained Vision Transformers and LSTMs
@INPROCEEDINGS{10.4108/eai.28-4-2025.2357797, author={M. Sowmya and D. Bandhavi and P. Vinaya Padma Sri Harshitha and S. Bhargav Rama Raju and M. Kapil Raj and Arul Elango}, title={Deepfake Face Recognition using Pretrained Vision Transformers and LSTMs}, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I}, publisher={EAI}, proceedings_a={ICITSM PART I}, year={2025}, month={10}, keywords={deepfake detection vision transformers (vit) long short-term memory (lstm) cybersecurity identity fraud misinformation}, doi={10.4108/eai.28-4-2025.2357797} }
- M. Sowmya
D. Bandhavi
P. Vinaya Padma Sri Harshitha
S. Bhargav Rama Raju
M. Kapil Raj
Arul Elango
Year: 2025
Deepfake Face Recognition using Pretrained Vision Transformers and LSTMs
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357797
Abstract
The advancement in deepfake technology has significant issues with the authenticity and reliability of digital media empower in the production of hyper-realistic artificial facial images and videos by deep learning models. Deepfake technologies have extensive uses in entertainment and virtual reality but pose threats to misinformation, identity theft and cybersecurity. The paper is an in-depth analysis on the deepfake face detection methods leveraging state-of-the-art deep learning techniques. Particularly, we use Vision Transformers (ViT) and Long Short-Term Memory (LSTM) for both image and video-based detection, leveraging self-attention mechanisms to learn spatial and temporal dependencies. Our method features state-of-the-art feature extraction techniques, such as frequency domain analysis and attention-based representations, to enhance detection accuracy. We tested these models on benchmark datasets, assessing their adversarial robustness and generalizability across various deepfake generation methods. The system illustrates the capability to accurately predict and detect manipulated facial videos and images and offer real-time classification results via an interactive interface.