
Research Article
Enhancing Video Based Emotion Recognition with Multi-Head Attention and Modality Dropout
@INPROCEEDINGS{10.4108/eai.21-11-2024.2354608, author={Xu Li}, title={Enhancing Video Based Emotion Recognition with Multi-Head Attention and Modality Dropout}, proceedings={Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey}, publisher={EAI}, proceedings_a={CONF-MLA}, year={2025}, month={3}, keywords={multimodal model emotion recognition modality dropout}, doi={10.4108/eai.21-11-2024.2354608} }
- Xu Li
Year: 2025
Enhancing Video Based Emotion Recognition with Multi-Head Attention and Modality Dropout
CONF-MLA
EAI
DOI: 10.4108/eai.21-11-2024.2354608
Abstract
Multimodal emotion recognition has become a critical component in enhancing human-computer interaction systems due to its capacity to integrate multiple modalities. In this paper, a novel cross-modal fusion model CFNSR-MSAFNet was proposed with Multi-Head Attention mechanism and modality drop out to improve the accuracy of emotion recognition. The Multi-Head Attention mechanism allows the model to learn and observe multiple aspects from both audio and video input, capturing complex interactions between these two modalities. Additionally, modality dropout is introduced during training, forcing the model to learn representations to handle the missing or noisy data. The proposed model achieved 78.33% of accuracy on the RAVDESS dataset. Our results demonstrate the effectiveness of MHA and modality dropout in improving the performance of multimodal emotion recognition systems by enhancing cross-modal alignment and generalization.