phat 21(27): e2

Research Article

Fusion of Attentional and Traditional Convolutional Networks for Facial Expression Recognition

Download343 downloads
  • @ARTICLE{10.4108/eai.17-3-2021.169033,
        author={Tin Trung Nguyen and Thai Hoang Le},
        title={Fusion of Attentional and Traditional Convolutional Networks for Facial Expression Recognition},
        journal={EAI Endorsed Transactions on Pervasive Health and Technology},
        volume={7},
        number={27},
        publisher={EAI},
        journal_a={PHAT},
        year={2021},
        month={3},
        keywords={Facial Expression Recognition, Convolutional Network, Ensemble Learning, Attentional Convolutional Network},
        doi={10.4108/eai.17-3-2021.169033}
    }
    
  • Tin Trung Nguyen
    Thai Hoang Le
    Year: 2021
    Fusion of Attentional and Traditional Convolutional Networks for Facial Expression Recognition
    PHAT
    EAI
    DOI: 10.4108/eai.17-3-2021.169033
Tin Trung Nguyen1,2, Thai Hoang Le1,2,*
  • 1: Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam
  • 2: Vietnam National University, Ho Chi Minh City, Vietnam
*Contact email: lhthai@fit.hcmus.edu.vn

Abstract

INTRODUCTION: The facial expression classification problem has been performed by many researchers. However, it is still a difficult problem to effectively classify facial expressions in highly challenging datasets. In recent years, the self-weighted Squeeze-and-Excitation block (SE-block) technique has evaluated the importance of each feature map in the Convolutional Neural Networks (CNNs) model, corresponding to the output of the Convolution layer, that has shown high efficiency in many practical applications.

OBJECTIVES: In this paper, with the aim of balancing speed and accuracy for the problem of facial expression classification, we proposed two novel model architectures to solve these problems.

METHODS: Two models proposed in this paper is: (1) a SqueezeNet model combined with a Squeeze-and- Excitation block, (2) SqueezeNet with Complex Bypass combined with a Squeeze-and-Excitation block. These models will have experimented with complex facial expression datasets. Furthermore, the ensemble learning method has also been evidenced to be effective in combining models. Therefore, in order to improve the efficiency of facial expression classification, and aim to compare with the state-of-the-art methods, we use more of the Inception-Resnet V1 model (3). Next, we combine three models (1),(2), and (3) for the classification of facial expressions.

RESULTS: The proposed model gives out high accuracy for datasets: namely, with The Extended Cohn-Kanade (CK+) dataset, there are seven basic types of emotions, reaching 99.10 % (using the last 3 frames), 94.20% for the Oulu-CASIA dataset (from 7th frame) with six basic types of emotions, 74.89% for FER2013.

CONCLUSION: Experimental results on highly challenging data sets (The Extended Cohn-Kanade, FER2013, Oulu-CASIA) show the effectiveness of the technique of combining three models and two proposed models.