Facial emotion recognition via stationary wavelet entropy and Biogeography-based optimization

INTRODUCTION: As one of the important research directions in the field of computer vision, facial emotion recognition plays an important role in people's daily life. How to make the computer accurately read facial emotion is an important research content. OBJECTIVES: In the current research on facial emotion recognition, there are some problems such as poor generalization ability of network model and low robustness of recognition system. To solve above problems, we propose a novel facial emotion recognition method. METHODS: Our method of feature extraction using the stationary wavelet entropy, which combines single hidden layer feedforward neural network with biogeography-based optimization for facial emotion recognition. RESULTS: The simulation results show that the overall accuracy of our method is 93.79±1.24%. CONCLUSION: This model is superior to the current mainstream facial emotion recognition models in the performance of facial emotion detection. In future research, we will try deep learning and other training methods.


Introduction
With the rapid development of computer technology and neural network technology, there is a higher and higher demand in the degree of automation for current people. We yearn for communication between people and computers to be the same as that between people. In the process of With the development of convolutional neural network and its own characteristics, many scholars tend to use convolutional neural network to extract image features. Ali, et al. [2] proposed to use support vector machine (SVM) method. Evans [3] used Haar wavelet transform (HWT) method. Ivanovsky, et al. [4] proposed to use convolutional neural network on GPU for feature extraction in facial smile emotion recognition. Hasani and Mahoor [5] proposed a converged network architecture that uses three sets of Inception-Resnet modules, and combines conditional random fields (CRF) to greatly improve the accuracy of facial emotion recognition. Yang [6] introduced cat swarm optimization (CSO) and achieved good results in facial emotion recognition. Lucy, et al. [7] proposed a double-channel convolutional neural network, the first channel inputs extracted eye features, the second channel inputs extracted mouth features. After the fusion of the above two features, input the full connection layer, to improve the accuracy of facial emotion recognition.
Through the analysis of the above, we can find that the facial emotion features extracted by the above methods are not stable in translation operation and are easy to lose the original emotional information. Moreover, network models all have poor generalization and weak robustness. When the performance environment of the network model changes slightly, the network model may get stuck in the process of running, which directly affects the speed and accuracy of facial emotion recognition.
To solve the above problems, we proposed an improved facial emotion recognition model. The stationary wavelet entropy (SWE) is introduced for feature extraction, takes single hidden layer feedforward neural network (SHLFNN) as the classifier, and uses biogeography-based optimization (BBO) as the training method of our model. Through the following simulation experiments, it can be proved that the recognition performance of the proposed facial emotion recognition method is better than that state-of-the-art approaches.
The structure of the rest is organized in the following way. The second part introduced the subjects and data sets.
The third part introduced the concept and basic principle of method. The fourth part introduced and analyzed the relevant experimental results. The fifth part summarized and anticipated the paper.

Dataset
In order to make the experiment easier to achieve and the experimental results more comparative, the data set adopted in this paper is from [8] data set. The data set in [8] is collected by an experienced photographer who used Facial emotion recognition via stationary wavelet entropy and Biogeography-based optimization 3 Table 1 and Figure 1Table 2. In the final, we have 700 images in total.

Table 1. Dataset of face models
Define distance between two centers of both eyes is D, then we perform the following modification.

Stationary Wavelet Entropy
The concept of wavelet entropy (WE) is put forward by Rosso, et al. [10] when analyzing the electrical signals of the human brain. Among them, wavelet refers to the wave formed by the function in a distance. Entropy is a physical quantity used to describe the complexity or chaos of a system. The greater the entropy value, the greater the complexity or chaos. Entropy is often used as a reference in evaluating time-domain physiological signals and diagnosing diseases [11][12][13]. Traditional WE is obtained based on wavelet transform (WT). Since WT is sensitive to the change of signal details, it can extract the transient local detail features of non-static signal. Thus WE is often used for physiological signal analysis [14][15][16], such as extracting the texture features in facial emotional images.
The calculation method of WE is: where, represents the index of state characteristics of a Spatial normalization of images is the premise for WE to have the above advantages in image feature extraction. Otherwise, when the image is slightly changed, the result of image recognition will be directly affected. In recent years, scholars have proposed SWE in response to this problem, which uses the stationary wavelet transform (SWT) instead of the WT. According to the experiments in [17][18][19][20], SWE is better than WE in hearing loss detection, gene expression and facial emotion recognition, etc. In this study, the decomposition level is set to 4. Some other advanced wavelet descriptors [21][22][23][24][25] will be used in our future studies.

Single-hidden-layer Feedforward Neural Network
According to the definition of neural network in Wikipedia, the full name of neural network is artificial neural network (ANN), which is a computational model that simulates the structure and function of the neural system of biological brain and is used to estimate or approximate the function.
Neural network belongs to the field of machine learning and cognitive science. It has many neurons that can change their internal structure based on external information and acquiring the ability to learn. SHLFNN is one of the structures of artificial neural networks.
The structure of SHLFNN consists of three parts: the input layer, the hidden layer and the output layer. The connection mode between each layer of neurons is a full connection. Figure 2 is the diagram of SHLFNN network structure. According to the universal approximation theorem [26], SHLFNN has a strong approximation ability and can arbitrarily approximate the corresponding expected output. Since its simplicity and practicality, difficult nonlinear mapping can be accomplished with less cost [27][28][29][30]. Therefore, SHLFNN is one of the most important network models in the field of feedforward neural network and has been widely used. The output of SHLFNN can be expressed as [31]:   And better SIV could produce higher HSI. The basic idea of BBO is as follows: for a problem to be solved, some candidate schemes (i.e. biological habitats) are proposed.
The score of the candidate schemes is quantified through the analysis of the adaptability of the candidate schemes, and the order is made according to the score size.
According to the ranking of scores, the optimal solution of the problem is selected. Among them, the first candidate scheme is regarded as the optimal solution of the problem to be solved. At the same time, to better preserve the optimal solution, the algorithm usually introduces elitist preservation strategy.

Migration operation
The optimization effect of BBO algorithm mainly depends on migration operator, which is used to realize information sharing and interaction between different habitats. The HSI value corresponds to the number of species in the habitat.
The higher the HSI value is, the higher the number of species in the habitat is, the higher the rate of emigration rate is and the lower the rate of immigration is. Conversely, the lower the HSI value is, the lower the number of species in the corresponding habitat is, the lower the rate of emigration is and the higher the rate of immigration is. Therefore, we can get the following formula.

Mutation operation
As mentioned above, BBO algorithm choose the habitat with low HSI value perform mutation operation by changing a certain SIV. And introduce elitist preservation strategy to preserve the best solution based on the elite parameters defined by us. Therefore, the mutation probability of a habitat mainly depends on its species number probability , which means the species number of a habitat is closely related to the mutation probability .
From this we can obtain the following formula: where, represents the probability of mutation operation with the number of species . The detailed description of the BBO algorithm process is shown in Table 2. The overall flow diagram of BBO algorithm is shown in Figure 5.   In the execution process, BBO generates the corresponding output (h) of habitats. We sort from large to small, which means where represents the ranking number of the value of HSI.
represents the fitness of the habitats (candidate solutions).

Measure
In the experimental process, in order to avoid overfitting, we chose the 10-fold cross validation technique [47]. Each group contained 10 images of each seven emotions: happy, sadness, fear, anger, surprise, disgust and neutral. In the performance of 10-fold cross validation, eight of these groups were used for training, one for validation, and the remaining one for testing. For a more concise representation, we introduced the confusion matrix (CM).
Therefore, the ideal should be as follows: (10) where, is the confusion matrix, is the number of runs, and is the number of folds. The above matrix is the representation of 1 group of ideal confusion matrices in 1 iteration. Thus, the ideal is: (11) where, the elements on the diagonal of are the structure of matrix summation for the test sets of 10 experiment groups. In general, to improve the accuracy of the experiment and reduce the random error, we performed 10 runs of 10-fold cross validation and summarized CM.
Thus, the ideal can be obtained as: (12) For the sensitivity and overall accuracy (th) of the network after the performance of , we can obtain the following formula to define:  Table 3 is the confusion matrix presented by our model, which presents the experimental data of seven emotions:

Confusion matrix of proposed method
anger, disgust, fear, happy, neutral, sadness and surprise. Figure 7 is the bar chart corresponding to Table 3. It can be seen from Table 3    Facial emotion recognition via stationary wavelet entropy and Biogeography-based optimization 9 EAI Endorsed Transactions on e-Learning 10 2018 -07 2020 | Volume 6 | Issue 19 | e4 Figure 7. The corresponding bar chart in Table 3 4 Table 4 shows the sensitivity analysis of the seven emotions running 10 times. According to the data from Table 4 and Figure 8, the sensitivity of each emotion is as follows: 94.90±1.52% (anger), 93.70±2.00% (disgust), 94.80±0.92% (fear), 92.90±2.13% (happy), 92.40±2.22%

.2. Statistical results
(neutral), 94.10±1.52% (sadness), and 93.70±2.36% (surprise). From this we can get: the emotion of anger is the most sensitive and easy to recognize. Followed by the emotion of fear, and the third is the emotion of sadness.
According to Table 5 and Figure 8, the overall average accuracy of the system after 10 runs is 93.79±1.24%.    A summarized comparison is shown in Table 7 and Figure 10. As can be seen from the Table 7, the overall accuracy of the model increases with the increase of decomposition level in the experimental range of decomposition level, which indicates that the higher the decomposition level is, the better the effect of emotion recognition is. At the same time, we can also see from  Table 3 and   Table 4. In addition, Figure 10 shows this result more vividly.

Comparison to State-of-the-art Approaches
The OA of the "SWE+BBO" method used in this experiment is compared with that of the other three methods, which are SVM [2], HWT [3] and CSO [6]. The results are shown in Table 8. OA of SVM is 3 3 t . OA of HWT is 3 t . OA of CSO is t t t . We can clearly see that the method of "SWE+BBO" has the highest accuracy ( t3 t ).
Followed by CSO. And the third is SVM, while the lowest is HWT.
As can be seen from Table 3 and Table 4  We should note that there are currently several variants of BBO algorithms, such as parallel hybrid BBO, alternated chaotic BBO and adaptive option-based BBO.
In the future research, we will test their performances. And we will also attempt to use deep learning approaches [49][50][51][52][53] to realize facial emotion recognition.

Conclusions
In this paper, we proposed an improved facial emotion recognition system. We use static wavelet entropy for feature extraction and BBO algorithm to train single hidden layer feedforward neural network. The facial emotion recognition model has achieved a good recognition effect. However, there are still some aspects that can be explored in our method, such as the influence of different order of wavelet entropy on feature extraction.
In the future research, we will continue to focus on the research of facial emotion recognition and try to collect more emotional images than in this paper. At the same time, we plan to put forward a better algorithm to optimize the hyperparameters of single hidden layer feedforward neural network, such as weights and biases. And we will also try such optimization algorithms based on evolution and swarm-intelligence as gray wolf optimization and ant colony optimization to improve the performance of single hidden layer feedforward neural network.