Multiscale fuzzy entropy based on local mean decomposition and Fisher rule for EEG feature extraction in human motion analysis

Electroencephalogram (EEG) is a nonlinear, non-stationary, and random weak signal generated by a large number of neurons. It has great research value and practical significance in artificial intelligence, biomedical engineering and other fields. EEG feature extraction is an important step which directly affects the processing results. Currently, the commonly used methods for EEG feature extraction include frequency domain or time domain analysis and time-frequency combination. Due to the nonlinearity of EEG, the above methods have certain limitations. Therefore, this paper proposes a multiscale fuzzy entropy based on local mean decomposition and Fisher rule for EEG feature extraction in human motion analysis. Firstly, the EEG signal is decomposed adaptively into a series of product function (PF) components. Then the effective PF component is selected and the multiscale fuzzy entropy is calculated. Multi-scale fuzzy entropy is used for feature extraction. Fisher rule is used to rank the feature classification ability of fuzzy entropy at different scales, and the multi-scale fuzzy entropy with the highest ranking is selected to form the optimal feature vector to achieve feature dimension reduction. Experimental results show that this proposed method can extract the features of EEG signal effectively, which verifies the validity and feasibility of the new method.


Introduction
Electroencephalogram (EEG) reflects the functional state of the brain and the electrical activity of brain tissues [1][2][3].
When performing motor tasks or motor imagination,  rhythm (8-12Hz) and  rhythm (18-25Hz) responsible for motor perception in the brain will change. An event-related desynchronization (ERD) event or an event-related synchronization (ERS) event will occur. Motor imagination EEG signals are widely used to control brain computer interface (BCI) [4], so feature extraction is the key of brain computer interface technology.
Common algorithms for feature extraction of EEG signal include cospatial mode filtering, autoregressive model and wavelet transform. The common spatial pattern (CSP) algorithm is to filter the EEG signals in spatial domain, so as to extract the EEG characteristics under different motion modes [5,6]. CSP algorithm has achieved good results in binary EEG signals, but it needs to be targeted at a specific frequency band and a large number of electrodes. Adaptive regressive (AR) modeling [7] is used to reflect the timevarying characteristics of EEG signals by using AR models or AR spectrum features. The method is suitable for stationary signal analysis, and EEG signal is a typical nonstationary nonlinear signal. Wavelet Transform (WT) method [8,9] uses variable time-frequency windows to decompose signals step by step, and then selects specific wavelet coefficients as features according to prior Huili He 2 information. However, for EEG signals with complex mechanism, accurate prior information usually cannot be obtained. Local Mean Decomposition (LMD) algorithm [10,11] is an adaptive decomposition of EEG signals to obtain several amplitude-modulated Product Function (PF) components, which can reflect the time-frequency changes of signals [12]. Commonly used entropy analysis includes approximate entropy (AE), sample entropy (SE), fuzzy entropy (FE) and multi-scale fuzzy entropy (MFE). The fuzzy entropy [13,14] feature has clear physical significance and can measure the probability of generating new patterns. It replaces the binarization function in sample entropy with continuous exponential function as similarity measure and overcomes the mutation problem in entropy calculation. In view of this, Torrents-Barrena et al. [15] used fuzzy entropy to diagnose Alzheimer's disease and achieved good accuracy. In reference [16], normalized brainwave power gain and fuzzy entropy were used as features to construct a prediction model for drunk driving accidents, and the experimental results showed that the estimated value of the model was consistent with the real value. Therefore, fuzzy can be used to analyze non-stationary and non-stationary EEG signals. According to different motion picture mode nonlinear nonstationary EEG feature extraction problem, this research combines LMD and multi-scale fuzzy entropy (MSE) for imagine EEG feature extraction, the original EEG signals are decomposed, the calculation of the effective product function (PF) component of multi-scale entropy is as a feature vector, then it used support vector machine (SVM) for classification and recognition.

Proposed EEG feature extraction method
The local mean decomposition (LMD) method was proposed in 2015 [17]. It was first applied in feature extraction of EEG signals, and then it was widely used in other fields, such as mechanical fault diagnosis. In signal feature extraction, the time-domain statistics of PF component of LMD are often extracted as features, which are easily affected by the noise in motor EEG signal and can not extract effective feature vectors.

Local mean decomposition(LMD)
Local mean decomposition is an adaptive time-frequency analysis method for nonstationary nonlinear signals. The essence of LMD is to adaptively decompose EEG signals into multiple product function (PF) components and a residual quantity R. Each PF component is the product of a pure FM signal and an envelope signal. The instantaneous frequency of the PF component can be calculated from the FM signal, and the instantaneous amplitude of the PF component can be obtained from the envelope signal. For the EEG signal ) (t x , its specific decomposition process is as follows: ) ( 1 t PF is separated from the original signal, and the new signal ) ( 1 t u is used as the source signal to repeat the above steps for q times until ) (t u q is a monotone function and the iteration equation is After the decomposition of EEG signal by LMD, the characteristic information of the original signal will be distributed to different time characteristic scales, thus amplifying the hidden characteristic information of the EAI Endorsed Transactions Scalable Information Systems 01 2022 -03 2022 | Volume 9 | Issue 35 | e12 original signal. Then multi-scale fuzzy entropy can be used to quantitatively evaluate the rationality of PF components more easily.

Moving mean filtering coarsening
The traditional coarse-granulating process of time series coarse-granulates the original series with length N at various scales [21]. The coarse-granulating sequence is: According to the formula (1), the sequence length after traditional coarse-granulating becomes 1/k of the original sequence length. When k value is large, the data length cannot meet the requirements of fuzzy entropy calculation, and the coarse-granulating of different scale factors leads to inconsistent data shrinkage. The simplified average processing of equation (1) is also easy to cause information loss. On the other hand, when the sequence length is not a multiple of k, some information will be ignored, which will affect the accuracy and stability of the multi-scale entropy algorithm. In view of the above shortcomings, this paper adopts the moving mean filtering coarse-granulating algorithm, taking scale 3 as an example, and its calculation process is shown in figure 1.

Figure 1. Coarse-granulating process of moving average
The sequence length after coarse-granulating by the moving mean filter is N-k+1, and the time series after coarse-granulating is expressed as: The moving average method reduces the dependence on the length of original time series, avoids data loss and improves the accuracy of feature extraction.

Fuzzy entropy
The similarity measure formula of fuzzy entropy uses exponential function as fuzzy function and the continuity of exponential function makes fuzzy entropy smooth [22,23]. In addition, fuzzy entropy introduces the concept of fuzzy set to measure the similarity of two vectors. Fuzzy entropy is defined as follows: 1) Suppose there is a time series Obviously, it can be seen from equation (7) and coarsegranulating process that the preset parameters required for multi-scale fuzzy entropy calculation include embedding dimension m, similarity tolerance r and scale factor k. According to abundant experiments, m=2, r=0.2SD(SD is the standard deviation of the original sequence) and k=20 are selected in this paper.

Feature selection based on Fisher score
After feature extraction by multi-scale fuzzy entropy algorithm, multiple feature vectors are obtained. However, not all feature vectors are closely related to task classification. Features also contain too much redundant information, even noise information, which affects the accuracy of classification. In order to reduce the feature dimension and improve the classification accuracy, Fisher score is used to screen the features. The basic idea of Fisher score [24,25] is to calculate the inter-class variance and intra-class variance ratio of features according to Fisher criterion. Fisher scores for the two types of samples are defined as follows: There exists a sample set } , , ,  represents the in-class variance of the n-th feature on the training sample set, and describes the distance between similar samples. A larger Fisher score, namely, the intraclass distance is small and the inter-class distance is large, which indicates that this feature has a relatively large degree of differentiation between categories and a good classification ability. As shown in reference [26], seven features with the highest score were selected to distinguish Alzheimer's disease from normal people. In reference [27], the optimal features were selected by adding features in sequence and combining them with support vector machine according to the score, and the optimal feature subset was found in all feature vectors. Although this method could obtain the global optimal feature subset, Fisher score indicated that the feature of feature classification ability was not fully utilized. Therefore, this paper selects the five features with the highest score of each channel and combines them with support vector machine to select the optimal feature vector.

Combining LMD and MFE
Different motor imagery patterns trigger electrical activity in different areas of the cerebral cortex, and distributed electrodes in the EEG acquisition system record changes in the electrical signals in the brain. When imagining unilateral limb movement, the amplitude of spectrum oscillation of rhythm  and rhythm  in contralateral motor sensory regions of the brain is reduced or blocked as event-related synchronous events. The electrical activity events of  rhythm and  rhythm increase the amplitude of the spectrum de-correlated and de-synchronized events. The  rhythm is concentrated in 8-12Hz.  The rhythm is concentrated in 18-25 Hz. Generally, EEG signals with 8~30Hz are used for feature extraction of these two rhythms.
Aiming at the nonlinear and non-stationary characteristics of motor imagery EEG signals, a feature extraction method based on LMD and MFE is proposed in this paper. In the decomposition process of LMD, the signals are decomposed step by step to retain the essential features of the original signal. The characteristic information of the original signal is displayed at different resolutions. It is easier to extract the characteristic information through multiple PF components. MSE method is an effective method to describe information features, which can quantitatively describe EEG signals by calculating multi-scale entropy of PF components. The combination of the two methods can effectively analyze and extract the features of motor imagination EEG signals, which is conducive to the recognition of motor imagination categories.
The feature extraction process of motor imagination EEG signals is shown in figure 2. The specific realization steps of feature extraction method of motor imagery EEG based on LMD and MFE are as follows.  Figure 2. EEG feature extraction process (1) Input EEG signals and perform LMD decomposition on each sample to obtain a series of PF components. (2) The EEG signals are decomposed successively, and the feature information is mainly distributed in the first few PF components, so the first few PF components are selected to extract the feature information.

Feature recognition by support vector machine
Support vector machine (SVM) is an excellent classification algorithm, which is mainly based on the structural risk minimization theory of statistical learning theory and VC (Vapnik-Chervonenkis) dimension [28][29][30]. The support EAI Endorsed Transactions Scalable Information Systems 01 2022 -03 2022 | Volume 9 | Issue 35 | e12 vector machine transforms the input sample into a higher dimensional space through a mapping, and then finds the optimal classification surface in this higher dimensional space, thus separating the samples.
If the training set sample is a d-dimensional vector, it has n samples. The category of each sample is expressed as

Experiments analysis
This paper uses dataset III [31] to verify the proposed algorithm. The experimental data records the EEG signals of a 25-year-old normal female in imaginary right-handed movement. The experiment is conducted on the same day, including seven groups with 40 times in each group. The EEG data of channel C3, C4 and Cz are recorded with a sampling frequency 128Hz and band-pass filtering frequency 0.5-30Hz. Each experiment lasts for 9s, 0~2s is the resting state. The experiment begins with a sound reminder from 2s, and a "+" is displayed on the screen at the same time, indicating the end of 1s. From the third second, the arrow on the screen prompts the experimenter to imagine movement, and the end is 9s.
The data of C3 and C4 channels are selected for feature extraction. The time of motion imagination is 3-9s, so the data between 3-9s of each channel are extracted, and each channel of each group has a total of 128×6=768 data points, and then the data of each channel is decomposed by LMD adaptive method. Figure 3 is the time-frequency diagram of LMD decomposition of C4 channel when imagining right hand motion.  synchronization/ event dependent dessynchronization (ERS/ERD) phenomenon [32,33]. The experimental analysis shows that  rhythm and  rhythm information including the left-right motion imagination are mainly distributed in the first three PF components. Therefore, the first three PF components are selected as the signal of feature extraction.
The time of motion imagination is 6s. Considering the processing time and classification accuracy, the time period used to extract the optimal feature is selected. The sliding time window is 2s and the sliding step is 1s. The results EAI Endorsed Transactions Scalable Information Systems 01 2022 -03 2022 | Volume 9 | Issue 35 | e12 Huili He 6 show that the classification effect is better in 4-6s and 5-7s, so 4-7s is selected as the feature extraction data segment.
The similarity tolerance r=0.35SD (SD is the standard deviation of the original signal) is selected according to several experiments, and the entropy values of multiple EEG signals at different scales are calculated, so as to select the parameters of multi-scale entropy. As shown in figure 4, the entropy of PF1 and PF2 increase first and then decrease with the increase of scale factor  , and the entropy of PF3 increase with the increase of scale factor  . When the experimental scale factor  =9, the entropy values of each component are discriminated and the classification accuracy is high. Therefore,  =9 is selected. Table 1 shows the average multi-scale entropy values of C3 and C4 channels of different categories when r= 0.35SD,  =9 is used to imagine the right hand motion.   figure 5. As can be seen from figure  5, although the multi-scale fuzzy entropy of EEG signals imagining left-right movement overlaps partially, it still has a degree of differentiation. Therefore, multi-scale fuzzy entropy can reflect the characteristics of EEG signals in imagining left-right movement. The multi-scale fuzzy entropy of the first three order PF components contributes the most to the classification. The combination of the first three order multi-scale fuzzy entropy as feature vector is expected to improve the classification accuracy. Next, support vector machine classifier is used to classify the extracted feature vectors. Support vector machine (SVM) is a machine learning method based on statistical learning theory. It has many advantages in solving small sample classification, nonlinear problems and high-dimensional pattern recognition. In order to accelerate the convergence of the training network, the training set and test set are normalized first, in order to summarize the statistical distribution of uniform data. Finally, it inputs the feature vectors of the test set for classification and outputs the motion imagination category.
The experimental dataset consists of 280 trials, which are randomly divided into two groups with 140 trials each. The 6-dimensional feature vectors extracted by multi-scale entropy are input into SVM classifier for training classification, and the optimal classification recognition rate is 85.21%. As can be seen from table 2, the classification accuracy of the feature extraction algorithm in this paper is 0.92%-4.13% higher than that in references SDA [34], EPCA [35] and NVDNN [36]. In this paper, 6-dimensional feature vectors are used. The reduction of feature dimension can make the classifier model simpler and reduce the classification time. Compared with the existing literatures, the number of features is significantly reduced, and 100 tests are carried out. The average time is 73ms. Thus, the new algorithm in this paper reduces the number of features and improves the classification accuracy.

Conclusion
A feature extraction method based on LMD and multiscale fuzzy entropy is proposed in this paper. LMD is used to decompose the EEG signals of left and right motor imagery adaptively, and the decomposed PF components are extracted from the feature vectors of multi-scale entropy, which are input into SVM for classification and recognition, thus realizing the classification of motor imagery EEG signals. Experimental results show that LMD and multi-scale fuzzy entropy are better than traditional feature extraction methods in recognition of EEG signals. The results show that the proposed method can effectively extract features from motor imagery EEG signals