Improved ECG based Stress Prediction using Optimization and Machine Learning Techniques

INTRODUCTION: ECG have emerged as the most acceptable and widely used technique to infer mental health status using cardiac signals thereby resolving major challenge of Mental Health Assessment protocols. OBJECTIVES: Authors mainly aimed at identification of stressed signals to distinguish subjects exhibiting stress ECG signals. METHODS: Authors have taken advantage of three optimization techniques namely, Genetic Algorithm (GA), Artificial Bee Colony (ABC) and improved Particle Swarm Optimization (PSO) that further improves the classification accuracy of Multi-kernel SVM. RESULTS: The simulation analysis confer that the proposed work outperforms the existing works while demonstrating an average accuracy, precision, recall and specificity of 98.93%, 96.83%, 96.83% and 96.72%, respectively when evaluated against dataset comprising of 1000 ECG samples. CONCLUSION: It is observed that the proposed stress prediction based on improved VMD and Improved SVM outperformed the existing work that comprised of traditional VMD and SVM.


Introduction
In everyday life, stress is called any event leading to a pronounced emotional reaction. The disorder in humans represents a state of mutilation, which interferes with and changes the functioning of the human body's organs. In medical records, there is a long list of human disorders. Among them, bone diseases, mental diseases, neurological diseases, genetic diseases, skin diseases, infectious diseases, heart-related diseases, and digestive system diseases are the main human disease categories [1].
Today, many people all over the world have a mental illness. According to the World Health Organization (WHO), about 300 million people worldwide suffer from depression. Bipolar disorder, dementia, and schizophrenia affect 60 million, 47.5 million, and 21 million people respectively [2]. Besides, the disability rate of these diseases is very high. Therefore, in this study, the diagnosis of stress on heart rate has been analyzed. A psychological disorder is a mental disorder, which represents the continuous dysfunction of thoughts, emotions and behaviors, causing great distress. This kind of disorder is unacceptable in our society [3]. Stress cause problems related to the brain. Stress, anxiety, depression, schizophrenia, and intellectual disability may affect human physical and mental health [4]. However, it is difficult to distinguish between normal and abnormal behavior states. Cognitive dysfunction, cultural accidents, and personal distress are the three main representatives of these human diseases. Stress is observed as a kind of hurdle in the path that leads to a healthy life. Stress indicates the state of overload or under pressure. There are two major classes of stress. The stress that proves to be beneficial for an individual is known as eustress, while the painful or hurting one is called distress [5]. Our sympathetic system is affected by stress when we are in a fight or flight reaction. Due to stress, several changes occur within the body that can cause an increase in heart rate [6]. In other words, it could be understood that stress affects the digestive as well as immune system while increasing the blood circulation rate. There is nothing that could be more dangerous than stress. It also leads to low back pain, erectile dysfunction, headaches, hypertension, upset stomach, and even reduced immunity of the body [7]. Overall, it weakens the individual, not only mentally but also physically and emotionally. Stress management lacks any proper medical treatment and is therefore considered more dangerous than any other disease. The first step to preventing stress-related problems is identifying the root cause of stress and trying to reduce it. There are numerous ways to identify mental stress. This includes physical symptoms such as sleeplessness, increased appetite, anger, memory loss, and psychological disturbance. Taelman and workers tried to understand heart's physiology as it is the main body organ used to identify the state of stress and normal in an individual [8]. With passing time, improvements in machine learning techniques have led to the development of computer-based techniques that assist users with an intelligent decision system and showed more efficient implementation in various fields. The Intelligent health system has also attracted masses due to its demonstrative ease, which allows information sharing [10]. Researchers from vivid fields, especially focusing on application development, dedicate themselves towards image and signal analysis in addition to biomedical-based analysis work. Electrocardiogram (ECG) signals are the most popular signals to unveil the heart's working condition [11]. Therefore, the classification of ECG signals becomes a significant topic in the field of biomedical research.
In recent years, adaptive studies of ECG signals are generally divided into two parts: the first one is their detection and second one is their classification. Detection studies focus on determining heart rate within the ECG data obtained over a certain time spam [12][13]. The detected peaks are recorded and considered as the raw data. The recorded data is then used to monitor the heartbeats and hence based on that the normal and abnormality in heart is decided [14][15]. Different techniques such as thresholdbased methods [16][17] digital filter-based [18], wavelet transform [19][20], and machine learning [21][22] were applied to detect the heartbeat. ECG signal classification is another essential step in the ECG bio-medical analysis. This process provided an automatic identification of the heartbeat signal. Generally, ECG signals have unique morphological features such as P-QRS-T complex peaks and having different time intervals. Many heart diseases can be diagnosed by visually analyzing these morphological changes [23].
However, the noise in the ECG signals will have a serious impact on visual diagnostics and diagnosis through computer system. Therefore, removing these extra unwanted signals becomes an essential task, which is possible by applying pre-processing steps on the recorded ECG data. Several researchers have used Infinite Impulse Response (IIR) filter as a pre-processing technique for the extraction of noise. The source of noise might be the power line or the electrode interface through which the signals are collected. It has been stated that the filtration of noise through IIR filter is simple and is well performed as compared to using IIR with higher order. But, adjoining these advantages, there are some disadvantages, such as the increase in filtering time, and moreover, it fails while filtering non-linear signals. Presently to overcome the problem of filtering power line interferences, an adaptive filtering scheme has been used. This filter removes the noise with minimum error and with fast filtering response [24]. Per day, ECG records contain billions of heartbeat signals. Therefore, it is very difficult for medical professionals to monitor all signals to detect possible disease.
Consequently, it is essential to design an automatic monitoring system to detect disease with minimal computation time. Machine learning approaches such as supervised or unsupervised techniques are studied to classify heart disease. SVM is one of the most effective tools used for automatic detection of heartbeats. But designing an accurate ECG monitoring system is a challenging task as signals varying with gender, age, and many other factors. Therefore, due to the dynamic nature of ECG signal, a static design cannot be utilized [25].
Improved ECG based Stress Prediction using Optimization and Machine Learning Techniques 3 sympathetic system, which helps to minimize the cardiac workload. In case of stressful condition, the heart rate of ECG signal gets affected. The stress analysis of cardiovascular system is shown in Figure 2. The interaction of obesity and physical activities in the cardiovascular system is shown in Figure1. The activation function of the heart is represented by the solid line whereas the dotted lines indicate the suppression of the indicated factor [26]. The paper is organised in five sections including above introductory discussion. Section 2 covers the state of art of the prior work in the related field. Section 3 presents the experimental steps carried out to design an automatic ECG signal classification (normal and stress). Section 4 presents the computed results to evaluate the designed system and conclusion of the paper is given in section 5 followed by the references.

Related Work
This section covers the research efforts pertaining to classification of ECG signals using various states of art techniques. In 2012, Karthikeyan et al. had worked to identify mental stress through ECG signal using DWT based approach in addition to employing thresholding approach. Using this approach, the noise signal induced from the power line and other sources was removed [27]. Poungponsri and Yu (2013) had used DWT in addition to Artificial Neural Network (ANN) approach for the reduction of noise signals. Here the two different features of DWT and ANN such as multi resolution and learning capability parallel to human tendencies have been used, which minimized the error [28]. Later, Karthikeyan et al. (2014) have used stroop color word tester to insert stress based on which ECG signal has been generated that are used later for the identification of stress level. The features are extracted using DWT approach from the ECG signals. The choice of proper wavelet function plays a vital role for feature extraction. In order to fully cover ECG signals in the feature extraction stage, an optimal wavelet function was needed, which is performed by mother wavelet transform. The signals were classified as stress or normal with an accuracy of 94.5% [29]. In the same year, Vinaya et al. (2014) had focussed their research to detect arrhythmia from the human stress. The features of stress were extracted using DWT approach that played important role in de-noising the ECG signal without affecting the relevant information of the input ECG signal. Later on, for classification of ECG signal, Hidden Markov Model was implemented [30]. Zhang et al. (2014) had proposed a classification method that employed Genetic Algorithm (GA) and kernel SVM. The method consists of three main modules, namely, lead-fall detection followed by feature extraction and classification. The method had demonstrated 92% true positive detection with 5.68% false positives with 94% classification accuracy for a training dataset consisting of 1000 records [31]. Munla et al. (2015) had introduced stress detection model for drivers during driving. The performance was analysed using Heart Rate Variability (HRV) that can be obtained from ECG signal. The designed automatic stress level model warns the driver at the early stage and hence safe his/ her life. The decision has been taken by considering the attributes such as mood variation, fatigue, disease and bio-rhythm. The data has been passed to different stages such as pre-processing, feature extraction, and then classification using radial basis function (RBF) of SVM, KNN approach, with an accuracy of 83% [32]. Li et al. (2016) have presented a non-linear feature extraction of ECG signal using Principle Component Analysis (PCA). Initially, the dimension of the dataset were reduced using PCA and then features are extracted using Kernel Independent Component Analysis (KICA). The extracted features are optimized using GA, and then classified using SVM approach with an accuracy of 97.78 % [33]. Li et al. (2017) had proposed an ECG feature extraction and classification design implementing GA with back propagating neural networks and wavelet packet decomposition method. The design achieved classification accuracy falling in the range of 97% for smaller groups of datasets while evaluating 25 ECG features [34]. Widasari et al. (2019) presented quick way for ECG signal classification through an automatic sleep stage detection. Feature extraction technique has been used for the detection of sleep stage by collecting data from 51 subjects that consists of data from healthy person, insomnia affected, sleep disorder breathing and REM behaviour disorder [35]. V. Malhotra and M. K. Sandhu 4 electrocardiogram classification method that was inspired by ensemble of SVM. The work was evaluated against single SVM using MIT-BIH database. Authors had involved jk index evaluation in their study which is a combination of jindex that reflects the sensitivity and positive predictivity while k-index also known as Cohen's Kappa reflects the degree of agreement in the judgement. The method demonstrated 10% better jk-index with the involvement of wavelets and morphological descriptors when combined with SVM product rule [37]. Kaur and Sharma (2019) have presented a detailed review about the number of psychological disorder detection using supervised and nature inspired algorithms. Mainly three techniques such as for disorder diagnosis, disease detection and classification have been introduced. Initially, the researchers have presented the racial influence on human of the psychological disorder [38]. Dhal et al. (2019) have presented a detailed survey in which nature inspired algorithms are used in the field of image enhancement [39]. Sharma and Singh (2020) have detected cardiac arrhythmia as hear disorder by using the Swarm Intelligence (SI) scheme. Using the concept of SI, optimal set of features have been detected that were responsible for the diagnosis of cardiac arrhythmia. From results it has been observed that Satin bird Optimization approach performed better [40]. Revathi et al. (2020) have analyzed temporal, spectral and statistical feature of ECG signal using Feed forward back propagation in addition to GA. The performance has been examined against SVM and KNN approach. The authors have modified the signal using crossover operator of GA, which against helps to enhance the training and then classification of feed forward back propagation neural network [41]. Reddy et al. (2020) have used GA in addition to Fuzzy logic classifier for the prediction of heart disease at the early stage. The feature related to heart disease has been selected using GA and classification of disease has been performed using fuzzy rule set. The fuzzy rules have been optimized using GA, and the essential features were selected using rough theory. For prediction GA with fuzzy rule set has been used [42]. Chui et al. (2020) have used multiobjective GA in addition to multiple kernels learning SVM for the detection of stress as well as drowsiness. The results show that an overall accuracy under the Receiver Operating Characteristic (ROC) of 97.1% and of 96.9 % has been attained while detecting driver drowsiness and stress respectively [43]. Zhou et al. (2021) have performed work mainly in three phases. In the first phase, HRV features have been extracted from the ECG signals. In the second phase, Based on HRV features, Gaussian mixture model (GMM) has been used to know the mental state. Then to reduce the stress classification coefficient (SCC), clustering approach has been used to process the HRV features. To extract the features Conventional Neural Network is used. At last, the manual and automatic features are combined and passed to SVM. The result provides an average recognition rate of about 95 % [44]. Zhao and Liu (2021) have detected the mental pressure on sports athletes using GA with back propagation neural network. Here, GA is used to optimize the weight of backpropagation neural network. The researchers have mainly focused on tennis players [45].

Material and Methods
The current section is dedicated to present the detailed methodology of the proposed work. First of all, authors discuss the type and sources of datasets employed in the study.

ECG signal Database
This article presented an automatic ECG signal classification model based on signal data accessed from the publically available dataset and extracted ECG signal data.

PhysioNet Database
The dataset employed in current study was collected from PhysioNet, which is freely accessible critical care dataset that intended to motivate researchers and investigational studies pertaining to complex physiological signal analysis. Basically, it is generated at the laboratory of Boston's Beth Israel Hospital and validate by the researchers of Massachusetts Institute of Technology (MIT) for the analysis of arrhythmia and related to arrhythmia diseases. The collected database is known as "MIT-BIH Arrhythmia Database" and contains near to 48 half-hour ECG recordings from 47 subjects [46][47].
Extracted ECG signal data ECG data has also been extracted from 10 subjects in stress and normal state using AD-624. The Precision Instrumentation Amplifier was used for performing Mental Arithmetic Task (MAT) to generate stress in order to record ECG signals under stress conditions. AD-624 is an amplifier instrument that provides low noise and high precision output signals.

Upload ECG signal data (Both normal and Stress Condition
In the present research, ECG dataset has been accessed bestowed with the information regarding digitalized signals with labelled datasets of normal as well as stress conditions that are further classified into normal, medium and high stress ECG signals.  The original ECG waveform uploaded for training and testing is shown in Figure 3. The waveform contains noise signal as denoted by small peak and high peak signal as the original signal. Therefore, it becomes necessary to de-noise the signal before it goes to a further processing unit.  [51] have used Cuckoo Search as an optimization approach. Among all, GA has been proved to be able to merge into semi-optimal solutions to various difficult problems. It is V. Malhotra and M. K. Sandhu 6 a powerful and random tool that works on the principle of natural evolution. The working of GA in the proposed work is as follows:

Figure 4. Genetic Algorithm for Row Selection
The entire work is basically divided into two sections: training and testing. Here, about 70 % of the available dataset is used for training and the left is used to validate the automatic stress detection performance using ECG signal. Initially, GA is applied for row selection of data. For example, if the dataset contains 1000 lines then 700 lines are used for training, and rest are used for testing. A novel fitness function has been designed based on that the selection of row was performed. GA also helps to reduce the dataset size without affecting the actual information.
Here, GA is used as a feature selection algorithm to select the uploaded data features by selecting appropriate rows. For the searching process of feature set in the available population set and to encode all candidate features within a chromosome, an accurate representation of features must be performed. Initialize, Q number of feature subset among the P dimensional dataset. The precision value of each candidate feature in one of the total N chromosomes is n. The steps shown in Figure 3 are followed to minimize the error in fitness value so that an optimized value can be determined. The offspring obtained from the step's selection, crossover, and mutation is considered as a parent and are responsible for the next generation.
The process of generating the best offspring is continued until the desired row features has been obtained. If the criteria like iterations reached at maximum number, chromosomes values become same as that of population size then the GA process is terminated. The steps followed for GA is written in Algorithm 1.  14. Return: = / / Optimized Selected Data

Variational Mode Decomposition (VMD) for data decomposition
VMD is used to decompose selected rows from ECG dataset into k number of discrete sub-signals (mode). Each signal having restricted bandwidth. Therefore, one can say that VMD signal depends upon the bandwidth and K number of discrete signals. The aim of using the signal decomposition approach is to minimize the signal complexity.
It adaptively determines the relevant frequency band and simultaneously estimates the corresponding mode to minimize the error. It decomposes the input signal into its main mode VMF, which can reproduce input signals with different sparse characteristics. For each VMF, it has a limited bandwidth, and it is assumed that the center pulsation (ωk) determined during the decomposition process is very compact. The VMD method also uses the alternating direction method (ADMM) of the multiplier to perform the reconstruction process instead of the screening process in the traditional decomposition method. The bandwidth of 1D signal can be determined as follows: for every uk mode, (i) an analytical signal is computed using Hilbert transform, (ii) The mode of frequency is shifted to the baseband by adding few exponential functions, (iii) determine the bandwidth using gaussian smoothness. Mathematically steps followed by VMD are as follows

The values of parameters such as
, , and is updated using following equation (20, equation (3), and equation (4) respectively.

Apply Improved VMD using ABC for Dynamic window threshold
ABC is a well-known swarm-based optimization approach used to extract features from the decomposed signal. Using the ABC approach best features regarding to the normal and stress signal are selected using a novel fitness function. ABC algorithm is inspired by the foraging process of bees. In ABC, the bees in the population are divided into three categories: employed bees, Onlooker bees, and scouts. The employed bee's role is to search food sources and bypass the collected information related to the route of food sources to other bees known as onlooker bees. The role of onlooker bees is to analyze the food and to determine the best quality food. The food source is the potential solution and the fitness function of it is evaluated based on the amount of nectar Let the available food source search by employed bees are = ,1 , ,2 , ,3 , … … … … … … … … , ), among those we have an optimal solution. Where i= 1,2,3,4………..N. Therefore, the aim of each employed bee is to select a random solution from the available ( ) and continue to search best neighbourhood bee to obtained an optimal solution ( ), which can be represented by equation (5).
Where, j is a dimensional index, which can be selected randomly between [1, D], where D is the dimension of the decomposed signal.
, is a solution, which is selected randomly and the value of k must belong to [1, N]. , is the randomly generated weight [-1, 1] After the completion of search, the employed bees shared their searching information to the onlooker bees. Depending upon the food source quality collected by the employed bees, best food source is selected using the designed fitness function. The selection process of best feature of ECG signal is performed by using the fitness function given by equation (6) Depending upon the food source quality collected by the employed bees, best food source is selected using the designed fitness function. The selection process of best feature of ECG signal is performed by using the fitness function given by equation (6).
Where, fitness value for the i th solution is represented by . If the quality or fitness of the food source is not being improved in the defined limits, the scout bee updates the old solution with the randomly generated food sources.
Where, box is defined by [ , ] and random values between 0 and 1 are represented by . Steps of the employed ABC strategy for determining dynamic window threshold are as follows:  The above algorithm is responsible for identifying an optimal threshold value for the decomposed ECG samples. Employed bee searches for the best feature among the decomposed feature population and forwards the next step's best feature information. Greedy selection mode is adopted, and mean value of the filtered features is extracted by the onlooker bee. The probability function is applied to identify the best suitable and optimized threshold value for the decomposed signals' features. Once the optimal threshold value is reached, iteration cycle is terminated and it returns ℎ as optimized threshold value.

Feature Extraction
PSO is applied for the selection of ECG signal features. If fitness function of PSO is satisfied, then pass data to the semi-supervised learning algorithm, which is SVM. The optimized features are not satisfied the fitness function of PSO, and then the signal is again passed to the PSO algorithm.
PSO was first proposed by Kennedy and Eberhardt (1995). PSO is a random search of real-numbers in the Ddimensional space. In PSO, the solution in the n dimension space is represented by the participating particles in the work space. The best solution is determined by combining the results (best solution) obtained by an individual particle and a swarm itself. In PSO, the main working elements are the velocity and position of every participating particles. For every iteration, particle position is updated as per the designed fitness function. Equation (8) and equation (9) are used to update the moving particles' position and velocity in the searching space.
( +1 ) = ( ) + ( + 1) Where, ( ) and ( +1 ) are the present and the next velocity that are used to control the direction as well as the magnitude of moving particles. ( ) and ( +1 ) are the recent and the next position of the particle (i). 1 and 2 are the controlling agent used to control the movement of particles 1 and 2 are the random variables. The particles best position is represented by and respectively. Algorithmic steps employed for feature selection using PSO are as follows: Algorithm 3: PSO for feature selection 1. Create particle swarm vector for each ECG signal //particle swarm vector for ECG signal // number of iteration // upper boundary // lower boundary 2.

= .
// initialize signal 13. Return the optimal particle features of the ECG signal 3.7. Multi-kernel SVM for training of ECG signal data At this stage, data undergoes semi-supervised learning using multi-kernel SVM. The SVM implementation at this stage is greatly advantaged to beat the annoyance of multidimensionality of machine learning. With the implementation of quadratic programming strategies, it could find the best hyperplane to aid the input data in either category. The trained data set is stored in the form of a training dataset that is calculated as follows: Where, represents the optimized signal. Training data obtained for two categories, and by ′ ′ representing normal and stress ECG signals. is the multi-kernel SVM function employed in the current prediction work. The trained signal data is further used as a comparison database in performing predictions of unknown test ECG signal data.

ECG signal classification using Multi-Kernel SVM
The uploaded test ECG signals follow the stages of optimization using GA, PSO. Finally, a fully optimized signal data is then passed to SVM for comparison against the training dataset. In the process the uploaded signals are classified into normal and stress signals using the following algorithm.

ECG signal classification using Multi-Kernel SVM
The section is dedicated to evaluating the quality of prediction of the proposed work in terms of performance parameters, namely, precision, accuracy, specificity and recall. These parameters are calculated using while considering confusion matrix parameter to evaluate the overall quality of prediction using the following formulas: In the above equations, true positive, true negative, false positive and false negative values are represented by , , and , respectively. Here, indicates the ECG signals that are correctly classified, is an indicator of incorrect ECG signal classification while and values indicate the instances of erroneous classification. In the adopted methodology, a higher positive predictive value reflects the quality of ECG signal prediction.

Result
The prediction results are obtained in the form of confusion matrix parameter expressed in terms of true positives, true negatives, false positives and false negatives. These parameters are further articulated in terms of performance parameters: recall, specificity, precision, and accuracy.

Recall Comparison
The present section compares the observed prediction results against the popular methods proposed by various researchers.

Precision comparison
Precision corresponds to the ratio of true positive observed by the implemented technology. Table 3 and Figure 7 compares the precision observed in case of proposed work with the two existing works. Li

Accuracy comparison
Accuracy is another important parameter that decides the quality of predication or classification in terms of exactness of the observed results. Table 4 Table  4.

Conclusion
The idea behind the current work is to develop a classification model to precisely classify large-scale ECG data using an optimization algorithm in addition to machine learning. In the current methodology, three-fold feature optimization is involved at three different stages using GA, ABC, and PSO. The respective fitness functions are employed at each stage to select the optimized features for the next stage. It has dramatically refined the ECG signal features. To reduce the overall computation time, VMD is involved in minimizing the ECG signal complexity for better SVM-based classification. Overall, it enhanced the performance of the proposed ECG signal prediction method compared to the existing works. The work has achieved ECG signal prediction with an average accuracy of 98.93% with a precision of 96.83%, recall of 96.83%, and a specificity of 96.72% when a larger ECG sample size of 1000 ECG signals is employed in the performance evaluation. It is also established that the improved VMD and improved SVM-based ECG signal classification outperformed the existing VMD and SVM-based ECG classification works.