An Improved Approach for Stress Detection Using Physiological Signals

Stress is a major problem in society. Prolonged stress can lead to ill-health and a decrease in self-confidence. It is necessary to detect stress at an early stage to prevent its adverse effects on our physical and psychological health. The paper presents a stress detection model using physiological signals. In this paper, WESAD (Wearable and Stress Affect Detection) dataset is used which consists of physiological data recorded from both the chest and wrist. Further, a Long Short-Term Memory (LSTM) based model is used to detect stress. The simulation results indicate that, indeed, Electrocardiograph (ECG), Electromyogram (EMG), and Respiration (RESP) signals may not be necessary for identifying stress. A three-way validation is carried out with an accuracy of 98%. The novelty of the paper is the way time-series data is handled to make it closer to real-time data captured from sensors. The work can be used widely in clinical practices to detect stress at an early stage.


Introduction
Nowadays, pandemic, natural disasters and economic recession are affecting our mental health negatively.It is leading to stress (Kaur et al, 2019; Gautam et al, 2020; Sharma et al, 2020; Ausin et el, 2021; Khalaf, 2020; Daniels et al, 2020) in our lives which is causing various psychological illness such as anxiety, depression, eating disorders, attention deficit disorder etc. (Olff et. al, 2020) and making the existing problems even worse.Moreover, it is affecting our physical health as well.
Stress can be defined as a response to situations that involve a high workload or demands high psychological thinking.In the current scenario, every other person can be seen in stress.If this problem is not dealt with, it might prompt real well-being issues at a later stage.Furthermore, in most exceedingly bad situations, it may lead to even deaths.Recently, researchers observed a strong relationship between physical health and psychological health of an individual (Picard, 2016;Liapis et al., 2020).They observed that physiological signals are a good indicator of our emotional well-being.Hence, these signals can be used for periodic monitoring of our mental health including stress, anxiety, and depression.The applications of stress detection systems lie in various domains such as among employees, students, drivers, passengers, doctors, patients, job applicants, and many more.
The methods proposed in the past researches to detect stress depend on self-reporting diary as ground truth that is to be filled by subjects.The drawback is stressed people don't fill up diaries with interest accurately.Thus, inconspicuous and accurate systems are required to identify stress in people for improving health.Researchers proposed different approaches in the past to recognize stress automatically (Carneiro, et al., 2012;Cho, et al., 2017;Gjoreski, et al., 2020;Lin, et al., 2017;Kurniawan, et al, 2013;Vizer, et al. 2009).
Though lots of research has been carried out in the field of automatic stress detection, the research is still far away from good accuracy while being close to real scenarios (Can, et al., 2019).In this paper, a physiological signal based dataset (WESAD) is used along with LSTM to design a stress detection model.To reduce the complexity of the model, first, a rigorous preprocessing is applied to clean the dataset and pick the most relevant features only.
The paper is organized as follows: Section 2 discusses the state-of-art.Proposed work is explained in detail in Section 3 followed by implementation and results in Section 4. Finally, the paper is discussed, concluded with future directions in Section 5 and 6.

Related Work
To detect stress in people, a considerable amount of work and researches have been done already in the past.Traditionally the researchers have done a considerable amount of work by anticipating the approach of stress detection which includes stress during day and night through smart devices (Muaremi, et al., 2013).The subject's data was recorded using mobiles.Using Smartphone sensors, variations in self-perception of the stress of people were recorded.Subjects were asked to fill questionnaires daily to have data.Voice recordings were also analyzed to measure stress.After experiments, HRV features were considered more important.However many more measures such as readings through biological sensors impact impeccably in recognizing stress-level.Considering this, there are several interesting works done in recognizing stress autonomously.Lin et al. (2019) used a deep fusion network for recognizing stress.LDA and Adaboost classifiers have also been used.Fusion schemes (early and late fusion) were framed and results show late fusion was superior to early fusion.
Some have used sensors available in commercial smartwatches to identify stress (Siirtola, 2019).A comparison was made between sensors in different operating systems available.Tizen OS was better than WearOS and WatchOS.The maximum accuracy achieved was 84% when the window size was 120s with an SVM classifier.The authors (Singh & Queyam, 2013) proposed a novel method for detecting stress in automobile drivers where stress function was used as a classifier.They took ECG, EMG, RR, GSR physiological data of 10 drivers for building the system.The problem was dealt with as 3-class classification with classes as high, moderate, and low stress.The highest accuracy of 88% was achieved when the ANN classifier was employed for the task.Eye blinks and brain activities (Haak et al., 2009) were used in determining stress.Eye blinks were recorded while moving on streets with straight roads, curved roads, and crashes on a curved track.It was concluded that blink in eyes and spontaneous activity in certain events point towards stress.Authors (Padmaja et al, 2018)  Though, lots of research has been carried out in the field of automatic stress detection, the research is still far away from good accuracy while being close to real scenario.For the data collection, they have used both a chestworn device (RespiBAN) and a wrist-worn device (Empatica E4).A RespiBAN Professional and an Empatica E4, respectively.The RespiBAN is equipped with sensors to measure electrocardiogram (ECG), electro dermal activity (EDA), electromyogram (EMG), respiration, body temperature, and three-axis acceleration (ACCx, ACCy, ACCz).All signals are sampled at 700 Hz.The RespiBan was placed around the subject's chest.The RESP is recorded via a respiration inductive plethysmograph sensor.The ECG data was recorded via a standard three-point ECG.To allow the subject to move as freely as possible, the EDA signal was recorded on the rectus and the TEMP sensor was placed on the sternum.The EMG data were recorded on the upper trapezius muscle on both sides of the spine.To avoid wireless packet loss, the recorded data was stored locally and transferred to a computer for further processing after the experiment.All subjects wore the Empatica E4 on their non-dominant hand.The E4 records blood volume pulse (BVP, 64 Hz), electro dermal activity (EDA, 4 Hz), body temperature (4 Hz), and three-axis acceleration (32 Hz).

Data Cleaning
Data cleaning is the way towards distinguishing and adjusting (or removing) off base records from a dataset.It includes recognizing inadequate, wrong, erroneous, or unimportant sections of the data and erasing it.Here, a two-level data cleaning process is carried out.In literature, it is found that the physiological signals obtained from the chest-worn device are more accurate compared to the physiological signals obtained from the wrist-worn device (Shi, et al., 2010).Therefore, at the first level, in the proposed model, the data captured from the chest-worn device i.e.RespiBAN is considered.Hence, the dataset with 6 modalities (EMG, ECG, RESP, TEMP, EDA, 3 axis ACC) is taken care of.
Moreover, there are four classes in the WESAD dataset namely: baseline, stress, amusement, and meditation.Since the objective of the proposed work is to detect stress, we used two-class data from the dataset, i.e., baseline (neutral) and stress.Hence, all the records which correspond to amusement and meditation have been eliminated.
Then, the data is segregated into 2 parts: • Records of each person when he/she is in stress.
• Records of each person where he/she is not in stress.
It is found that there are more records of data when the person is in stress compared to a person when he/she is not in stress for each subject.

Feature Selection
Feature selection is the process of selecting the most relevant subset of features that can be used in further processing.Having superfluous features in the data can diminish the exactness of the models and cause the models to learn dependent on unessential features.The plot is between class label that is the baseline (1), stress (2), amusement (3), and meditation (4) with respect to time (Y-axis).It is necessary to know which subject belongs to which version as it is not stated in [16] so that while taking a cluster of subjects for training and testing, not all subjects should belong to a single version.If this becomes the case, then the testing accuracy can be affected.

Feature Generation
Feature generation deals with dimensional reduction by which raw data is reduced to more segments for further processing.Large datasets contain enormous data that require techniques for computing such a huge dataset.Aggregating the features used in signal segments and combining them for classification serves as an initial step for pre-processing.Extracted features are grouped in several possible ways like time-or frequency-domain features, linear or non-linear features, uni-modal or multi-modal features, etc. Considering statistical features (like dynamic range, mean, standard deviation, etc.) are feature dependent on complex features.
In the proposed model, the following features are used for ACC as there are 3-axis X, Y, and Z: • maxX -Maximum of X-axis • mean -the combined mean of x, y, and z-axis that is (mean x + mean y + mean z)/3.

• std -Standard deviation
For TEMP and EDA, we include the following features: • max -maximum of the feature • min -minimum of the feature • mean -mean of the feature • range -Dynamic range of the feature (max-min) • std -Standard deviation So, it resulted in a dataset with 5 attributes each for EDA, RESP, and ACC.Also, samples are from 15 subjects and labeled with two classes: stress and baseline (neutral).

EAI Endorsed Transactions on Scalable Information Systems
Online First thousand approximately.The concept of window size and the shift are depicted in Figure 5. Since, we have unequal number of data in both the classes, keeping the window size and the shift same for both class may lead to more records of baseline data than the records of stressed data which can create further unbalancing.To balance the classes, the window size of 42000 (as all samples are recorded at 700Hz, hence (1/700) * 42000 gives a window of 1 minute) with a window shift of 175 (0.25 seconds) is considered for computing the features corresponding to the records of baseline and similarly, for class stress, a window size of 42000 and window shift of 105 (0.15 seconds) was considered.There is a difference in the window shifts between the two to get an almost equal number of samples from baseline and stress data.The concept of window size and the shift are depicted in Figure 5.If window size and the shift are kept constant, then definitely records of baseline data will be more than the records of stressed data and it may lead to outliers in the model.So, the number of samples considered for each subject for class baseline as well as the class stress is approximately 4500.However, by approximation, it can be in the range 3500-4800.Hence, feature sampling was applied so that data from each of the 15 subjects may get an equal number of samples.

FEATURE SAMPLING
As mentioned in the previous section, approximately 4800 samples were taken for each subject.Hence feature sampling is used.The values are up-sampled according to the demand of the dataset.4800 samples were chosen as it is closest to all the records computed by each subject.If the data is lesser than 4800 the data samples were up-sampled and substituted by mean of each column.Bringing the uniformity in the number of columns is achieved for better interpretation of data points.
After feature sampling, the data is ready for analysis.Figure 6 depicts the procedure followed for training and testing the model.

EAI Endorsed Transactions on Scalable Information Systems
Online First

Implementation and Results
After preprocessing, we have created a single data-frame consisting of 72,000 (4800*15) baseline records followed by 72,000 (4800*15) stress records.This gives a total of 1,44,000 records.The baseline and stress labels are assigned 0 and 1 respectively.Figure 6

Recurrent Neural Networks
Recurrent Neural Networks are applied to sequential data.They have nodes similar to neurons organized into a network of layers.They store what has been processed before, so they are said to possess 'memory'.The input in the current state is the output from the previous state in the network.They are called recurrent because they recurrently process each item in the sequence of data.
They pose a problem of exploding and vanishing gradient which is solved by LSTMs by introducing some modifications in the basic structure of RNN.
The Problem of Long-Term Dependencies: It may be possible that due to RNN being used, past information is connected to current information.Sometimes, it is required to examine recent information to perform the present task which is exactly in our case.
And therefore, Long Short-Term Memory networks (LSTM) will be suitable for training our dataset.

LSTM Networks
LSTM networks are built to overcome the problem of vanishing and exploding gradient posed by RNNs during their training.It solves the problem as it allows the gradients to flow without change.This type of neural network architecture has the ability not only to process single data points but large sequences of data.One single unit of LSTM includes a forget gate, an input gate, an output gate, and a cell as shown in Figure 7.These gates altogether help in regulating the flow of data in and out of the cell.Input gate checks what information is going as input to the cell i.e. how much new values pass into the cell, forget gate is to check which information is not to be kept from previous cell state, and output gate is to take decision what data is to be passed to next state.Forget gate uses a sigmoid activation function.

Training and testing of the model
For training, we designed a sequential LSTM network with 1 hidden layer and 15 neurons (as there are 15 attributes in our dataset).We applied 3 types of classification: The first type of classification is the traditional approach which is used in (Carneiro, et al., 2012;Vizer, et al. 2009;Cho, et al., 2017;Lin, et al., 2017;Kurniawan, et al, 2013).In this type of approach, samples are randomly picked from the dataset.In this case, the accuracy of training and testing can be high for such a scenario, however, it treats each sample as independent data.Though the accuracy is high, its behavior does not match the real-world scenario.
The second and third types of classification are novel and necessary.Here, instead of picking up the samples randomly, the samples are considered subjectwise.All the samples of a subject are fed into the model in a sequence.As a result, the model is trained first with a sequence of data of one subject.Then, the next subject is considered for training, and so on.In the second type, 14 subjects are considered for training, and the remaining one is used for testing.In the third type, 13 subjects are considered for training and the remaining two are used for testing.The two test subjects are selected such that one belongs to version A and second belongs to version B.
Further, experimentation is performed by varying the hyper-parameters like batch size and number of epochs to observe their effect on training.It is carried out for all 3 types of classification.Batch size of 32 or 4 is chosen as it is a factor 4800, the number of stress and baseline samples in each subject.By batch size of 32, the algorithm takes the first 32 samples (from 1st to 32) from the training dataset and trains the network.Next, it takes the second 32 samples (from 33rd to 64th) and trains the network again.The same procedure is repeated until all samples have been propagated through our network.It helps in avoiding the mixing of samples of different subjects.Further, the sigmoid activation function is used for the output layer using loss function as binary cross-entropy.

Results
A detailed evaluation of testing is demonstrated in Table 1.All the types of classification that are discussed in the previous sub-section give promising results.This is by far the best accuracy that has been achieved in the binary classification of stress vs baseline.Each of the types of classification gives the best training accuracy of 98%, 99%, and 99% for type 1, type 2, and type 3 respectively.
• The best testing accuracy for type 1 classification is 98% with a batch size of 32 and the number of epochs equal to 31(early stopped).• The best testing accuracy for type 2 classifications is 98% with a batch size of 32 and the number of epochs equal to 8(early stopped).• The best testing accuracy for type 3 classifications is 93% with a batch size of 32 and the number of epochs equal to 40 (early stopped).
The plots of training and testing accuracy vs. the number of epochs and training and testing loss vs. the number of epochs for type1, type 2, and type 3 classifications are depicted in Figure 8, Figure 9, and Figure 10 respectively.It can be observed that type 2 (Figure 9) is the best among all others in terms of accuracy, loss and number of epochs as compared to type 1 (Figure 8) and type 3 (Figure 10).Moreover, it is closer to the human pattern of stress.Just by looking at one instance of values of different parameters, one cannot be declared in stress.Rather, monitoring is to be done over a period of time and the collective analysis of all those values helps in concluding whether a person is in stress or not.This is what is happening exactly in type 2 and type 3 classification.

Discussion
In this section, we compared the existing models with the proposed one (Table 2).It is observed that in the base paper (Schmidt, et al., 2018)

Conclusion
Nowadays stress is a part of our life.Dealing it at an early stage may avoid a lot many further problem.In this paper, we propose an automatic stress detection model using LSTM.The model used the WESAD dataset containing physiological signals of 15 subjects for training and testing purposes.Utmost care has been taken to preserve the human body pattern and hence preserved the sequence of the given dataset.Further, a thorough and vigorous preprocessing is applied followed by 3 types of classifications.The LSTM model provides a great possibility of exhibiting temporal dynamic behavior.All 3 types of classification using LTSM have shown promising results.Furthermore, type 2 is the best one having good testing accuracy of 98% with only 8 epoch.Although the results are promising, the LSTM model helps to categorize into stress and non-stress states using binary classification.Future Directives: The scope of applying model as 4-class classifications will turn into a massive advantage in terms of a better understanding of various emotions.Predicting Body Mass Index and differentiating the stress levels of men and women would add depth to the project.Working on different transition states of emotion will broaden the perspective of the study.Real-time testing of models with the available wrist sensors is still a challenge to cope with, but currently, biological sensors are proving to be suitable for analysis.
This paper proposes an LSTM based stress detection model using physiological signals.As discussed in the previous section, the stress cannot be detected based upon the values of physiological signals EAI Endorsed Transactions on Scalable Information Systems Online First

Fig. 1 .
Fig. 1.Steps to get data fit for training captured at an instance of time.Rather, continuous monitoring of these values is required to detect the stress accurately.Hence, the proposed model is a continuous model that takes the sequential physiological signals while detecting stress.Hence, LSTM is used for this purpose.The steps followed to make data fit for training is depicted in Figure 1.

Fig. 4 .
Fig. 4. Baseline (1), Stress (2), Meditation (3) and Amusement (4) vs Time plot for each subject.According to Schmidt, et al., (2018), each subject is subjected to 20 minutes of baseline and 10 minutes of stressed tasks.Therefore: • Number of records for class Baseline: 800 thousand approximately.• Number of records for class Stress: 400thousand approximately.The concept of window size and the shift are depicted in Figure5.Since, we have unequal number of data in both the classes, keeping the window size and the shift same for both class may lead to more records of baseline data than the records of stressed data which can create further unbalancing.To balance the classes, the window size of 42000 (as all samples are recorded at 700Hz, hence (1/700) * 42000 gives a window of 1 minute) with a window shift of 175 (0.25 seconds) is considered for computing the features corresponding to the records of baseline and similarly, for class stress, a window size of 42000 and window shift of 105 (0.15 seconds) was considered.There is a difference in the window shifts between the two to get an almost equal number of samples from baseline and stress data.The concept of window size and the shift are depicted in Figure5.If window size and the shift are kept constant, then definitely records of baseline data will be more than the records of stressed data and it may lead to outliers in the model.So, the number of samples considered for each subject for class baseline as well as the class stress is approximately 4500.However, by approximation, it can be in the range 3500-4800.Hence, feature sampling was applied so that data from each of the 15 subjects may get an equal number of samples.

Fig. 5 .
Fig. 5. Concept of window size and window shift.

Fig. 6 .
Fig. 6.Steps for model training depicts the procedure followed for training and testing the model.The data-frame is divided into two parts-X and Y, where X contains an array of data-points and Y contains a single dimension array of 0(Baseline) and 1(Stress).X is further divided into two parts X Train and X Test for training and testing respectively.Similarly, Y is also divided into Y Train and Y Test.More emphasis is given to consider the samples sequentially which preserves the dependency of current value over the previous value.To ensure this, special care has been taken to train the model subject-wise, i.e., feed the data of one subject (means 4800 sample) at once for training.By training the model subject-wise, it is meant that first the subject 1 is trained and carried forward then the second subject is trained and so on for all the subjects.

•
Batch size = 4, Number of Epochs = 1 • Batch size = 32, Number of Epochs = 5 • Batch size = 32, Early stopping used for epochs Early Stopping is used to avoid over-training of the neural networks.Following some criteria, it restricts the number of epochs after some time.The criterion that has been used for early stopping for model training is to minimize the validation loss while training in each epoch.

Fig. 8 .
Fig. 8. Accuracy vs Epochs and Loss vs Epochs for training and testing of type 1.

Fig. 9 .Fig. 10 .
Fig. 9. Accuracy vs Epochs and Loss vs Epochs for training and testing of type 2.
, number of experiments are performed and maximum accuracy achieved is 92.83% with LDA.Later, in 2019, another experiment(Gjoreski,  et al., 2019) is performed which reported accuracy of 97.55% with EMI-Fast GRNN.Then, Indikawati and Winiarti (2020) designed a stress detection model and

Table 2 .
COMPARISON OF PROPOSED MODEL WITH EXISTING MODELS