A comparative analysis of classification techniques for human activity recognition using wearable sensors and smart-phones

INTRODUCTION: In these days, the usage of smart-phones and wearable sensors have increased at an exceptional rate. These smart devices are equipped with different sensors such as gyroscope, accelerometer and GPS. By using these sensors to analyze the activity of the end-user, behavioural characteristics of the user can be captured. OBJECTIVES: Although smart-phone and wearable devices provide a platform for conducting social, psychological and physical studies, they still have several limitations and challenges. METHODS: This paper provides a comparative analysis of different classical Machine Learning and Deep Learning algorithms and discusses their accuracy and efficiency for human activity recognition (HAR). RESULTS and CONCLUSION: The paper has primarily used the data captured using wireless sensor devices placed on different parts of a human body, and then compared the results for different classifiers. The conclusion shows that Deep learning schemes are extremely accurate and efficient in comparison with classical machine learning techniques.


Introduction
In mobile and pervasive computing domain, recognizing human activity based on the data collected from sensors has become a promising field. Different sensing techniques are used to accumulate and classify the user's activities, and are used in diverse application areas such as home automation and assisted living, sports and medical applications.
During the last few years, smart-phones have become ubiquitous and gained prominence. There has been a continuous rise in computing power, sensing devices and emergence of novel inter-connectivity mediums. This has enabled smartphones to perform physical human activity recognition, which has empowered many context-aware applications in different domains. For instance, it has been used in several data mining and artificial intelligence applications. Human activity recognition is a primary and a core structure block for these types of applications. It receives raw sensor data as inputs and predicts a user's activity as shown in Figure 1.

EAI Endorsed Transactions on Pervasive Health and Technology
Research Article  Modern smartphones are equipped with several sensors, which includes temperature, barometer, GPS, gyroscope, light, and accelerometer etc. These sensors become a rich data source to measure various aspects of a user's daily life. Human activities such walking, running, jogging, sitting etc. can be identified based on the data from these sensors. Due to this rich data source, it can be said that smartphones have become the dominant platform for human activity recognition for their usability, cost-effective installation, and self-efficacy. To process this raw data, pervasive computing and machine learning have been employed, which provide context-aware and ubiquitous services to people. A host of researchers have developed several models that are designed for the identification of activities performed by humans.
The process to differentiate appropriate activity information from sensor data is a significant task and leads to many challenges for traditional machine learning algorithm. These challenges include spatiotemporal variations in activity patterns, sparse occurrences for some activities, and the frequency of sensor data that does not fall into predefined activity classes. Recognizing human activities automatically is a vital prerequisite to monitor the efficient health of an intelligent home resident and enable their independent functionality. Applications including interactive interfaces for games and mobile services, smart homes, on-demand information system, healthcare system for both inpatient and outpatient treatment etc. has captured the growing interest of researchers in human activity recognition field.
Human activity recognition is primarily targeted towards design and development of intelligent healthcare system. Health issues are eventually the crucial issues that encourage research to conduct research in human activity recognition. Studies for healthcare can be observed in [1,2,3,4,51]. This proposed MEDIC, a medical diagnosis and patient monitoring system designed by using physiological body worn and wireless contextual sensors network [2]. Quantified daily energy expenditure in daily life activities and sport activities based on physical activity classification.
In this paper, a comparative study of classical machine learning and deep learning classifiers for human activity recognition is provided. The objective of the research is to investigate the performance of these classifiers on various datasets, and identify the classifiers that are good for multiple scenarios. Rest of the sections of the paper have been organized as follows: The next section presents the literature review. This is followed by the methodology and the research design used for experimentation. The results are then presented which is followed by conclusion and future work.

Related Work
There has been a host of approaches proposed in literature for human activity recognition. Table 1 summarizes some of these approaches available in literature. [5] proposed an approach using smart-phone inertial sensors. In their approach, the features are first extracted from raw data. Autoregressive coefficients, median and mean are included in feature set. To make the system more robust, the feature set is further processed via linear discriminant analysis (LDA) and kernel principal component analysis (KPCA). Finally, deep belief network (DBN) was trained with selected features set for successful activity recognition. The proposed methodology was compared with traditional activity recognition methodologies like classic multiclass support vector machine (SVM) and artificial neural network (ANN). [6] proposed an online human activity classification based on deep learning. The approach is claimed to be user independent. To extract the local feature set, convolutional neural networks are used along with simple statistical features that reserve information regarding the global form of time series. Additionally, recognition accuracy is investigated based on influence of time series length. The proposed method provides real-time activity classification. Two popular datasets WISDM and UCI were used to evaluate the accuracy of proposed methodology. [7] presented a systematic performance analysis of motion- sensor behaviour for HAR using smart-phones. Sensory data is collected using smart phones while the participants performed daily human activities. To discriminate five different human activities, three multi-class classifiers: nearest neighbors, random forests and support vector machines were implemented [8]. The proposed two implementations for human activity recognition which are different in their prediction technique as they deal with transitions either by directly learning them or by considering them as unknown activities. This is achieved by assembling the output of a support vector machine (SVM) with a heuristic filtering method. The proposed design is validated on data gathered from people performing different kinds of activities (up to 33) using wearable sensors or smart-phones. [9] proposed an approach for human activity recognition using time series data. The data was collected from a triaxial accelerometer of a smart-phone. To recognize human activities, k-nearest neighbour algorithm and neural network classifiers were used. Accuracy of the proposed methods were evaluated on the WISDM dataset [10] proposed an approach for human activity recognition using wrist-worn device (smart-watch) and a smart-phone. Three classifiers were used to recognize 13 different activities. To make author's work reproducible, dataset used to validate proposed methods is made publicly available. It show that the combination of a smart-phone and smartwatch, identifies complex activities with a reasonable precision. [11] proposed a methodology based on deep learning algorithm i.e. convolutional Neural Networks with sensor data collected from smart-phone sensors for HAR. In experiments it is shown that increasing the number of convolutional layers raises performance. Publicly available HAR smart-phone dataset from the UCI repository has been used in experiments. [12] proposed a fast-human activity recognition system in terms of orientation, placement and subject variations. Coordinate transformation and principal component analysis (CT-PCA) are used along with online support vector machine (OSVM) for HAR. [13] proposed a methodology using deep neural networks and classical single layer feed-forward neural network (SLFN). The feature selection has been done using deep neural network and selected features have been used by single layer feed forward neural network (SLFN). The proposed approach is termed as a distilling strategy to maximize the performance. [14] proposed a new adaptive and interactive method with general personal model training components. The data used in experiment is shared on the cloud. Three classifier decision tree (J48), logistic regression and multi layered perceptron neural network were used with different feature selections. [15] proposed One-Dimensional Convolutional Neural Networks(1D-CNNs) for human behaviour recognition [16]. This applied LSTM to classify human activity recognition. By doing a critical review of existing literature, it can be established that there is limited work on the comparison of some of the modern deep learning techniques for human activity recognition. Therefore, in this work, a comparison of 11 machine and deep learning classifiers on five different datasets was performed. The objective of the research is to investigate the performance and identify which of these classifiers performed well for human activity recognition.

Methodology
Classification problems and techniques have been considered a vital part of ML. In last few years, massive amount of applications has been published. In supervised classification we predict a class label basis on predictor features. After pre-processing of our datasets, we trained our machine learning and deep learning model on training dataset. In this section, we will discuss the pre-processing of our datasets and different classification techniques we used in this study.

Dataset
This study used five different datasets to evaluate different deep learning classifiers. The first dataset used is Activity Recognition based on Multisensory data fusion (AReM) dataset. This real-life dataset represents a benchmark in the area of HAR applications [17]. RSS data was collected using IRIS nodes. These nodes were embedded in a Chipcon AT86RF230 radio subsystem that implements the standard of IEEE 802.15.4. This is programmed with a TinyOS firmware. Three nodes were placed on the user's ankles and chest. One is planted on a stuff in the environment representing an expressive place for a particular activity (i.e. a stationary bike for cycling activity recognition).
Second dataset, HAPT, is a smart-phone based dataset for human activity recognition (HAR). To obtain the dataset [18], a group of 30 people were selected. Each person was instructed to follow a series of activities. All the volunteers were wearing a waist-mounted Samsung Galaxy S2 smartphone. Angular velocity and triaxial linear acceleration signals were collected using the phone's Gyroscope and accelerometer at a 50HZ sampling rate.
The third, OLDPPL, HAR dataset is based on battery-less wearable sensor for healthy older people [19]. This dataset contains the activity data of 14 healthy older people (older than what? Need to quantify this), performed activities using a battery-less wearable sensor. Volunteers resided in two clinical rooms. The setting of Room1 used 4 RFID reader antennas and Room2 used 3 RFID reader antennas to collect the activity data.
The fourth dataset PAMAP2 is an Activity Monitoring dataset. This dataset contains data of 18 different activities (i.e. playing soccer, cycling, walking etc.). A group of 9 people, wearing a heartbeat monitor and 3 inertial measurement units, performed some activities. The benchmark dataset can be used for HAR applying algorithms of classification [20], [21].
The fifth dataset WISDM is a smart-phone-based activity recognition dataset [22]. A group of twenty-nine volunteer were asked to perform a specific set of activities, while they were carrying a smart phone in their trousers' front leg pocket. They were instructed to jog, sit, walk descend stairs, walk ascend stairs, Table 2. Dataset features set with abbreviations (walk and stand for specific duration of time. Table 2

Pre-processing
In all datasets the missing values were fixed. For this purpose, we took average value of instances for each class label. After fixing the missing value, we took equal number of instances for each class label to maintain the entropy of the system. We encoded the class label accordingly. The huge amount of continuous feature values may cause issues and slow down processing of machine learning algorithms, so discretization is used to decrease the number of continues feature values. We used the normalization process to scale down the transformation of predictor features. This is a

Training Process
For training process, all datasets has been divided into 70% and 30% in ratio. All models have been trained separately on each classifier.

Classical Machine Learning Classifiers
Naive bayes, decision tress and random forest classifiers have been used in ad boost (AB) classification [23,24]. In boosting technique, random forest provided the better accuracy then decision tree and naive bayes classifiers. In our study, we trained our artificial neural network (ANN) [25,26] with different numbers of hidden layers and with different number of nodes in hidden layers. We started to train our ANN model from 5 numbers of hidden layers to 50. For five different datasets we trained our ANN model with different numbers of hidden layers to get better results. We selected different number of input neurons for five different datasets.
Decision trees (DT) [27,28] has been trained with Gini index and Gain ratio respectively. Minimum number of records per node has been tried as 5, 7, and 3. During training, no pruning method has been used. Better results have been achieved with Gini index method. In our study, we used mixed fuzzy rules [29,30] [33] model has been trained with stochastic average gradient. Maximum number of epochs has been set as 100 and epsilon 1.0 × 10−5. learning rate Strategy was fixed and step size has been set as 0.1. For regularization, uniform Distribution has been used as prior probability with 0.01 variance.
Naive bayes (NB) [34,35] model was trained with 0.5 default probability. Maximum number of unique nominal value per attribute has been fixed as 20. Random forest (RF) [36] model has been trained with different split criterion for all datasets. information Gain, information Gain ratio, Gini index methods have been used for AReM, HAPT, OLDPPL, PAMAP2 and WISDM. Best results has been found with Gini index method. Tree depth has been adjusted as 10. Minimum node size has been set as 1. Number of models has been adjusted as 100.
For Support Vector Machine (SVM) [37,38,39,40,41]. model training, different kernels have been used for different datasets. Overlap penalty has been fixed as 1.0. For polynomial kernel, power parameter has been adjusted as 3.0. Bias and Gamma value have been tune as 1.0 and 0.5 respectively. Kappa and delta values have been adjusted as 0.1 and 0.5 respectively for hyper tangent kernel. RBF kernel has been used with 0.1 sigma value.

Deep Learning Classifiers
Convolutional Neural Network (CNN) uses three significant parameters: sharing, sparse interactions and equivariant representations [42] ideas. Af-ter convolution, there are generally fully-connected and pooling layers, which achieve classification tasks. [43] put forward the structural model and [44] pro-posed recognition for the transformation of image. [45,46] proposed and used the algorithm based on the error gradient to train CNN, and derived their prominent performance compared with other approaches used in some pattern recognition tasks.
Our proposed Convolutional Neural Network (CNN) model for each dataset contains the Input layer with 2 dimension input size. The input dimensions are based on datasets features and instances. 1D convolutional layer has been added with 64 kernels having kernel size 4 ×1. 'Relu' activation function was use in the convolutional layer. Max Pooling layer has been added with kernel size 2 × 1. To reduce the overfitting dropout layer has been added with 0.5 dropout ratio. Another 1D convolutional layer has been added with 'Relu' activation function and 64 kernels having kernel size 4 × 1 . After 1D convolutional layer , another max Pooling layer has been added with kernel size 2 × 1. another dropout layer has been added with 0.5 dropout ratio after max pooling layer. fully connected layers have been added with 54 units and 'Relu' activation functions Output layer has been added with softmax functions. Output layer of each dataset model contains the units according to labels f0 respective dataset. learning rate has been adjusted as 0.01 with Categorical Cross-Entropy Loss function. 100 number of iterations has been set as epoch.
Long Short Term Memory networks (LSTM) is a superset of feedforward neural networks with the ability to permit information across time steps Few work used LSTM for the HAR tasks [46,47,48,49,50], where the learning speed and resource utilization are the key issues for HAR. [49] Explored numerous model parameters first and then suggested a comparatively good model which can achieve HAR with high throughput. [50] suggested a binarized-BLSTM-RNN model, in which the input, weight parameters and output of all hidden layers are all binary values.
Our Long Short Term Memory network contains the input layer as per dataset features and instances. Recurrent layer has been added with 100 unit and return sequences parameter was true. To reduce the overfitting, dropout layer has been added with 0.3 dropout ratio. another Recurrent layer has been added with 50 unit and return sequences parameter was true. another dropout layer has been added with 0.2 dropout ratio. fully connected layers have been added with 25 units and 'Relu' activation functions. Output layer has been added with softmax functions. Output layer of each dataset model contains the units according to labels f0 respective dataset. learning rate has been adjusted as 0.01. Umair Saeed et al. 6 'adam' optimizer has been used with Categorical Cross-Entropy Loss function. 100 epoch has been set with 32 batch size.

Result and Discussion
This section presents the results acquired from the classifiers employed over the five different datasets. We can categorize activities into two groups, static and dynamic activities. For instance, standing still is static activity while walking and running are dynamic activities.  Figure 2 shows the overall accuracy of all the classifiers we have used in this study. This is a graphical representation of the accuracy of all classifiers for each dataset. By observing Figure 2 , it can be clearly shown that deep learning techniques, CNN and LSTM provide better results than classical machine learning algorithms. If we analyze it thoroughly, it is found that CNN and LSTM are used to model sequential data in which the data is dependent on the specific order. In human activity recognition a set of subactivities when performed in specific sequence leads to a human activity i.e. the sequence of small sub-actions comprise a complete activity. Therefore the good performance of CNN and LSTM is evident. In all five datasets, these deep learning techniques are much better. CNN provides the accuracy rate up to 80%, 80%, 99%, 99%, 90% for AReM, HAPT, Old People, PAMAP2 and WISDM datasets respectively. Random forest and support vector machine techniques provide better result than other classical machine learning techniques. Random forest provides the 80%, 79%, 99%, 99%, and 90% accuracy rate for AReM, HAPT, Old People, PAMAP2 and WISDM datasets respectively. SVM provides the 80%, 70%, 91%, 95%, and 78% accuracy rate for AReM, HAPT, Old People, PAMAP2 and WISDM datasets respectively. Overall random forest, CNN and LSTM classifiers provide the accuracy rate up to 80% for all datasets. Table 3 Figure 4 (b) showed that LSTM, CNN, random forest, support vector ma-chine K-nearest neighbour performed very well as compare to other algorithms. Overall the precision for CNN, LSTM and random forest is benchmark for all five datasets.    and logistic regression classifier provides good sensitivity and specificity rate after CNN, LSTM, Random forest and SVM classifiers.

Conclusion
In today's world, the sensors and smart phone based human activity recognition area is proving to be vital in assisting human beings in different areas of life. In this paper, we used 5 benchmarks on different datasets to implement multiclass classification techniques and used a GUI based analytics platform KNIME and MATLAB. We used Ada boost, artificial neural network, decision tree, K-NN, logistic regression, Naive Bayes, random forest, support vector machine, convolutional neural network and Long Short Term Memory networks classifiers. The methods we applied in this study produced some outstanding classification, specifically for convolutional neural network and Long Short-Term Memory networks classifiers which produced over 90% overall classification rate, while random forest and support vector machine produced up to 80% accuracy rate. (One number is classification rate and other number is accuracy..? Should be the same thing when comparing. Either classification rate or accuracy) The primary reason for CNN and LSTM to perform better is their ability to work on sequential data. Some other finding in this paper is the pattern of an activity recognition. For example, random forest and KNN are best for static activities, convolutional neural network and Long Short Term Memory networks are best for almost each activity, especially for cycling and jump forward and backward. Random forest classifiers are other classifiers which produced better results. We concluded that deep learning models (CNN, LSTM) and Random forest from classical machine learning are more stable and best performance classification algorithms for human activity recognition.

Future work
Despite a comprehensive comparative study, we have not tested the human activity recognition datasets against some other deep learning models such as deep belief network. In addition, the use of feature transformation techniques such as restricted Boltzmann machine and auto-encoders can be utilized to further optimize the classification technique and improve the accuracy and other matrices. In future work, we also plan to work on a real time HAR system. The system is anticipated to be a cloud based solution that system will communicate with different IoT devices. In the proposed system, we will use deep learning techniques CNN and LSTM for Human activity recognition.