A Novel Random Split Point Procedure Using Extremely Randomized (Extra) Trees Ensemble Method for Human Activity Recognition

INTRODUCTION: Automatic detection and recognition of various human physical movements while performing daily life activities such as walking, jogging, running, sitting, standing etc. are usually considered as Activity Recognition (AR). AR is a prominent research area in many applications, such as elderly care, security and surveillance, smart homes, health and fitness. Extremely Randomized Trees Classifier (ET Classifier) is a type of ensemble learning technique used in Activity Recognition, which clusters several different decision trees into a forest from a single learning set and gives the classification result. But it suffers from high variance and over-fitting problem due to high inter-dependency among hyperparameters during model building. OBJECTIVES: The primary objective of this paper is to propose a novel Random_Split_Point procedure for Extra tree classifier to make the existing approach more robust, less variance, less computational time in obtaining optimal split points and faster in model building. This approach generates K random split points from all the candidate features of the dataset and selects the best split point based on the maximum score obtained by information gain measure. METHODS: In the proposed method to improve the randomization and accuracy of AR system, a novel random split-point procedure for ET classifier is proposed. This approach reduces the bias-variance problem induced due to the three hyperparameters such as K, nmin and M used in split-point procedure of existing ET classifier (K : number of randomly selected attributes at each node, nmin : minimum sample size for splitting a node, M : number of decision trees for ensemble). This approach generates K random split points from all the candidate features of the dataset and selects the best split point based on the maximum score obtained by information gain measure. RESULTS: The proposed approach is experimented with two public AR datasets HAR and HAPT (UCI Machine Learning Repository) containing 6 and 12 activities respectively. In HAR dataset, smartphone sensed sensor signals of 3 static and 3 dynamic human daily activities are there, where as in HAPT dataset apart from these 6 daily activities, 6 postural transitions data is available. Experimental results and comparative analysis show that the proposed method outperforms over other existing techniques with an accuracy of 94.16% for HAR dataset and 92.63% for HAPT dataset. It also takes less computational time in finding optimal split-points and less model building time. CONCLUSION: AR systems can be used as an intelligent system in healthcare to monitor the behaviour of healthy people by recognizing their daily activities. These systems also help in early detection of some chronic diseases and improve the quality of life. In this paper, an attempt is made to improve the accuracy of Activity Recognition over some existing methods.


Introduction
Human Activity Recognition primarily focuses on surveying the characteristics of physical and psychological human behaviour. Individual activities are identified through various built-in sensors in wearable devices and smart phones. An activity sensor is a device used to identify and trace body movements. Human activity recognition uses several sensors such as accelerometers, gyroscopes, and heart rate monitors while employing unprecedented machine learning procedures to translate low-level kind of sensor data and yield prolific contextual info in a real-life aspect. Individuals continually connect with their ease, little sized cell phones in their everyday exercises, which have prompted the ascent in the examination of fetching useful knowledge from information procured by ubiquitous sensors in smart phones [1,2]. It has numerous applications and benefits in everyday life, such as life style improvement, health and fitness, smart homes, elderly care, etc. While HAR aims to incorporate motion and physical movement data, human behaviour analysis focuses on both physical movements and psychological states of the being.

Motivation and Contribution Highlights
Human activity recognition technology exploits distinctive multi-modular information produced from different gadgets to distinguish human stance, physical activity status, and conduct activities. The interest in understanding human exercises has developed in the medicinal services area, particularly in elderly care support, restoration help, diabetes, and subjective issue. Strong proof shows that ordinary observing and acknowledgment of physical exercises can possibly help to oversee and diminish the danger of numerous sicknesses, for example, weight, cardiovascular and diabetes. Accelerometer and Gyroscope are the most broadly utilized cell phone sensors for human action perception. The contributions in this paper are as follows:- • A Robust Random_Split_Point procedure is proposed here to reduce the variance effect on the ensemble model using the existing Extremely Randomized (Extra) trees classifier.
• Experimental analysis is carried out on two standard publicly available activity recognition datasets HAR and HAPT having six and twelve activities respectively. • Later, comparative study is performed with some significant existing techniques based on various statistical performance measures obtained.

Organization of the Paper
Remaining paper is structured as: Section 2 provides an extensive summary of literature review in this domain. Proposed method along with their algorithmic steps is given in Section 3. Experimental analysis, results discussion and comparisons with other state of the art existing approaches are presented in Section 4. Finally, Section 5 presents the conclusion of the paper.

Related Work
Several studies have been regulated over past years to make the collected sensor data more accurate and precise. While accelerometers and gyroscopes are the most frequently used sensors for tracing human activities, smart phones have also become a prominent choice to improve the accuracy of human activity recognition.  [13] aimed to identify activities from simple inertial sensors using decision tree ensemble algorithm XGBoost. This approach gives an accuracy of 94.6%, but suffers from an over-fitting problem. Kwon et al. (2018) [14] proposed an activity recognition system that collects data from the smart watch and used ANN classifier for activity recognition. But this approach suffers from an over-fitting  [15] focused on the process employed to generate data samples for activity recognition, because many traditional approaches are susceptible to bias leading to skewed results. The author observed that the accuracy of many datasets like MHealth is low due to imbalance in data and missing values. Ku Nurhanim et al. [2018] [16] used semi-non-overlapping mechanism and 10-fold cross validation for sample generation from smartphone sensor data for activity recognition. This approach used an ensemble method using classifiers such as Bagging, Adaboost, Rotation Forest, Ensemble nested dichotomy and Random Subspace and achieved an accuracy of 94.22%. Cho and Yoon (2018) [17] proposed 1D CNN model that employs divide and conquer based classifier learning couples with test data sharpening. The authors experimented with two standard datasets UCI HAR and Opportunity and obtained an accuracy of 91.62%. Münzner et al. (2017) [18] used CNN's on RBK and PAMAP2 datasets. Nurhanim et al. (2017) [19] study the performance of different classification kernels of the SVM for classifying various daily activities. Test subjects performed various physical activities such as sitting, climbing stairs, and laying down which were tracked and measured using inertial sensor signals. The collected data was processed using signal processing methods and multiple features of time and frequency domain. Luštrek et al. (2015) [20] made use of smart phones to aid in better tracking of daily lifestyle activities of diabetes patients, which could be beneficial for physicians as well the patients themselves. Ole M. B. et al.(2017) [21] demonstrated that the best in class choice tree gathering calculation XGBoost gives an exactness of 94.6% approved on a free test set. Kaur et al. (2016) [22] applied human exercises forecast with the assistance of different AI models and information mining sets of tools. Cross-validation has been performed to check the consistency of the group model and precision of over 85% has been acquired. Padmaja et al. proposed a distributed and parallel decision forest approach for human activity recognition and also experimented on human stress behaviour using socio-mobile data [23,24,25]. Table 1 shows the state-of-the-art existing literature on activity recognition. 5 used to determine the acceleration of the device. Values along the X, Y and Z axis are used to identify motions such as swinging, tilting, vibration, etc. A gyroscope, on the other hand, utilizes the angular velocity to calculate the rotation or twist in a smart phone device. While an accelerometer detects directional movement, a gyroscope detects the lateral orientation of the device. They captured the sensor signals at a constant rate of 50Hz, which were subsequently preprocessed to reduce noise. The signals were preprocessed for noise reduction with a median filter and a 3rd order low-pass Butterworth filter with a 20Hz cutoff frequency. The Butterworth filter was employed to distinguish the acceleration signal into body acceleration and gravitational acceleration. The processed signals were sampled into a fixed window of length 2.56 seconds with a 50% overlap. Each window had 128 data points for every original features recorded, which are body acceleration, body gyroscope and gravity acceleration over X, Y and Z axis. The inertial signals were feature engineered and several, time and frequency, features were extracted from each window. Feature engineering yielded a feature vector of 561 attributes. The authors randomly split the dataset into 70:30 ratios which formed a distribution of 21 subjects for training and 9 subjects for testing. In HAPT dataset, the process of data collection is as same as HAR, but it contains 12 human activities, three static, three dynamic and six transitions between activities such as sit_to_stand and stand_to_sit etc.

Proposed Work:
This section presents the proposed framework representation along with the algorithmic steps. The Extra tree classifier builds multiple trees by making bootstrap -False, which means it samples without replacement. Then the classifier chooses the optimal split-point for each one of the K randomly chosen features at every node, which means this algorithm selects a split-point randomly. The existing Extra tree classifier, chooses a subset 'S' and selects an attribute 'a' for random split. The value of 'a' is chosen randomly after finding the maximum and minimum value in S [amax, amin]. The procedure followed in existing Extra tree classifier is given in Table 2. This algorithm draws samples without replacement and the chances of getting the same split-point is high as it selects a split-point randomly from its maximum and minimum range of values. -Return a split s* based on maximum score where Score(s*, S) = max i = 1, 2, ...., K Score(si, S).

Pick_a_Random_Split (S, a)
Input: Sample subset S and a feature / attribute a In this work, a novel procedure for random-split-point is proposed. The overall block diagram of this work is shown in figure 1. The raw sensor data is first pre-processed using various noise filters and then 561 handcrafted features are generated. Then the training and testing set is prepared by taking 70% and 30% of the dataset. The training set data (M) goes as an input to Build_Extra trees procedure, which exploits a sub-procedure called Random_Split_Procedure and outputs location indexing. Further it generates sub model (ti); i→ [1, S] and for each sub-model ti, it computes local misclassification rate. In ensemble procedure, Statistical Mode [set of values] is performed. Finally the procedure returns T = {t1 ….tS}. Learned model is deployed to predict the labels for test data. The performance of the framework is judged by measures such as precision, recall, F1-Score and accuracy etc. Table 3 contains the procedure for building tree ensemble method using ET classifier through Random_Split_Procedure. In the proposed approach, for each sub model, accuracy and misclassification rate is computed and initial weights are updated.
For each misclassification rate, stage is calculated and weights are updated for every misclassified instance.
Then the new weights are updated using the formula: Finally, M sub models are generated and then it will predict the output of each model using the formula: Table 3. Procedure for building tree ensemble method using ET classifier using Random_Split_Procedure
for i = 1 to M -Create a tree ti = Build_ET (TS).
Output: A tree t. -

Experimental Results
The experimental setup consists system environment (Hardware and Software specifications) as -OS: Ubuntu 16.04 LTS, 64 bit, Python 3.7 version is used in implementation. The performance measures considered for activity recognition are -precision, recall, F1-Score and accuracy. The performance of AR system is maximized using recall which is the ability of a model to find all the relevant cases within the dataset. Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives. While recall factor shows the ability for finding out all the relevant instances in the dataset, precision judges the total proportion of data points which the model infers was relevant in actual were relevant. Figure 2 and 3 shows the confusion matrix for the proposed system for activity recognition using HAR and HAPT datasets.   Where precision = TP / (TP+FP), Recall = TP / (TP+FN), F1-score = 2 * ((Precision * Recall) / (Precision + Recall)) and Accuracy = ((TP + TN) / (TP + FP + TN + FN)) The proposed approach is compared with state-of-theart classical machine learning algorithms used popularly for activity recognition. In CART algorithm, trees are grown from the learning sample and pruned by estimating the errors using 10-fold cross validation approach. In KNN algorithm, the value of K = 7 has given the best accuracy for the AR datasets. Bayes algorithm is proven to be not suitable for these datasets because of poor performance obtained. RF algorithm outperforms significantly with good accuracy. Each time it builds a tree by using the bootstrap copy of the learning sample. At each test node, the optimal split is obtained by searching a random subset of K candidate attributes. This algorithm performs well in terms of degree of randomization, if K is small compared to number of attributes n, otherwise RF algorithm suffers from overfitting problem. The existing ET algorithm performs well in terms of randomization compared to RF, but prone to bias-variance problem. Table 5 and 6 show the comparative analysis of recognition accuracies and Cohen's Kappa score of proposed approach with existing classifiers.     The proposed approach predicts the class labels of each activity with a reasonable accuracy of 94.16% for HAR dataset and 92.63% for HAPT dataset. From the results it is observed that our proposed approach shows better accuracy in comparison with existing classifiers used for activity recognition.

Conclusion
Activity recognition system plays a vital role in many applications such as virtual education, gaming, entertainment, sport injury detection, elderly care and rehabilitation, and smart home environment monitoring. AR systems can be used as an intelligent system in healthcare to monitor the behaviour of healthy people by recognizing their daily activities. This system also helps in performing a long-term analysis of early detection of some chronic diseases and to improve the quality of life. In this work, a random_split_point procedure is devised for human activity recognition using two public datasets HAR and HAPT from UCI machine learning repository. This approach utilizes the existing extremely randomized (EXTRA) trees ensemble method with a new procedure for random_split_point selection for building trees. Experimental results and comparative analysis show that the proposed method outperform over other existing techniques with an accuracy of 94.16% for HAR dataset and 92.63% for HAPT dataset. The proposed approach takes less computational time in finding optimal splitpoints and less model building time. In this an attempt is made to improve the activity recognition accuracy over some significant methods available in literature.