Distance Based Method for Outlier Detection of Body Sensor Networks

We propose a distance based method for the outlier detection of body sensor networks. Firstly, we use a Kernel Density Estimation (KDE) to calculate the probability of the distance to k nearest neighbors for diagnosed data. If the probability is less than a threshold, and the distance of this data to its left and right neighbors is greater than a pre-defined value, the diagnosed data is decided as an outlier. Further, we formalize a sliding window based method to improve the outlier detection performance. Finally, to estimate the KDE by training sensor readings with errors, we introduce a Hidden Markov Model (HMM) based method to estimate the most probable ground truth values which have the maximum probability to produce the training data. Simulation results show that the proposed method possesses a good detection accuracy with a low false alarm rate. Received on 19 September 2015; accepted on 24 November 2015; published on 19 January 2016


Introduction
The improvement of living standards, unreasonable diet, excess energy, environmental pollution and other factors enable chronic diseases developing more quickly.This leads to a shortage of qualified healthcare professionals and equipments to treat the sick and needy persons.The wireless body sensor network (BSN) is one of solutions to this problem.BSNs use wireless devices attached to or implanted in the body to collect various vital signs such as heart rate (HR), oxygen saturation (SpO2), blood pressure (BP), etc, and transmit collected data to a central device for processing.This allows real-time monitoring and early detection of clinical deterioration, and greater freedom and mobility while maintaining the quality of medical care [9].
Wireless devices are restricted by resources.In addition, they are frequently susceptible to environmental effects, vulnerable to the malicious, which lead to unreliability sensor data.However, medical applications have strict requirements for reliability to avoid false alarm, so outlier detection is extremely important to ensure the reliability and accuracy of sensor data before the decision-making process [9].
Outlier detections in wireless sensor networks have been studied for many years [1][2][3][4].They estimated sensor readings or probabilities of sensor readings using spatial correlation in measurements at different sensors for the outlier detection of wireless sensor networks.These methods have an assumption that there are a large number of sensors used for the same events detection, a sensor value can be deduced from other sensors' values.However, they are not suitable for body sensor networks, because it is difficult to put too many sensors on body and it is usual that different sensors are used to collect different vital signs.
Kim et al. in [5] gave an approach for motion outlier detection in body sensor networks, they used history data to train Gaussian Mixture Model to generate clusters of data in similar motion groups, these cluster of data are used to estimate a Gaussian distribution to compute fault probability for new node reading.These methods can detect faults that are deviated largely to the normal data.For those faults whose readings are normal values without a low probability these methods may have a bad performance.
Chen et al. in [6] diagnosed abnormal data of time series of multivariate variables by two steps.Firstly, they suppose that there is a relationship between the variables represented by an expression.Then a data is diagnosed as faulty if its deviation to the estimated value by this expression is larger than a threshold.
Similarly, Salem et al. in [8] tried to use a linear regression to estimate the reading of a sensor, e.g.HR, by values of its neighbour sensors.However, there may not be relationship between variables, and it can not decide which data is faulty when some fault occurs.
Salem et al. in [9] used a Kernel function to estimate the the distribution of the distance between a sensor reading to the mean of training data.A data instance is diagnosed as faulty if the probability of the calculated distance is very low.Rajasegarar et al. in [13], [12] used a naive Bayes based method for outlier detection.This method simply calculated the frequency of each attribute to estimate the probability.However, the Bayesian model is established on the basis of independence hypothesis, which assumed that the attributes of data are independent.This is not always true, and may affect the results of the classification.
In this paper, we propose a distance based outlier detection method for BSNs.Firstly, we calculate the average distance to k nearest neighbors of training data to estimate a KDE.For any diagnosed data, we calculate the probability of its distance to k nearest neighbors using the KDE, if this probability is less than a threshold, then the diagnosed data may be an outlier.Then we check the distances of the diagnosed data to its left and right neighbors, if both of these distances are great, then we decide the diagnosed data as an outlier.
In some conditions like that the sensor readings have successive outliers, then the previous method may have a poor performance.We introduced a sliding window to this issue.Similarly, we calculate the probability of the distance to k nearest neighbors for a diagnosed sliding window with the estimated KDE, if this probability is less than a threshold, then we decide that there are some outliers in this window.Then we check the distance to the left and right neighbors to locate the outlier.
Estimating the KDE with training data is the key issue of the proposed method for outlier detection.However, the history training data containing errors can disrupt the estimated values of the KDE.We use the Hidden Markov Model (HMM) to estimate the most probable ground truth values which have the maximum probability to produce the training sensor readings.
The rest of this paper is organized as follows.Section II introduces system models and some definitions.In section III, the distance based outlier detection method is formalized.In section IV, experiments are carried out to test the performance of the proposed method.Conclusions are drawn in Section V.

System Models and Definitions
Fig. 1 shows the network architecture of our considering medical deployment scenario.We use three sensors to monitor heart activities, blood pressure, respiration rate and saturation of oxygen in the arterial blood.These sensors monitor vital signs and transmit the collected data periodically at every discrete time instance to neighboring personal server devices, such as a smartphone.Then by wireless and wired connection, these data are streamed remotely to a medical doctor's site for real time diagnosis, to a medical database for record keeping, or to the corresponding equipment that issues an emergency alert.
Time series of vital sign data: It is a sequence of vital sign data arranged in time order X = (X a , X a+1 , • • • , X c ), where X i = (x ih , x ib , x is ) is the set of sensor readings of HR, BP and SpO2.The main purpose of time series analysis is to diagnose the current sensor readings based on the existing historical data.At any time t, suppose the ground truth values of HR, BP and SpO2 are G t = (g th , g tb , g ts ), the measured values transmitted from sensors are X t = (x th , x tb , x ts ), the outlier detection process decides whether x t accords with G t .However, for outlier detection process, the difficult is that we have no way to know the ground truth values G t .

Kernel Density Estimation
Kernel density estimation is a non-parametric way to estimate the probability density function of a random variable.Let (y 1 , y 2 , • • • , y n ) be an independent and identically distributed sample obtained from some distribution with an unknown density function f .The shape of function f can be estimated as where K(•) is a non-negative function called the kernel that integrates to one and has mean zero, h > 0 is a bandwidth.K h (x) is a kernel with subscript h is given as

Hidden Markov Model
A hidden Markov model is a 5-tuple λ = (Q, V , Π, A, B), where Q = {q 1 , • • • , q n } is a set of hidden states with s t denoting the state at time t, V = {v 1 , • • • , v m } is a set of observation symbols with o t denoting the symbol at time t, Π = {π 1 , • • • , π n } is a vector of initial probabilities with π i = P (s 1 = q i ), A is a matrix (a ij ) (n×n) of transition probabilities with each a ij = P (s t+1 = q j s t = q i ), 1 ≤ i, j ≤ n, B is matrix (b ij ) (n×m) of observation probabilities with each b ij = P (o t = v j s t = q i ).We also use π s 1 , a s t s t+1 and b s t o t to denote π i , a ij and b ij respectively.

Simple Distance to Neighbors
We use the Euclidean distance to calculate the distance between two multivariate data.Let X i = (x ih , x ib , x is ) and X j = (x jh , x jb , x js ) be sensor readings on time i and j, the Euclidean distance between X i and X j is The k nearest distance of X i is calculate as follows.
where w j is a weight.The distance of X i to its left and right neighbors (the nearest distance for short) is Given a time series of history training sensor readings, we firstly calculate the k nearest distance of each data X i , then the univariate KDE is used to estimate the probability distribution of these k nearest neighbors distance.For a recently produced sensor readings X t , the k nearest neighbors distance d k t of X t and the probability p of d k t is calculated by the KDE obtained by history training data.If p is less than a threshold, the the nearest distance t is greater than a pre-defined value, then X t is diagnosed as an outlier.
If some error occurs at time t, then the k nearest distance data at time closing to t, e.g.time t − 1 or t + 1, may have a small probability.The similar condition exists in the nearest distance.This leads to that the average rate of outliers newly introduced will be high using the k nearest distance or the nearest distance based method alone for outlier detection.Thus, we use the combination of these two method for outlier detection in this paper.
Fig. 4 shows a simulation result of 5000 HR data from a real medical dataset using the proposed outlier   detection method.We use the first 4000 data to estimate the KDE.The k nearest distance of the 4000 training data is shown in Fig. 2, where k is selected as 2, and the kernel density estimation of the distance in Fig. 2 is depicted in Fig. 3. Then we inject 5% errors to the rest of 1000 data.The original data, the data with injected errors, and the data diagnosed as outlier are all marked in Fig. 4.

Distance to Neighbors of Sliding Window
For some condition like that the time series of sensor readings has too much successive outliers, then the detection method in the above subsection may have a poor performance.The reason is that the k nearest neighbors of an outlier X t may be contained in the successive outliers.This leads to the k nearest distance of X t having a normal probability, and X t is diagnosed as a normal data.To this problem, we improve the distance based outlier detection method on a sliding window.
A sliding window with width m at time t contains X t and its left m − 1 nearest neighbors We define a distance D ij between sliding window B i and B j as where d (i−l)(j−l) is the Euclidean distance between X i−l and X j−l , and the weighted average distance D k t of sliding window B t to its k nearest neighbors: w l D tl (6) where w j is a weight.We can see that the weighted average distance D k t of sliding window B t to its k nearest neighbors is the average of k nearest distance for all X i in B t .
Similarly, given a time series of sensor readings of history training data, we firstly calculate the k nearest distance of each sliding window to estimate a univariate KDE.For a recently produced sensor reading widow B t , the k nearest distance D k t of B t is calculated.If the probability of D k t calculated by the KDE is less than a pre-defined threshold, then we decide that there are some outliers in the window B t .The next thing is to locate the outlier sensor reading.Since the sliding window B t contains three time series-B th for HR, B tb for BP and B ts for SpO2, for each sensor reading y i in B th , B tb or B ts , if the nearest neighbor distance of y i is greater than a pre-defined M, then y i is diagnosed as an outlier.

Handling Error Training Data With HMMs
It is impossible to ensure that the history training data are all correct.The outliers in the training data can disrupt the estimated value of the KDE, and influence the performance of the proposed outlier detection method.To this issue, we use the HMM to estimate the most probable ground truth values which have the maximum probability to produce the training sensor readings.
for sequence O with random initial conditions.However, through our experiments, we find that the performance using the HMM with parameters being estimated by the Baum-Welch algorithm directly is poor.We improve Baum-Welch algorithm as the following steps.Forward procedure: let α t (j)a ji (10) Backward procedure: let Update: we can now calculate the temporary variables: ζ can now be updated: , then where 0 < ε ≤ 1 is a weighting, and Θ is a preselected threshold.
These steps are now repeated iteratively until a desired level of convergence.

Simulation Results
In order to examine the performance of the proposed outlier detection method, we carry out some experiments on medical datasets from the PhysioNet database [? ].The dataset contains 7 attributes: BPmean, systolic BP, diastolic BP, HR, pulse, respiration rate, and SpO2.We only focus on three attributes: BPmean, HR, and SpO2.We use a data sequence of 5000 data, in which the first 4000 data are selected as the training data, and the last 1000 data with injecting faults as diagnosed data.

Performance Without Sliding Window
Since the 4000 training data are not injected faults, so we can estimate the KDE of the k nearest distance for each sensor reading.Given sensor error probabilities, we inject faults into the 1000 diagnosed data with the position and the value of the injected error all selected by random numbers.The simulation results are the average performance of times randomized experiments.
Table I shows the performance of the k nearest distance based outlier detection method.In this simulation, if the probability of the k nearest distance calculated by the estimated KDE is less than a threshold δ = 0.001, then the diagnosed data is decided as an outlier.From Table I, we can see that the k nearest distance based method has a good outlier detection, but the false alarm rate is high, which leads to that the error rate after executing this method is enormous greater than the original error probability.
Table II shows the performance of the nearest distance based outlier detection method.If the nearest distance is greater than 4, then the diagnosed data is determined as an outlier.From Table II, we can see that the number of errors can be reduced by 50% approximately using this method.
Table III shows the performance of the combination of the k nearest distance and the nearest distance based outlier detection method.From this table, we can see the the performance is better than that of using k nearest distance and the nearest distance based method alone.when a new data is obtained, the Mahalanobis distance is calculated between the current arrival data and the mean of training data, then KDE is used to estimate the probability of this distance, if it is less than threshold , then the current arrival data is diagnosed as an outlier.From table IV, we can see that the performance of the proposed method is not better than the method in [9].The reason is that the simulation performance of the outlier detection process depends on the sensor error rate, besides, the range in which injected errors must lie is another factor influences the outlier detection performance, the closer the outlier to the ground true value, the harder it is to be detected.In previous experiments, the range of injected errors is wider than the normal value of vital signs.Fig 5-7 give the comparison of the performance of the proposed method and the method in [9], in which range of injected errors is set the same to the normal value range.From these figure, we can see that the proposed method has a better performance when the faults appear in the range of most normal vital sign data occurs.that, the method with sliding window has a high outlier detection rate.Although its false alarm rate is a little higher, but the error rate executing the outlier detection algorithm is obvious lower.

Conclusion
Outlier detection is very important for BSNs to avoid false medical diagnosis and false alarms.In this paper, we formalize a distance based method for outlier detection of BSNs.This method consider both distance to k nearest neighbors and to the left and right neighbors.To deal with the condition like successive errors, we formalize a sliding widow based method to improve the performance of the outlier detection method.To handle errors in the training data, we introduce a Hidden Markov Model based method to estimate the most probable ground truth values which have the maximum probability to produce the training data.Simulation results show that the proposed method possesses a good performance.

Figure 2 .
Figure 2. k Nearest Distance of Training Data.

Figure 3 .
Figure 3. Probability Density Distribution of The k Nearest Distance.

Given a time series {y 1 ,
• • • , y n } of training sensor readings and a sensor error probability p, we select the first half data O = {y 1 , • • • , y T = n/2 } to estimate the parameters of an HMM λ by improving the Baum-Welch algorithm [? ].For HMM λ and the rest training sensor readings O = {y T +1 , • • • , y n }, we can use the Viterbi algorithm [? ] to find the most likely ground truth vital sign values G of all possible G that can produce O .Given the sensor reading sequence O, the Baum-Welch algorithm finds a local maximum ζ = (Π, A, B) = max ζ P (O|ζ )

Figure 5 .
Figure 5.Comparison of Proposed Method and The Method in [9] On Probability of Errors Corrected.

Figure 6 .
Figure 6.Comparison of Proposed Method and The Method in [9] On Probability of Errors Introduced.

Figure 7 .
Figure 7.Comparison of Proposed Method and The Method in [9] On Probability of Errors after Executing the Detection Algorithm.

Figure 8 .
Figure 8.Comparison of With and Without Sliding Window On Probability of Errors Corrected.

Fig. 8 -
Fig.8-10 show the comparison of the distance based method with and without sliding window.We can see

Figure 9 .
Figure 9.Comparison of With and Without Sliding Window On Probability of Errors Introduced.

Figure 10 .
Figure 10.Comparison of With and Without Sliding Window On Probability of Errors after Executing the Detection Algorithm.

Table 1 .
k Nearest Distance Based Simulation Result (p Indicates Prior Error Probability), OD denotes Outlier Detection, FA denotes False Alarm and EP denotes Error Probability.)

Table 2 .
The Nearest Distance Based Simulation Result (p Indicates Prior Error Probability), OD denotes Outlier Detection, FA denotes False Alarm and EP denotes Error Probability.)

Table 3 .
The Combination Method Simulation Result (p Indicates Prior Error Probability, OD denotes Outlier Detection, FA denotes False Alarm and EP denotes Error Probability.)

Table 4 .
[9] MD In[9]Based Simulation Result (p Indicates Prior Error Probability), OD denotes Outlier Detection, FA denotes False Alarm and EP denotes Error Probability.)Asacomparison,TableIVgives the performance of the Mahalanobis distance (MD) and KDE based approach in[9]on the same dataset.In this method, 5 EAI European Alliance for Innovation EAI Endorsed Transactions on Wireless Spectrum 12 2015 -01 2016 | Volume 2 | Issue 7 | e4