Human Motion Enhancement via Joint Optimization of Kinematic and Anthropometric Constraints

INTRODUCTION: Recently, low cost RGB-D depth sensors have emerged as a promising alternative for motion capture for clinical gait assessment. However, depth sensor based Mocap (D-Mocap) suffers from low accuracy and poor stability for 3D joint estimation due to noise, self-occlusion, interference, and other technical limitations, which prevents it from being widely used in health-related applications. OBJECTIVES: The primary objective of this study is to integrate non-linear Kalman filters (KFs) and evolutionary algorithms to enhance the quality of D-Mocap data by jointly considering kinematic or anthropometric constraints at the joint-level and skeleton-level, respectively. METHODS: We propose a hybrid approach to synergistically integrate the Tobit Kalman filter (TKF) and the Differential Evolutionary (DE) algorithm for human motion enhancement that is referred to as TKFDE. Specifically, the joint-level TKF provides the predictive distribution of each joint that is kinematically admissible in time and probabilistically amenable in space for skeleton-level DE optimization in terms of all bone lengths. Two predictive distributions of the TKF, i.e., the Gaussian and uniform, are tested and compared in terms of their effectiveness in generating the initial DE population. RESULTS: Two sets of motion capture data are used to validate the proposed TKF-DE methods, one simulated and one real-world. The first dataset is from the Carnegie Mellon University Database (CMU) which contains a multitude of various motions and is simplified to a 21-joint skeleton and corrupted with additive white Gaussian noise (AWGN). The second dataset was collected at two labs at Oklahoma State University (OSU). The Orbbec depth sensor and the Nuitrack SDK were used for D-Mocap data acquisition along with an optical Mocap system that was time-synchronized and skeleton-matched with the D-Mocap system as a reference for evaluation. The results confirm that the proposed TKF-DE algorithms significantly outperform other nonlinear KFs, including the extended KF (EKF), the unscented KF (UKF), and the TKF alone, with improved accuracy and stability of the estimation of joint positions, bone lengths, and joint angles. It is also shown that the Gaussian-based predictive distribution is better than the uniform one, further validating the efficacy and synergy of the two key components in the TKF-DE algorithm. CONCLUSION: Our research synergistically integrates the TKF and DE algorithms in one framework referred to as the TKF-DE, that takes advantage of kinematic and anthropometric constraints for human motion enhancement. The experimental results on two D-Mocap datasets show that the proposed TKF-DE method significantly improves the quality of D-Mocap data in terms of joint positions, bone lengths, and joint angles. This study takes one step closer to bringing the D-Mocap technology to possible health-related applications. Received on 18 February 2021; accepted on 03 April 2021; published on 09 April 2021


Introduction
Since the last decade, low-cost RGB-D or depth sensors have become popular and prevalent in many computer vision applications. There are numerous works in the fields of surveillance and entertainment where depth sensors are used for human motion capture (Mocap). Particularly, in medicine and health care, depth sensors have been used for gait analysis as an assessment, diagnostic, or even as a predictive tool [1][2][3][4][5]. Traditionally kinematic gait measurements are obtained by optical Mocap systems, which are considered the gold standard but are impractical in a clinical setting. There is a practical need for lowcost and user-friendly Mocap tools [6]. Recent progress has been made towards the development of markerless Mocap systems using an Inertial Measurement Unit (IMU) which integrates an accelerometer and a gyroscope to measure the specific force and angular velocity of the human body segments. IMU sensors have gained popularity in rehabilitation and other motion tracking applications [6,7]. On the other hand, as a more affordable and practical alternative, depth-based Mocap (D-Mocap) has some advantages for markerless motion capture in a normal setting [5,8]. However, D-Mocap suffers from low accuracy due to several technical limitations: the limited imaging range [9], self-occlusions during motion [10], and possible interference [11].
The intent of D-Mocap refinement is to improve the accuracy of the human motion data so that it can be used in a broad collection of applications. This allows for the use of depth sensors, a much more convenient and affordable technology than optical Mocap. There are several computational approaches to the enhancement of D-Mocap. Kalman filters (KF) and particle filters (PF) use a recursive algorithm and use a kinematic model for human joint motion to smooth data and remove irregularities [12]. However, human motion is a nonlinear system especially in the case of D-Mocap which suffers from distortion and outliers. Researchers have explored the use of the extended KF (EKF) [13] and the unscented KF (UKF) [14] in order to compensate for the nonlinear nature of D-Mocap data. However, these nonlinear methods are still limited to handle the non-Gaussian and biased noise inherent in the D-Mocap data. The Tobit KF (TKF) that involves an observation model for censored data [15] was applied to D-Mocap with noticeable improvement in [16]. In addition, learning-based methods like evolutionary algorithms (EA) can be used to search for the candidate with the best fitness in an iterative process. Some examples of learning methods in D-Mocap enhancement are, Differential Evolution (DE) used in [17] and Genetic Algorithms (GA) used in [18], and in the case of deep learning-based methods, the improvement of D-Mocap data through a learned motion manifold in [19]. There have been a few attempt to integrate kinematic and anthropometric constraints with the purpose of improving joint trajectories over time and the skeleton structure in space [17,18,20].
Our work aims to develop a new hybrid method that integrates TKF-based filtering and DE-based learning into a computational flow, namely TKF-DE. Specifically, our approach provides a harmonious way to incorporate the kinematic and anthropometric constraints for D-Mocap enhancement. The TKF takes advantage of a kinematic model in the dynamic system and it provides a probabilistic predictive distribution for population initialization in the DE. Then the DE is applied to reduce the bone-length variation in space and makes bone lengths anthropometrically consistent. To demonstrate the validity of our approaches, several nonlinear Kalman filtering methods (EKF, UKF and TKF) are applied to the same D-Mocap data for quantitative evaluation and comparisons. In order to have a more comprehensive comparison, we implement a method which combines TKF and DE directly, which we refer to TKF+DE, which is similar in spirit to the weighted average method used in [18]. In order to illustrate the robustness of our proposed method, two datasets were used in our experimentation. The first being simulated simulated D-Mocap data created from the Carnegie Mellon University Mocap data [21] and second, real-world D-Mocap data collected by two laboratories at Oklahoma State University (OSU) which were collected side-by-side with an optical Mocap system to serve as the ground-truth reference.

Related Work
We briefly review the related work on improving D-Mocap data from three methodological perspectives, filter-based, learning-based and hybrid approaches, followed by a short summary about how the proposed approach is different from previous ones.
In filter-based approaches, researchers provide a Bspline wavelet-based filtering method in [22] to remove noise from noisy motion data as well as improve the smoothness of the motion. With regard to other approaches, the KF is widely used in human motion capture and analysis for denoising [12]. To achieve higher accuracy in the D-Mocap data, in [23] five Kinect sensors are established around the tracking target, that cover 180-degree range of its perimeter, forming an average angle of approximately 45 degrees between two adjacent Kinect sensors. In addition, non-linear KFs provide another idea to solve the low-accuracy problems in D-Mocap. In [13], Researchers use the sound localization capabilities of the Kinect sensor and 2 Le Zhou et al.

EAI Endorsed Transactions on Bioengineering and Bioinformatics
Online First joints extracted from the SDK to obtain two sets of joint data of the subject's head joint respectively, where an EKF is applied for smoothing and correcting the noise data. Furthermore, the application of the TKF is explored in the human motion enhancement, in [16], where a the TKF was applied to reduce the noise and find the censored data which is caused by occlusions. Compared to traditional KFs, the TKF can achieve higher accuracy.
Learning-based approaches have also shown promise in human motion enhancement of D-Mocap data. For instance, a deep recurrent neural network was used in [24] trained on D-Mocap data obtained from the Microsoft Kinect skeleton tracker. Different networks are used to refine the velocity and position separately and get two sets of results. This paper explores three methods of integrating the results of the two networks to improve the joint positions of D-Mocap data. An autoencoder-based deep learning framework was proposed in [19] and [25] for human motion enhancement, synthesis, and style transfer. The key component is the motion manifold represented in a latent space learned from the high-quality and diverse CMU Mocap dataset [21] which can be used to improve the quality of D-Mocap by projecting the erroneous or erratic data onto the latent space. In [26], researchers used a perceptual-based 3D skeleton motion data refinement network (BRA-P) to improve the refined motion data and further suppresses the bone-length variation as well as smooth the joint trajectories.
Hybrid approaches combine both learning-based and filtering-based ideas. In [24], researchers used the KF as one method to integrate the results of two separate neural networks. The recurrent neural networks (RNN) were trained separately on joint positions and joint velocities, and the results were used as the measurement model and process model for a Kalman filter. In [17], a multiple objective GA algorithm is embedded into PF to support importance sampling with the aim to improve accuracy of both bone-lengths as well as joint position. A constraint KF method was developed in [18] which combine DE and the KF to refine the bone-lengths fluctuations with an attempt to improve the smoothness of joint trajectories. In addition, in [20], traditional filtering is used to create a target for optimization over the latent space of a convolutional autoencoder. The authors use the KF and deep learning to recover corrupted Mocap or low quality D-Mocap data.
Inspired by previous studies, our approach is unique in several ways. First, we incorporate the probabilistic prediction of the TKF and the iterative correction of DE jointly in a synergistic way in our proposed method. Specifically, two different predictive distributions based on the TKF outcome are tested to show their effectiveness and robustness. Furthermore, various approaches are employed for comparison to show our proposed method outperforms some filtering methods and hybrids methods mentioned in this paper by taking advantage of kinematic and anthropometric constraints from two techniques. Second, we explore the process for quantifying estimation of the D-Mocap data with regard to joint trajectories and bone-lengths. To do so, we collected both optical Mocap and D-Mocap data simultaneously. With the implementation of some software tools to overcome the mismatch problem between Mocap marker and D-Mocap joints, the comparison of quantitative results becomes more accurate and reliable. Third, we investigate the several lower body joint angles of both legs including hip angles and knee angles which are critical in gait assessment for rehabilitation [27,28]. We are able to improve the error of joint angles from 8 • ∼ 10 • to 4 • ∼ 5 • . Our long term goal is to achieve accuracy comparable to that of the IMU technology, the gold standard in clinical applications, where the angle errors range from 1.4 • ∼ 4.3 • in rehabilitation applications [6].

Kalman Filters for human kinematics
The Kalman filter (KF) is one of the most popular algorithms applied in human motion capture applications, such as human motion synthesis [29] and real-time human motion tracking. The system modeling involves a dynamical model and a Gaussian noise model for denoising where the observation value of the current state is corrected by prediction of previous state depend on a recursive formula [30]. In KF, the state transition model F will predict the state value at each time step k based on the previous instant. The state transition model will be contaminated by the process noise w k , described by: where k is the frame index, and w k is the process noise which is assumed to be a multivariate normal distribution with zero mean and covariance Q X . Since the filter is applied to all joint positions separately, the joint index (j) is omitted in this section. Therefore, the state of the Kalman filter applied in the human kinematics at the frame k is given by: k represent the position of the joint in Cartesian coordinate system, and v are corresponding velocities at the same frame for each axis. From (1), the updating of the state is dependent on the state-transition model F. In the human motion system, this evolution is represented by the physics equation of motion considering each time step to be ∆t = 1/f s, where f s is the frame rate of the depth sensor. We assume the speed of a joint remains constant along the direction of the joint's movement, thus the constant velocity model is used: The measurement Y k in the KF model is defined as: which is generated from the state X k represented by: where u k is observation noise with zero mean and covariance Q Y . H is observation model which is defined by: The statistical property of the KF is based on a linear system. However, most of the systems in human motion capture are nonlinear. Therefore, non-linear KFs such as the EKF and UKF are employed to achieve a higher accuracy of D-Mocap data. The EKF uses Taylor decomposition to linearize nonlinear system [23]. However, EKF only provides a sub-optimal solution, because if the linearization is not accurate, its performance will be undermined. UKF estimates accurate state values by using unscented transformation (UT) [31] [32]. Theoretically, the performance of UKF is better than EKF, and the accuracy of UKF can reach the second order or higher of EKF [33]. Gaussiandistributed samples are selected to estimate the true state in the UKF [34]. However, the real-world D-Mocap data are more anomalistic with biased measurements or censoring data and non-Gaussian noise. In addition, the problem of censored measurements may be caused by occlusion or device limitation. It is difficult to solve these problems with the EKF and UKF. In order to overcome these problems, we employ the TKF to build a novel observation model for human kinematics.

Tobit Kalman Filter (TKF)
The TKF is a novel non-linear KF which integrates the property of the Tobit model in censored data estimation with the Kalman filter to provide an anticipation for actual state value of a dynamic system in which parameters are partially known [15,35]. The Tobit model is a truncated model which is widely used in economics because it can compensate for the impact of personal privacy or government secrecy policies on economic statistics [36,37]. The Tobit model employed in the TKF which is applied to the D-Mocap data is depicted in Fig. 1.

Figure 1. Tobit model
Based on the left censoring and right censoring of the Tobit model [34], two thresholds are defined in the TKF: the upper threshold T low for left censoring and the lower threshold T high for right censoring. For the left censoring, cases with a value no less than the upper threshold are censored, the censoring will take place above [38]. Likewise, for the situation of right censoring, the values that no more than the lower threshold are censored, the censoring will take place below. In the real world D-Mocap data, joint positions might be missing due to device limitations or occlusion. Therefore, a portion of parameters might hide in latent variables in which the situation meets the assumption of the TKF. The Tobit model can help to compensate for the impact of censoring joint data caused by occlusion or device limitation while the dynamic model providing a prediction for correcting noisy data. To apply TKF in the D-Mocap system, the upper and lower thresholds are selected along each axis of the current state depending on the previous state value and absolute values of maximum velocities in a small sequence (e.i. 50 frames) which is represented by |v x max |, |v y max |, and |v z max |. Therefore, thresholds for the k th joint position are based on |v max |∆t and the (k − 1) th step joint position in a small time window with a center at the (k − 1) th step. For x-axis, thresholds are expressed by: where T L (x) and T H (x) are the lower and upper thresholds for x-axis at frame k, respectively. The same logic is applicable to other two axes (y and z). Hence, 4

EAI Endorsed Transactions on Bioengineering and Bioinformatics
Online First the measurement model for x-axis is represented by: k is a latent variable for the x-axis [15]. ρ (y) and ρ (z) are defined in a similar way. Therefore, the 3D higher threshold vector is represented by: This logic is equally applicable to the lower threshold vector TL k and 3D latent vector k ] T . After being divided by two thresholds, there will be three parts in the measurement data: a portion censored equal to or above the upper threshold, a portion censored equal to or below the lower threshold, and an uncensored portion. And the expectation of the measurements is used as the observation value, which is expressed as: (11) where is the Hadamard product, p (uc) k represents the probability vector of uncensored measurement which is defined as: where t , which represent the probability vectors of three coordinates being censored from above and below, respectively. The values of probabilities above are estimated based on the difference between latent measured variables in (9) and thresholds in (10), which is detailed in [15]. The Tobit Kalman gain is defined by: where R XY k is the cross-covariance between the observation and the state, and R Y Y k is the variance of the observation. For the details about R XY k and R Y Y k , refer to [15]. Thus, the complete TKF process applied in human kinematics is represented by the following equations: To make it easier to relate the results in context, we readd the joint index to the final output of the TKF. Thus, the resultant joint positions which make up the filtered result in the k th frame is given by: where J is the number of joints. The TKF differs from the traditional non-linear KFs like EKF and UKF in that it not onlysmooths the joint trajectories, but also utilizes a regression model for estimation of censoring measurement which makes state updating more reliable in the D-Mocap system. By selecting thresholds for censoring above and below, the measurements are divided into three parts. It helps the TKF to achieve an accurate estimation of the real state value while the measurement is replaced by the expectation. Using expected value can compensate for censored or biased data which makes the estimation close to the latent variables so that TKF yields a better result in human motion capture. Although the joint positions have been improved by denoising of nonlinear-KFs, the smoothness of the trajectory may cause changes in the skeleton structure. In order to limit the length of the bones in motion, we employed a differential evolution algorithm.

Differential Evolutionary (DE) Algorithm
Differential evolution (DE) is a population-based metaheuristic search algorithm that optimizes a problem by iteratively driving a candidate solution toward better fitness regarding a given quality metric. It is executed by keeping a population of candidate solutions and creating new candidate solutions by mutation and crossover operations between exciting candidates, where the candidate solution with the best fitness will be kept. To reduce the bone length variation, a DE algorithm is employed to optimize the joint position   5 EAI Endorsed Transactions on Bioengineering and Bioinformatics Online First m th bone, its bone-length can be defined by: where m i and m j are two joint indices, m i , m j ∈ [1, J].
For the reason that DE is used for optimizing joint positions in each frame, the frame index (k) is omitted in this section. Joints positions are sampled in Cartesian coordinates (Fig. 2) to form a skeleton candidate, and multidimensional DE is used to maintain a high efficiency of the optimization method. Therefore, the joints are integrated into the skeleton candidate which is represented by: where n is the index of the initial population. The search space of each parameters in the population is defined by a lower limit and an upper limit. The value of a parameter in the initial population is usually selected uniformly and randomly in the interval between the limit above and the limit below. For example, given that C n,min and C n,max represent the limit below and limit above of each parameter, the generation of other candidates in the initial population is represented by: where τ is a number randomly selected from [0, 1]. Three parameters can be tuned in the mutation and crossover steps. First, the scaling factor F is used in the mutation step to generate a new skeleton candidate: where n, n 1 , n 2 , and n 3 are four positive integers randomly selected from numbers not larger than the population size, and the values of n 1 , n 2 , and n 3 are different. C (p) and C (c) are the skeleton candidates for parent and offspring respectively. The scale factor F ∈ (0, 2] is used to control the generation of the new population in the mutation step. The other two parameters, mutation rate, and crossover rate prevent the algorithm from stalling in a local optimum. In order to constrain bone lengths in D-Mocap data, we reduce deviation from corresponding reference bone lengths, which can be expressed as:: where L m is the bone length measured from the m th bone, and L m is the reference bone length obtained from an initial application of the TKF, and M is the total number of bones. Normalization is applied to the objective function to improve convergence speed and accuracy. Consequently, the objective function is represented by: where FV represents the fitness value. Thus the fitness function is defined as: where FV (p) and FV (c) represent the fitness value of the parent and child generations, respectively [18]. The iteration of the DE will stop once ∆FV achieves a value no larger than η, where η is a very small value.

Integration of TKF and DE
To further improve the accuracy of D-Mocap data and reduce bone length variation, a hybrid method is proposed which integrates TKF and DE. We employ two different methods to integrate the TKF and DE based on initialization: TKF U −DE and TKF G −DE. In the following sections we refer to these two methods collectively as TKF−DE. In the TKF−DE, the results of the TKF, including output states and respective covariance matrices are used to generate an initial population in DE. Furthermore, the constraint of the search space in mutation and crossover also depends on the initial TKF results. The algorithm flow of TKF−DE is depicted in Fig. 3 (b). For comparison, we also employed a method called TKF+DE that directly combines the TKF and DE outputs, similar to the one in [18]. The flow chart of TKF+DE is depicted in Fig. 3 (a).

TKF+DE.
The TKF and DE are operated separately in this method, and the final output is a weighted sum of the results of the TKF and DE, as shown in Fig. 3(a). The upper bound and lower bound of the search space are defined by the standard deviation (STD) of the joint trajectories and the velocity of each joint, which is detailed in [18]. Candidates in the initial population are randomly generated from the original D-Mocap confined by the upper and lower bounds defined above [18]. This is done so that the initial population can effectively span all the parameters of the algorithm. After optimization of mutation and crossover, the candidate with minimum bone length fluctuation is selected as x (DE) k at the k th frame. The combination of TKF and DE in TKF+DE is represented by: where x

(T KF) k
is the filtering result of the TKF at k th is the result of DE, ζ is the weight value, and the value of ζ depends on the speed of the joints. When the speed is small, the value of ζ will be larger, otherwise, it will be small [18]. 6 EAI Endorsed Transactions on Bioengineering and Bioinformatics Online First TKF G −DE. In this method, the TKF and DE are integrated sequentially and probabilistically, as shown in Fig. 3(b). To improve the quality of initialization and to refine the search space, the initial population of DE is generated from the Gaussian-based probabilistic distribution of the TKF output, which is represented by the estimated joint position and its corresponding covariance matrix. This will allow the initial population to take advantage of both the mean and uncertainty from the TKF filtering result. The initial population is created joint-by-joint. For the j th joint, a set of random 3D vectors, X j , is sampled from a 3D Gaussian distribution N 3 (µ, Σ) as: where P j is the covariance matrix of the j th joint obtained from (17). Likewise, χ j is the estimated joint position of the j th joint obtained from TKF (16) where the frame index k is omitted for simplicity. X j has the same scale as the initial population and is reshaped into row vectors that reflect the original population in a single joint. This process is repeated for each joint one-by-one, after which the initial populations of the whole skeleton are accumulated to be used in the DE (Fig. 4), where the front layer denotes the initial population generated by the TKF output for all joints in the first frame. Then mutation, crossover, and selection are iteratively operated frame-by-frame to drive the population toward better candidates in the search space.
For comparison, we also introduce an alternative TKF-DE integration method called TKF U −DE that involves a uniform distribution to create the initial population for the 3D position of each joint. Sharing the same algorithm flow as TKF G −DE Fig. 3(b), TKF U −DE creates the initial population for each joint as: where τ is a number randomly selected from [0, 1]. p  , y and z) . In the interest of simplicity, the joint (j th ) index is left out in this section. Mutation, crossover, and selection are executed after initialization, and the process repeats itself until the candidate with the best fitness value is found. 7 EAI Endorsed Transactions on Bioengineering and Bioinformatics Online First

Experiment Results
In order to test our methods, we used two datasets. The first dataset is from the Carnegie Mellon University Database (CMU) [21] which contains a multitude of different motions such as jumping and shaking, shaking hands. The CMU data is simplified to a 21 joint skeleton and corrupted with additive Gaussian white noise (AGWN). The AGWN for this corruption is 0 mean with a standard deviation of 7 cm, and along with the original CMU data forms our set of simulated data. An example of motion in the CMU simulated dataset is presented in Fig. 5 Fig. 6, elongated stepping motion depicted in Fig. 7, and the motion of stepping in place shown in Fig. 8. In addition, an optical Mocap system was time-synchronized with the D-Mocap system as a reference for evaluation.    OptiTrack marker-based Mocap system is used in the data acquisition where the markersets for the lower body is depicted in Fig. 9 (a). The Mocap system and D-Mocap system are excuted simultaneously, where Mocap markers adhere to the certain positions of the lower half of the body, and the trajectories of these markers are recorded by the Mocap system. Then the recorded Mocap data is imported into OpenSim where the Mocap markers from a single frame are depicted as magenta dots attached to the lower body in Fig. 9 (b). The reference joint data is generated by forming a rigid body from select markers and calculating the rigid body's centroid. The generated reference joints in a single frame are depicted as red pots in Fig. 9 (c). We see that the generated reference joints more accurately correspond with the skeleton model of the Nuitrack SDK ( Fig. 9 (d)). In order to validate the accuracy of the generated reference data, we calculated the bone lengths from reference joints generated by our method, and the bone length obtained from markers defined in Fig. 9(a). Here, LASIS and RASIS are used as the left and right hips, markers LTT and RTT are used as the left and right knees, and markers LLM and RLM are used as the left and right ankles. The STD of the bone length obtained from the Mocap markers is 10.34cm while the result of the reference data is 1.02cm. The resultant bone lengths are depicted in Fig. 10 which shows the consistency and robustness of the reference bone lengths for future algorithm evaluation.

Quantitative Evaluation
We present results comparing the performance of the different approaches applied to both simulated data and real-world D-Mocap data in this section. The accuracy assessment of the joint positions for all 8 EAI Endorsed Transactions on Bioengineering and Bioinformatics Online First Figure 9. The process of reference data generation of OSU data Figure 10. Comparison between the bone length obtained from markers and generated reference Figure 11. The RMSE of joint positions of CMU simulated data. 9 EAI Endorsed Transactions on Bioengineering and Bioinformatics Online First different methods is performed using the root mean square error (RMSE) metric which is represented as: where X j,k andX j,k represent the j th joint positions of the D-Mocap data and reference data at the k th frame where J represents the number of joints and K is re-defined as the number of frames. The overall performance of bone-lengths and joint angles is evaluated by using the same metric. Figs Fig. 15. The hip extension angle and hip flexion angle are calculated together for simplicity, where the result is referred to as LHF/RHF in Table 3 and Fig. 16. The hip abduction angle and    [6]), D-Mocap does offer advantages in low-cost and efficient motion capture. In addition, there is still much potential to be further enhanced with advanced deep learning approaches [26].

Evaluation of Nonlinear KFs
From all results above, we confirm that the EKF and UKF significantly improve the accuracy of joint positions and joint angles for both simulated and real-world motion data. They also demonstrate moderate improvements in the accuracy of the bone length estimation. Moreover, we note that the TKF significantly outperforms the EKF and UKF by a considerate margin in all three metrics, showing the usefulness and robustness of the Tobit model for the censored measurements in the D-Mocap data. to the accuracy of joint positions and angles with similar performance on bone length estimation. This is because the Gaussian predictive distribution from the TKF is more informative and useful than the uniform to initialize the DE population. More importantly, as shown in Tables 1, 2, and 3, all three aspects of D-Mocap have been significantly improved by TKF G −DE T KF and TKF G −DE ref over the TKF only. It is also understandable that a more accurate bone length constraint leads to better overall performance.

Conclusion
The relevant nature of our work lies in the fact that Mocap systems, although highly accurate, are expensive and inconvenient to use. If the accuracy of D-Mocap can be improved, depth sensor systems could provide an economical and convenient alternative to Mocap. We have introduced a novel hybrid approach to D-Mocap refinement synergistically combining the TKF and DE algorithms. Through the use of the TKF, we capitalize on kinematic prediction and a model for censored data. TKF joint trajectories are used to initialize DE, which optimizes this data to maintain accurate and consistent anthropometric measurements. This combined approach proves harmonious and improves upon other traditional approaches. To validate our work we utilized two sets of data, one set of realworld D-Mocap, and one set of Mocap data corrupted with AGWN. Using the metrics of joint position, bone length, and joint angle we find that our methods show more improvement than the TKF alone, as well as the EKF and UKF. Our future work will involve deep learning approaches to further enhance the accuracy and sensitivity of D-Mocap data for clinical and rehabilitation applications.