Wavelet and kernel dimensional reduction on arrhythmia classification of ECG signals

Electrocardiogram (ECG) monitoring is continuously required to detect cardiac ailments. At times it is challenging to interpret the differences in the PQRS-T curve. The proposed approach aims to show the excellence of kernel capabilities of Kernel Principal Component Analysis (KPCA) and Kernel Independent Component Analysis (KICA) in the wavelet domain. In this work, experiments are performed using five different categories of cardiac beats. The supervised classifiers like feed-forward neural network (FNN), backpropagation neural network (BPNN), and K nearest neighbor (KNN) statistically evaluates the impact of discrete wavelet with KPCA and KICA on extracted beats. The performance evaluation also compares the outcomes with existing techniques. The obtained results justify the supremacy of the combination of wavelet, kernel, and KNN approach, yielding a 99.7 % classification success rate. The five-fold crossvalidation scheme is used for measuring the efficacy of classifiers.


Introduction
The intervention of computer-aided techniques in health care has given significant diagnostic support to resolve medical issues. The cardiac arrhythmias and cancer are the two major concerns prevailing in developing countries. About 31 % of deaths in the whole world are due to heart diseases [1]. The constant rise in cardiovascular diseases (CVD) has enforced ECG analysis to be a paramount task nowadays. The unhealthy diet, pollutants, and sedentary style of living lead to the global cause for CVD equally in the young and old generation. The anticipation of arrhythmia can help to a great extent to take precautions and prevent suffering [2]. The dysrhythmia condition of the heart instantly needs the measurement of the ECG morphology and its parameters.
Abnormality detection [19] in ECG in the wavelet domain holds promising results leading to high achievements in accuracy and time. The multi-resolution wavelet analysis has played a vital role in biomedical applications such as cardiology, Fetal Echocardiography, and neurology. The foremost important step is to recover the original ECG signal from the human body. This step is attained through an ECG enhancement process, which is also known as a pre-processing [3]. Also, the kernel filtering concept has shown best-suited results related to noise removal preserving low-frequency information for ECG diagnosis [4]. The other effective technique used is the Empirical Wavelet Transform (EWT) [5].
The 'QRS complex' is the fundamental part of ECG as the maximum information or energy is confined in it, and its accurate segmentation and detection are must [6], [7]. Using two databases PTB and MIT-BIH, the maximum amplitude R-peak and ECG features have been extracted [8]. An intelligent approach revealing different and sample entropy from ECG modes successfully uses irregularity detection with the help of Support Vector Machine (RBF) [9].
ECG wave feature extraction and arrhythmia classification have become a substantial interest of researchers seeing its urgency, requirement, and future scope. An impressive survey presents featuring deep learning networks like convolutional, deep belief, recurrent [10], [11], and multi-perspective convolutional neural network [12] for biological datasets that is followed by the successful validation of probabilistic NN with PCA and LDA to accomplish eight beat classification improving accuracy rates [13]. Recently, various heart condition with different parameters are employed using kernel Extreme Learning Machine (KELM) [14].
DWT provides a high time-frequency resolution as well as separates the signal discontinuities. This attribute helps in the critical feature extraction of cardiac parameters. But the selection of these parameters can lead to loss of important information related to the electrical activity of the human heart. So, maximum data point samples are required. This extraction increases the feature dataset, and to reduce the size of the dataset dimensionally, some algorithms like KPCA and KICA are needed, which best suits different classifiers. The literature analysis shows that KPCA uses nonlinear Gaussian processes, whereas KICA uses nonlinear non-Gaussian processes [15]. This fact is advantageous for KPCA over KICA as former can deal with uncertainty in the unknown process by averaging not minimization. It's gaussian process can learn the kernel parameters automatically from data without cross-validation. Gaussian nature helps in incorporating learning with automatic feature selection.
In the proposed scheme, enhancement of ECG beat is done by denoising using multiresolution characteristics of the wavelet, and feature extraction are performed through signal processing and wavelet transform. Then KPCA and KICA treat the extracted features for nonlinearity and dimensionality reduction. The effectiveness of the current method accomplishes using neural networks (BPNN, FNN) and KNN classifiers, along with a comparison with recent existing techniques under a similar environment. In our past work [19], two-class classification, i.e., detection of normal and abnormal ECG, is successfully achieved by using different orthogonal wavelet families and neural classifiers. Training and testing performed using random data partitioning. And in this current method, multi-class classification is achieved by applying the Kernel approach of KPCA and KICA with the multiresolution analysis in the discrete wavelet domain. The training and testing part implement a five-fold cross-validation. The prime focus of the proposed effort is summarized as:  The potential of DWT multiresolution analysis is applied to gain maximum information containing data points from the individual cardiac beat. This step ensures that no alternative remains for misclassification.
 The two kernel tricks 'KPCA' and 'KICA' are successfully implemented, showing robust feature extraction and dimensionality reduction of non-linear signals like ECG. Both can explore and address higher-order statistics of the original data and can analyze the non-linear relationship in the input feature space in a linear way, making computation easy. And there is no need for calculation and estimation of input feature vector explicitly in defined feature space.  Experimental outcomes of supervised classifiers using five-fold cross-validation data partitioning scheme have given the desired accuracy for the multi-classification of ECG abnormalities. The other sections of the paper structured as Materials and methods, including KPCA and KICA, based ECG classification method using DWT is proposed in Sect. 2, ECG arrhythmia detection results are illustrated, compared, and discussed in Sec. 3 and Sec. 4, respectively, and finally, the proposed research concludes with future directions in Sec. 5.

ECG Data Set Acquisitions
In this current approach, the ECG arrhythmia database referred from laboratories at MIT-BIH (Beth Israel Hospital (BIH) at Massachusetts Institute) is utilized. The ECG data comprises of the whole record or the required beats. It depends on the objective of the research. The database contains 48 ECG records, and each record has specified beats. The sampling of the analog ECG signal is at 360 Hz. For this study, 6000 ECG beats used to classify them into normal (1200) and four types of abnormal (4800) ECG beats. Table 1 displays the material used in this work. Figure 2. (b) displays the five types of cardiac beats extracted in the proposed study.
A brief characterization of the used ECG heartbeats is described below [17] The proposed scheme for five categories of ECG heartbeat analyses is presented in Fig.1 and is exemplified as: 1. Input MIT-BIH ECG record in .dat format. 2. Compute zero mean and unit one standard deviation to reduce amplitude variation. And this process is called normalization. 3. Filter noise of normalized ECG signal by using the multiresolution characteristics of DWT. Perform decomposition of DWT up to nine levels using the Daubechies db6 wavelet family. 4. Execute the R peak detection algorithm (explained in section 2.3) on the decomposed and denoised signal. 5. One cardiac cycle is composed using 130 samples (data points) over the R-peak i.e., create a window block using 70 samples from left and 59 data points from the right of 'R'-peak. This choice of 130 window length is having maximum information regarding ECG beat. 6. 6000 ECG spikes (beats) are considered from the arrhythmia dataset. Reduce the feature dataset (6000 * 130) using dimension reduction techniques (KPCA and KICA) to achieve twelve prime significant features of each beat. 7. Perform supervised classification with classifiers like KNN, FNN, and BPNN using a compact ECG feature dataset (6000 * 12). The detail proposed workflow of the respective subblocks is described in the subsequent sections.

ECG Data Processing and implemented Feature Extraction Techniques
Discrete Wavelet Transform (DWT) for ECG noise filtering and 'R'-peak detection Discrete WT (Wavelet Transform) has become a robust technique for biomedical signals which are non-linear and non-stationary. Discrete wavelet analysis creates an edge over other methods by showing its potential in varying window length, availability of a wide range of frequencies for broad and narrow spectrum, and compactness. The significant applications of DWT are depicted in noise removal, feature extraction, segmentation and pattern classification in data mining, biomedical signal processing, image enhancement, etc.
DWT works by decomposing the signal progressively into two components known as approximation coefficient 'H (n),' i.e., low frequency and detail coefficient 'G (n),' i.e., high frequency as shown in Fig. 1 (b). For ECG arrhythmia, findings and analysis, shape, and the morphological attributes of cardiac beats are the key factors. They play a major role in ECG data analysis. Therefore, the selection of wavelet transform basis function is a vital part of denoising and feature extraction. The Daubechies (db6) WT function has similarities related to the ECG signal, and most of the information is contained at low frequencies [18], [19]. In the present study, the db6 wavelet function has been deployed for noise removal and feature extraction.
The wavelet function, where 'u' represents the dilation factor and '' is the fixed dilation parametric value which is supposed to be greater than 1. Similarly, 'v' represents the location factor, and 'y' is the position parameter, which is to be greater than 0.
(1) becomes ) ( Now the DWT expression is represented as, is the db6 wavelet coefficient with A as approximate and D as a detail component. The primary objective is to decompose an ECG signal and then reconstruct it back, eliminating some unwanted frequency components which cause disturbances. This is done by down sampling and up sampling, respectively. ' ' J represents the desired filter dimension(length). These two filters make quadrature mirror a couple filters. The subsampling with DWT filtering can be evaluated by, 's (n)' is the original signal (ECG). Its decimation is done twice. The system becomes recursive for low pass filter ) (n H till the desired frequency level is reached, as shown in part b of figure 1. This process is for getting the detail and the approximate coefficients of required WT [18]. For noise filtering of the ECG signal 's(n),' the Daubechies wavelet (db6) has been employed. The sampling frequency, as mentioned in the arrhythmia database, is 360 Hz. The contamination of baseline wander noise is low frequency (0-0.5Hz), and that of power line interference (50Hz or 60 Hz) is high frequency. Therefore, the decomposition of wavelet depends on the frequency content to be eliminated from the original signal. In present work, the input ECG signal disintegrates up to 9 levels. This implementation results in nine detail and nine approximate coefficients of db6. While reconstructing the noise free ECG signal, the 9 th level (0-0.351) approximates (A9) and first (90-180 Hz) and second (45-90Hz) level detail coefficients are not involved.
The 'R' peak detection of the QRS complex is done by decomposition of Daubechies (db6) wavelet up to four levels. The second level approximation coefficient (A2) is selected, and 0.60 of its maximum value is set as the threshold 'Th'. In the ECG sample, the values which are ten samples apart (experimental observations) and higher than set 'Th' are taken under consideration. These selected values are the R peaks in the A2 component. Next, the mapping is done of detected R positions to the original signal. A window of ±20 samples is created to search R peaks in denoised and down sampled ECG signal. R-peaks detected are stored in an array having an amplitude in Ramp array and positions in Rloc array. In a previous work [19], this algorithm is successfully implemented using the db4 wavelet. The R peak detection is followed by a single beat finding of mentioned five categories of cardiac beats. As the maximum of the energy of the signal (ECG) lies within the QRS complex i.e., between 3 Hz and 40 Hz. So, to utilize this information, the single beat is confined to 130 samples including, R peak, 30 data points before it, and 99 data points after it. The resulted ECG dataset comprises 6000 ECG beats (1200 each) having 130 dimensional features of a Kernel Principal Component Analysis (KPCA) Vapnik theory states that any classification done in higher dimensional space than input space results in more exceptional performance [20]. Both PCA and KPCA work in higher dimensions. PCA does the linear mapping to convert the large set of correlated data to compact size non-correlated data. The uncorrelated principal components help in pattern classification. However, a feature set like the ECG dataset is a non-linear and nonstructured; it needs a non-linear transformation technique. Kernel PCA is a PCA with a kernel trick for the nonlinear reduction in dimensions [21]. Figure 3. presents the mapping to the kernel domain. KPCA is an extension of PCA by choosing a specific kernel in place of inner products. This implicitly projects the data to high dimensional space where all related operations are done. ECG dataset gets transformed from 'L' dimension to 'H' dimension feature space, where 'H' is much higher than 'L' ('H' >> 'L'). The steps related to KPCA are described below: Step Combining and using matrix form, we get Step 3. Calculate kernel principal components Kernel principal components can be computed using Step where 1n represents a matrix (n x n) having entire factors as 1/n.
The Gaussian kernel has been employed in the present paper, which is, taking 'σ' as a parameter. KPCA is enforced on each arrhythmia, and the twelve kernel principal components have been chosen for use in feature dataset that comprises of 1200 * 12 dimensions. The non-linear kernel methods do not involve any nonlinear optimization technique. This advantage helps to simplify the mathematical operation. Kernel ICA implements ICA in the kernel activated vector space. And also, KICA is known for its non-linear extension with whitened KPCA and ICA [22]. There are several ways to obtain independent components. In this paper, the approach used is of the maximization of the Kurtosis ICA algorithm means maximization of non-Gaussianity.
The signal is transformed from low dimensions to high dimensions and within this new dimensional kernel Hilbert space, searching for the minimum value of kernel contrast function. The steps related to KICA are provided below: The training data is represented as x tr ECG  In this non-linear mapping, Mercer's kernel plays its role in the calculation of the dot product without specifically knowing the nonlinear operations as, Step1. Centering task of ECG F Obtain data in new mapping, Step2. Whitening task of ECG F For whitening The objective is to find Q (transformation matrix) where the covariance matrix Step3. ' where W ECG * is orthogonal transformation and Q matrix is the result of kernel operation.
In this present work, a non-linear approach of Gaussian kernel (RBF) which is non-parametric is being used.

ECG Classification Techniques
K-Nearest neighbor (KNN) classifier KNN acts as a supervised instant training and learning capability, which classifies new testing samples on the similarity basis. The applications of KNN are in computer vision, medical diagnosis, pattern recognition, handwriting matching, and image processing, etc. KNN gives a compelling performance by minimizing the misclassification error for large datasets. It has shown added advantage over other classifiers like NN, SVM, and decision tree for multiclass classification [23].
It works in two aspects (training, classification). The class labels of training features help to identify class labels for the testing features. Factors for tuning KNN Only two features, namely 'the distance metric' and 'K' are required for the working of KNN.

K factor
This is a prime significant factor as it decides the classification rate. If its value is small, it can lead to overfitting and misclassification of testing data points. And larger K value diminishes the noise effect, which helps in functional classification at the cost of less distinct decision boundaries. In the present work value of K is taken as 3.

Distance measures
The distance metrics between the training and the testing samples is an essential criterion for the desired classification with noise or without it. "Euclidean distance metric" (EU) is the most common and best-preferred parameter used. It is also called Ruler distance, an extension to Pythagorean and L2 norm. The steps used for the execution of the KNN algorithm is presented in Fig. 4

Neural Network (NN) classifier
The neural network is made of stimulated biological like small units called neurons which are interlinked to perform complex tasks [24]. NN or artificial neural network (ANN) has an extensive spectrum for applications in pattern recognition, classification, optimization, reasoning, approximation [25]. They are showing remarkable results in medical diagnosis, data mining face recognition, fault detection, etc.
In this study, for ECG arrhythmia classification, the supervised classifiers FNN and BPNN are used. Both are fully connected and having a structure, as shown by Fig.  4. This NN structure has an input-output fold and two hidden folds having a sequence of 20 and 5 neurons consecutively in hidden folds. The selection of neuron numbers in the hidden fold (20,5) is made by the "trial and error" method.
For BPNN, gradient descent algorithm implementing the concept of Mean Square Error (MSE) using the network targets and exact outcome. Its task is to minimize the difference or error for a useful and accurate result. This is depicted by objective function as   2 1200 where ) ( n i x y presents the neuron actual output of nth pattern (1 to 1200 observations) of the i th neuron output as  in an output layer consists of 5 neurons according to a number of classes. This process continues until the total MSE gets minimized below a threshold. This is known as the training phase, and neurons are trained for the testing. For FNN, the output decisions are based on current input; no past and future data is interlinked. It is based on the perceptron model described as a machine algorithm given by Frank Rosenblatt, McCulloch, and Pitts in 1957 [26].

Experimental Results
DWT gives desired outcomes in the frequency decomposition range. This helps further in digital signal processing techniques. The detail noise filtering of ECG record and extracted ECG beats from ECG records are displayed in Fig. 2(a) and Fig. 2(b), respectively. The identified R-peaks using Db6 wavelet family are presented in Fig. 5. The algorithm of the proposed work is implemented in the MATLAB working environment (R2016 B).

KPCA and KICA extracted ECG attributes
The mean and standard deviation of extracted ECG attributes using KPCA are arranged in Table 2, and using KICA is tabulated in Table 3. All these coefficients have a low p-value (<0.0001), which is computed by Anova2 (Analysis of Variance) test. The low p-value signifies the statistical stability of all extracted features. The statistical measuring parameters like the accuracy rate (Acc_R), sensitivity rate (Se_R), positive predictivity rate (PP_R), and specificity rate (Sp_R) are calculated as formulated in Eq.23 to Eq.26. This present method has been implemented using k-fold cross-validation schemes for classifier operations (training & testing) to analyze the authenticity of the proposed algorithm.

ECG classification outcomes
In the k-fold scheme, the required feature set (6000 beats) is sub-divided within needed folds i.e., five subsets having an almost equal allocation of beats from each of the labeled classes. The format applied consists of nine sets (4800 beats) involved for training and the left out remaining set (1200 beats) required for testing the classifiers. The significance of k depicts the number of repetitions of the same process, and the performance outcome is figured out by taking the average of five folds.
Ritu Singh, Navin Rajpal and Rajesh Mehta    Fig. 6(ac) displays the sensitivity, specificity, and accuracy of all three classifiers during each fold using KPCA. The fivefold cross-validation has also been implemented on different classifiers using KICA. KNN has given the highest performance of 98.5 % of the average sensitivity, 98.6% of the average specificity, 98.4% of the average positive predictivity, and 98.5 % of the average accuracy. Fig. 6(d-f) displays the sensitivity, specificity, and accuracy of all three classifiers during each fold using KICA.
The compelling feature of a neural network known as Receiver Operating Characteristic (ROC) with a confusion matrix is also implemented. ROC helps in the quality check of a classifier by applying thresholds to output values. The area under the ROC graph or curve measures the accuracy of multi-class separation. The more the ROC curve is towards the axis, above and away from random line, the more accurate it is. A Random line represents 0.5, and the perfect curve represents 1 in the accuracy test. Figure 7(a) displays the ROC curve and confusion matrix of BPNN using KPCA, and (b) part represents the same of FNN using KICA. It can easily be interpreted that BPNN with KPCA shows more classification accuracy than FNN with KICA.

Comparison analysis and discussion
The research focus expressed the objective of the present paper to achieve higher performance measures using multiresolution wavelet analysis and dimensionality reduction techniques, namely KPCA and KICA. This gives a concise data feature set representation. The multiresolution analysis (MRA) of DWT with KPCA and KICA technology, when used with different classifiers, yielded the desired results. Table 5 tabulates a performance comparison between ECG heartbeat classification studies using the same MIT-BIH ECG database.
The literature references show that in multi-class analyses, researchers have a broad scope to work out. For this criterion, in 2012, Lamido and Martinez, [27] achieved an accuracy of 98.0% for normal, supraventricular (SV) and ventricular (V)beats using linear discriminant and RR features. Zadeh et al., [28] classified ECG signals composed of timing interval-based Wavelet and kernel dimensional reduction on arrhythmia classification of ECG signals   [31] achieved higher results of 99.6% of accuracy using DWT and RBFNN for five-beat classification. Another method used PCA for dimensional reduction and KICA for nonlinear feature extraction with LIBSVM, a classifier to achieve an accuracy of 97.78 % [33].
The proposed work provides a framework showing the kernel capabilities of KPCA and KICA, proving outstanding results in higher dimensions with KNN in comparison with FNN and BPNN.

Conclusion
The technical advancements in biomedical sciences have resulted in making ECG signal a decisive measure for cardiac arrhythmias. In this study, KPCA and KICA analysis are efficiently implemented as a non-intrusive mechanism for the analysis of five different cardiac beats. The obtained experimental outcomes of the present method are in two-fold. First, the discrete Daubechies wavelet multiresolution analysis helps in noise filtering, beat segmentation, and feature extraction. The dimensionality reduction and non-linear mapping characteristics of KPCA and KICA lead to the successful making of the ECG feature dataset Second, this reduced dimension dataset is subjected to supervised classifiers (KNN, FNN, and BPNN) using five-fold cross-validation data partitioning.
The main innovation of the proposed approach is that the used dimensional reduction methods map the features to higher kernel dimensions boosting the classification capability of classifiers such as KNN. It is analyzed that KPCA with a combination of KNN classifier performed the highest average accuracy, specificity and sensitivity of 99.7% each as compared with FNN achieving average accuracy, sensitivity and specificity 99.2%, 99.1%, 99.2% and with BPNN achieving average accuracy, sensitivity and specificity 99.2%, 99.2%, 99.3% respectively as shown in Table 4.
The classification outcomes depict that 4800 are irregular ECG beats and 1200 are normal ECG beats. Table 5 shows that the proposed combination has given an improved performance as compared with previous works. The five-fold cross-validation adds to more stable results. Moreover, patient interaction and manual interpretation are reduced to a great extent. So, this fully automated system is an asset to heart disorder analyst.
Wavelet and kernel dimensional reduction on arrhythmia classification of ECG signals Dimensionality reduction and wavelet-based non-linear feature extraction give a positive edge to analyze and investigate large datasets contributing to the health care management system. Indeed, it is eminent that the prospective system applies not only to self-monitoring leading to minimal delay in precautionary action but also to other applications such as fetus heart sound analysis and respiratory measurement indulge in different events.