Diabetes Correlated Renal Fault Prediction through Deep Learning

INTRODUCT I ON: Diabetic nephropathy is one of the complications of diabetes that causes damage to kidneys. Deep learning techniques are widely used to predict different diseases. OBJECTIVES: The main aim of this work is to develop an effective prediction model using deep learning. To get an effective model, a suitable dataset is considered that comprises of features related to diabetic nephropathy. METHODS: Deep belief network (DBN) is the proposed deep learning technique which is compared with naive bayes, CART decision tree, logistic regression and support vector machine. DBN is composed of Restricted Boltzmann Machines (RBM). The algorithms are analysed based on evaluation measures like area under PR curve, area under ROC curve, gini coefficient and jaccard index. RESULTS: After comparison of all algorithms, it was observed that DBN has performed better in terms of AUROC, gini coefficient and jaccard index with values 0.8203, 0.6406 and 0.7777 respectively. But CART obtained better value of 0.9039 only for AUPR. CONCLUSION: The proposed technique has outperformed other techniques in terms of three metrics and is identified as the best performing algorithm. Hence, it is suggested to use DBN while predicting diabetic nephropathy.


Introduction
Diabetic nephropathy is a kidney related ailment which is a predominate complication of the diabetes. It is caused because of long term suffering with diabetes and is most likely to occur in patients from diabetes type-1 and also in type-2. In type-1 diabetes insulin is not produced by the cells in pancreas. In type-2 diabetes small amount of insulin is produced, but this amount is not sufficient for the body. Both type-1 and type-2 diabetes will result in high blood sugar levels [1]. The increase in sugar levels of blood will cause damage to kidneys that filters the blood. Because of the damaged kidneys the protein called albumin is leaked into urine. This condition is known as protein urea. In some cases diabetic nephropathy disease may lead to kidney failure. The symptoms for severe kidney damage are weight loss, sickness, muscle cramps, swollen ankles and feet, frequent urination, puffiness around eyes, dry and itchy skin, tiredness and loss of appetite [2]. If a person has health complications like high blood pressure, cholesterol and diabetes for a long time and has a habit of smoking then

EAI Endorsed Transactions on Pervasive Health and Technology
Research Article EAI Endorsed Transactions on Pervasive Health and Technology 09 2020 -12 2020 | Volume 6 | Issue 24 | e4 S.S. Reddy, S. Nilambar and R. Rajender 2 there is high risk of affecting to the diabetic nephropathy. Some preventive measures are to be taken to reduce the risk of affecting to this disease. These preventive measures include managing high blood pressure, treating diabetes, healthy weight and avoiding habit of smoking [3].If this disease is left untreated it may lead to severe complications like pulmonary edema, hyperkalemia, diabetic retinopathy, cardiovascular disease, anemia and damage of nerves and blood vessels. Pulmonary edema means fluid retention which may lead to fluid in lungs, high blood pressure and swelling in legs and arms. Hyperkalemia means increase in the blood potassium levels. Anemia means having less amount of RBC than normal range [3]. About 40% of the diabetic patients are affected by diabetic nephropathy [4].  Filtration Rate (GFR) is a test that assesses functioning of the kidneys by knowing how kidneys are cleaning the blood. Tiny filters called Glomeruli in the kidney will filter and purify the blood. By performing this test the doctor can know the amount of blood passing through glomeruli each minute [5].
The GFR value is estimated by calculating the formula which includes parameters like creatinine level in blood sample taken, age, gender, height and weight. The five stages of kidney disease are as follows.
• GFR value greater than 90% is called as stage 1 which indicates normal or high functioning of kidneys. • GFR value between 60-89% is called as stage 2 in which the kidneys functioning is mildly decreased. • GFR value between 30-59% is called as stage 3 in which the kidneys functioning is moderately decreased. • GFR value between 15-29% is called as stage 4 which is an indication of severe drop in functioning of kidneys. • GFR value less than 15% is called as stage 5 which indicates kidney failure [6]. There are many complications of diabetes. Diabetic nephropathy is one of them which should be identified in order to avoid severe health complications. So, predicting the diabetic nephropathy in diabetic patients is performed in this work. The dataset considered, is to predict whether the respective person has diabetic nephropathy or not, but not to predict the stage of nephropathy the person is having.

Literature survey
Zarkogianni et al. [7] assessed four predictive models for type-1 diabetic patients. The dataset used was collected from sensors which monitor the glucose concentration in blood and physical activity. This data is taken from 10 diabetistype-1 patients with 6 day observation period. Four techniques namely feed forward neural network (FFNN), neuro-fuzzy network with wavelets as activation functions, self-organizing map (SOM) and linear regression have been chosen for developing the predictive models. Prediction horizons of 30, 60 and 120 min using mathematical and clinical evaluation criteria are considered to evaluate the developed models. From the comparative study it was observed that SOM has performed better in terms of both the criteria. Han Wu et al. [8] concentrated on improving accuracy and producing an adaptive model for predicting type-2 diabetes. Pima Indian diabetes dataset has been considered for developing model. This model has two levels. In the first level a k-means clustering technique will eliminate the data that had been clustered incorrectly. After elimination the remaining data is given as input to the second level i.e. logistic regression algorithm. Accuracy, precision, recall, area under ROC curve and kappa statistic are the evaluation metrics chosen to evaluate the model. This model was compared with the models of published work and proved that proposed model has obtained better accuracy. Further this model was applied for two other relevant datasets. Farzi et al. [9] focused their work on predicting complications of diabetes. The dataset chosen is mainly to predict complications of type-2 diabetes. It includes retinopathy, neuropathy, nephropathy, diabetic foot and heart disease. The classification algorithms namely Logistic model tree, NB tree, J48, random forest, SMO, MLP, Bayes Net, Naive Bayes and RBF were implemented for each complication. The performance metrics like accuracy, kappa, f-measure, recall, precision, FP rate, TP rate, MSE and MAE has been considered by them for evaluating and comparing the models for each complication. Among all the nine algorithms random forest has performed better in case of most of the complications. Dilip Singh Sisodia and Akanksha Verma [10] used individual and ensemble classifiers to predict kidney diseases. The dataset used for their work is chronic kidney disease dataset from UCI repository. The individual classifiers used are naive bayes (NB), minimal sequential optimization (SMO) and J48. The ensemble techniques used are random forest, bagging and Ada-Boost. They used performance measures namely accuracy, precision, sensitivity, f-score and AUROC to assess algorithms. By analyzing the results they concluded that J48 among three individual classifiers and random forest among three ensemble classifiers has performed better. Uma Dulhare and Mohammad Ayesha [11] mainly focused on extracting action rules based on stages of kidney disease using naive bayes classification technique. The authors used the dataset of kidney ailments from machine learning repository of UCI. They implemented naive bayes algorithm with oneR feature selection method. The different stages of kidney disease are predicted by calculating GFR value and these are used to generate action rules. They proved that by using oneR method the features in the dataset are decreased by 80% and overall accuracy has increased by 12.5% compared to normal naive bayes classifier. Ramya and Radha [12] presented their work on diagnosing kidney ailments using machine learning techniques. They used kidney disease dataset collected from different laboratories in Coimbatore. This dataset consist of 15 attributes with 1000 instances. The algorithms they have considered are random forest, neural network with back propagation and radial basis function. Evaluation metrics like accuracy, sensitivity, specificity and kappa statistic are considered for comparison.
From the analysis it is concluded that radial basis function has performed well with 85.3% accuracy. Hanyu Zhang et al. [13] explored neural network (NN) technique for predicting survivability of chronic kidney ailment patients. The dataset they have considered is taken from a hospital in Taiwan. It consist of 35 attributes with 5617 instances. They used artificial neural networks in two ways multilayer perceptron (MLP) and MLP with lasso feature selection. These two techniques are compared using accuracy, sensitivity, specificity, precision, f-measure and recall. From experimental results MLP with lasso feature selection has performed better than other technique. Ahmed Aljaaf et al. [14] highlighted their work on early diagnosis of kidney disease. The dataset they have used is a kidney disease dataset that consist of 25 attributes with 400 instances. The algorithms used are multilayer perceptron, SVM, logistic regression and CART decision tree. These algorithms are compared using accuracy, recall, specificity, precision, f-score, overall error and area under ROC curve. From result analysis CART has obtained better specificity, EAI Endorsed Transactions on Pervasive Health and Technology 09 2020 -12 2020 | Volume 6 | Issue 24 | e4 4 precision values and MLP has obtained better values for remaining performance measures. Soltanpour Gharibdousti et al. [15] used mining algorithms to predict kidney disease. The dataset is a chronic kidney disease (CKD) data obtained from UCI ML repository. The algorithms they have considered are logistic regression, DT, naive bayes, SVM and NN. These algorithms are implemented on original dataset and normalized dataset. Evaluation metrics like accuracy, sensitivity, specificity and area under ROC curve are used for comparing algorithms. From result analysis Regression and SVM has better performance for original data, SVM, NN and Regression has better performance for normalized data. Nusrat Tazin et al. [16] used selection of attributes and classification techniques for diagnosing prolonged kidney disease. The dataset is a CKD dataset from ML repository of UCI. The ranking algorithm is applied as feature selection technique. They have considered four classification algorithms namely SVM, DT, NB and k-nearest neighbor. The evaluation measures like accuracy, kappa statistic, mean absolute error, RMS error and ROC curve are used for comparison. They have summarized that decision tree has performed better by obtaining 99% accuracy. Shahariar Azad et al. [17] performed assessment of different ML methods to predict prolonged kidney diseases. The dataset is CKD dataset extracted from UCI ML repository. They considered ten ML algorithms namely KNN, SVM, random forest, naive bayes, Ada-Boost, linear discriminant analysis, decision tree, logistic regression, gradient boosting and artificial neural networks. They concluded that decision tree and naive bayes has obtained better accuracy values compared to remaining algorithms. Devika et al. [18] compared three classification techniques for predicting kidney ailment. The dataset considered in their work is CKD dataset acquired from UCI ML repository. The three classification techniques used are KNN, naive bayes and random forest. They performed comparison of algorithms using four evaluation parameters namely accuracy, precision, recall and f-measure. By comparing results of all algorithms random forest algorithm has performed better than remaining two algorithms. Himanshu Kriplani et al. [19] used deep neural network to predict kidney disease. The CKD dataset is acquired from UCI ML repository. It consists of 25 attributes with 400 instances. The gradient descent method is used as optimization technique for deep neural network which is the proposed algorithm. Some existing algorithms like NB, logistic regression, random forest, SVM and AdaBoost are compared with proposed algorithm. They concluded that the deep neural network has performed better than other algorithms by obtaining an accuracy of 97%. El-Houssainy and Anwar [20] used data mining techniques for predicting different stages of kidney ailments. The dataset was obtained from UCI repository named as CKD. The algorithms considered by them are probabilistic neural network, multilayer perceptron, radial basis function and SVM. They concluded that among all these algorithms the probabilistic neural network has performed better with 96.7% accuracy.
Abdullah Almansour et al. [21] compared performance of neural networks and SVM to predict kidney ailment. They considered dataset from machine learning repository of UCI named as CKD. The missing values in the data are filled by the average values of respective attribute. The SVM algorithm is implemented with the four kernels namely linear, radial basis, polynomial and sigmoid functions. Among these four types the SVM with kernel type linear has performed better. The artificial neural network has obtained better accuracy of 99.75% compared to SVM with linear kernel. Veenita Kunwar et al. [22] explored mining algorithms for analysing prolonged kidney ailment. The dataset is CKD dataset extracted from ML repository of UCI. The algorithms considered by them were artificial neural networks (ANN) and naive bayes. The evaluation metrics used for analysis are accuracy and kappa statistic. From the results obtained it was concluded that naive bayes algorithm has performed better than ANN. Parul Sinha and Poonam Sinha [23] aimed their work to predict CKD by application of classification. The dataset is taken from UCI repository named as CKD. The classification algorithms namely NB classifier, SVM and knearest neighbour (KNN) are used for prediction of CKD. These algorithms are compared using precision, accuracy and execution time. From the result analysis they concluded that NB algorithm has performed better among three algorithms. Pratibha Devishri et al. [24] compared classification techniques to predict prolonged kidney ailments. The dataset is CKD dataset obtained from UCI repository. They used principal component analysis (PCA) for feature selection. Then they applied six classification algorithms namely decision stump, rep tree, IBK, k-star, stochastic gradient descent (SGD) and sequential minimal optimization (SMO).They have used some evaluation measures like recall, f-measure, precision, kappa statistic, ROC curve, RMS and mean absolute error (MAE). After comparison of all the algorithms they concluded that decision Stump and rep tree has well performed than other classifiers. The works that are already done related to nephropathy are studied in this section. Many of the previous works used basic ML algorithms and didn't explore the deep learning techniques. In this work the deep learning technique namely deep belief network is explored for effectively predicting diabetic nephropathy. The advantage of using deep belief network is time required for training the model is comparatively less for DBN. Another important advantage of DBN is that it requires a small dataset. The dataset considered for this work is also small with 84 records that is sufficient to exploit the advantages of DBN in this work. The continuation of this section comprises of methodology of the work. In that section detailed information of dataset and system architecture are provided. Section 4 comprises of proposed work with detailed explanation of proposed algorithm and brief explanation of existing algorithms. Section 5 comprises of result analysis and comparison of algorithms. Section 6 comprises of conclusion of this entire work and the best algorithm among all.

Objectives of work
Kidneys are the most important organs in the human body and diabetic nephropathy is the prominent side effect of diabetes. So, the problem of effective prediction of diabetic nephropathy is considered in this work. This problem is planned to be handled by using an effective deep learning algorithm namely DBN. The main objectives of this work are • To Extract and refine an appropriate and suitable dataset. • To build an effective model using deep belief network.
• To ascertain performance of the model using precise performance metrics.  This is target attribute with two values 0 and 1. 0 represents not suffering with diabetic nephropathy. 1 represents suffering with diabetic nephropathy.

Dataset
The dataset considered was a kidney disease dataset that consist of 21 attributes and 84 instances. Among these attributes 20 are predictor variables and one target variable. It is a binary classification dataset. It contains the data that is used to predict whether the person has diabetic nephropathy or not, but not the stage of diabetic nephropathy the person is suffering with. The predictor variables are relative density, albumin, sugar, pus cell, pus cell clumps, red cells, bacteria, creatinine, potassium, haemoglobin, red blood cells count, high BP, diabetes, diabetes duration, coronary artery disease, appetite, pedal edema, anemia, high blood cholesterol and smoking. The target variable is classification that will classify the dataset into two categories positive and negative. Positive means the person is having diabetic nephropathy. Negative means not having diabetic nephropathy.

System architecture
The work flow of the proposed methodology or system architecture is provided in figure 6. The dataset is first loaded then data pre-processing is done to identify if there are any missing values. Then testing and training data were obtained by dividing the dataset. The training dataset contains 68 instances which is 80% of the original data. The test data contains 16 instances which is remaining 20% of the dataset. Then different predictive models are trained using the algorithms SVM, naive bayes (NB), CART decision tree, logistic regression (LR) and one deep learning algorithm namely deep belief network (DBN). The trained models are then evaluated using the test dataset. The presence or absence of diabetic nephropathy is predicted, but not the stage of nephropathy for the test instances. Then results for each algorithm are obtained which includes evaluation measures gini coefficient, jaccard index, AUPR and AUROC. Then effectiveness of the trained models are compared and analyzed to obtain the best algorithm.

Proposed work
This section comprises of detailed explanation of proposed algorithm deep belief network and brief explanation of other existing algorithms like naive bayes, CART decision tree, logistic regression and SVM.

Naive bayes
Naive bayes is a probabilistic classification algorithm. For each instance given as input it determines the posterior probability depending on which the prediction is made. The formula given below is used to compute posterior probability P(C/F) of class C where F is set of independent features. Here in the formula P(F/C) is likelihood probability and P(C) is prior probability of class C. Then check, for which class the probability obtained is higher, that class is the predicted output for given instance [25].

CART
Classification and Regression Tree that is abbreviated as CART is an algorithm based on decision tree. This algorithm constructs decision tree using an impurity measure called gini index. The splitting of tree is done by considering the attribute which obtained lowest value of gini index. The formula given below is used to calculate gini index of each attribute 'a' in given instances. Here in formula D1 and D2 are subset of instances belonging to different categories of 'a', c is the class, Pt is the probability of D belonging to the class t. In the constructed decision tree the leaf node represents the target value. By using this decision tree the output of instance is predicted [26].

Logistic regression
Logistic regression is an algorithm used for binary classification problem. In this algorithm a decision boundary is created which classifies the input instances.
The sigmoid function σ(z) given below is used for generating S shaped decision boundary. It lies between 0 and 1. A threshold value between [0, 1] is fixed based on which the output value is predicted. The cost function given below is used to reduce the error. In cost function y indicates actual value [27].

Support Vector Machine (SVM)
SVM is a ML classification algorithm. The major principle of this algorithm is training a model by separating the input instances into two groups using a hyper plane. Among two groups, one represents tested positive and the other represents tested negative. The hyper plane selected must be of maximum margin among all possible hyper planes. When an instance is given as input to the trained model the output is predicted based on the hyper plane. If the instance lie on side of positive group it is predicted as positive otherwise negative [28].

Deep belief network (DBN)
DBN is a deep neural network that could be used for classification problems. It is composed of unsupervised networks like restricted Boltzmann machine (RBM). Each layer in RBM network communicates with previous and next layers. In DBN there are multiple hidden layers in between input or visible and output layers. The hidden layers are connected with each other but not the neurons in hidden layers. A hidden layer will act as a visible layer for the neurons present in the subsequent hidden layer. This technique has of two phases namely pre-training and fine tuning phases. In pre-training phase the model is trained using RBM. In the fine tuning phase the adjustments in weights and bias of each layer in trained model is done to reduce cross entropy error and predict optimal output [29]. In step 2 to step 10, repeat the loop for each instance in the dataset. Steps 3-9 represent the pre-training phase. In step 3 the values are given to the units in visible layer. In step 4 initialize the bias and weights in the network. In step 5 conditional probability for hidden layer units are calculated. In step 6 the probability for visible layer units is calculated. The values of conditional probabilities in steps 5 and 6 are used to calculate new weights in step 7. In step 8 the process is repeated for next hidden layers. In step 9 logical condition is provided to decide whether the process should be repeated for next layers or not. In step 11 the trained model is obtained for prediction. Step 12-14 together represents the fine tuning phase. In step 12 the error is calculated based on which the weights and bias are adjusted using back propagation technique in step 13. In step 14 the final trained model is obtained that gives optimal solution [30].

Algorithm: Deep belief network (DBN)
Input: Instances in dataset and learning rate. Output: Predicted target values and error. Assumptions: Ht is unit t in hidden layer H, Vs is unit s in visible layer V. B is the bias given for all units in hidden layer, A is the bias given for all units in visible layer, Est is edge between unit s in visible layer and unit t in hidden layer. Wst is the weight of the edge Est, σ is sigmoid function computed as σ( ) = 1 1+ − , L is learning rate.
Step-1: Start Step 2: For each instance in dataset.
Step 3: Set values of visible layer units with attribute values belonging to instance.
Step 4: Initialize the values of all weights in the network with random value ranging from [0,1] and bias with value 0.
Step 5: Calculate conditional probability of each unit 't' in hidden layer H, for given visible layer V. This is called positive phase.

Pos(Eₛₜ) = P(Hₜ
Step-6: Calculate conditional probability of each unit 's' in visible layer V, for given hidden layer H. This is called negative phase. Neg(Eₛₜ) = P(Vₛ = 1/H) = σ �A + � WₛₜHₜ n t=1 � Step 8: Set hidden layer H as visible layer for subsequent hidden layer H1 i.e. V=H and H=H1.Step-Step 9: If there are any layers further then repeat from steps 5 to 8. Else repeat from steps 5 to 7 and go to step 10.
Step 10: End for loop in step 2 Step 11: The trained model is obtained and the neuron in the output layer which has more probability is the predicted value for given input.
Step 12: Calculate the error which is equal to the variation between actual and predicted values.
Step 13: Adjust weights and bias in the trained model by performing back propagation from output layer to input layer. This is called as fine tuning which reduces the error in predictions.
Step 14: After performing back propagation the neuron in the output layer which has obtained more probability is the predicted value which is optimal solution.

Gini coefficient
The Gini coefficient is used to measure how the developed model predicts the output better than the random predictions. The value of gini coefficient is computed using AUROC which is given below. Higher the value of gini coefficient indicates better model.

Jaccard index
Jaccard index is an evaluation metric used to measure the similarity of actual values and predicted values. Its value should lie between 0 and 1. Higher the value of jaccard index indicates better model. It can be defined using below formula.
Where V and P are two sets representing actual and predicted target values of test dataset respectively. Here intersection is calculated by considering pair of actual and corresponding predicted target value. This indicates number of correctly predicted values. In table 2 the test instances 3 and 9 are not predicted correctly out of 16 instances. Therefore intersection of V and P is equal to 14.    Figure 7 represents the PR curve obtained for naive bayes algorithm. Here Recall is taken as abscissa and the precision is taken as ordinate. The value AUC in the illustration is the AUPR value. Figure 8 illustrates the ROC curve plotted for naive bayes algorithm. The FPR and TPR are taken as abscissa and ordinate respectively. The value AUROC in the figure is AUROC for NB.    9. Area under PR curve for CART Figure 10. Area under ROC curve for CART Figure 9 represents the PR curve obtained for CART decision tree algorithm. The recall and precision are taken as abscissa and ordinate respectively. The AUC value in the illustration is the AUPR value. Figure 10 demonstrates the ROC curve plotted for CART decision tree algorithm. The FPR and TPR are taken as abscissa and ordinate respectively. The AUROC value is highlighted in the figure. Table 5 Figure 11 represents the PR curve obtained for logistic regression algorithm. The recall and precision are taken as abscissa and ordinate respectively. The value AUC in the illustration is the AUPR value. Figure 12 illustrates the ROC curve obtained for logistic regression algorithm. The FPR and TPR are taken as abscissa and ordinate respectively. The AUROC value is highlighted in the figure. Table 6 comprises of the values of evaluation measures obtained for support vector machine algorithm. The values obtained for AUPR, AUROC, gini coefficient and jaccard index are 0.8873, 0.6562, 0.3125 and 0.6 respectively.  Figure 13 represents the PR curve obtained for SVM algorithm. The recall and precision are taken as abscissa and ordinate respectively. The value AUC in the illustration is the AUPR value.

Deep Belief Network (DBN)
The DBN is implemented using 300 epochs. The best model was found after 91st epoch with values of cross entropy error and classification error on training dataset as 0.628 and 10.29% respectively. The cross entropy error is used for evaluating performance of neural network.   Figure 15 represents the PR curve obtained for deep belief network algorithm. The recall and precision are taken as abscissa and ordinate respectively. The value AUC in the illustration is the AUPR value. Figure 16 shows the ROC curve obtained for deep belief network algorithm. The FPR and TPR are taken as abscissa and ordinate respectively. The AUROC value is highlighted in the figure.  figure 17 given below represents the comparison graph. All the evaluation measures obtained for algorithms are compared using one colour for each metric. From this figure it was observed that the values of evaluation measures AUROC, gini coefficient and jaccard index are higher for DBN and AUPR is higher for CART. The values of AUROC, gini coefficient and jaccard index obtained for DBN are 0.8203, 0.6406 and 0.7777 respectively. The value of AUPR obtained for CART is 0.9039. Though CART has obtained better value of AUPR the proposed algorithm DBN has performed better in terms of remaining three evaluation measures. Thus, DBN was identified as the best algorithm among all. From the comparative analysis it was clear that the proposed algorithm deep belief network has performed better than remaining algorithms. The dataset considered in this work is of 84 instances. So, deep belief network was suggested to use for predicting diabetic nephropathy, when the datasets with few instances was considered. Though the instances are less the attributes used to predict are 21 which will give better predictive model. Hence, the use of deep learning techniques gives better results than machine learning techniques. It was advised to use deep learning technique in further works. Table 9 demonstrates analysis of the proposed work and other classifiers used in works related to kidney disease from literature survey. These works from literature survey include different machine learning classification techniques for predicting the disease. The best performing algorithms in the works [15], [16], [17], [22] and [23] from literature survey are the existing algorithms considered in this work. They are naive bayes, CART which is a decision tree, logistic regression and SVM. These are the most commonly used machine learning classification techniques. Similarly, the works [12], [14], [19], [20], [21] and [22]  in this work most effective and advanced measures like AUPR, AUROC, gini coefficient and jaccard index were used. The AUPR for DBN is 0.8873 and its value above 0.5 to 1 indicates a good value. But for this metric CART algorithm has obtained higher value of 0.9039. DBN performed better in terms of AUROC, gini coefficient and jaccard index and they are demonstrated as follows. The AUROC is 0.8203 and its value between 0.8 and 0.9 is considered as good value. The gini coefficient is a measure of minimize misclassification, that is measured for the considered model. The proposed DBN got 0.6406 of gini coefficient and its value above 0.6 is considered as a good value. The Jaccard index indicates the accuracy of the model. So, the accuracy of considered DBN algorithm based on Jaccard Index is 0.777 or 77.7%. Among all the considered techniques DBN was identified as the better one after comparison.  The limitation of this work is, predicting the stage of diabetic nephropathy was not performed. Instead prediction is done by classifying the dataset based on target attribute "Classification". The target attribute has two classes, positive for the presence of diabetic nephropathy and it was in any one of the five stages, negative means no diabetic nephropathy. As the proposed deep learning technique DBN outperformed machine learning techniques it was recommended to choose some other deep learning techniques for future works on diabetic nephropathy.

Conclusion
Diabetes leads to several other chronic diseases. Diabetic nephropathy is one of those chronic diseases that affect kidneys of the diabetic patient. In the context of predicting diabetic nephropathy, comparison of different existing ML classification algorithms with deep learning technique namely deep belief network is performed. Evaluation measures like AUPR, AUROC, gini EAI Endorsed Transactions on Pervasive Health and Technology 09 2020 -12 2020 | Volume 6 | Issue 24 | e4 coefficient and jaccard index are selected for performance evaluation of trained models. The existing ML classification algorithms considered for comparison are naive bayes, CART decision tree, SVM and logistic regression. The prediction is done whether the patient is having Diabetic nephropathy or not. Here the patient in any one of the five stages of the nephropathy is considered having diabetic nephropathy. From result analysis it is observed that the CART decision tree has obtained better value only for AUPR. But DBN has performed better in terms of AUROC, gini coefficient and jaccard index with values 0.8203, 0.6406 and 0.7777 respectively. As, the deep learning technique, DBN has obtained better values for most of the evaluation metrics it was suggested to use for prediction of diabetic nephropathy. Using any other deep learning technique that gives better results than DBN for Diabetic nephropathy prediction will be considered as the future work.