Predicting Diabetes Mellitus and Analysing Risk-Factors Correlation

INTRODUCTION: Diabetes mellitus is a common disease of the human body caused by a group of metabolic disorders where the sugar levels exceed a prolonged period, and that is very high than the usual time. It not only affects different organs of the human body but also harms a large number of the body system, in particular the blood veins and nerves. OBJECTIVES: Early predictions of this phenomenon can help us to control the disease and also to save human life. For achieving the goal, this research work mainly explores various risk factors such as kidney complications, blood pressure, hearing loss, and skin complications related to this disease using machine learning techniques and make a decision. METHODS: Machine learning techniques provide an efficient result to extract knowledge by constructing predicting models from diagnostic medical datasets collected from 200 diabetic patients from the Medical Centre Chittagong, Bangladesh using 16 attributes. Obtaining knowledge from such data can be useful to predict diabetes. In this work, we perform four popular machine learning algorithms, such as Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbour (KNN) and C4.5 Decision Tree (DT), on adult population dataset to predict Diabetes Mellitus. RESULTS: C4.5 Decision Tree performs better than other algorithms for predicting diabetes with 73.5% accuracy, 72% F-measure, and 0.69 of AUC (area under ROC curve). Besides, we determine the correlation between different risk factors of Diabetes Mellitus. The highest correlation is 0.81 for blood pressure (Hypertension) complications with diabetes. CONCLUSION: In this study, both positive and negative correlation has been established between the various risk factors and diabetes. There is a positive correlation for predicting kidney complications (Nephropathy) and blood pressure (Hypertension) complications and a negative correlation at predicting hearing loss and skin complications (diabetes dermopathy) from diabetic patients. It helps a patient to be aware of the risk factors related to diabetes.


Introduction
Diabetes mellitus, generally known to the people as diabetes, is a disease that immensely affects the hormone insulin and increases levels of sugar in the blood and also causes abnormality metabolism of carbohydrates. This high blood sugar has an impact on various organs of the human body and sometimes creates complications in many bodywork functionalities, in particular the blood veins and nerves. The causes of diabetes are not yet wholly discovered, and many researchers believed that both genetic elements and environmental factors are involved there. Most of the cases, diabetes occurs mainly in grownups, and that's why it called 'adult-onset' diabetes. Notably, diabetes mellitus is primarily involved with the aging process.
2 from 2.5 million to around 3.7 million [1]. The current situation of the world is not different from this situation. As indicated by the International Diabetes Federation in 2013, the number of people having diabetes mellitus is 382 million [2], which are 6.6% of the total grown-up population in the world. According to the statistics of world healthcare medical data, it expects that patient of diabetic disease increases from 376 billion to 490 billion within the year 2030 [3]. Moreover, the diabetic is a conceivably independent contributing risk factor to microvascular entanglements.
Diabetic patients are probably more victimized against a hoisted risk of micro-vascular damage. Long-term complication effects of cardiovascular disease are the leading cause of death. This micro-vascular harm and hasty cardiovascular disease eventually prompt to retinopathy, nephropathy, and neuropathy [4].
One reason for chronic kidney disease is Diabetes Mellitus, which is evaluated by high blood glucose levels of sugar. These levels of sugar in the blood vessels mischief a large number of tiny filtering units in the kidney. This fact, in the long run, prompts to kidney failure. Around 20 to 30 percent of people with diabetes create kidney disease (diabetic nephropathy), although not all of these advances to kidney failure. A person with diabetes has more chance to attack by nephropathy, whether they use insulin or not.
There is no remedy for diabetic nephropathy, and therefore, the treatment is life-long. People with a disease like diabetes are also at risk of various kidney issues, including narrowing of the arteries to the kidneys. It is called renal artery stenosis or renovascular disease.
Many people with diabetes also have hypertension or high blood pressure. In a 2013 review, the American Diabetes Association (ADA) found that the combination of hypertension and diabetes mellitus is deadly and can substantially raise the risk of having a heart attack or stroke. A person with diabetes must make assured that his blood pressure is well controlled.
Diabetes increases the risk of developing a sudden hearing loss. It creates the impression that the patient having severe diabetes is more susceptible to hearing loss. A few studies found that high blood sugar levels can damage the tiny blood vessels in the inner ear, which affects sound reception and make it harder to hear. It restricts people of a specific age group or those living in noisy environments.
Diabetes can affect most of the parts of the body, including the skin. Fortunately, most skin conditions can be prevented or effectively treated whenever they got early. A portion of these problems are skin conditions anyone can have, but people with diabetes get more easily. These incorporate bacterial infections, fungal infections, and itching. Other skin issues happen mostly, or only the people with diabetes-these incorporate diabetes dermopathy.
Many researchers worked on the early prediction of diabetes by taking into account various risk factors related to this disease. For our analysis, we collect diagnostic datasets having 16 attributes diabetic of 200 patients such as age, diet, hypertension, the problem in vision, genetic, and so on. In the later part, we discuss these attributes with their corresponding values. Based on these attributes, we build a prediction model using various machine learning techniques to predict diabetes mellitus.
According to International Diabetes Federation (IDF) published by Atlas in 2017 [54], there are around 424.9 million diabetes patients in the world having age from 20-79 years, of whom 95% suffer from Type 2 Diabetes Mellitus (T2DM) which shown in Table 1. It predicts that the number increases to 628.6 million by 2045 [55]. According to the World Health Organization (WHO) report published in 2016, the statistics of people in Bangladesh dying from this disease shown in Table 2. A correlation measures the relationship between two or more variables. To evaluate the correlation between different risk factors of Diabetes Mellitus, we use statistical correlation [5] on the collected dataset attributes from chronic kidney and blood pressure disease over the diabetic disease. We have also used the CORREL function or the Analysis Toolpak add-in in Excel [6] to find the correlation.
The contributions in this study are as below:  We collect real diagnostic datasets having various risk factors of 200 diabetes patients from the medical center.


We make a performance comparison of different machine learning techniques and evaluate the prediction results.


We make a correlation between different risk factors of diabetes mellitus.
In section II, we mention the related works. Our methodology is described in Section III. In Section IV, we analyze our experimental results. Finally, we conclude in Section V.

Related Work
Numerous works have been conducted in the area of diabetes by using machine learning techniques to extract knowledge from available medical data. performance than other algorithms. WeifengXu et al. [22] applied to different machine learning algorithm in the prediction of diabetes. From those algorithms, Random Forest (RF) provided better results than other data mining techniques. Yunsheng et al. [23] used KNN and DISKR in the prediction of diabetes, where storage space was reduced, and an instance had fewer factors eliminated. By evacuating the exception, both performance and accuracy are increased. Sajida et al. [24] proposed the Adaboost model, which provided better performance and accuracy. Dr. Naveen & Pradeep et al. [25] applied KNN, SVM, J48, and Random Forest over 215 instances with 7 different attributes, where the J48 machine learning algorithm provided better performance and accuracy than others before preprocessing technique. Santhanam and Padmavathi [26] Genetic Algorithm, K-means, and SVM were applied and increase the accuracy value. Ramiro et al. [27] applied a fuzzy rule mechanism to decrease the possibility of wrong treatment. It helps doctors as a recommender system to give the patient the correct treatment. Different data mining algorithm applied by Saba et al. [28] and it observed from these algorithms, the Meta classifier provided higher accuracy than a single classifier. To avoid the chronic complications of diabetes, patients should control a blood glucose level as the HbA1c (3 months accumulative blood glucose level) should be less than 7% [29]. Kavi at el. [30] developed a new predicted model by using machine learning techniques. In the proposed model, the main objective is to classify diabetic patients into two classes, such as under control (HbA1c < 7%) and out of control (HbA1c > 7%). Moreover, Diabetes mellitus is a chronic, debilitating disease, which is associated with a range of severe complications [31]. There is a strong correlation between different risk factors of diabetes mellitus [32] over kidney disease and blood pressure disease. Early detection and meticulous management to prevent complications is the major challenge of diabetic care [31]. Renal failure or kidney disease is the common risk factor correlated with chronic microvascular complications of diabetes mellitus [33]. Chronic complications vary markedly in individuals but generally increase with the duration of diabetes.

• Chronic Kidney Disease:
Diabetes nephropathy is the leading cause of end-stage kidney failure in the world [34]. A diabetic person with persistent albumin loss in urine and progressive renal insufficiency with or without hypertension is said to have diabetes nephropathy [34]. But it ideally depends on documentation of diabetes-specific changes in kidney biopsy material. Based on Table 3 investigation range, around 30% of people with type 1 diabetes eventually create diabetes nephropathy. Nephropathy is less common in Type 2 diabetes (15-20%) than Type 1, but due to the more significant number of Type 2 diabetes, the majority of patients with kidney failure are Type 2 diabetes [35].
Early changes may be asymptotic but later may lead to kidney failure. Table 3. Investigation ranges of chronic kidney disease.

Gender
Serum Creatinine (mg/dL) Man 0.7-1.3 mg/dL Woman 0.6-1.1 mg/dL Hypertension or blood pressure is other common risk factors that correlated with diabetes mellitus for developing heart disease, stroke, blindness, and kidney failure [36].
• Blood Pressure Disease: High blood pressure causes damage to large and small blood vessels in the body. At the point when high blood pressure exists with high blood cholesterol levels or diabetes, the risk of heart attack or stroke increases many times [37]. Hypertension or blood pressure contributes to the progressing state of chronic complications of diabetes. Patients with Type 1 diabetes have hypertension is often an appearance of diabetes nephropathy. In patients with Type 2 diabetes, hypertension is often a part of metabolic syndrome. Most prospective studies with hypertensive diabetic persons have documented that reduction of blood pressure is the single most significant factor that reduces both renal disease progression and cardiovascular events [37]. Table 4 shows the investigation ranges of blood pressure disease. All these risk factors should be taken into account during the prevention and treatment of chronic complications of diabetes mellitus [29].
• Hearing Loss: Hearing loss is a common medical health problem that influences work efficiency, functional status, social communications, personal satisfaction, and quality of life [38]. Hearing loss is one of the concerns among the growing age person infected with diabetes mellitus and the person who is working in a noisy environment in the job. Diabetes mellitus and exposure to loud noise are well-known risk factors for hearing loss. Research suggests that patients with diabetes may experience more Md. Faisal Faruque et al.
EAI Endorsed Transactions on Pervasive Health and Technology 08 2019 -05 2020 | Volume 5 | Issue 20 | e7 5 significant hearing loss than those without the disease [39]. Below the hearing threshold range limit up to 20 decibels, is considered to be normal hearing. Several hearing loss problems can be described according to severity, as follows in Table 5. The skin issue differs significantly in side effects and severity. They can be temporary or permanent and might be painless or painful. Some have situational causes, while others might be hereditary. Sometimes, skin conditions are minor, avoidable, and others can be dangerous [40]. Diabetes can affect the small veins of the body that supply the skin with blood. The skin problem can be the first visible sign that a person has diabetes. Diabetes dermopathy causes due to the changes to the blood vessels because of diabetes. Dermopathy appears as scaly patches that are light brown or red, often occurring on the skins. The patches are sometimes called skin spots. A higher rate of this condition is found in people having retinopathy, neuropathy, or kidney disease [41]. The diagnostic range of diabetes skin problem patients is categorized in Table 6.

Methodology
For the study, methodology comprises of few stages, which are an accumulation of diabetes dataset with the relevant attributes of the patients, such as pre-processing the attributes, to perform various machine learning classification and corresponding performance matrices utilizing such data. Finally, we analyze the risk factors correlated with diabetic dataset. In the following, we are going to discuss these phases briefly.

A. Dataset and Attributes
The dataset obtained from the diagnostic section of Medical Centre Chittagong (MCC), Bangladesh. For the study, we collect the diagnostic dataset consists of 16 attributes or risk factors of diabetes mellitus of 200 patients. We have organized the attributes and corresponding values, shown in Table 7.
The training data is categorized diversely for diabetic and non-diabetic patients. We have considered only the last result of medical tests of diabetic patients before their diagnostic of diabetes. For non-diabetic patients, we have considered all their test results throughout their previous medical history. For repetitive diagnostic tests, we consider only the results of the first test. The attributes of the dataset are obtained from clinical tests. The chosen clinical test result is found relevant to the diagnosis and the onset of Type 2 diabetes disease, which is the part of diabetes prevention trial studies.

B. Data Pre-processing
Data pre-processing is one of the most significant phases in the data mining process. It prepares and transforms the initial dataset. Raw data is generally incomplete and inconsistent and can produce misleading results. Thus, some pre-processing data methods can be applied to raw data before running an analysis. For example, the exact numeric value of the attributes does not have meaning to predict diabetes. This specific dataset had both nominal and real-valued attributes. We transform the numeric attribute values into nominal for finding a meaningful way to use such data. The patient's age is categorized into three, such as Young (10-25 years), Adult (26-50 years), and Old (above 50 years).

C. Apply Machine Learning Techniques
Machine learning is part and parcel of modern computer science, comprises algorithms that can learn from data; it gives a set of methods that can recognize patterns from the data and also use the patterns to generate future predictions. Machine Learning (ML) provides various techniques, methods, and tools that can help to solve diagnostic and prognostic problems in a variety of medical data. ML can be used for evaluating how important clinical parameters are and how their combinations are used for prognosis, e.g., prediction of disease progression, extraction of medical knowledge for outcome research, therapy planning, and support and above all for the patient and clinical management. ML is also used in the Intensive Care Unit and intelligent alarming, resulting in effective and efficient monitoring for the detection of regularities in the data, such as dealing with incomplete data and interpretation of continuous data.
Machine learning has incredible potential in improving the effectiveness and accuracy of decisions drawn by intelligent computer programs. Machine learning incorporates mainly concept learning and classification learning. Classification learning is the most widely used machine learning technique that includes separating the data into different segments, which are non-overlapping. Hence classification is the way toward finding a set of models that describe and recognize the class label of the data object.
Machine learning techniques also give efficient outcomes to extract knowledge by constructing predicting models from diagnostic medical datasets collected from diabetic patients. Furthermore, predicting the disease earlier leads to treating the patients before it becomes critical. Therefore, it has a significant role in diabetes research, presently like never before. Then we employ four popular machine learning classification techniques, namely Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbour (KNN), and C4.5 Decision Tree (DT), on adult population data to predict Diabetic Mellitus. We depict some explanations about these machine learning algorithms as below.

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a more powerful classification technique proposed by J. Platt et al. [42]. A Support Vector Machine (SVM) is an excluded classifier, formally characterized the data by separating a hyperplane. SVM isolates entities in specified classes. It can also identify and classify an instance that is not supported by data. SVM is not caring for the distribution of acquiring data for each class. The two extensions of this algorithm are used; they are regression analysis to produce a linear function, and another one is learning to rank elements to produce a classification for individual elements.
SVM is one of the supervised learning techniques used in medical diagnosis for classification and regression [43,45]. SVM, at the same time, minimizes the empirical classification error and maximize the geometric margin. So, SVM is called Maximum Margin Classifiers. SVM is a general algorithm based on guaranteed risk bounds of statistical learning theory, which is also called the structural risk minimization principle. SVMs can efficiently perform non-linear classification using the kernel trick. The kernel trick maps inputs into high- 7 dimensional feature spaces without explicitly realizing the feature spaces.
An SVM model represents points in space, mapped so that the different categories are divided by a clear gap [43,44]. For instance, given a set of points belonging to either one of the two classes, an SVM finds a hyper-plane having the most significant possible fraction of points of a similar class on a similar plane. This separating hyperplane is known as the Optimal Separating Hyperplane (OSH) that maximizes the distance between the two parallel hyperplanes and can minimize the risk of misclassifying instances of the test dataset.
For considering the overlapping points, an SVM finds a hyper-plane having the appropriate points of the same class on the same plane. This separating hyperplane is called the optimal separating hyper-plane (OSH) that maximizes the distance between the two parallel hyperplanes and can minimize the risk of misclassifying instances of the test dataset. Figure 1 shows an SVM model by using a set of training data from the sample dataset of diabetic patients.

Naive Bayes Algorithm
Naive Bayes is the popular probabilistic classification technique proposed by John et al. [46]. Naive Bayes, also called the Bayesian theorem, is a simple, effective, and commonly used machine learning classifier using probabilistic results by counting the frequency and combines the value given in the data set. By using the Bayesian theorem, it assumes that all attributes are independent and based on variable values of classes. In a real-world application, the conditional independence assumption rarely holds and gives more sophisticated classifier results. The formula (1) for Bayes' Theorem is [46]: Here, P(H|E) is the posterior probability, the probability that a hypothesis (H) is true given some evidence (E). P(H) is the prior probability, i.e., the probability of the hypothesis being true. P(E) is the probability of the predictor, irrespective of the hypothesis. P(E|H) is the probability of the evidence when the hypothesis is true. In Naive Bayes classifier, it is acceptable that the input variables (features) are independent of each other, and all features individually contribute to the probability of the target variable. So, the existence of one feature variable does not affect the other feature variables. That is why it is called naïve. However, in real datasets, the feature variables are usually dependent on each other. So this is one of the drawbacks of the Naive Bayes classifier. Naive Bayes classifier, though, works very well for large data sets and sometimes performs better than other complicated classifiers. There are few distinct types of Naive Bayes classifiers; among them, the Gaussian Naive Bayes classifier was used in this model. The Gaussian Naive Bayes classifier accepts that the feature values are continuous, and the values of belonging to each class are normally distributed [47]. For simplifying prior and posterior probability calculation, among 200 real diagnostic datasets, we have considered 14 training datasets with four different attributes of the diabetic patients. The attributes are blood pressure, kidney disease, skin disease, and hearing loss. If any patient suffers from blood pressure and kidney disease and at the same time do not infect from skin disease and hearing loss problem, by using Naive Bayes classifier, we have found out the probability value of 'Yes' (0.48) is higher than the probability value of 'No' (0.11). It means that the patient has diabetes. Naive Bayes algorithm is used for binary and multiclass classification and can also be trained on a small dataset, which is a huge advantage. It is also swift and scalable. Moreover, it mitigates the problem arising from the curse of dimensionality to some degree. However, as mentioned before, it makes the unrealistic presumption that the input variables are free of one another. It is not the case in reallife datasets, where there can be many complex relationships between the feature variables.

K-Nearest Neighbour Algorithm (KNN)
K-nearest neighbour is a simple classification and regression algorithm that used the non-parametric method proposed by Aha et al. [48]. The algorithm incorporates all valid attributes and classifies new attributes based on their resemblance measure. To determine the distance from the point of interest to points in the training data set, it uses a tree-like data structure. The value of k is always a positive integer of the nearest neighbour. KNN is a standout amongst the most basic and straight forward data mining techniques. It is called Memory-Based Classification, as the training examples should be in the memory at run-time [49]. When dealing with continuous attributes, the distinction between the attributes is calculated using the Euclidean distance. The equation (2) gives the Euclidean distance between two x and y points: KNN generally manages with continuous attributes; however, it can also deal with discrete attributes. The study additionally shows that K-Nearest-Neighbour is a standout amongst the most broadly utilized data mining techniques in classification problems. Its simplicity and generally high convergence speed make it a popular choice. However, the main disadvantage of KNN classifiers is the enormous memory necessity expected to store the entire sample. When the sample is large, the response time on a sequential computer is also significant. KNN classifier separates the training data into smaller subsets, and building a model for each subset, then applying voting to classify testing data, can enhance the classifier's performance. Figure 2 shows a simple KNN classifier algorithm (with, k=1) that applied to 200 diabetic patient datasets obtained from data preprocessing.

Decision Tree
Decision Tree is the best classification procedure which can make a comparison among the most popular classification technique and easy to understand for knowledge and information systems [50]. It is a learning system, which uses a 'Divide and Conquers' procedure to classify instances. Most of the cases real data contain noisy information, and the divide and conquer approach to deal with building trees will, in general, incorporate such inconsistencies with the classifier, which prompts to lower prediction accuracy on unseen test data. For overcoming such over-fitting, Decision Trees are suitable [59]. Pruning of the decision tree is a helping way to understand easily as well as the execution of test data. A Decision Tree is a robust classification technique to predict diabetes. Most of the information represented in limited discrete areas called the 'classification.' Each of the discrete areas and features of the specific domain is called a class. In Decision Tree, an input feature of the class attribute is labeled with the internal node, and the leaf node of the tree is labeled by attribute, and each attribute associated with a target value. The highest among information gain for all the attributes are evaluated in each node of the tree. Numerous popular decision tree algorithms are available to classify diabetes data, including ID3, J48, C4.5, C5, CHAID, and CART. In project work, the C4.5 Decision Tree algorithm has been chosen to measure performance analysis of the diabetes data. C4.5 provides extended features of the ID3 Decision Tree algorithm proposed by Ross Quinlan et al. [51]. This learning method can be used to diagnose medical data for predicting the value of the decision attribute based on information gain. C4.5 evaluates and selects the attribute value of the data that separates the tested data into subset data, which enriched the class in each branch of the decision tree. The normalized information gain decides to pick the highest value attribute in constructing the tree. As a result, the significant risk factors of the diabetes mellitus attributes are arranged from root nodes to child nodes downwards with comparatively week attributes. In this process, the tree structures are constructing.  According to Figure 4, after defining the problem, we have to collect the relevant data from the Diagnostic Data Storage. We then pre-process the data to build the prediction model. After that, various machine learning techniques are applied to the training dataset. Finally, the test dataset is used to measure the performance of the techniques for choosing the best classifier to predict diabetes mellitus.

D. Risk Factors Correlation
Progression of diabetes mellitus is strongly correlated to several complications, the leading causes of chronic kidney and blood pressure disease. It is well known that DM covers a wide area of different pathophysiological conditions. The most widely common complications are divided into micro and macrovascular disorders, including diabetic nephropathy, retinopathy, neuropathy, and cardiovascular disease. Because of high DM increase mortality and morbidity. Its related complications need to be prevented. That's why it is essential to eliminate several risk factors related to long term diabetes complications; as a result, longevity can be increase. A correlation measures the relationship between two or more variables indicating the risk factors of DM. To evaluate the correlation between different risk factors of DM, statistical correlation [5] can be applied to the data set attributes that we have considered of chronic kidney disease, blood pressure, hearing loss, and skin problem over the diabetes disease.

• Correlation between diabetes and Kidney disease:
One reason for chronic kidney disease is diabetes mellitus, which is characterized by high blood glucose (sugar) levels. Over time, the high amounts of sugar in the blood vessels that harm millions of small filtering units in the kidney. It is, in the long run, prompts to kidney failure. Approximately 20 to 30 percent of people with diabetic enlarge to kidney disease (diabetic nephropathy), although not all of these people progress to kidney failure. A person with diabetes is at risk of nephropathy, whether they depend on insulin or not. The risk is correlated to the length of time the person has diabetes. There is no heal for diabetic nephropathy, and therefore the treatment is life-long. Diabetic peoples are also at risk of other kidney problems, including narrowing of the arteries to the kidneys called renal artery renovascular disease.
To build a correlation between chronic kidney disease and diabetic patients, the attributes that we have considered are-age, sex, blood pressure, itching, vomiting, trouble sleeping, chest pain, smoking, heart disease, loss of appetite, too much urine, breath problem, and family history. The sample clinical dataset of diabetic nephropathy patients is shown in Table 9. Table 9. Sample dataset of diabetic nephropathy patients.

 Correlation between diabetes and blood pressure disease:
Many people with diabetes mellitus also have hypertension or blood pressure disease. Blood pressure disease is known as a "silent killer" since it often has no cleared symptoms, and many people are uninformed they have it. According to the 2013 review [36], the American Diabetes Association (ADA) found that a combination of hypertension and diabetes mellitus is particularly deadly and can significantly raise the risk of having a heart attack or stroke. A person with diabetes must control blood pressure. In our comparative study to build a correlation between blood pressure disease and diabetic patients the attributes that we have considered are-age, sex, occupation, smoking, blood pressure (both systolic and diastolic), pulse rate, drink, family member, salt in diet, murmur, and cholesterol. The sample clinical dataset of diabetes blood pressure patients is shown in Table 10. Table 10. Sample dataset of diabetic blood pressure patients.

 Correlation between diabetes and hearing loss disease:
Diabetes is associated with a risk of hearing loss. Type 2 diabetes may be an independent risk factor for hearing loss. Because high blood sugar effects of hyperglycemia may damage the cochlea. Signs and symptoms that commonly occur in Type 2 diabetes can be related to the immediate effects of hyperglycemia or hypoglycemia (blurred vision and excessive thirst, for example). Many patients may not realize the relation between their hearing impairment and their diabetes condition. According to the National Institutes of Health [39], Hearing loss is common in people with diabetes. To build a correlation between hearing loss and diabetic patients the attributes that we have considered are -age, sex, weight, diet, polyuria, water consumption, excessive thirst, blood pressure, hypertension, tiredness, the problem in vision, kidney problem, hearing loss, skin problem, genetic and diabetic. The sample clinical dataset of diabetes hearing loss patients is shown in Table 11.
 Correlation between diabetes and skin problem disease: Long term Type 2 diabetes with hyperglycemia or high blood glucose, tends to be associated with poor circulation, which decreases blood stream to the skin. It can also affect blood vessels and nerves. The capacity of the white platelets to fend off infections is also decreased in the face of elevated blood sugar. Diminished blood circulation can prompt changes in the skin's collagen. It changes the skin's texture, appearance, and ability to heal. Sweating may cause dry skin in the feet and legs [52].
To build a correlation of diabetic patients, the attributes that we have to consider of a skin problem patients are age, sex, weight, diet, polyuria, water consumption, excessive thirst, blood pressure, hypertension, tiredness, the problem in vision, kidney problem, hearing loss, skin problem, genetic and diabetic. The sample clinical dataset of diabetic skin problem patients is shown in Table 12.

Experimental Results and Discussion
After the implementation of different Machine Learning models, the next step is to measure the performance of the classification techniques. It is done by running the models on the test dataset, which was set aside earlier. a) The test dataset is separated into N-folds, where each fold is used for classifying the testing and training data to predict the model. b) Repeats N times until completing the procedure for the testing and training data. c) According to N-folds cross-validation, we partition the data into 10-folds, where each fold is nearly the same with other folds in the dataset. d) Execution of each iteration contains nine folds as a training set to adapt the model, and the remaining 1 fold known as the testing set is used for evaluating performance. e) In the end, learning scheme techniques performed 10 times on training data sets, and lastly, the prediction accuracy averages for 10 data sets. Various performance metrics, such as precision, recall, F-measure, and accuracy, are described as follow:

A. Evaluation Metric
If TP belongs to true-positive rate and FP belongs to false-positive rate then according to [53] the formal definition of precision is in equation (3)

B. Comparison Results
The performance of different machine learning techniques has been shown in Table 13, based on precision, recall, and F-measure. The table shows the results of various machine learning techniques such as SVM, NB, KNN, and C4.5.As information gain helps to construct the trees with attributes of the highest gain to lowest in a downward fashion, it is evident that C4.5 achieves better results than other classifiers to predict diabetes mellitus. According to Figure 6, C4.5 achieves 72% precision, 74% recall, and 72% F-measure on this dataset, which is higher than other learning techniques. This experimental result provides evidence that C4.5 Decision Tree performs well on medical datasets to predict diabetes mellitus based on various risk factors discussed in the earlier section. In addition to precision, recall, and F-measure, we also calculate the direct accuracy rate in the percentage of all these classifiers shown in Figure 7. If we observe Figure  5, we see that the C4.5 decision tree technique outperforms other techniques to predict Diabetes Mellitus.   Based on Figure 6, it can be observed that C4.5 Decision Tree achieves a better accuracy of 73.5% to predict diabetes mellitus utilizing a given medical dataset.
In the results, Area under the Receiver Operating Characteristic (ROC) curve of the SVM, NB, KNN, and C4.5 Decision Tree algorithms are 0.56, 0.65, 0.67 and 0.69 respectively, which is shown in Table 14. From Figure 7, the confidence band of the curve is clearly shown for C4.5 Decision Tree rather than other techniques. With the features of the information gain criterion, C4.5 Decision Tree achieves better accuracy for ROC. The experimental results prove that for the diabetic dataset, the area under ROC for the C4.5 algorithm performs best in four learning techniques is 0.69. To determine the correlation between different risk factors of diabetes mellitus, we collect the dataset consists of various attributes or risk factors of kidney disease and diabetes mellitus of 200 diabetic patients. In the diagnostic dataset results, blood glucose levels (after the meal) are significantly increased for diabetic patients. Serum creatinine levels observed significantly low in nondiabetic patients and high in diabetic patients. A positive correlation (0.72) is found between serum creatinine and blood glucose level for diabetic patients, which shown in Figure 8. Diabetes mellitus and blood pressure frequently coexist.
Formation o blood glucose (after and before the meal) and blood pressure (systole/diastole) have recently reported as disease markers for diabetes and hypertension, respectively. This study is aimed to find the correlation between diabetes mellitus and blood pressure. An equal number of people of different ages and sex are selected to test. Their blood glucose (after and before the meal), serum creatinine, pulse rate, and cholesterol levels are measured by spectrophotometer. The dataset attributes are correlated by statistical methods. Notably, we see that 13 blood glucose (after and before the meal) levels, as well as blood pressure (systole/diastole) levels, are significantly high for diabetic hypertensive patients. A significant positive correlation (0.81) is found between blood glucose levels and BP levels in diabetic hypertensive patients, shown in Figure 9. These findings suggest that the combination of hypertension and diabetes can be deadly, and together they can enhance the risk of a heart attack or stroke.

Figure 9. Correlation between blood pressure and diabetic patients
We have analyzed the correlation between diabetes mellitus and hearing loss patients. A negative correlation (-0.72) is found between blood glucose levels and diabetic hearing loss patients, shown in Figure 10. These results suggest that hearing loss and diabetes mellitus are comparatively weak correlated.
. Figure 10. Correlation between hearing loss and diabetic patients We also evaluate the correlation between diabetes mellitus and skin problems. A negative correlation (-0.76) is found between blood glucose levels and diabetic skin problem patients, shown in Figure 11. These results suggest that skin problem and diabetes mellitus is nearly weak correlated.
. Figure 11. Correlation between skin problem and diabetic patients In the future, we can collect more data and make decisions based on their correlation with other diseases respective by males and females by considering the concept of recent pattern analysis [57] for building more effective models.

Conclusion
In this work, we have explained how Machine Learning can be adopted in clinical diagnostics to predict the probability of diabetes-induced complications. It is done using different Machine Learning algorithms under various circumstances. Knowledge extraction from real health care dataset can be useful to predict diabetes mellitus. To predict diabetes mellitus effectively, we have performed our experiments using four popular machine learning algorithms, such as Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbour (KNN) and C4.5 Decision Tree on the adult population. From the experimental results, we can make the decision that C4.5 Decision Tree is significantly superior to other machine learning techniques on diabetes data. We also find a positive correlation at predicting kidney complications (Nephropathy) and blood pressure (Hypertension) complications and find a negative correlation at predicting hearing loss and skin complications (diabetes dermopathy) from the diabetic patients. For the study, we have collected a diagnostic dataset having 16 attributes diabetic of 200 patients. The experimental results assist the health care centre to make better clinical decisions on diabetes. It is also helpful for individuals to control diabetes.