Risk Assessment of Type 2 Diabetes Mellitus Prediction using an Improved Combination of NELM-PSO

Risk i Assessment of i Diabetes Type-II is crucial in i preventing it and i reducing the risk of various comorbidities. There are many i existing machine i learning models for predicting i Type-II diabetics in i short term future or in unspecified future. But obtaining a model having optimal performance and predicting i diabetes risk in long term future are the main problems. i These problems are i handled in this work by i proposing i a stacking based integrated KELM i model to predict the i risk of diabetes Type-II for a person within five years after assessment. The Pima Indian Diabetic Dataset (PIDD) and a Diabetic Research Center dataset are used in this study. i A Min-Max normalization is i used to pre-process i the noisy i datasets. The HAFPSO i algorithm used in i this work i explores the best combination of Base i learners by increasing the i Classification Accuracy (CA) i and decreasing i the kernel i complexity of the i optimal learners. i Finally, the model is i integrated by utilizing the i KELM as a meta-classifier that i combines the i predictions of the twenty Base Learners. The i proposed i method is assessed i with different i measures such as accuracy, i sensitivity, i specificity, Mathews i Correlation i Coefficient, and Kappa Statistics. i The proposed KELM-HAFPSO i approach has got i better values i of the considered metrics confirming its effectiveness in identifying type-II diabetes. i The proposed method helps the clinicians to predict the i patients who i are at a i high risk of Type-II diabetes i in the i future with the i highest i accuracy of 98.5%. The results i obtained show that i the KELM-HAFPSO i approach is a i promising new i tool for identifying type-II diabetes.


Introduction
Diabetes is a chronic disease affecting the world's general population.The Diabetic Patient has elongated blood glucose levels, and they are also under risk of developing life-threatening medical conditions in the future.Research [1] has shown that 693 million people will be affected by diabetes by 2045.Diabetes is graded under Type-I, Type-II, Gestational Diabetes, Variation Diabetes, Maturity Onset Diabetes of the Young (MODY), etc.Although a variety of variants have been identified, the most common are Type-I and Type-II.Type-I diabetes occurs when their immune system invades the beta cells that produce insulin.It occurs in one's early stages of life and also affects the later stages.Type-II diabetes normally affects humans who are in their middle age of life and makes their bodies insulin resistant.Insulin Resistance is a condition that doesn't allow the body's cells to absorb the glucose content.People who follow a poor diet regime and are physically inactive [2], [3] tend to experience Type-II diabetes more than the one who follows a healthy diet regime and exercise routine.The complications of Type-II diabetes include renal failure, blindness, bleeding disorder, hypertension, impaired wound healing, heart disease, stroke, and neurodegeneration [4].
Earlier prediction of people with type-II diabetes is essential for reducing the risk and delaying chronic complications throughout their lifetime.The assessment of risk factors for type-II diabetes assists in the monitoring of undiagnosed patients at risk.The identified factors that are linked to increased risk of Type II diabetes [5] are adults with BMI value ≥25kg/m 2 , overweight women planning for pregnancy, individuals with a family history of diabetes, signs of insulin resistance, physical inactivity, etc. Glucose concentration in the blood plays a significant role in predicting diabetes.Diabetes develops when the Fasting Plasma Glucose (FPG) of the individual exceeds an average of 126 mg/dl.The FPG level in the blood is tested after eight to twelve hours of fasting, and the Postprandial Blood Glucose (PPG) level can be tested after a meal.A two-hour plasma glucose level less than 140mg/dl indicates that patients will have greater potential for Type II diabetes [6].A PPG value (≥160 mg/dl) indicates the possibility of Type II diabetes and also cardiovascular disease.FPG has a high degree of specificity when compared to PPG in predicting Diabetes.Glycosylated Haemoglobin or Haemoglobin A1C (HbA1c) [7] is used to evaluate the sugar level present in the Red Blood Cells.An HbA1c value of≥6.5% indicates Type-II diabetes present in the haemoglobin proteins.In recent years the possibility of identifying Type-II diabetes in an earlier stage and increasing classification accuracy has been addressed by machine-learning algorithms [8].A wide range of Machine Learning (ML) algorithms [9], [10] has been proposed for diagnosing Type-II diabetes.Machine Learning leads to predictive modelling, an approach that develops a mathematical model to offer accurate predictions [11].Due to the vast amount of healthcare data generated daily worldwide, this process can be achievable.The clinical data analysis offers better healthcare solutions to the patients and also aids in financial and operational improvements.Integrating ML algorithms and data systems yields in predicting Type-II diabetes earlier.M.R. Islam et al [12] observed the fact that Social networks are the repositories of communication between interested people and repository for their communication attributes like images / videos transferred and feeling / sentiments shared.They have exploited the repository for investigating the moods of people and their attitude and thereby for detection of depression problem in people.They have collected facebook data and applied ML techniques including Decision Trees.Finally, after evaluation of the considered algorithms, Decision tree was found to be effective and accurate than the other Machine learning algorithms for the detection of depression in people.S.S. Reddy et al. in [13] discussed about an important problem of diabetic patients i.e.Diabetic retinopathy.The main thrust of their paper is to develop an effective model for diabetic retinopathy prediction.Different features were considered from the selected UCI data set.Then they have developed different ML models along with Support Vector Machine (SVM) with Gaussian kernel and applied on the data set.The proposed SVM algorithm was evaluated along with the other considered algorithms using novel performance measures and found that their proposed SVM algorithm performed better than other approaches.R.Sarki et al. [14] emphasized the fact that Diabetes is a major risk for the people and the diabetes patients have a high probability of getting different eye complications.They have highlighted different research related works on detection of diabetes correlated eye ailments.The works they have studied include datasets resources and different detection methods.The detection methods considered for their study include image processing and machine learning.Their work also includes study on suitable performance metrics for the diabetes related eye ailment detection.The remainder of this paper is structured accordingly.Section 2 has related research work Section 3 introduces the problem statement and new models of the current prediction framework; Section 4 demonstrates the formulation of HAFPSO algorithm; Section 5 describes the KELM methodology proposed; Section 6 explains our study's methodological setup; Section 6 describes the dataset and various important aspects used in this study for experimental evaluation, Section 8 demonstrates the experimental results and discussion and Section 9 concludes this paper.

Related work
In recent years, a wide range of clinical trials has been focused on predicting type-II diabetes, in order to evaluate the associated risk factors effectively.R. DelshiHowsalya Devi et al. [15] used the Farthest First Clustering (FFC), SVM, and Sequential Minimal Optimization (SMO) algorithms to present a hybrid approach for Type II diabetes prediction.Bum Ju Lee et al., [16] argued that the hyper triglyceridemic waist (HW) and Waist circumference (WC) is the high associated risk factor for type II diabetes.Triglyceride (TG) is not considered a strong factor when compared to these.Hang Lai et al [17] developed an interactive computer program to help doctors predict the risk of diabetes in their patients and provide preventive measures.They suggested that their proposed model performs well in detecting Type-II Diabetes than the models using Random Forest and Decision Tree.Karim M. Orabi at al. [18] designed a predictive system for Type-II diabetes, which can predict the person's age in which they are prone to be diabetic by using the regression technique and random code mechanisms.Researchers did not consider the crucial risk factors associated with Type-II diabetes, however, and therefore the model suffered from low precision values.Namrata Singh et al., [19] developed a hybrid model utilizing an ensemble-based approach (XGBoost) to extract the rules from SVM to diagnose hypertension among diabetic patients.However, they did not handle the class imbalance problem present in the dataset.Researchers have integrated several ML techniques to predict several diseases related to Type-II diabetes with other diseases in healthcare [20].Xiao-lu XIONG et al. [21] adopted a Cross-sectional Retrospective Study in Chinese Adults by using different ML algorithms (AdaBoost (AD), Multilayer Perceptron (MLP), SVM, Trees Random Forest (TRF), and Gradient Tree Boosting (GTB) to choose the most appropriate technique for the prediction of critical risk factors present in Type-II diabetic patients.This study evaluated eleven risk factors that are closely related to this disease.In literature, Bassam Farran et al. [22] combined four different ML techniques (Multifactor Dimensionality Reduction (MDR), k-nearest neighbours (k-NN), Conventional Logistic Regression (LR), and SVM to create predictive models for evaluating the likelihood of hypertension, type-II diabetes, and comorbidity using the Kuwaiti national health dataset.This work builds a predictive model using non-intrusive data to identify diabetic and hypertension patients at high risk.Here the risk factors associated with Type-II diabetes are BMI, ethnicity, and family history of diabetes.Shiva Shankar et al. in [23] highlighted the importance of diabetes prediction.They have considered the problem of doctors/hospital authorities to predict probability of a diabetic patient's early readmission into hospital with diabetic problems and complications.For attaining this aim, they have considered appropriate ML methods.They have taken dataset of over hundred hospitals obtained from UCI repository for the purpose.Applied the ML methods on the data set considering 46 features.Then evaluated them based on different evaluation metrics.Gradient Boosting method was found to be most effective among all the algorithms considered by them for the prediction.Reddy Shiva Shankar et al. [24] in their work concentrated on detecting of different kinds of diabetes including type 1 and type 2. They proposed a novel method for the detection using Datamining algorithms.Applied different mining methods including Naïve Bayes (NB) and SVM on medical data of hospitalized patients.An expert voting was also used to select the best algorithm for each record.The scheme proposed in this work was compared with their counterparts using tenfold cross-validation.It was found that the proposed scheme obtained good accuracy in detecting a particular diabetes for the considered dataset.R Shiva Shankar et al [25] discussed about different side effects of diabetes especially diabetic nephropathy.The main aim of this work is to predict diabetic retinopathy with an effective deep learning method.Implemented different deep learning methods by considering dataset consisting of 84 records.Applied the deep learning methods on the dataset with 21 features and evaluated the results.Then compared and contrasted the algorithms and finally concluded that deep belief network is most effective among algorithms considered for better prediction of diabetic nephropathy.Bassam Farran et al. [26] proposed a prognostic model to predict future risk for Type II diabetes (within 3,5 and 7 years) using LR, k-NN, and SVM from Kuwait health network data.They demonstrated that this model could identify subjects at higher risk for Type-II diabetes and prognosis at an early stage.We have performed an advanced study A. Sheik Abdullah et al. [27] using an enhanced combination of PSO and Decision Trees to examine the risk factors correlated with Type II diabetes.This study evaluated the risk factors associated with diabetes using a mathematical model named Fishers Linear Discriminant Analysis (FLDA) for the discovered attributes.Han Wu et al. [28] used K-Means and LR algorithms to implement a data mining technique for the prediction of Type 2 diabetes mellitus.This model suffers from high time complexity that occurs in the pre-processing phase.Hamid R. Marateb et al. [29] applied an Expert-Based Fuzzy Micro Albuminuria (EBFMA) Classifier, PSO, and Multiple LR techniques to identify MA in type-II diabetes patients without measuring urinary albumin.The limitations found in this study is its small sample size and cross-sectional nature; however, they did not consider taking a larger sample size for detailed investigations.Table 1 provides a summary of different methods proposed for detecting Type-II diabetes and its associated symptoms.Reddy Shiva Shankar et al [30] in this paper reiterated the fact that disease prediction in particular diabetes prediction using Datamining and ML is a topic of research interest.They have considered the problem of predicting early readmission of patients suffering with diabetes to the hospitals due to its problems.The problem is handled in their paper with the prediction of the readmission time by the use of Deep belief networks (DBN).Applied different algorithm including DBN on the Pima Indian Diabetes Dataset using R. Then after evaluation and comparison of the considered algorithms found DBN as optimal method for the patient's hospital readmission prediction.The present study proposed an automatic Type-II diagnosis system utilizing KELM and a hybrid PS0-AFSO optimization algorithm which could reveal the significant factors associated with Type-II diabetes.In the first phase, the input training dataset is divided into five sub-samples by a fivefold cross-validation approach where 90% of the input are used for training and 10% are used to test the model.In the second phase, a hybrid PSO-AFSO (HAFPSO) optimization algorithm is applied for improving the accuracy and base learner optimization.In the final phase, the proposed KELM framework is deployed for automatically selecting the appropriate classifier based on the derived features.The classifier chosen determines if a person is at or not at risk of developing Type II diabetes.The experiments were conducted on Pima Indian Diabetes Dataset (PIDD) and a Diabetes Research Centre dataset.The table1 illustrated the fact that most of the algorithms have got less than 90% accuracy.But some of the works based on the Particle Swarm Optimization have got more than 90%.This fact is motivation of this work for proposal of a technique based on the PSO that could obtain better results.
The key goals of this research work are summed up as follows: i. Utilizing two datasets (PIDD and Diabetes Research Centre) to classify important risk factors associated with Type II diabetes.
ii. Formation of theKELM-HAFPSO stacking method to assess Type II diabetes accurately.
iii.The solution for the multi-objective selection problem is analyzed using the HAFPSO algorithm by WherePC and NC denote the actual positive and negative classes respectively.The exact label value which indicates the actual presence of the disease is indicated by ELj and the predicted label of the proposed model is indicated by PLj.At any moment, the sum of the four random reference variables is equal to one which is indicated by the following equation.Where NCS is the number of classifiers selected by the HAFPSO stacking approach and TNCdenotes the number of classifiers present in total.The main aim of this paper is to find the optimum number of base classifiers required to obtain the maximum CA value.NBC represents the kernel complexity for the optimum learners chosen.

Formulation of HAFPSO
Swarm Intelligence is a collection of algorithms that are inspired by nature's way of solving the problems.The emergent complexity of the diabetes prediction model can be solved by using swarm intelligence.

PSO for Global search
The PSO algorithm [20], [31], [32] is inspired by the social behavior of birds flocking, fish schooling and swarm theory for solving continuous optimization problems.It, in addition, provides fast convergence which leads to an optimal solution.The particles are often anything that will fly through the multi-dimensional search space to seek out an optimal solution for the given problem.In the initialization phase, they are allotted a random initial position and velocity.The position of the particle indicates the solution based on the value obtained by the objective function.The particles memorize the position of the best search space they found.Velocity is a weighted sum of three components namely the old velocity, the velocity of the previous best solution, and velocity of the neighbor's best solution.
The PSO swarm consists of a set of particles also known as the initial population where } ,……,p ,p {p P n 2 1

=
. The position of the particle which represents the candidate solution is represented by the fitness function f.For any time or minimal values to prevent the particles from exiting the search space in the first iteration.When the algorithm enters its main loop, the velocity and the position of the particles are iteratively updated until convergence.The rules used to update the velocity and position are described below.
Where w represents the inertial weight, The values of w,   and should be allocated with appropriate values to prevent the velocity from entering infinity.The value of the acceleration constant is usually the numbers between zeros to four.The PSO algorithm pseudocode is shown in Algorithm 1.The velocity update rule follows three characteristics to generate the local behavior of the particles which is listed as follows: • Inertia: It helps the particle keep track of the previous flight direction and helps to prevent it from rigorously changing direction (velocity).Choose the particle with the best fitness value of all as Global-Best-Position For each particle Compute the velocity of the particle using equation ( 13) Update the position of the particle using equation ( 14) End for t=t+1 End Begin

AFSO for local search
The AFSO algorithm mathematically models the collective movement and social behavior of the fish [33].The algorithm is highly convergent, fast, versatile and accurate.The AFSO algorithm imitates the behavior of the fish (preying, swarming, and local search) to reach the global optimum value.The environment where the Artificial fish (Af) lives is the solution space and it also contains the states of every Af's present.The next behavior of Af relies on its current state as well as its local environment state.First, the algorithm generates potential solutions randomly and then it performs the search to find the optimal solution.The Afanalyzes the environment around it by using its vision.
The current state of the Afis represented asCs, visual distance as Vd, and Visual Position as Pv.The Af moves to its next state (Nstate) if it finds the state at Vp better than its Cs.Let ) ,........, , ( Where rand () generates a random number 0 or 1, n is the number of variables used and step is the step length.The AFSO algorithm includes three behaviors namely A f -prey, A f -swarm, and A f -follow.
a) A f -prey: This function is based on the biological behavior of the A f chasing its prey(food).The statebased on its random visual distance is represented as C v , and F represents the prey concentration value (objective function/fitness value).When the value of V d increases the A f finds its global extreme value rapidly and converges soon.() b) A f -swarm:TheA f always moves in swarms(groups) to exist in its colony and avoid potential threats.() .Let C centre be the center position, n c be the number of companions,  be the crowd factor, and n be the number of , this indicates the companion center has a major amount of food (higher fitness value) and less crowd which results in the A f moves near the companion center.

HAFPSO optimization
For training the KELM neural network, a HAFPSO optimization algorithm is used and makes full use of both the algorithms.This algorithm combines the behavior of the artificial fish in the swarm and the particle information of the PSO into one [34].Here, first, the PSO algorithm is applied to the Type-II diabetes prediction problem to initiate the global search.When the PSO algorithm is terminated, the AFSO algorithm is started; this obtains the final best population from PSO as its initial population and then conducts the local search.The AFSO performs the three functions of the artificial fish to obtain the final fitness value (best solution).The summary of the HAFPSO algorithm is narrated as follows and its flowchart is demonstrated in Figure 1.
Step 1: Initialize the population of PSO and set its iteration value b=0 Step 2: Check the convergence condition of PSO: If b=bmax-PSO, Then goto Step 9 or else go to step 3.Where bmax-PSO indicates the maximum iteration number of PSO.
Step 3: Calculate the fitness value of each particle in the population and sort each one of them based on their fitness value.
Step 4: Update the Local-Best -Position, and Global-Best-Position experienced by the j th particle.
Step 5: Update the velocity of each particle using equation ( 13).This process prevents the particle in the search space from entering in the wrong direction.
Step 6: Update the position of each particle using equation (14).The position indicates the current best solution and fitness value.
Step 8: Set b=b+1 and go to step 2.
Step 9: Initialize the parameters of AFSO and select the elite population of the PSO as the initial population of AFSO.The scoreboard is updated with the best particle from the elite population.Initially, the iteration b is set as 0.
Step 10: Check if the convergence condition of AFSO is met or not.If b=bmax-AFSO the algorithm converges and the result is updated to the score-board.Here, bmax-AFSO is considered as themaximum number of iteration present in AFSO or else go to step 11.
Step 11: Function selection-Each function represents the different behaviors (Af-prey, Af-swarmand Af-follow) exhibited by artificial fish.The artificial fish finds its food (best fitness value) by simulating these three behaviors respectively.If the best fitness value is found select the best behavior to perform if none is available to select the Af-prey function.
Step 12: Update the score-board-Compare the best fitness value between each artificial fish and the score-board.If the fitness value of the artificial fish is found to be superior to the fitness value present in the score-board then update the score-board.
Step 13: Set b=b+1 and go to step 10

Extreme Learning Machine (ELM)
ELM is a novel technique for classifying patterns and approximating functions.ELM is a single, feed-forward neural network with one hidden node layer [35].Weight is assigned randomly between the inputs and hidden nodes and they remained constant throughout the entire training and predicting phases.The weights which directly link the hidden node to the output can be trained quite quickly.ELM EAI Endorsed Transactions on Scalable Information Systems Online First improves the prediction accuracy, gives better generalization performance, and reduce the risk of overfitting and lowers computational cost [36].The ELM model first constructs a classification model from the input dataset D={ where l=1,2….,nwith n samples and f input features.The ELM network consists of i input units, h hidden neurons, and o outputs, where its model output can be formulated as follows: Where the weight , which connects the hidden neuron to the i th output neuron.
g   is a vector representation of output hidden neurons for its which is the weight representation of the g th hidden neuron, and f(.) defines the sigmoid activation function.Both the weight (wg) and bias vectors (bk) are generated randomly from a Gaussian distribution.By taking the weight and bias vector as input, the next step generates a matrix that provides the hidden layer output HO.An account of this, the weight matrix W=[ ] is estimated by a Moore-Penrose pseudo inverse approach as follows: Where P= ] denotes a n o matrix, where the actual target vector is the l th column present and o R p(l)  .After the ELM network's parameters are identified, the class label for the Type-II diabetes prediction is measured as follows: Here, the predicted class label is represented as L in the above equation.

EAI Endorsed Transactions on Scalable Information Systems
Online First

KELM
KELM is a kernel representation of the ELM which can be derived as follows.
For S arbitrary distinct samples }, the output function of the ELM with X hidden neurons is derived as follows.
The output weight vector that lies between the hidden layer X is denoted as = with respect to the input a, whose sole purpose is to map the data from the input space to the KELM's feature space.When the training error and output weights are reduced it automatically improves the generalization performance of the KELM i.e., derived as follows: The least-square solution obtained in equation ( 26) is based on Karush-Kuhn-Tucker Theory [37] and it can be derived as shown in equation (27).
Here A indicates the hidden layer output matrix, O is the expected sample output matrix and C the regulation coefficient.The ELM learning algorithm's output function is derived as follows.
EAI Endorsed Transactions on Scalable Information Systems Online First In the above equation for feature mapping, initially, the value of h(a) remains unknown.In ELM, the unknown value of the kernel matrix can be identified by the Mercers condition (2020) as shown as follows: ) , ( ; ) ( ) ( ; From the above equation, the output of the KELM can be derivedas Where M=AA T and k(a,b) is the kernel function of the hidden neurons present in the KELM.The feature selection approach utilized by KELM follows a Leave One Out Error (LOOE) scheme [38] for faster convergence and to find the optimal feature in the subsets.This paper uses four kernel functions as four Base Learners (BL) such as Linear-KELM (Lin-KELM), Polynomial KELM(Pol-KELM), Sigmoid-KELM(Sig-KELM), and Gaussian KELM(Gaus-KELM).
i. Lin-KELM:It offers the best performance for larger datasets by generating the best solution for the optimization problem and increasing the predictive performance simultaneously.It generates the results in a lesser amount of time.
ii. Pol-KELM:The P-KELM finds the similarities and features from the input text.The exponent value of the polynomial kernel is always greater than one for various cases when its value is lesser than one it is said to be a fractional polynomial.Here d indicates the polynomial degree.
iii.Sig-KELM:The S-KELM is similar to the sigmoid function used in Logistic Regression.
iv. Gaus-KELM:In G-KELM, the input samples are mapped into the higher dimensional space in a non-linear fashion.There is no prior knowledge used in determining the parameter γ.
γ and C are the kernel parameters used.C is a constant used to tradeoff higher and lower-order features present in the input dataset.In this paper, the KELM and cross-validation method is combined in the training phase to yield higher prediction accuracy and reduce the overtraining problem.A fivefold cross-validation scheme is constructed initially to find the fitting parameters from the training dataset.The cross-validation based model selection used here automizes the four kernel functions used and reduce the Overfitting problem [39].The automated process can sometimes degrade the performance of the system when it does not consider the whole process of fitting the model.A HAFPSO optimization algorithm is used to boost system performance.

Proposed Approach
This work suggests the KELM HAFPSO stacking method to construct a Type-II diabetes prediction model.The method is clearly identified in subparagraphs below.

The KELM stacking approach
This segment prescribes the new KELM method for the diagnosis of Type II diabetes from different data samples.
The whole PIDD and the physical examination dataset are split into three parts-training, testing and validation.The training data set is used to get the model learners and the prediction error is calculated by the data set used for validation.This study utilizes a learning process with the HAFPSO algorithm for model selection by the twenty BLconstructed and the stack-based integration approach as shown in Figure2.The KELM learning model comprises of two core comprehensive modules.The first module is utilized for base-level learning which constructs the BLfrom the training dataset.The second module is the multiobjective generative module which generates an optimal solution by enhancing base learner count and CA count.The HAFPSO algorithm uses the validation set to perform the model selection procedure.The selected models are integrated via a stacking method.The preceding paragraphs elaborate on the concepts for the two modules used.value of the four base learners, base learner selection is considered as a significant factor in measuring the proposed model for performance evaluation.

Multi-Objective Generative Module
The integrated approach of the model shows better productivity than their individual counterparts.Parameter Analysis and model construction is considered as an important aspect of the proposed model.The multi-objective generative module is the HAFPSO stacking approach that incorporates both the selection techniques and the combination of models.The training data is used to provide the candidate solutions, and the validation data is used to determine the candidate solutions ' fitness value for each iteration that is a generative process.
i. HAFPSO for Parameter Analysis:This section discusses the function of the HAFPSO to maximize the number of base classifiers for the classification of both datasets and how to achieve a higher CA value.The selection of the model is known as a two-fold issue for maximizing the optimal solution and giving a higher accuracy by using the minimum number of base classifiers.In order to find the right combination of base learners, this optimization process is most necessary.A binary coding scheme is employed to find a solution to the Type-II diabetes problem.The model's selection and rejection are indicated by the value '0' and '1' respectively.To find whether a model is selected or not the bit values '0' and '1' are used.The value '0' indicates that the corresponding model is not selected and the value '1' indicates that the corresponding model is selected.Therefore, the number of BLused matches the length of the solution.After the model selection is completed, the first benchmark value (CA) is increased and the next benchmark value(NBC) is reduced.
The evolutionary algorithms based on swarm intelligence areadopted to find a solution to multi-objective optimization problems and hence they are also known by the name multiobjective generalization algorithms.By investigating the candidate solution generated it selects the potential optimal solution by performing both global and local search.The HAFPSO algorithm combines the benefits of both algorithms to find the solution which results in improved accuracy, faster convergence, and enhanced global searching ability.The AFSO algorithm is acquainted with the PSO during the iteration process.It not only avoids the premature phase in PSO but also improves the exploring and developing phases in the AFSO algorithm by increasing the diversity of the swarm optimization.This study uses a multiobjective prediction of type-II diabetes problems that must be simultaneously optimized.Since these goals are inherently contradictory, progress towards one goal can only be made to the detriment of at least one other.In order to achieve this goal, we typically aim for the best balance between competing objectives.The HAFPSO algorithm in this analysis is used to optimize the accuracy and number of base classifiers.This process can be explained by the upcoming step based on the concept of influence between two decision vectors present in the objective space.The solution is generated on one condition which the decision vector v influences another decision vector u(indicated by v<u), if and only if: The above equation states that the objective of v is not aggravated than u and it also states that there is one objective present in v which is surely greater than the objective present in u.Here Nreflects the cumulativenumber of objectives used consequently ) (v f j specifies the value for the j th objective function based on u.
EAI Endorsed Transactions on Scalable Information Systems Online First The AFSO algorithm leaves the current peak and searches for the best solution.In the initial phase, the algorithm has large velocities that focus more on exploration and in the later phase when the velocity converges to zero, it goes to exploitation.The balance between the two algorithms leads to a potential optimal solution.Once the HAFPSO algorithm is completed, a potential optimal solution will be obtained.This solution leads to a kernel that indicates the trade-off between the complexity of learning with kernels and prediction accuracy.To build a precise KELM model, the final particle derived is considered as the potential optimal solution with high accuracy.The final kernel was therefore built by the selected learners at the basic level of the encoded particle with a bit of value one.

iii. Stacking Based Model Integration:
To develop a KELM model, model integration plays a crucial role.The predictive behaviors of the classifierscan be enhanced by using the correct model integration method.For model integration, the proposed hybrid PSO-AFSO uses the stacked generalization approach.The meta-data is combined by using KELM.
Experimental analysis was conducted on the KELM model to explore the combination of the selected BLand the KELM.KELM has gained wide popularity in various classification tasks due to the promising results obtained.As the new attribute to the KELM classification function, the predicted output of the BLis used.The meta-data from the training set will be loaded into the KELM meta-learner.The qualified KELM classifier is utilized by the test set to obtain the final predictions.The output of the KELM is the subjects diagnosed with Type-II diabetes; otherwise, the output of the KELM is the subjects not diagnosed with Type-II diabetes.

Dataset Description and Preprocessing
This Study uses two datasets National Institute of Diabetes and Digestive and Kidney Diseases is where the PIDD dataset originated.The dataset was originally created with an objective to predict whether a patient is subjected to diabetes or not based on the diagnostic measurements obtained from the dataset.The study was conducted on the Pima Indian Women population near Phoenix, Arizona for a period of five years.This database is familiar for researchers to identify the onset of diabetes based on eight features from the 768 samples as shown in Table 2.The eight features tend to be the significant risk factors when predicting Type-II diabetes.From Table 2, the ninth feature describes a class label that identifies whether a patient is subjected to Type-II diabetes or not.Diabetes pedigree function is a likelihood value derived from the patient's family history of diabetes.
From the 768 samples obtained, 268 patients were actually diagnosed with diabetes within a one year period.The actually identified 268 samples are annotated with the value one while remaining values are labelled as zero.Since our proposed work focuses on identifying Type-II diabetes, the insulin measures and the number of times pregnant is not considered as a very significant risk factor.
From the 768 samples obtained, 376 samples lacked experimental value because few attributes were considered missing.Due to the errors and deregulation present in the dataset, the missing value occurs.If the missing values are not replaced it leads to inaccuracy in the results.The pgc, tst, dbp, si, and bmi values cannot be termed as zero if it is then the real value is missing.The zero values are interchanged with the mean value of the corresponding attribute present in the training data to replace the missing values.The pre-processed data which is free from errors is taken by the disease prediction model for processing.
In our second dataset, Physical Examination Data were collected from a total of 8700 samples, where 5000 were not affected with Type-II diabetes and 3700 were affected with Type-II diabetes from a Diabetes Research Centre, Tamilnadu.There were a total of 230 indicators present in the physical examination dataset and some of them had no significant relationship with Type-II diabetes.For our study, we selected some indicators manually which are associated with Type-II diabetes of some sort and they are presented in the Table 3 shown below.
In both datasets obtained some values were missing and some of the samples had more than one feature missing.The missing value is interchanged with the mean value here.The mean value of the samples affected by Type-II diabetes and samples not affected with Type-II diabetes is calculated separately.Since each feature has a different interval replacing it with its mean value affects its prediction accuracy.This study uses a Min-Max Normalization to make sure that every feature at least has a value between zero and one.

Performance Evaluation Metrics
The efficiency of this proposed workis measured using different evaluation metrics such as Accuracy, Sensitivity, Specificity, and Mathews Correlation Coefficient (MCC) has been used.ROC (Receiver Operating Features) curve provides a graphical interface that offers an estimation of the predictive performance of the model proposed.The curve shows the true positive as well as the false-positive rates.Accuracy indicates the percentage of correctly identified samples from the PIDD dataset.The sensitivity indicates the probability of accuratelyidentified diabetic patients.The specificity indicates the probability of accurately identified non-diabetic patients.The binary classification in the Type-II diabetes prediction problem can be measured by using MCC.The MCC value lies in a range between -1 to 1.The value -1 indicates that the model's prediction is completely inaccurate in terms of prediction and observation.An accurate prediction value is represented by 1 and 0 indicates that the model's prediction is not better than a random prediction.
The Kappa Statistics (KS) is a critical factor used to check the stability of our proposed approach.It is a comparison technique that compares the result of our proposed model with a result randomly generated by another classifier.The KSvalue ranges between 0 and 1.The range value close to 1 indicates that the model performs well and a range value close to 0 indicates that the model's performance is worse.The KSequation is derived as shown below  .This study utilized 150 combinations of both these parameters and the KELM model used 45 hidden neurons and a sigmoid activation function.The dimension size D of the problem is taken as 20 and the population size is initialized to 50.The maximum number of iteration M is set to 1000 to increase the quality of the potential optimal solution.Rand() is a random number value that lies between [0,1], and the step size is set to 0.3.The value of the visual field Vd is set to 3.5, try-number is set to 10, and the crowded factor is  =1.
The values of inertial weight w are set to 0.7 and 2 2 1

= = 
for the PSO algorithm.For the Gaus-KELM the parameter γ is tuned to 0.01, and in the Sig-KELM the parameter γ and C is both tuned to 0.5 and 0.01.In Pol-KELM the values for C, γ, and dare tuned to 0.5, 0.25, and 1 respectively.The parameter used for the model selection process in this work is depicted in Table 4.

Experimental Setup and Test Functions
The objective of this paper is to improve the classification accuracy of the proposed model.The performance evaluation of this proposed model is compared with various conventionalML techniques that exist in the literature to diagnose Type-II diabetes.The Pseudocode of the evolutionary algorithms is coded in Matlab 2018(a) environment and the experiments were conducted on an Intel Core i9-9980HK Processor, 5.00 GHz maximum turbo frequency, and windows 10-64 bit OS.In this study, nine statistical optimization problems are solved to verify the optimization performance of the HAFPSO algorithm.Nine conventional benchmark functions(F1-F9) are used to verify the effectiveness of the proposed algorithm by comparing it and testing it with PSO [31], AFSO [33], GWO [40], Nondominated Sorting Genetic Algorithm-II (NGSA-II) [41] , and Pareto Archived Evolution Strategy(PAES) [42].The functions from F1-F3 are termed as single peak function which is used to measure the algorithm's exploitation capacity, functions from F4-F6 are called the multi-peak functions and they are used to evaluate the algorithm's exploration capacity, and the functions F7-F9 are called the fixed multi-peak functions and they are used to evaluate the algorithm's ability to escape from the local minima.These 9 functions have their own expressions and variable ranges as shown in Table 5.In Table 5, the number of variables used is represented as v, the range of values used is represented as Range, and the optimal value is indicated asfoptimal.The effectiveness of the proposed approach can be measured using four evaluation EAI Endorsed Transactions on Scalable Information Systems Online First metrics namely Average-Fitness, the Best Value, and Standard Deviation.Table 6 shows the three-evaluation metrics used to evaluate the proposed methodology with the other five techniques.
Here Niindicates the number of iterations used, Fj indicates the fitness value, and µ indicates the mean value of the population.

Mathematical
Minimum value calculated for the overall fitness * max Measures the reliability of the proposed model

Computational Complexity of Kernels
This section discusses the computational complexity of the proposed approach.The worst-case computational complexity encountered in the KELM stacking approach is given as follows:  the number of the training samples used and M is the number of models selected.The computational complexity of the solution proposed is derived from O(TS*( n3 + nf 2 +nf+NS 2 )).Complexity depends primarily on three factors: the number of basic learners, the KELM meta-learner and the method of optimization.

Discussion
The purpose of this study is to identify individuals who are at risk for Type-II diabetes and who are potentially vulnerable to this disease.The KELM-HAFPSO model proposed has been assessed for Accuracy, Sensitivity, Specificity, MCC, and KS by means of 5-fold crossvalidation on the basis of two related datasets.Comparative Analysis was conducted between the proposed KELM-HAFSO with the other five competitive methods namely ELM-GA [43], Decision Tree C4.5-PSO [27], k-NN [22], MLP [21], LR [17], SVM [38] and NB [16].

Test functions to evaluate the performance of HAFPSO algorithm
The accuracy of the novel hybrid algorithm has been verified by evaluating them with nine different benchmark functions.of the above study, it is obvious that the proposed HAFPSO algorithm will effectively boost the PSO and AFSO algorithm's convergence speed and accuracy, respectively.In other words, the suggested AFSO algorithm is statistically assumed to outperform the other five standard algorithms in terms of their applicability and practicality.The potential optimal solution obtained from the two input data sets on a single run as shown in figure 7. The potential optimal solution is obtained from the input dataset on a single run of the HAFPSO algorithm.In figure 7, B1 represents the classification Accuracy CA and B2 represents the number of base classifiers used NBC and it also shows the accuracy and kernel complexity trade-off between the potential optimal solutions.Each kernel is termed as a solution that ensures that the accuracy and kernel complexity is balanced.From Figure 7 it is clear that solution 5 has the best accuracy obtained when the appropriate kernel used.Figure 7 shows the best kernel for validation and test data with 5 BL's, and a B1 value of 0.9992(D-I) and 0.9857(D-II).11.Table 11 indicates that our proposed model is a promising tool to classify highrisk diabetic patients with higher accuracy of 99.92% and 98.57% for Datasets D-I and D-II respectively.Here f is the mapping function of the KELM network,  is its output weight,psis the size of the feature subset, and 0 || .|| is a Lo normalization function.i  is a binary value that indicates whether the i th feature in the subset is selected or not.The Lo normalization is non-continuous, so there arises a complexity in the optimization function to find the fitness value which can be solved by using a relaxed version of L1 normalization.normalization.The value ˆis not binary and it can take any real number values.If the i th entry ˆis termed to be non- zero then it is immediately selected as a feature.The feature can be extracted via a multivariate LRtechnique [43].The individual risk factor associated with Type-II diabetes can be identified by substituting the features derived from the KELM algorithm directly into the LR equations.The prediction class is denoted by a variable P which takes the following values such as positive-class (value-1), negativeclass(value-0) and the predictor variables n a a a ,......., , 2 is evaluated by increasing the classification accuracy and decreasing the number of base classifiers used.The KELM model serves as a meta-classifier that accurately classifies the test samples linked with a higher risk of type-II diabetes based on the risk factors.The comparison results of the novel Hybrid HAFPSO algorithm with six competitive algorithms evaluated using nine different benchmark functions shows the superiority of our proposed algorithm in terms of accuracy and optimalism.The experimental results reveal that the proposed approach outperforms the other seven competitive classifiers in terms of accuracy, sensitivity, specificity, MCC and Kappa Statisticson both the datasets applied.In future side effects of diabetes could be explored.
uses a two-class classification problem on the PIDD and Diabetic Research Centerdataset and N is known to be the total number of samples present in the dataset.The N samples are classified into positive (p) and negative (n) using the following expression.p=TrueNegative + FalsePositive = ∑   =1 j1 + ∑   =1 j4 two benchmark values which are specified in the following equations shown below: . The global optimum solution (best solution) with respect to the fitness function t j b  is estimated as for a given time stept.The particle j p receives its information EAI Endorsed Transactions on Scalable Information Systems Online First from its neighbor P N j  .The PSO algorithm is initialized by generating a random position for the population at the starting point of the Region R   R .The velocity values are usually initialized R , but sometimes they are also set to zero

1 =
,......... , 2 and the output vector of the same is represented as (a)] h (a),....., (a),h h [ h(x) x 2 1 This module constructs the base learner from the training dataset.To construct a KELM learning model with higher prediction accuracy four kernel functions are employed as the base learners.The whole PIDD dataset and the physical examination datasetare divided into two parts randomly in a ratio of 90 to 10. Thetraining and validation dataset is constructed from 90% of both datasets obtained.This dataset can be simultaneously used for both the model construction and estimation of the prediction error.The testing set is obtained from the remaining 10% data which is utilized to evaluate the generalization error.The five samples of distinct training datasets obtained is a result of fivefold cross-validation.To this five training dataset created four learners are applied.These four BLare Lin-KELM, Pol-KELM, Sig-KELM, and Gaus-KELM.The validation set estimates the fitness of each solution and identifies the optimal solution.The testing dataset evaluates the performance of the KELM model.The diversity present in the four kernel functions used serves as an essential component for building an efficient KELM model.The trained BLshould be diverse and complementary at the same time to obtain the maximum information from the metadata used for prediction.The crucial factor for the learning aspect at the base level is producing a fair amount of diverse KELM's.The five training samples generated by the fivefold cross-validation gives rise to distinct base learners.Consequently, the diversity is obtained by using the bias EAI Endorsed Transactions on Scalable Information Systems Online First

( 43 )
Here, TN represents the total number of observations found, P(observed) indicates the actually observed agreement, P(chance) indicates the chance value, Aj1 indicates the Truepositive, Aj2 indicates the True-negative, Aj3 indicates the False-Positive, and Aj4 indicates the False-Negative values.

Table 4 .
Parameters used in the model selection processThis section gives a brief description ofthe various parameters used in this study.The regularization parameter C and the kernel function parameter γ is considered as the most important aspect in KELM and proper care should be given to them while tuning.The values of C and γ is taken as i.The training complexities of the four classifiers used are listed as follows: Lin-KELM has a complexity of O(nf), and Pol-KELM, Sig-KELM, and have a complexity of O(n 3 ).Where n represents the sample size and f represents the number of features present in the sample.The computational complexity of iterations 1-6 of the HAFPSO algorithm is O(TS*(nf+2n 3 +nf 2 )).Here TS is the number of training samples used.ii.The computational Complexity of the KELM model for both accuracy and kernel complexity of the

Figure 4 , 5 ,
and 6 shows the different optimization curves obtained for 9 different benchmark functions by HAFPSO, PSO, AFSO, GWO, NGSA-II, and PAES algorithms to compare the convergence rate.Figure 4 consists of 4 representations for 4 benchmark functions(F1-F4) evaluated by the six algorithms respectively.The remaining representations for the other five benchmark functions(F5-F9) is shown in Figure 5 and 6 respectively.

Figure 4 .Figure 5 .Figure 6 . 8 . 2 .
Figure 4. Comparison of Optimization results with Benchmark Functions.(a)-(d): F1 -F4 Figure8displays the bar graph of the five performance measures namely accuracy, sensitivity, specificity, MCC, and Kappa Statistics obtained from different classifiers, and the proposed KELM-HAFPSO.The higher accuracy of the proposed KELM-HAFPSO approach is shown in Figure8forboth datasets D-I and D-II when compared with other classifiers.The second highest accuracy value is obtained by C4.5-PSO.Figure8demonstrates the significant sensitivity values obtained by our proposed approach.The classifiers such as k-NN, MLP, LR, and SVM yields lower sensitivity value than others.In figure8, our proposed model achieves higher specificity values when compared to other classifiers.The MCC scores obtained are shown in figure8, where our proposed approach achieves maximum MCC score(0.935)than other classifiers.In contrast, MLP and SVM achieve the most lowest MCC values.Figure8demonstrates the Kappa Statistics comparison of the proposed model with other classifiers.The kappa statistics value of the proposed approach is relatively high when compared to others.The Kappa statistics value of k-NN is slightly low when compared to others.Lastly based on the seven competitive classifiers, the performance of the KELM-HAFPSO on the datasets D-I and D-II tendS to be the highest.The proposed algorithm is compared with ten classifiers from the literature as shown in Table

Figure 7 .
Figure 7. Accuracy (B1) versus kernel complexity (B2) of the potential solution obtained in a single run (a) D-I (b) D-II

Figure 8 .
Figure 8. Performance measures for the proposed KELM-HAFPSO approach compared with other competitive classifiers regularized coefficient and ||.||1 denotes the L1

Algorithm 2 :
f -follow:The following behavior represents the swarm moving towards the single fish or multiple fishes finding food.If F v >F s and has a major amount of food (higher fitness value) and less crowd which results in the A f moves Here Fv denotes the fitness value of Cv, Fcentre denotes the fitness value of Ccentreand Fs denotes the fitness value of Cs.The following pseudo-code is provided for the AFSO algorithm.Pseudocode for the AFSO algorithm Begin Initialize the parameters of the AFSO algorithm such as population size N, Visual Distance, and iteration time t.

Table 2 .
Features present in PIDD dataset

Table 3 .
Features present in the Diabetes Research Centre dataset

Table 6 .
Evaluation Metric used for Fitness Estimation EAI Endorsed Transactions on Scalable Information Systems Online First multiobjective optimization problem is O(NS 2 ).Here, N denotes the number of objectives and S denotes the size of the population.This complexity is calculated for iteration number 7 to 36.iii.The predictional time complexity is represented as O(t) and it is termed as constant.For iteration Number 36-43, the complexity is computed as O(nMt).Here, n is

Table 7 .
Comparative Analysis for Optimization Results Table 7shows the test results of the six algorithms compared to evaluate their optimization results.The test results shown in Table7prove that the proposed hybrid HAFPSO algorithm has the best fitness capability for both the local and global search and it also prevents it from premature convergence.Additionally, based on the convergence accuracy, HAFPSO converges faster and in a smaller number of iterations than other algorithms.In view EAI Endorsed Transactions on Scalable Information Systems Online First

Table 8
from Table9and 10proves the efficiency of our proposed approach.The competitive performance of the proposed model over the conventional classifiers shows promising results and superior performance of the model.The proposed KELM-HAFPSO method has therefore been concluded to achieve greater accuracy of predictions than those of the seven competitive classifiers.In contrast ELM-GA, k-NN, MLP, and LR provide slightly less accuracy, sensitivity, specificity, MCC and Kappa Statistics value than the proposed model.
displays the details of both the test accuracy and validation accuracy obtained for the proposed KELM-HAFPSO model in a single run.The table8shows that during kernel formation Gaus-KELM and Sig-KELM is the frequently selected base learner to obtain the potential optimal solution and it helps to gain increased prediction accuracy.The swarm optimization algorithms generate a random combination of BLwhich leads to diverse classification.The detailed classification result of the proposed KELM-HAFPSO classification model is shown in Table9.For the classification of Type-II Diabetes in D-I, the Accuracy, Sensitivity, Specificity, MCC, and KS values are 0.999, 0.997, 0.869, 0.938, and 0.962 respectively.For instance, in D-II the Accuracy, Sensitivity, Specificity, MCC, and KS values are 0.985, 0.987, 0.832, 0.938, and 0.967.The performance of the proposed approach is superior to the other classifiers compared.The minimum and maximum kernel complexity associated with the potential optimal solution is 4 and 11 out of the twenty rounds of the algorithm.The average selection number of BL is 7 i.e., required to form a kernel.As shown in Table10, the proposed KELM-HAFSO model achieves the highest performance among all other competitive classifiers used with an average accuracy of 98.5%, Sensitivity of 98.2%, Specificity of 84.2%, MCC of 90.5%, and Kappa Statistics of 96.5%.In this study, we assessed the other seven competitive models for analysis on the same two datasets through 5-fold Cross-Validation.From Table10, the accuracy of SVM is much low when compared to other models.Table10lists the detailed classification list of ELM-GA, C4.5-PSO, k-NN, MLP, LR, NB, and SVM models.The average classification accuracy of C4.5-PSO and NB is slightly lesser than our proposed model by 0.098 and 0.046.As seen from the Table, the classification accuracy of SVM in D-I and D-II is 78.7% and 77.4% which is very much lower than that of our proposed model by 21.2% and 21.1%.The simulation results of the proposed model obtained

Table 8 .
Validation (B1) and Test Accuracy obtained for the potential optimal solution in a single run for datasets D-I and D-II

Table 9 .
Experimental Results of proposed KELM-HAFPSO with different runs on two different datasets

Table 10 .
Comparative Analysis of proposed Approach with individual Classifiers

Table 11 .
Comparative Analysis of Classification Accuracy of the proposed model with classifiers from literature EAI Endorsed Transactions on Scalable Information SystemsOnline First