Efficient Diagnosis of Liver Disease using Support Vector Machine Optimized with Crows Search Algorithm

The early and accurate prediction of liver disease in patients is still a challenging task among medical practitioners even with latest advanced technologies. The support vector machines are widely used in medical domain. It has proved its efficiency on producing good diagnostic parameters. These results can be further improved by optimizing the hyperparameters of support vector machines. The proposed work is based on optimizing support vector machines with crow search algorithm. This optimized support vector machine classifier (CSA-SVM) is used for accurate diagnosis of Indian liver disease data. The various similar state of art algorithms are taken for comparison with proposed approach to prove its efficient. The performance of CSA-SVM is found to be outstanding among all other approaches in terms of all metrics taken for comparison. It has yielded the classification accuracy of 99.49%.


Introduction
Liver is the largest organ in the body. Its main functions include digestion, remove toxins, fights infection, balance hormones and secrete bile juice. There are various liver diseases which are caused due to virus infection, excess amount drugs, poisoning, alcohol, obesity and many other factors. These causes liver failure which significantly damage the body as it leads to improper body functions. This is a life-threatening condition. Some common liver problems include hepatitis, fatty liver disease, liver cancer etc. There are many tests to diagnose liver dysfunction, liver biopsy, viral hepatitis tests, comprehensive metabolic panel, transient elastography etc. The initial stage of liver diseases are mostly unable to diagnose, as the liver functions normally even with partial infections. This creates a challenging task for doctors for accurate prediction at early stage. Early detection and treatment leads to healing of liver rather than leading to critical conditions. Many machine learning algorithms such as artificial neural networks, decision trees, support vector machine (SVM) and many others are used in the literature for liver data classification. A few recent works are discussed below and their classification results are tabulated below. The classification algorithms such as Naïve Bayes (NB), J48, Random tree (RT), K-Star are implemented using WEKA tool [1]. In [2], various algorithms namely Logistic regression, SVM, RT, Bagging techniques are compared for classification accuracy. Multi layer feed forward deep neural network (MLFFDNN) trained with back-propagation network (BPN) is used in [3]. XGBoost algorithm is used to predict the liver disease data collected from Andhra Pradesh, India. In this L1 and L2 regularization technique is used to improve efficiency [4]. The class imbalance in ILPD is handled using synthetic minority oversampling technique. Then the classification performance is evaluated for both balanced and unbalanced dataset using K-Nearest neighbor (KNN) and SVM [5]. Particle swarm optimization (PSO) is combined with SVM for feature selection and applied for classifying liver data [6].
The liver disease diagnosis done using SVM has found to produce good results. This algorithm works still more efficient when it is combined with heuristic and natureinspired meta-heuristic optimization algorithms (MHOAs). In [7], the modification of kernel and optimal set of SVM hyperparameters that are found using optimization methods such as random search, grid search and the Nelder-Mead method, has improved the classification accuracy of DNA sequence recognition problem. The learning vector quantization neural network algorithm and the Fisher-SVM coupling algorithm are applied for prediction of hypertension risk in steel workers. The efficiency of this combination is proved for varying sample size [8].
MHOAs jointly work with SVM for tasks such as parameter tuning and feature selection. The two key hyper parameters namely penalty parameter and kernel function width are mostly tuned for better efficiency in many works. A few includes, the MHOAs such as Ant colony optimization and PSO [9], Fruitfly optimization algorithm [10], accelerated PSO [11], Multi-verse optimizer approach [12], Simulated annealing [13] were adopted to find optimal set of parameters for SVM. To improve the SVM classification accuracy in high-dimensional datasets, the feature selection technique is applied with the help of MHOAs such as Grasshopper optimization algorithm [14] and Firefly algorithm [15] is used to train all the parameters of SVM. Many such MHOAs are used with SVM for specific applications. But still these algorithms are found to have some limitations. Most of the times the accurate results are not produced. Thus a robust algorithm that promises high diagnostic accuracy in early prediction is needed to solve the above mentioned issues.
In this work, Crow search optimization algorithm (CSA) [19] is firstly combined with SVM linear kernel to optimize its lagrangian values in order to improve the diagnostic efficiency of liver disease dataset. CSA is chosen among other MHOAs as it contains simple and efficient optimization steps. It also maintains good balance between exploration and exploitation. As it has only two tuning parameters, it is simple to apply and fast. It is also noted that it has proved its efficiency in many similar applications. The calculation of alpha and bias value is the critical task during the training of SVM. Many mathematical optimization algorithms like Quadratic programming, Least squares, SMO etc., have been used. Thus in this paper, the usual procedure of optimizing SVM lagrange values using SMO during training is discussed in steps. Then the details of CSA for optimizing these lagrange values in the place of SMO is illustrated. It is observed that the optimization steps of CSA-SVM is very simple and efficient.
The organization of the paper is as follows, Section 1 gives the introduction, Section 2 provides the details of Indian liver disease dataset, concept of SVM, training of support vector machine parameters with Sequential minimal optimization (SMO), details of CSA, CSA-SVM methodology. Section 3 deals with experimental details, results and performance analysis. Finally, Section 4 concludes the proposed work

Dataset details
The publicly available Indian liver patient dataset from University of California Irvine machine learning dataset repository [16] is used for this work. This data is collected from patients of north-east Andhra Pradesh, India. It contains 583 samples including 416 diseased liver samples and remaining 167 non-liver diseased samples. It data is tabulated with 10 input attributes and one output class attribute. The attribute details of the dataset is given in Table 1.

Support vector machine
The SVM algorithm was firstly invented by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963. The SVM classifier is a kind of machine learning algorithm that attempts to find an optimal hyperplane with maximum margin [17]. This algorithm separates the linearly separable data samples into two classes. If the data is nonlinearly separable, then SVM maps the data into highdimensional feature space and performs the classification. The equation of the separating hyperplane is given by the Equation (1), Efficient Diagnosis of Liver Disease using Support Vector Machine Optimized with Crows Search Algorithm (1) where 'W' is the normal vector that represents the angle or orientation of the hyperplane in m-dimensional space (synonymously it can be called as width of the margin), 'X' is the input vector and 'b' is the bias or threshold that represents the position or the distance of the hyperplane from the origin. The canonical hyperplane is defined by Equation (2) for positive samples and Equation (3) for negative samples. (2) The data samples that lie above the Equation (2) belong to positive class and data samples that lie below the Equation (3) belong to negative class. The data samples that lie on the Equations (2) and Equation (3) where (L-2 norm). Now the objective of SVM is to maximize the margin and is carried out by minimizing the L-2 norm This is mathematically expressed as an optimization equation given in Equation (5).
The above Equation (5) is a constrained convex quadratic optimization subject to linear constraints, so it can be rewritten as Equation (6), (6) Further, by using the Lagrangian multipliers the above equation is converted into an unconstrained optimization equation as given in Equation (7) where is the kernel function value for the training data and c is the box constraint values whose details are discussed in upcoming section.There are many kernel functions such as Linear kernel, Quadratic kernel, Polynomial kernel, Gaussian Radial Basis function and Multilayer Perceptron kernel can be used.
The objective function given in Equation (7) is evaluated using any one of the mathematical optimization algorithms such as SMO, Quadratic programming (QP), Least squares (LS) and so on [18]. The optimal values of alpha and bias are used for classifying the unknown data 'z' using Equation (8)   (8) Where Ksv,z is the kernel function value that gives the similarity or distance between the support vectors and unknown data.

Optimization of SVM using SMO
The training phase of SVM starts with loading of training data and then the separation of data into input and the target. The input data is shifted and scaled followed by calculation of kernel matrix using a kernel function. Then the box constraint values are calculated. After that the alpha and bias values are calculated using SMO algorithm that results in the calculation of support vectors [18]. Using these support vectors, testing phase is carried out for classification of unknown data.
Steps for optimizing Support vector machines with SMO algorithm Step 1 Load the data The training data of size 'n' is loaded for training the support vector machine classifier.
Step 2 Separation of training data into input and target The training data is separated into input and target. Let X = {X1,X2,…,Xn}represents the set of samples(records) in the data. This X contain 'n1' number of records that belongs to class 1(positive class) and 'n2' number of that belongs to class2(negative class). Each Xi contains m attributes(features), i.e., Xi = {xi,1,xi,2,…,xi,m}. Yi is the actual output which may either take -1 for negative class and +1 for positive class. Let Yi={Y1,Y2,...,Yn} represents the output class label of each record.
Step 3 Shifting the input data For shifting the input data, first the shiftmean value is calculated. The shiftmean value is the negative of the mean of each column or attribute of the input data and it is calculated using the equation (9) as, The input data is shifted by adding the shiftmean value of each column with its corresponding column values. It is calculated using the equation (10) as, (10) This is to centre the data points at their mean. Shiftdata, sh is the shifted data matrix Step 4 Scaling the input data The scalefactor is calculated as one divided by the standard deviation of each column as per the equation (11) given below, The scalefactor of each column is multiplied with the shifted data matrix of its corresponding column using equation (12) as, (12) The scaled data matrix is used for training the SVM classifer.
Step 5 Selection of Kernel function and calculation of the Kernel matrix The kernel function maps the training data into kernel space. There are many kernel functions such as Linear kernel, Quadratic kernel, Polynomial kernel, Gaussian Radial Basis function and Multilayer Perceptron kernel. The kernel function is denoted by K(Si,Sj), where Si and Sj are the scaled input vectors. The calculation for Linear kernel is given in the equation (13), The kernel matrix, Ki,jis calculated by the equation (14). (14) The kernel matrix represents the similarities between the input vectors. It is a symmetric and positive semi-definite matrix. The x represents the input vector and K denotes the feature space vector got after the transformation. The kernel function maps the shifted and scaled training data 'S' into kernel space or the feature space vector. Linear kernel, Polynomial kernel, Gaussian Radial Basis function (RBF) are some popular kernel functions listed in Table 2. The purpose of the kernel matrix is to find out the similarities between the input vectors. It is a symmetric and positive semi-definite matrix.
Step 6 Retrieving the diagonal of the Kernel matrix The diagonal of the kernel matrix is retrieved and given using equation (15) as, Step 7 Calculation of Box constraint values for classes The Boxconstraint (c) is a value used in the training process to handle the trade-off between training error and complexity of the model. Further this penalty parameter is a boundary condition that decides the number of outliers accepted for the calculation of support vectors. It is of same length as the training data. It is always initialized as 1,[c = 1].It automatically rescales the samples if two groups are unbalanced. The box constraint for each class is calculated using the equation (16) and (17) as, In general, smaller value of 'c' makes the classifier flat, larger value makes the training with less error and very larger values make the classifier to start overfitting. Hence an optimal c value is chosen to make the classifier retain its property of generalisation with less training error.
Step 8 Calculation of Alpha and Bias values using Sequential Minimal Optimization (SMO) algorithm In this section, SMO is discussed to calculate the alpha and bias value. The following are the control parameters to be initialized for SMO algorithm.
In each iteration, the SMO algorithm chooses a pair of the Alpha values (α1 and α2) also known as the Lagrange multipliers and optimizes it by solving analytically, till convergence takes place. The existence of the equality constraints makes it impossible to optimize the variables individually which in turn only optimizes the alpha values. Likewise the Alpha values are calculated for all the datapoints, two at a time till the optimum values are obtained, based on the condition 1 that whether the maximum number of iterations reached or when the condition 2 ((α1-α2)≤tolKKT) is satisfied. Then the bias or the threshold value is calculated using the equation (18) as, (18) Each datapoint is associated with one alpha value which plays a vital role in qualifying the datapoints as the support vectors. The post condition is the alpha values should be greater than or equal to 0 and less than or equal to the Boxconstraint value, i.e., Step 8.  Step 8.2 Calculation of first Alpha value The first alpha value α1 is calculated using the equation (19) as, The index value of α1 is stored in id1 Step 8.
Now calculate the gain value using the equation (21), where i=1,2,…n, gainNumerator gainDenominator≠ 0 The index value of gain value is the index of the second alpha value (id2). The α2 is calculated in the equation The alpha values are updated based on stopping condition.
Step optimization process. If the condition 2 is true then the bias value is calculated using the equation (10). If the condition fails, then the Lagrange multipliers α1 and α2 are updated till it the convergence occurs. The alpha values are updated before the next iteration. The alpha calculation is stopped if any one or both the conditions are satisfied, which ever be the earliest.
Step 8.5 Updating the Alpha values based on clip limits The bound constraints, causes the Lagrange multiplier (LM) to lie within a box, while the Linear equality constraints makes the LM to lie on the diagonal line segment. The ends of diagonal line is computed with the help of LM. This corresponds to the right orientation of the Hyperplane. The clip limits are calculated using the equation (23), and (23) The calculation of the second derivative of the objective function, ( along the diagonal line is given in the equation (24), (24) When then lambda, λ value calculated using equation (25) as, (25) The new second alpha value, is calculated first using the equation (26), (26) Next the constrained minimum is found by clipping the unconstrained minimum to the ends of the line segment i.e., the using the equation (27), Now the new first alpha value, is calculated first using the equation (28), (28) The is clipped using the equation (21), These alpha values from the equation (27) and (28) are updated in the global array .
Step 8.6 Updating the training parameters The relevant training parameters such as of the (19) and (20) are updated using the equation (30).
Proceed with the next iteration till the condition1 and 2 gets satisfied.
Step 9 Evaluation of the value of the objective function in Equation (7) Step 10 Calculation of the Support vectors This is the final alpha values used for testing process. Testing phase Step 12 Load the test data The test dataZi={Z1,Z2,….Zn} with m attributes as that of training data is loaded for testing the SVM classifier.
Step 13 Shift the test data The test data is shifted using the equation (33) from the shiftmean calculated from equation (9), Step 14 Scale the test data The shifted test data is scaled using the equation (34) with the scaling factor derived in equation (11), Step 15 Classification of the test data The classification function is evaluated and the sign of it denotes the classification of data into class1 and class2. The classification function is given using the equation where is the kernel matrix which gives the similarity or distance between the support vectors and the testing data.If the sign of the output of particular test data is positive, then it belongs to class 1 and else if it is negative, then it belongs to class 2.

Crow search algorithm
CSA is the most recently developed algorithm by Alireza Askarzadeh in the year 2016 [19]. It is inspired based on intelligent stealing behaviour of clever bird crow. The crows hide extra food in hiding places and retrieve when needed. A crow follows the other that has better food sourceinorder to steal it. From its own stealing experience, it also tries to avoid being a future victim. These behavioural characteristics of crows are simulated as metaheuristic optimization algorithm. The flock of crows forms the population (N). Each crow Xi, [ i=1, 2,…N] is considered as search agent, the environment as search space, the hiding places as certain positions which corresponds to feasible solution, the fitness function is based on food source quality where best food source is global best solution.
A d-dimensional environment is assumed, that is each crow is considered as d-dimensional vector. The position of crow i at iteration iter is given as X i,iter where iter=1,2,…max_iter. , , , Each crow is associated with memory to memorize the information of its hiding places. The memory of i th crow in iteration iter is given as m i,iter . This is considered as best position achieved so far, based on fitness value calculated at each iteration. The crows have the habit of following the other to find their hiding places to steal food. Based on their behavioural strategy, two cases are formulated to update their position. Assume crow i follows crow j,  Case 1: If crow j does not know that it is followed by crow i, it reaches its hiding place which is also reached by crow i. Hence position is updated for crow i. The new position of crow i is calculated as, where ri is random number and fl i,iter denotes flight length of crow i at iteration iter.
 Case 2: If crow j notices that it is followed by crow i, then it tries to fool crow i by reaching some other location randomly. Now the new position of crow i is updated with random value. The case 1 and case 2 depends on value of awareness probability (AP).
Step 2 Initialize the crow population using random values (position of crows).
Step 3 Initialize the memory of crows. For first iteration, its initial positions are considered as memory.
Step 4 The quality of position of each crow is evaluated using the fitness function (Equation 7) Step 5 The new position of each crow is generated based on two cases, case 1 and case 2.
Step 6 Feasibility of new positions are checked. If it is found better than current, then position update takes place else current one is saved.
Step 7 Fitness of each crow for new position is calculated.
Step 8 The memory of each crow is updated by comparing the new fitness value with memorized one. It is updated with the better one.
Step 9 Check for stopping criterion, that reaching maximum iteration.
Step 11 The optimal or best solution is achieved.

Experimental details and discussions
The most widely used algorithms such Genetic algorithm (GA) [20], Multi-verse optimizer (MVO) [21], Firefly algorithm (FA) [22] and PSO [23] are used to optimize   Table 3. All the experiments are conducted using ten-fold cross validation method and averages of results are tabulated in Table 4. The performance measures such as sensitivity, specificity, precision and accuracy are used to scale the performance along with standard deviation (SD) [24]. The results from the experiments clearly shows that CSA-SVM has the best diagnostic capability than all other hybrid SVMs. It has produced accuracy, specificity, sensitivity and precision of 99.49±0.12, 98.80±0.33, 99.76±0.21 and 99.52±0.51 respectively.  21 respectively, and from this it is found that it has given the least discrimination ability toward the negative samples. The results prove that CSA-SVM has produced outstanding performance than all classifiers used for comparison. This is plotted in a graph with performance metrics in X-axis versus scaling (in percentage) in Y-axis. This details are shown using Figure 2.  The several works on Liver disease data using various algorithms in literature along with proposed CSA-SVM are tabulated in Table 5 based on accuracy produced by them. This also shows that CSA-SVM has produced better result when compared with others.

Conclusion
The optimization of SVM parameters with SMO is dealt. In this work, the lagrange values of support vector machines are optimized using the crow search algorithm. This optimized CSA-SVM classifier applied for the efficient diagnosis of Liver disease. It is noticed that the procedure to optimize SVM with CSA is simpler than with that of SMO. The experiments are carried out using ten-fold cross validation method. Many similar SVM hybrids are taken for comparing the efficiency of CSA-SVM. It is experimentally found that CSA-SVM has good discrimination ability on the liver disease data in terms of performance metrics such as sensitivity, specificity, precision and accuracy. Also the results of various algorithms that are used for liver disease diagnosis in literature are also compared. The overall classification accuracy produced by CSA-SVM is 99.49% which is the highest value. Finally, it is found that CSA-SVM has produced outstanding results than that of other approaches in liver disease data diagnosis. This approach can also be recommended to be used for other disease diagnosis. It is proved that it can help the medical domain in earlier accurate diagnosis of diseases based on the results produced in this proposed work.