Review of Optimization in Improving Extreme Learning Machine

Now a days Extreme Learning Machine has gained a lot of interest because of its noteworthy qualities over single hiddenlayer feedforward neural networks and the kernel functions. Even if ELM has many advantages, it has some potential shortcomings such as performance sensitivity to the underlying state of the hidden neurons, input weights and the choice of functions of activation. To overcome the limitations of traditional ELM, analysts have devised numerical methods to optimise specific parts of ELM in order to enhance ELM performance for a variety of complicated difficulties and applications. Hence through this study, we intend to study the different algorithms developed for optimizing the ELM to enhance its performance in the aspects of survey criteria such as datasets, algorithm, objectives, training time, accuracy, error rate and the hidden neurons. This study will help other researchers to find out the research issues that lowering the performance of the ELM.


Introduction
This Extreme learning machine (ELM) [1] is considered as the fundamental and efficient for training the single hidden layer feed-forward network. The learning parameters are picked at random from the hidden layer's network design in the ELM, so that the total number of neurons in the hidden layer network can be placed. The least square approach is employed to generate the output weights in the hidden layer in such a way that iterations are not needed for the network offset and weight in this progression. ELM algorithm has the strong generalization ability and effective training speed when compared with the several traditional algorithms based on neural networks [2]. This distinctive feature of effective training generally results from two aspects: i) the weights of the output can be analytically generated instead of frequently utilizing the chain rule for the learning parameters based on * Corresponding author. Email: Nilesh.rathod@mctrgit.ac.in the partial derivatives. ii) the parameters of the latent layers are randomly generated without fine-tuning before generating the training data [3]. In [4], the output weights resolved from the least squares problem can prolong the widespread approximation capability of connectionist model and the weights of the stochastic latent layer. ELM achieved good generalization capability with respect to the faster training speed compared with the support vector machine (SVM) and back-propagation-based neural networks [5]. ELM is an effectual learning algorithm in the single-hidden layer feed forward neural networks. Traditional ELM and SVM provides equivalent significance for all the samples, thereby outcoming in a biased outcome regarding the majority class. The larger class imbalance problems can be effectively resolved by developing variants of Boosting Weighted ELM (BWELM), ELM-like Weighted ELM (WELM) etc. [6].
Several real-life application issues can be considered as an inconvenience based on the imbalanced classifications where the total sample in a particular class is completely exceeded Nilesh Rathod and Sunil Wankhade 2 than the total samples in several other classes. The samples with the lesser and the higher-class proportions are considered as the minority and the majority classes. The weights are initialized by the ELM between the hidden layer and the input in a random fashion, and then the randomly assigned weights are utilized for mapping the data with the input based on the feature space. Consequently, these initialized weights remain unaffected during the process of training resulting in some misclassifications in the samples [7]. The algorithmic level techniques, cost sensitive approaches, and data level strategies can all be categorised based on imbalanced issues. The data level techniques, like under sampling and the oversampling [8] [9] vary the data space for reducing the class imbalance effects. The undersampling approach selects the fraction of data from the majority class examples in a random fashion, and then the loss of information regarding the cost in the data distributions is balanced. For example, Balance Cascade [9] and the Easy Ensemble algorithms utilize the under-sampling methods for the dataset balancing. The method of oversampling randomly makes an exact copy of the minority class for maximizing the minority class cardinality, which may result in over fitting issues. On the contrary, the oversampling method obtains synthetic minority class example for balancing the class distribution, like technique of Synthetic Minority Over-Sampling (SMOTE) [10]. The techniques based on the algorithmic level [11,12] altered the classifier model to solve the imbalanced problems and techniques based on costsensitive functions allocates higher penalty for the misclassifications generated in the minority class in comparison with the examples based on majority class. In addition, cost is expensive as the minority class examples produce misclassifications [13,14].

EAI Endorsed Transactions on Industrial Networks and Intelligent Systems
The ELM approach proposed in [15] was considered as the straightforward and effectual single hidden layer feedforward neural network algorithm, and no iterative adjustments are needed during the training of algorithm with respect to the network weight and bias. In comparison with more traditional neural network algorithm, faster training speed was achieved by the ELM. In [16], a novel approach called the ELM-AE classification algorithm regenerated the auto encoder and the input signals. Based on this approach, a kernel extreme learning machine auto encoder (KELM-AE) was proposed as a multi-label learning algorithm. In this algorithm, KELM module with two-layer was employed as the support model in such a way that the first module was considered as the auto encoder block, and then the information regarding the label nodes were added to the layer of input. In addition, the output layer achieved some output features, which consists of the label's relationships and feature. The second module was considered as the classification module and was employed in the process of classification whereas the non-equilibrium-based label completion matrix algorithm was utilized by the label space. However, the trivial design of OC-ELMs may limit their potential with respect to the high and complex dimensional learning of the datasets, thereby resulting in a sub-optimal performance of generalization. On the other hand, many developments have been noticed and witnessed regarding the deep networks in the ancient times. The multilayer neural network-based ELMs (ML-ELM) examined the representative achievements, like deep weighted ELM [17], sparse representation-based hierarchical ML-ELM (H-ELM) [18] stacked AEs based ML-ELM [16] and the kernel-based MK-ELM [19] etc., which may result in further enhancement to OC-ELMs [20].
This paper intends to review a total perspective of ELM for different application. This review introduced qualities and problem with the goal of finding research gap. Additionally, this review provide how optimisation will affect extreme learning machine. The following is how the paper is structured: Section 2 discusses several optimization strategies that have been used to extreme learning machines, as well as how these algorithms affect the performance of the ELM methodology on extreme learning machines and as a research tool. Section 3 Comparative analysis develops a comparative table based on the procedures outlined before with respect to the algorithm applied, the parameter being optimised, application. In Section 4, Conclusion, future scope and Summary of the paper.

Literature Review
This section evaluates the different optimization techniques employed to extreme learning machines and how these algorithms are affecting the overall performance of ELM.

Genetic Algorithms
Author Xue et al., introduced a unique algorithm for learning referred to as genetic ensemble of ELM [21]. This approach adopted GA for producing the set of ELM candidates with idea; parameters for decrementing the negative effect of parameters that are not ideal. Then a portion of the ELM candidates are chosen for assembling a new system based on the strategy of sorting. The GE-ELM's execution has been enough surveyed by nine benchmark datasets for both classification and regression issues. GE-ELM created better performance of speculation when differentiated with other ELM strategies with respect to robustness. From the results in both classification and regression issues, it very well may be assumed that GE-ELM can make the networks progressively strong and produce better performance of generalization. Matias et al., also proposed an algorithm for learning referred to as optimized ELM (O-ELM) [22]. The SLFN parameters were determined using an optimization process in this framework. The least square methodology was employed to estimate the output weights; however, Tikhonov's regularisation was applied to improve the SLFN's performance when noisy data was available. The proposed system has been endeavoured with three methodologies of improvement (differential evolution, simulated annealing, genetic algorithm) using 16 benchmark issues accessible in public repositories. Review of Optimization in Improving Extreme Learning Machine 3 Alexandre et al., specifically discussed the plausibility of a novel search strategy; hybridized with an ELM, allowing features concentrated on the special GA class of minimal searching [23]. This method GA-ELM was considered to the most appropriate for the present application after various examinations and connection with the presentation of linear regressions of fast learning. The proposed GA-ELM highlight option assessment provides fantastic outcomes as to the probability of proper portrayal. The proposed methodology lets the classifier based on ELM improve its order efficiency from 74.83 percent (without selecting highlights) to 93.74 percent (while using the ideal setting of highlights selected). Alencar et al., proposed a novel procedure to prune neurons in the hidden layers depending on GA [24].
The GAP-ELM performance was assessed on 7 genuine datasets and contrasted with OP-ELM, MLP, RBF and ELM. All techniques were contrasted agreeing with three criteria: time required for training, number neurons in the hidden layer and accuracy [25]. Based on the results the authors concluded that GAP-ELM is a substantial option for the tasks of classification. Avci, & Dogantekin, proposed a diagnosis FRAMEWORK for Parkinson disease by applying GA, wavelet kernel-(WK) and ELM [26]. A SLNN (single layer neural network) classifier was utilized and prepared by the learning technique of ELM. The datasets for the disease are obtained from the machine learning database of UCI. The structure of WK-ELM includes of three parameters of WK that can be adjusted. These parameters along with the hidden neuron number played a significant role in the ELM's performance. GA was helpful in determining the hidden neuron count and optimising parameter values [27][28]. The developed GA-WK-ELM was assessed based on the performance parameters of specificity, sensitivity, accuracy, and ROC curves. The developed model gave the highest classification accuracy of 96.81%.
In the above reference papers, in the experimentation inputs have been normalized in the range [0,1] and output normalization [1,1], they set the population size as per the requirement and have taken enough iteration to search the optimal solution. The validation is randomly selected from the training dataset. These three criteria shall be addressed, for better performance measures like RMSE, error rate and accuracy of the ELM via Genetic algorithm.

Particle Swarm Optimization (PSO)
PSO has been utilized progressively as a viable procedure for searching global minima. PSO has no complex transformative operators and smaller number of parameters needed for modification. Like this, PSO hybrids and ELM ought to guarantee the training FF neural networks [29]. To acquire a compact network architecture with better performance in terms of generalization Zhao et al., developed an enhanced ELM with the growth of hidden nodes being adaptive (AG-ELM) along with PSO [30]. The PSO is utilized in selecting the optimal biases and weights for overcoming the lack of the standard AG-ELM. One particle in the PSO represents the parameters of a single network and the particle dimension is increased during the process of training. The results obtained from simulations on different test issues confirm that the developed algorithm accomplishes more compact architecture of the network than conventional AG-ELM. Zhang, & Yuan improved the accuracy of diagnosing the fault in power transformers by developing a kernel-based ELM with PSO to optimize the KELM parameters [31].
After the optimization of the parameters the optimized KELM is actualized for classifying the faults in the power transformers. Nine benchmark datasets to test the developed technique. The developed fault diagnosis system was compared with the two more ELMs the BPNN and SVM. The results proved that the developed method is more stable, had a faster speed of learning and achieved an enhanced generalization performance.
ELM-PSO methodology was proposed by Yadav et al., for designing an in-situ bioremediation model for the groundwater that is debased with the compounds of BTEX [32]. The outcomes of this investigation are compared with existing studies, indicating the better effectiveness of the created model. The ELM-PSO is a potential apparatus to structure an ideal in-situ bioremediation framework as it lessens the computational time as well as the framework cost while corrupting the concentration of BTEX to a predefined level of clean-up.
Here in these experiments, the inputs and outputs range have been normalized in the range of [−1, 1] and [0,1], respectively. For the trial of simulations, the data set was divided into training and testing data sets. The sigmoid function is selected as the activation function. The hidden node parameters are randomly chosen from the range of [−1,1]. In the experiments, some parameters including the population size, maximum iteration number, To deliver the developed plant products to clients through storage depots, the two-stage capacitated facility location problem is discussed [37]. The restricted depot storage capacity and plant production are brought into notice relating to the client demands. The aim of this study is to reduce the cost of finding plant and depot locations. The cost includes both, fixed and transportation costs. In a suitable time -period of computation, more optimal solutions are built with the hybrid evolutionary algorithm framework by approximating the machine learning fitness. Genetic parameters will be used to carry the searching process out and local search strategy will be utilized to refine the best solution in the population [38,39].
Mixing optimization techniques of differential evolution feature selection with extreme learning machine results in the new algorithm for subset selection containing features. The developed feature selection model along with feature subset selection method shows good performance in producing feature sets of optimal number while having higher rates of classification when compared to other techniques. The simulation results point to the same [40,41].
For assessing the performance through experiment, a welldefined set of instances developed the benchmark instances consist of five classes of instances by varying seven parameters. For each instance, the values of the seven parameters are randomly generated.

Artificial Bee Colony Optimization
To optimize the ELM parameters which are based on ABC (ant bee colony) optimization method, a new hybrid methodology is proposed. The methodology uses ABC method to specify optimal input weights with biases and MP (Moore-Penrose) inverse to determine the output which are the weights. Original ELM with other evolutionary ELM methods are analysed with the build ABC-ELM algorithm to compare the differences in various datasets of classification [42]. Three significant differences between the improved technique based on the ABC optimization, PS-ABCII [39], and existing techniques include mechanism modification where employee bees become scouts, greedy selection mechanism being ignored, and the population initialization followed by the application of opposition-based learning. To discuss the optimal performance of the PS-ABCII in comparison with existing techniques, ten social functions are highly regarded. Besides these, a hybrid model combining PS-ABCII with ELM is proposed by the authors and is called PS-ABCII-ELM model. Application of PS-ABCII to the ELM results in the increased generalized ELM performance for weight tuning of inputs and ELM's biases in PS-ABCII-ELM model [43].
New method is devised which incorporates building projections of ELM on the fed data to a high dimensionality feature space, followed by ABC algorithm executing unsupervised clustering. Despite facilitation of clusters by ELM projections, the dependence issues faced on the clusters' initializing centres are bested by a metaheuristic technique which includes the ABC algorithm and following local minima convergence where losses are faced because of standard algorithms, including the K-means algorithm [44]. For problems related to optimization, Wang et al. proposed a new ABC algorithm which is known as NABC [45]. In this algorithm, the population quality is improved first by the utilization of chaotic opposition-based learning techniques in NAC first. Further, the ability of convergence is improved by the self-adaptive search method. Chaotic local search technique allows the scout bees to escape the local maxima. Hench, both, the ELM biases, and the input weights, are enhanced by NABC and the ELM's performance of generalization is improved overall [46]. The experimental set up of these algorithms were carried out on the platform of MATLAB 7.0. Here the number of the iterations and population size were set. The number of the employed bees is equal to the on lookers, can be 20. The input weights and the biases are obtained in the range [-1, 1]. The table 4 presents the algorithm accuracy of the suggested techniques in the aspect of datasets. The observations made from the above table are: a) Learning speed rate determines the classification accuracy b) Randomness in determining hidden neurons will tend to lower the classifier accuracy. c) Convergence speed also determines the better searching ability.

Other Algorithms
MBAS (Multitask Beetle Antennae Swarm Algorithm), a new innovative molecule swarm heuristic calculation algorithm, supported by ABC algorithm along with the BAS (Better Antennae Search) algorithm is proposed. Employing MBAS allows the input weights and ELM biases to be optimized further [47]. This method minimized the regression error and condition number and obtains good generalization performance. A novel Kernel Extreme Learning Machine (KELM) parameter tuning method employs GWO (Grey Wolf Optimization), a new swarm intelligence algorithm [48]. Owing to its ability of simulating a structure's social levels and the ability of hunting present a grey wolf, GWO is used in developing KELM model to predict bankruptcy and has been found out to be extremely efficient. Further, a comparison is made between GWO and existing KELM methods including support vector machines, ELM, random forests, improved ELM. Genetic KELM, grid search KELM, and particle swarm optimization based KELM and is operated on two different datasets in real-time through cross-validation evaluation.
Improved GWO integrates KELM in order to deal with medical diagnosis and is known as IGWO-KELM. The IGWO-KELM proposes a novel framework for prediction EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 06 2021 -09 2021 | Volume 8 | Issue 28 | e2 [49]. IGWO-KELM approach is employed in cases where the optimal dataset consisting of medicinal features is to be identified. This approach works in a way that the GA first adopts the diversity of the first position, which is then followed by the present positions of the population being updated by the GWO to discrete searching in the allotted space which, through the optimal existence of feature sub setting, provides the most optimal classification [50].
An alternative learning path for KELM which has been devised utilized the CMFO (Chaotic Moth-Flame Optimization) [51]. Optimization of parameters is carried out by CMFO along with feature selection simultaneously. A detailed compassion of the model put forth in this research, with the KELM models that already exist, is based mothflame optimization, genetic algorithms, and particle swarm optimization. Various problems in the medical field like diagnosis and detection of Parkinson's disease along with breast cancer support the comparison. The learning method developed here has found its application in medical diagnoses.
A hybrid DA (Dragonfly Algorithm) is put forth with ELM system for predicting the problems. In data regression with classification problems, the model with ELM shows practical results [52,53]. Even after the very fast training efficiency, the model's hidden layers require many nodes. The evaluation time of ELM, thus, leverages the large number of hidden layer nodes in its utilization. Furthermore, with respect to the understanding of optimal weights and settings for the hidden layer biases, no guarantee can be undertaken. The acceleratory motion of the moths is captured in its essence by the DA. Exploitation of the algorithm's hidden layer characteristics helps in choosing the small number of nodes. This results in the optimal selection of weights of the suitable hidden layer and the most efficient biases. A set of assessment indicators supports the analysis of the proposed method, and a comparison on the repository's ten regression datasets is carried out.
A roadblock with customary learning measurement is the high-dimensional knowledge investigation. It appears that the development of improved ELM algorithms will be intriguing. One such fascinating example is the familiarization of ELM with inadequate strategies of coding to manage higher dimensional information effectively. From a hypothetical point of view, the defending of ELM arbitrary system and contemplation of relationships between different algorithms and ELM. In ELM, the main consideration is the irregular aspect of preparation, which makes the preparation extremely competent and guarantees an all-inclusive ability of estimation. Often, over-fitting is eliminated, and better predictions are exhibited due to the irregular aspect preparation. This irregular component and its viability require additional study, analysis, and instructions. Examining the relationship between other algorithms that also receive arbitrary systems and ELM would be intriguing. Some examples of relevant algorithms for counting arbitrary tests are Arbitrary Woodland and Ad boost.
From an all-inclusive perspective, the exploration of the dispersion structure's influence on the hidden boundary layer development shows that any continuous diffusion without the general estimation, power being given up, can generate the layer's hidden boundary. In genuine applications, the highly speculative exhibitions may be affected by the widely undertaken work on the boundary creation. While it supports further study about the issue and also about the information subordinate speculation mistakes destined for ELM. Even though the calculation learning productivity and speculation execution of ELM have been understood, the application is not exceptionally helpful and seems to be influencing the precision of the outcomes due to the high number of neurons in the shrouded layer. This suggests that continual development of ELM model and the structure and execution speculation of the associated algorithms is necessary. Additionally, combination of competent methods, such as RBF component, along with ABC algorithms is required and comparison between the ABC algorithms in ELM and Piece Highlight Space also has scope. Lately, several scientists have undertaken the task of consolidating ELM with various control algorithms, solving problems with ELM, and introduction of better preparation models.

Comparative Analysis
Finally, a comparative table is developed based on the techniques discussed in this paper with respect to the algorithm applied, the parameter being optimised, application etc. It can be used to make robust and efficient networks.
The testing accuracy of the GE-ELM is lesser than that of E-ELM. 3 22 The grouping genetic algorithm (GGA)

Predicting variables=92
It can be used to perform exact prediction for solar radiation. From the above analysis, the research issues and findings pertaining to this field are collectively presented as follows: Kernel Extreme Learning Machine Auto Encoder algorithm is developed for improving the accuracy of classification. However, the challenge lies in using the effective information contained in the output space and combining these algorithms to construct a unified multilabel learning framework for better performance results. Hybrid machine learning model is devised for improving the global and local search capabilities. However, this method failed to solve multi-objective optimization problems and also failed to deploy this method to a highperformance computing platform in order to reduce the computational burden. Under Bagging ensemble method is designed to address the class imbalance problems, but the challenge lies in evaluating other types of ensemble methods to address the large imbalance problems more accurate and effectively. The major problem in ELM is that it may utilize huge number of hidden neurons as compared to classical gradient descent technique. In addition, the arbitrary initialization of input weights and hidden biases may affect the ELM performance. ELM training is an optimization issue that offers an opportunity to solve various issues and offer improved generalization performance. However, the accuracy, training speed, stability and network complexity are the major issues faced by ELM in classification.
Machine learning techniques are devised for supporting different tasks that are related to make decision. Thus, intelligent classifiers are utilized for improving ELM and classification accuracy of the data. Traditional neural network-based classifiers have been updated, but they are prone to overfitting and local optimum concerns, and they have remained a hot topic for performance enhancement using various ensemble methods. However, recently ELM has gained fame for solving classification problems. Unlike the earlier classic gradient-based learning algorithms,

Nilesh Rathod and Sunil Wankhade
Review of Optimization in Improving Extreme Learning Machine 11 which only work for differentiable activation functions and are prone to issues like local optimum, inappropriate learning rate, overfitting, and so on, as a result, ELM can handle non-differentiable activation functions and tends to arrive at a solution quickly and without such modest difficulties. Thus, the motive is to devise novel method for classifying data, and to improve the ELM. The goal is to devise improved learning-based model that are capable to reveal its reasoning to deal with input case and offer reasoning for prediction. ELM needs improvement, it can be done through optimization [57]. Through this study we can say that need of real time optimization [58] to improve ELM.

Conclusion
During this study, ELM applications and implementation has been reviewed. There is a lot of research work done about the research topic through various interesting algorithms and the observatory results has proved the efficiency, accuracy, and easy implementation in various fields through ELM and its variants. ELM has shown great advantages when compared with the other state-of-the art learning algorithms such as SVMs and algorithms of deep learning.
Despite many advantages of Extreme learning machine to its credit, there exist some problems with state to researchers. it's been ascertained that the classification boundary for learning parameters of hidden layer. ELM cannot handle massive high dimensional information, it needs additional hidden nodes than conventionally standardisation algorithms, and cannot be parallelized because of the presence of pseudo-inverse circulation. Although, a number of these difficult problems had been tried in some variants of hybridizations and optimizations of ELM.
From the review, we observe that the existing optimization techniques have certain drawbacks. Some of them being separate implementation of parameter optimization and feature selection (or) some of them are complex. Hence, in future, we aim to develop an effective ELM with multi-objective optimization in order to overcome the drawbacks of the existing ELM techniques. KELM with PSO is recognized as the best one to support ELM. KELM out of the considerable number of algorithms studied, and PSO has been implemented for the conclusion of strength transformers problem. KELM's parameters are further advanced by using PSO to boost KELM's performance. Deficiency identification of KELMdependent strength transformers with PSO was contrasted with the other two ELMs, Back-engendering Neural Network (BPNN) and Bolster Vector Machines (SVM) samples from Disintegrated Gas Analysis (DGA). The estimation is yet to be completed for analysers other than GA and PSO. The measurement display for big data sets is yet to be tested.

Future Scope:
To research further, we can investigate the following problems for future Scope -To enhance the ELM, handling of high dimension data with ELM will be a curious topic to research about. To investigate further we can research about How to clarify the effectiveness of random mechanism. The connection between ELM and other related algorithms which accepts random mechanism that will be some highly interesting topic for study. From a theoretical perspective we can justify the optimization is used to improve ELM and this can be studied between ELM and other algorithms like cuckoo search, invasive weed optimization etc will be some highly interesting topic for study.