A Bagging Strategy-Based Kernel Extreme Learning Machine for Complex Network Intrusion Detection

Network intrusion can enter the network through informal channels. Some illegal users utilize Trojans and self-programmed attack to change the network security system, so that the system loses the defense and alarm function and the Hacker can steal the internal information. Network intrusion seriously harms the security of network information and the legitimate rights of users. Therefore, a bagging strategy-based kernel extreme learning machine for complex network intrusion detection is presented in this paper. This method adopts a bagging strategy to train several sub-kernel extreme learning machines independently. Then the integrated gain of above machines is measured based on the margin distance minimization (MDM) criterion. Selected machines with high gain degree are selected for selective integration to obtain selective integrated learners with strong generalization ability and high efficiency. Then an improved universal gravitation search algorithm is used to optimize the kernel parameters. Meanwhile, a sub-kernel extreme learning machine online update strategy based on incremental learning of batch samples is introduced to realize the online update of intrusion detection model, so that the proposed detection method can effectively be adapted to the changes of complex network environment. Finally, experiments illustrate that the proposed method has better effect on network intrusion detection in terms of detection accuracy and speed, especially for unknown network intrusion connection events, the response speed is fast, the false alarm rate is low.


Introduction
With the arrival of the era of big data, network information is facing more and more security threats. As an active security protection technology, network intrusion detection is expected to intercept and respond to intrusion before harming the network system, which has received extensive attention from many researchers [1][2][3].
Traditional intrusion detection [4][5][6] includes two detection types, namely, misuse detection and anomaly detection. Misuse detection finds the abnormal links in the network by establishing an intrusion rule base. Although the accuracy is high, but it is powerless for the new intrusion types and the old virus variant connection. Anomaly detection is used to analyse network anomalies by summarizing the characteristics of normal network connections. Because this method also has a good detection effect against the new attacks, it has been widely concerned.
Network intrusion detection based on automatic learning is to transform network intrusion detection into pattern recognition (classification). These methods train the classifier (including SVM, ELM, DMT) to identify the normal behavior and abnormal behavior in the network connection [7][8][9]. Generally speaking, the training and testing time of complex classifier is relatively long. Although the simple classifier has high processing efficiency, it may be difficult to obtain effective recognition effect for network attack events with multiple features and complex intrusion modes. Some researchers hope to integrate the advantages of multiple learners to obtain better detection performance. Therefore, in recent years, intrusion detection methods based on integrated hybrid learning approaches have attracted wide attention.
The integration learning algorithm improves the generalization ability of the whole algorithm by integrating multiple sub-learners [10]. In theory, intrusion detection based on integrated learning is much better than intrusion detection based on single learner. However, not every sublearner is beneficial for integrated learning. If we can select the better learning part for selective integration, the integrated learning will have better performance and can improve the detection efficiency.
In this paper, in the complex network environment, the network connection is complex and changeable [11,12]. Cyber-attacks emerge in an endless stream. The existing network intrusion detection methods have problems such as slow recognition speed and low recognition rate for new intrusion modes. As we all know, the kernel extreme learning machine (KELM) has the advantage of fast learning, then we combine KELM and bagging learning strategy, a network intrusion detection method based on KELM selective integrated learning, named BL-KELM is proposed. BL-KELM selectively integrates partial sub-learners by subdividing the gain of each KELM on the ensemble detector. Then we use improved universal gravitation algorithm (UGA). Meanwhile, this new method can update the model online according to the change of network environment to enhance the detection efficiency of known intrusion types and decrease the false alarm rate of unknown intrusion types. Finally, we make experiments on the traditional KDD99 data set and a manually built complex network physical simulation platform to verify the effectiveness of the proposed BL-KELM method.

Proposed network intrusion detection
Network intrusion detection is essentially a multi-variable pattern classification problem. Suppose that n network connection data sets represents the i-th network connection data record. K Represents the feature dimension of the network connection record data. n is the collected sample number. The network connection type corresponding to these records is marked as Then the intrusion detection model based on single learner is: Where ) (⋅ ϕ is the regular term to control the complexity of the model. λ is the corresponding weight coefficient. If the structure of j G has been determined, then formula (1) is equivalent to obtaining an optimal classifier by adjusting the parameters in classifier j G . The proposed BL-KELM method in this paper is based on the bagging learning strategy. Firstly, a group of sub-learners with certain complementary functions are trained in parallel, and then MDMC is used to selectively learn the sub-learners. The flow diagram of this proposed method is shown in figure 1.
denotes the corresponding label of each network connection data record [13,14].
The output of ELM with ω hidden layer nodes can be written as:  = is the target (label) matrix of the training data set. It can be obtained by solving the optimization problem (4).
T H BELM + = β (6) Where + H denotes the Moore-Penrose generalized inverse matrix of H. In order to improve the generalization performance of ELM, a regular term is usually added when the approximation error is considered to obtain better performance. Based on this principle, formula (4) can be adjusted to a general ELM model with regular terms [15] The parameter vector of ELM output layer described in formula (6) or (8)   Shoulin Yin et al. 4 The universal gravitation algorithm has the advantages of strong global optimization ability and simple process [16,17]. It can be well used in practical problems and can highlight its superiority in the treatment of nonlinear problems. It has been successfully applied to many problems in engineering and other fields. However, as a member of intelligent algorithms, gravitation algorithm has the defects of falling into local extremum and premature convergence. In order to overcome the above shortcomings, Opposition Based Learning (OBL) [18,19] is used to initialize the initial population of GSA to make the distribution of the initial population more uniform. Tent chaos mapping is introduced to improve the diversity of the population and promote the exploration and development of GSA algorithm.
The steps of OBL generation of initial population are as follows: 1) Determine the initial population number as the random generation of initial population (i.e. the first group of candidate solutions), calculate the fitness value of each individual. The second group of candidate solutions and fitness values are obtained. 2) The two groups of candidate solutions are sorted according to the size of individual fitness value, and the solution space of the problem is formed by the former particle, that is, the initial population of GSA.
The steps of Tent chaotic mapping to generate chaotic sequence are as follows: 1) Normalize the optimal solution found so far to the interval of (0,1), and make it equal to the initial value 0 x , that is, k=0. 2) Chaotic sequence X is generated by iteration, and k++; 3) Stop iteration and save X sequence when the maximum number of iterations is reached; 4) The sequence X is inversely normalized into the original solution space to obtain a new solution. 5) Compare the fitness values of the new and old solutions, leaving the solution with better performance. The improved universal gravitation search algorithm is used to select the best kernel parameters and output weights intelligently. The KELM neural network is optimized to ensure the accuracy of model prediction.

KELM classifier online updating incrementally
Due to the complexity and variety of network intrusion models, it is difficult to adapt to the change of new network intrusion models, if it is only based on the classifier model obtained from the previous historical network connection data sets. Therefore, it is necessary to update the obtained classifier model (at any time/in real time).
Lim et al. [20] proposed online sequential ELM update algorithm based on recursive least squares (Online sequential ELM, OSELM). Inspired by this idea, we first train a set of KELM sub-learner models based on historical data sets. Then, in actual network security monitoring, each KELM sub-classifier model is updated online when a new set of samples with relatively explicit tags are collected (the sample tags can be based on manual tags or the result of a common decision based on multiple classifiers). Finally, the KELM sub-classifier models are then updated online.
The main steps of updating the KELM sub-learner model online are as follows: N is the sample number. The feature map matrix of these initial samples is set as 0 H , the corresponding label matrix is 0 T ). According to formula 2) In the process of network intrusion detection, if the new denotes the kernel mapping of X on new training sample.
Assuming that h times of model updating have been carried out previously, and we obtain the model parameter ) ( X G k , so the k+1-th KELM classifier online batch sample incremental updating criterion is: In practice, as the number of batch updates increasing, the matrix (vector) in the second term of equation (15) will become larger. In order to ensure the effectiveness of calculation, a forgetting mechanism can be added to discard some samples with a long history. For example, the maximum sample size is set as max N . If

KELM sub-learner selective integration based on MDM (MDMSE)
Ensemble learning integrates several weak classifiers to form a new strong classifier algorithm, which is also called meta-algorithm such as boosting and bagging. Boosting optimizes each newly created sub-learner by focusing on the samples that are misclassified by the existing classifier. Boosting method emphasizes the strong dependence between sub-learners including Adaboost, XGBOOST, GBDT, etc. [21,22]. Bagging is Bootstrap Aggregation, where Bootstrap is a random re-sampling [23,24]. It advocates that each sub-learner should be as independent of each other as possible. The distributed concurrent computing method which can be generated at the same time is used to establish. Therefore, this method can train a group of sub-learners independently and efficiently, and make each sub-learner have complementary abilities [25,26,27].
In view of the concurrent learning advantage of the Bagging strategy, it is adopted for sub-classifier learning in this paper. Meanwhile, to ensure that each kind of maximum anomaly intrusion behavior can be detected, this paper proposes a margin distance minimum selective integration (MDMSE) algorithm to order of all sub-learning gain. By selecting the partial learner with large gain degree as the final integration result, the negative effect of the weak learner on the final detection result is reduced. Selective integrated learning refers to the integration of individual learners with a large gender and strong generalization ability from all trained sub-learners to obtain better performance. The study shows that selective integration algorithm is superior to the signal Bagging or Boosting algorithm. In order to avoid local optimization and improve computational efficiency, a new selective integration algorithm MDMSE is proposed based on MDM principle. Based on the MDM criterion, the method calculates the gain degree of each sublearner for the performance improvement of the integrated algorithm. The KELM sub-learner with a high gain degree is selected for partial integration to obtain a strong learner with high computational efficiency and strong generalization EAI Endorsed Transactions Scalable Information Systems 08 2021 -10 2021 | Volume 8 | Issue 33 | e8 Shoulin Yin et al. 6 ability. Here are some basic definitions to analyze the main principles of MDMSE.

Definition 1. Classifier eigenvectors t C .
Given a labeled data set D with n elements. The eigenvector t C of classifier t G is an n-dimensional vector, its i-th term is: If the i-th part of the average eigenvector C is positive, the whole classifier is classified correctly on the i-th sample. Therefore, if the eigenvectors of a partial ensemble classifier are in the first quadrant of the n-dimensional space (that is, all the components are positive), the classification is correct on data set D. The goal of this paper is to select an ensemble learner that is as small as possible but whose average eigenvector is as close as possible to a reference position in the first quadrant. It selects any target position p as the point O with the same component, that is: Definition 3. Iterative selection of sub-classifiers based on integrated gain. The classifier whose distance from the eigenvector C to the target point O decreases the most is with the largest integrated gain. Therefore, the classifier selected in the u-th iteration is: is the Euclidean distance between point v and point u . S can be regarded as the distance gain of the current sub-learner. If the S is smaller, the gain is greater. Where 1 0 < < p . Theoretically, p should be small enough, so that simple examples (those are correctly classified by most sub-learners) can be quickly close to the p value. Then, this allows the algorithm gradually focus on the more difficult classify samples. By contrast, if p is close to 1, it will have a similar attraction for all instances throughout the selection process, which will reduce the effectiveness of the method. To sum up, the main steps of the proposed BL-KELM algorithm in this paper are as algorithm 1.

Algorithm 1. BL-KELM.
Step 1. Input data Set sub-learner number as T, selective learning integration number as U. The final selective ensemble learning set ST is initially null, the optimal position of n-dimensional space is p.
Step 2. Based on Bagging mechanism, the data set D is sampled and divided into T+1 sub-data sets.
Step 3. T KELM sub-learners are trained with T data subsets.
Step 4. The gain of each sub-learner is calculated on the T+1 data set according to equation (19).
Step 5. It selects the sub-learner with the greatest gain and adds it to the set ST.
Step 6. Judging whether the number of learners in ST is greater than U. If the condition is satisfied, then it stops; otherwise, repeat Step5.

Experiment and analysis
The experiments mainly consist of two parts: (1) verifying the effectiveness of the proposed BL-KELM on the KDD99 data set, analysing the impact of different parameters on the performance and comparing the performance of BL-KELM with the state-of-the-art intrusion detection methods. (2) Testing the real-time performance and effectiveness of BL-KELM in the real intrusion detection types under a complex network environment.

KDD data set experimental results
KDD99 is a network connection data set simulated and collected by the advanced planning department of the U.S. defence department at MIT Lincoln laboratory. It contains about 5 million network connection data sets, which can be divided into training sets and tests set including 4 big categories and 39 small classes of abnormal intrusion types and normal connection. The training set contains 22 kinds of abnormal attack type, the remaining 17 as unknown type in the test set (KDD99 provides 10% test set) used for judging the unknown intrusion test, it will also be as an important basis to verify the robustness of the BL-KELM. Each connection and each label have 42-dimension features. In this experiment, the KDD99 data set is preprocessed first, the character features of the data set are converted into EAI Endorsed Transactions Scalable Information Systems 08 2021 -10 2021 | Volume 8 | Issue 33 | e8 numerical features, and the feature data is normalized. Then, the processed training data set is randomly sampled. T+1 sub-data sets are generated. The first T part is used for learner training, and the last T+1 part is used for selection of sub-learners. Considering the second manual examination for network intrusion and the serious harm caused by network intrusion, network intrusion detection should check out all abnormal connections as much as possible. In this paper, accuracy rate (AR) and missing rate (MR) are used as two evaluation indexes for evaluation of proposed method.
AR=  2) The influence of parameter setting on performance of BL-KELM.
The parameters that affect the performance of BL-KELM include the random extracted feature number F and the integrated number of sub-learners U when Bagging is used for random sampling. The effect of different F on the performance is shown in table 2. As can be seen from table 2, F has little effect on the experimental results. The main reason is that the number of KDD99 sample sets used for training and testing is huge, and the algorithm has good generalization ability. It is not helpful to improve the algorithm only by adjusting F. However, F parameter adjustment can greatly improve the performance of learners trained by small sample sets. The number of sub-leaner determines the generalization performance of the integrated algorithm. But too many sublearners will take up too many resources. In this experiment, the initial number of KELM sub-learners is 100, and the final integration number is determined by selective learning. The impact of the final number of selective integrator learners on intrusion detection performance is shown in table 3. Shoulin Yin et al. 8 number, the performance of intrusion detection has little impact. Therefore, in the subsequent comparison experiment, this paper adopts 40 as the final integration number of selective integrated learning.
3) Comparison experiments: The experiment is trained on the KDD99 data set by calling multiple single learner models in the Scikit-learn framework and common integrated algorithm. AR and MR are shown in table 4. Table 5 shows the time costs with different algorithms including WPFH [28], NB [29], FDS [30], and IDBN [31]. In this paper, BL-KELM adopts the optimal parameter setting in the previous section . In order to avoid errors  caused by unstable single experiment, table 4 and table 5 show the average experimental results with 30 independent repeated experiments. The results in table 4 show that the BL-KELM method can effectively improve the accuracy of learner detection, and greatly reduce the MR, training and detection time. Table 5 shows that IDBN and IDBN have a good performance in the accuracy of intrusion recognition, but the training time is long. BL-KELM has a high correct detection rate, and in terms of time, the proposed algorithm spends less time.
The KDD99 test set contains 17 types of abnormal network connections not included in the training set. This experiment further verifies the generalization of BL-KELM by counting the recognition effects of the single learner algorithm and the common integrated learning algorithm on the unknown types of intrusive connections. The experimental results are shown in table 6. The performance of BL-KELM is further verified by building a network simulation platform to simulate network connection requests and abnormal intrusion requests in real life. In this simulation experiment, 30 physical terminal devices, a network server, seven routers and six switches are used to build a network physical simulation platform. The 30 physical terminal devices are divided into six sub-networks, each sub-network contains three Ethernet terminals, two wireless network terminals, one sub-network and a switch. Three Ethernet terminals are connected to the subnet route through the wired network switch, and then the switch is connected to the subnet route. Two wireless terminals are connected to the subnet route through Wi-Fi. The six subnets are connected to the attack server by a central router. On the attack server, Ettercap software is used to launch network simulation attacks on six sub-networks. The simulation attacks include PROBE, U2L, DOS and R2L. TCP dump is used for network data monitoring on 30 physical terminals, network connection data is obtained by capturing packets. Log information on terminal devices is collected at the same time. The collected information including 41 features forms a simulation data set. Finally, the processed data set is sent to the BL-KELM detector in real time for judgment that whether it needs to receive the network connection request.
Through a week of data collection, 30000 network requests are filtered and counted every day, 9000 requests are connected with attacks containing 1000 unknown attacks. The average detection results are shown in table 7. As can be seen from table 7, the BL-KELM method has better effect in network intrusion detection, especially for unknown network intrusion connections, its recognition rate exceeded 98%, and the average response time is controlled within 0.1s, which satisfies the effectiveness and real-time performance of network intrusion detection.

Conclusion
The proposed BL-KELM uses MDMSE to calculate integration gain of each KELM sub-learners. Partial selective integration is performed by selecting KELM sub-learners with high gain. BL-KELM not only has the efficient learning characteristics of KELM, but has the generalization ability of bagging integration algorithm. It shows good real-time performance in both training and testing, and can detect abnormal network connections in time. The experimental results show that BL-KELM can effectively detect various known and unknown intrusion types on both the public KDD99 data set and the manually built complex hybrid network physical simulation platform, it has a very low false alarm rate and missing alarm rate at the same time.