A Deep Learning Based DDoS Detection System in Software-Defined Networking (SDN)

Distributed Denial of Service (DDoS) is one of the most prevalent attacks that an organizational network infrastructure comes across nowadays. We propose a deep learning based multi-vector DDoS detection system in a software-defined network (SDN) environment. SDN provides flexibility to program network devices for different objectives and eliminates the need for third-party vendor-specific hardware. We implement our system as a network application on top of an SDN controller. We use deep learning for feature reduction of a large set of features derived from network traffic headers. We evaluate our system based on different performance metrics by applying it on traffic traces collected from different scenarios. We observe high accuracy with a low false-positive for attack detection in our proposed system.


Introduction
Distributed denial of service (DDoS) attacks results in unavailability of network services by continuously flooding its servers with undesirable traffic. Low-price Internet subscriptions and readily available attack tools led to a vast increase in volume, size, and complexity of these attacks in the recent past. According to the forecast of Cisco Visual Networking Index (VNI) [1], DDoS incidents will reach up to 17 million in 2020, a threefold increment compared to 2015. The nature of attacks has also changed to being multi-vector rather than having a single type of flooding. A study reported that 64% attacks until mid-2016 were multi-vectors that include TCP SYN floods and DNS/NTP amplification combined together [10]. Adversaries or hacktivists use DDoS attacks for extortion, revenge, misguided marketing, and online protest. Many financial, public sector, media, and social entertainment sites are recent victims [9,2,11] and suffered from monetary and reputation damages. Therefore, detection and mitigation of these attacks in real-time have become a prime concern for large organizations.
Recently, both software-defined networking (SDN) and deep learning (DL) have found several useful and interesting applications in the industry as well as the research community. SDN provides centralized management, global view of the entire network, and programmable control plane; makes network devices flexible for different applications. These features of SDN offer better network monitoring and enhanced security of the managed network compared to traditional networks [30,33]. On the other hand, DL based approaches outperformed existing machine learning techniques when applied to various classification problems. They improve feature extraction/reduction from a high-dimensional dataset in an unsupervised manner by inheriting the non-linearity of neural networks [31]. Researchers have also started to apply DL for the implementation of various intrusion detection systems and observed desirable results discussed in Section 2. In this work, we implement a DDoS detection system that incorporates stacked autoencoder (SAE) based DL approach in an SDN environment and evaluate its performance on a dataset that consists of normal Internet traffic and various DDoS attacks.
The organization of this paper is as follows. Section 2 discusses related work on DDoS detection in an SDN environment and use of DL for network intrusion detection. Section 3 gives an overview of SDN and SAE. In Section 4, we discuss the architecture of our proposed system. Section 5 presents experimental set-up and performance evaluation of the system. Finally, Section 6 concludes the paper with future work directions.

Related Work
We discuss the related work from two perspectives, one in which DL has been used for network intrusion detection and the other in which DDoS detection is addressed in an SDN environment.

Intrusion detection using DL
In [29], Mostafa et al. used deep belief network (DBN) based on restricted Boltzmann machine (RBM) for feature reduction with support vector machine (SVM) as a classifier to implement a network intrusion detection system (NIDS) on NSL-KDD [34] intrusion dataset. In [13], Ugo et al. used discriminative RBM (DRBM) to develop a semi-supervised learning based network anomaly detection and evaluated its performance in an environment where network traffic for training and test scenarios were different. They used real-world traffic traces and KDD Cup-99 [4] intrusion dataset in their implementation. In [14], Gao et al. used RBM based DBN with a neural network as a classifier to implement an NIDS on KDD-Cup 99 dataset. In [17], Kang et al. proposed an NIDS for the security of in-vehicular networks using DBN and improved detection accuracy compared to previous approaches. In [26], we implemented a deep learning based NIDS using NSL-KDD dataset. We employed self-taught learning [28] that uses sparse autoencoder instead of RBM for feature reduction and evaluated our model separately on training and test datasets. In [21], Ma et al. proposed a system that combines spectral clustering (SC) and sparse autoencoder based deep neural network (DNN). They used KDD-Cup99, NSL-KDD, and a sensor network dataset to evaluate the performance of their model.

DDoS detection in SDN environment
In [12], Braga et al. proposed a light-weight DDoS detection system using self-organized map (SOM) in SDN. Their implementation uses features extracted from flow-table statistics collected at a certain interval to make the system light-weight. However, it has limitation in handling traffic that does not have any flow rules installed. In [15], Giotis et al. combined an OpenFlow (OF) and sFlow for anomaly detection to reduce processing overhead in native OF statistics collection. As the implementation was based on flow sampling using sFlow, false-positive was quite high in attack detection. In [20], Lim et al. proposed a DDoS blocking application (DBA) using SDN to efficiently block legitimately looking DDoS attacks. The system works in collaboration with the targeted servers for attack detection. The prototype was demonstrated to detect HTTP flooding attack. In [24], Mousavi et al. proposed a system to detect DDoS attacks in the controller using entropy calculation. Their implementation depends on a threshold value for entropy to detect attacks which they select after performing several experiments. The approach may not be reliable since threshold value will vary in different scenarios. In [35], Wang et al. proposed an entropy based light-weight DDoS detection system by exporting the flow statistics process to switches. Although the approach reduces the overhead of flow statistics collection in the controller, it attempts to bring back the intelligence in network devices.
In contrast to the discussed work, we use SAE based DL model to detect multi-vector DDoS attacks in SDN. We use a set of large number of features extracted from network packet headers and then use DL to reduce this set in an unsupervised manner. We apply our model on traffic dataset collected in different environments. The proposed system attempts to detect attacks on both the SDN control plane and the data plane, and is implemented completely on the SDN controller.

Background Overview
We discuss SDN and SAE before describing our DDoS detection system.

Software-Defined Networking (SDN)
As discussed earlier, the SDN architecture decouples the control plane and data plane from network devices, also termed as 'switches', and makes them simple packet forwarding elements. The decoupling of control logic and its unification to a centralized controller offers several advantages compared to the current network architecture that integrates both the planes tightly. Administrators can implement policies from a single point, i.e. controller, and observe their effects on the entire network that makes management simple, less error-prone, and enhances security. Switches become generic and vendor-agnostic. Applications that run inside a controller can program these switches for different purposes such as layer 2/3 switch, firewall, IDS, load balancer using API offered by a controller to them [19]. Figure 1a shows the SDN architecture with its different planes and applications. Switches, end hosts, and communication between them form the data plane. The controller is either a single server or a group of logically centralized distributed servers. The controller can run on commodity hardware and communicates with switches using standard APIs called southbound interfaces. One of the de facto standards for southbound interfaces is OpenFlow (OF) protocol [22]. The controller servers communicate with each other using east-westbound interfaces. Network applications communicate with the controller using northbound interfaces.
The controller and switches exchange various types of messages using OF protocol over either a TLS/SSL encrypted or open channel. These messages set-up switch connection with the controller, inquire network status or manage traffic flows in the network. Switches have flow tables for flow rules that contain match-fields, counters, and actions to handle traffic flows in the network. SDN defines flow as a group of network packets that have same values for certain packet header fields. The controller installs flow rules for traffic flows based on the policies dictated by the network applications.
Flow rule installation takes place in switches either in reactive mode or proactive mode. The reactive mode works as follows. When a packet enters a switch, it looks up for a flow rule inside its flow tables that matches with the packet headers. If a rule exists for the packet, the switch takes an action that may involve packet forwarding, drop or header modification. If a table-miss  happens, i.e., there are no flow rules for an incoming flow, the switch sends a packet in message to the controller that encapsulates packet headers for the incoming flow. The controller extracts packet headers from the received message and sends a packet out or flow mod message to switches for the received packet's flow. The controller installs flow rules inside switches using flow mod messages and switches perform actions on subsequent packets of the installed flow, without forwarding them to the controller. Figure 1b demonstrates the reactive mode set-up in SDN. Flow rules may expire after a certain time to manage the limited memory size of switches. The controller does not install rules in switches, instead, it instructs them to forward the packet from a single or multiple port(s) using packet out messages. In contrast, the controller pre-installs flow rules into switches in proactive mode.

Stacked Autoencoder (SAE)
Stacked Autoencoder (SAE) is a DL approach that consists of stacked sparse autoencoders and softmax classifier for unsupervised feature learning and classification, respectively. We discuss sparse autoencoder before SAE. A sparse autoencoder is a neural network that consists of three layers in which the input and output layers contain M nodes, and the hidden layer contains N nodes. The M nodes at the input represent a record with M features, i.e., X = {x 1 , x 2 , ..., x m }. For the training purpose, the output layer is made an identity function of the input layer, i.e.,X = X shown in Figure 2a. The sparse autoencoder network finds optimal values of weight matrices, U ∈ N ×M and U ∈ M ×N , and bias vectors, b 1 ∈ N ×1 and b 1 ∈ M ×1 while trying to learn an approximation of the identity function, i.e.X ≈ X using back-propagation algorithm [25]. Many different functions are used for activation of hidden and output nodes, we use Sigmoid function, g(z) = 1 1+e −z , for the Eqn. 2 represents the cost function for optimal weight learning in sparse autoencoder. It is minimized using back-propagation. The first term in the RHS represents an average of sum-ofsquare errors for all the input values and their corresponding output values for all r records in the dataset. The second term is a weight decay term with λ as the decay parameter to avoid over-fitting. The last term is a sparsity penalty term that puts constraint on the hidden layer to maintain low average activation values and expressed using Kullback-Leibler (KL) divergence shown in Eqn. 3: where ρ ∈ {0, 1} is a sparsity constraint parameter and β controls the sparsity penalty term. The KL(ρ ρ j ) becomes minimum when ρ =ρ j , whereρ j is the average activation value of a hidden unit j over all the training inputs.
Multiple sparse autoencoders are stacked with each other in a way that the outputs of each layer is fed into the inputs of the next layer to create an SAE. Greedy-wise training is used to obtain optimal values of the weight matrices and bias vectors for each layer. For illustration, the first layer, g, on raw input x is trained to obtain U , U , b 1 , b 1 . The layer g encodes the raw input, X, using U and b 1 . Then, the encoded values are used as inputs to train the second layer to obtain parameters V , V , b 2 , b 2 shown in Figure 2b. This process goes further until the last hidden layer is trained. The output of last hidden layer is fed into a classifier. Finally, all layers of SAE are treated as a single model and fine-tuned to improve the performance of the model shown in Figure 2c.

Implementation of DDoS Detection System
In an SDN, attacks can occur either on the data plane or control plane. Attacks on the former are similar to traditional attacks and affect a few hosts. However, attacks on the latter attempt to bring down the entire network. In this second kind of attack, adversaries fingerprint an SDN for flow installation rules and then send new traffic flows, resulting in flow table-misses in the switch [32]. This phenomenon forces the controller to handle every packet and install new flow rules in switches that consume system resources on the controller and switches. In our previous work [27], we empirically evaluated the impact of SDN adversarial attacks on network services. In the current work, we implement a DDoS detection system as a network application in SDN to handle attacks for both cases. The detection system consists of three modules as shown in Figure 3: i) Traffic Collector and Flow installer (TCFI), ii) Feature Extractor (FE), and iii) Traffic Classifier (TC). It should be emphasized here that to minimize false-positives, our system relies on every packet for flow computation and attack detection instead of sampling flows using some tools such as sFlow. Output the packet to desired port

Traffic Collector and Flow Installer (TCFI)
The TCFI module runs concurrently with the FE and TC modules which are triggered using a timer function. It examines OF message type for an incoming packet at the controller. A message type determines the reason for a packet's arrival which is either due to a flow table-miss or an installed flow rule that forwards a packet towards the controller and desired physical ports. The TCFI extracts various header fields from a packet to identify its flow. A flow in TCP or UDP traffic is a group of packets having same values for protocol type, source and destination IP addresses, and source and destination port numbers. An ICMP flow has similar header fields, except for port numbers it has ICMP message type and code. The TCFI extracts few more header fields from a packet that help in features extraction from flows. It stores all of these extracted headers in a list for every packet coming to the controller. Table 1 shows the headers for TCP, UDP, and ICMP traffic that the TCFI extracts. It performs this task when a packet arrives due to pre-installed flow rules. However, when a packet arrives due to a flow table-miss, it performs following tasks in addition to the one mentioned above. It looks up a symmetric flow corresponding to the packet's flow in the flow list. Two flows are symmetric for TCP or UDP traffic if the source IP address and port number of one flow are similar to the destination IP and port number of the other, and vice-versa. For ICMP traffic, two flows are symmetric if they are request and response types. If a symmetric flow exists for an incoming flow, then it installs forwarding rules for both of them in SDN switches and removes the symmetric one from the list. The rules include an action that forwards packets to desired physical ports and the controller for the incoming and its symmetric flows. The reason for installing rules only for symmetric flows is built on the assumption that attackers, in general, spoof their IP addresses to prevent responses towards them from victims. Therefore, the TCFI installs flow rules for legitimate traffic and avoids any flow table saturation attacks in switches. If it does not find any symmetric flow for an incoming packet, it looks up whether a flow already exists in the list for the same. If a flow exists, it forwards the packet from switches without installing any rules. Otherwise, it adds the packet's flow in the list and then forwards it. Algorithm 1 shows various steps involved in the TCFI. Although the algorithm appears similar to maximum entropy detector in [23], i) it considers flow in general instead of flags based ii) a packet arrives at the controller either due to a table-miss or a forwarding rule towards the controller. iii) it stores packet headers for each packet arrives at the controller.

Feature Extractor (FE) and Traffic Classifier (TC)
The detection system triggers the FE module using a timer function. The FE takes packet headers from the packets list populated by the TCFI and extracts features from them for a set interval and resets the packet list to store headers for the next interval. Table 2, 3, and 4 show the list of 68 features that the FE extracts for TCP (34), UDP (20), and ICMP (14) flows, respectively. We derived this feature set after detailed literature survey and use SAE to reduce it. The FE computes these features for all hosts in a network which has incoming traffic flows for that particular interval. Although we perform computations on all packets in the network, we extract features by grouping them in flows. The FE computes median for a number of bytes and packets per flow in feature # 9-12, 43-46, and 63-67. It computes the entropy, H(F ), for feature # 8, 14, 16, 18, 20, 42, 48, 50, 54, 62, and 68 which is defined as follows: where set F ={f 1 , f 2 , ..., f n } denotes the frequency of each distinct value. Once the FE extracts these features, it invokes the TC module implemented using SAE. It classifies traffic in one of the eight classes which includes one normal and seven types of DDoS attack classes based on TCP, UDP or ICMP vectors that adversaries launch either separately or in combinations.

Experimental Set-up, Results, and Discussion
To evaluate our system, we collected network traffic from a real network and a private network testbed. We discuss them along with the performance evaluation results.

Experimental Set-up
We used a home wireless network (HWN) connected to the Internet for normal traffic collection. The HWN comprised of around 12 network devices including laptops and smartphones. These devices were not uniformly active for all the time which led to variation in the traffic intensity. We saved HWN traffic of 72 hours in a Linux system using tcpdump [7] and port mirroring at a Wi-Fi access point. Fraction of TCP incoming flows with RST flag set 32 Fraction of TCP outgoing flows with RST flag set 33 Fraction of TCP incoming flows with PUSH flag set 34 Fraction of TCP outgoing flows with PUSH flag set     [16] that consists of an SDN controller, an OF switch, and a network host using Ubuntu Linux systems. We used POX [6], a Python based controller, with our DDoS detection application running on it in the controller system and installed OpenvSwitch [5] in the switch to use it as an OF switch. In the host system, we used tcpreplay [8] to replay traffic traces for normal and attack traffic one at a time. We saved features computed by the FE for each interval in dataset files for the training of TC module. We set the interval 60s in the timer function to trigger the FE module for feature extraction. We divided the dataset files into training and test datasets. Table 5 shows the distribution of records in the dataset. Traffic features in the datasets are real-valued positive numbers. We normalize them using max − min normalization shown in Eqn. 5, before passing them to the TC module.
x min = smallest value in X x max = largest value in X

Results
We evaluated the performance of our system on the datasets specified in Table 5 using parameters including accuracy, precision, recall, f-measure, ROC. We use confusion matrix to calculate precision, recall, and f-measure. A confusion matrix, M , is an N × N matrix where N is the number of classes. It represents the actual and predicted classes in such a way that columns are labeled for actual classes, and rows are labeled for predicted classes for all records. The diagonal elements of the matrix represent the true-positive (TP) for each class, sum of the matrix elements along a row excluding the diagonal element represents the number of false-positive (FP) for a class corresponding to that row, sum of the matrix elements along a column excluding the diagonal element represents the number of false-negative (FN) for a class corresponding to that column. Following are definitions of various performance parameters: • Accuracy (A): percentage of accurately classified records in a dataset A = Accurately classif ied records T otal records × 100 (6) • Precision (P): number of accurately predicted records over all predicted records for a particular class. Using the confusion matrix, M , precision for each class, j, can be defined as follows: • Recall (R): number of accurately predicted records over all the records available for a particular class in the dataset. Using the confusion matrix, M , recall for each class, j, can be defined as follows: • F-measure (F): It uses precision and recall for the holistic evaluation of a model and is represented as the harmonic mean of them. For each class, j, it is defined as follows: • Receiver Operating Curve (ROC): It helps in visualizing a classifier's performance by plotting the true-positive rate against false-positive rate of the classifier. The area under the ROC gives an estimate of an average performance of the classifier. Higher the area, greater is the performance. We used the training dataset to develop the SAE classification model for the TC module and test dataset for performance evaluation. First, we developed the model for 8-class traffic classification including normal and seven kinds of DDoS attack that occur in combination with TCP, UDP, and ICMP based traffic. To make a better comparison, we also developed separate attack detection models with soft-max and neural network (NN) which are building blocks of SAE. As observed from Table 6, the SAE model achieved better performance compared to the soft-max and neural network model in terms of accuracy. We computed precision, recall, f-measure for each traffic class. Figure 5 shows their values which are derived from the confusion matrix shown in Figure 4. As seen from   Table 6: Classification accuracy comparison among soft-max, neural network, and SAE based models the figure, the model has f-measure value above 90% for normal, TCP, UDP, and UDP with ICMP attacks traffic. It has comparatively low values of f-measure for TCP with ICMP and TCP with UDP attacks due to their classification of other kinds of attacks as observed from the Figure 4. However, it is observed from the same figure that the fraction of their classification as normal traffic is less than 0.2. Figure 6 shows the ROC curve for 8 different classes. From the figure, we observe that the true positive rate is above 90% with a false-positive rate of below 5% for all kinds of traffic that results in the area under the ROC curve close to unity. We evaluated our model for 2-class classification by considering all kinds of DDoS attacks as a single attack class to make a comparison with other works. Due to the unique nature of this work involving deep learning based attack detection in an SDN, and unavailability of existing literature in this specific domain, it was difficult to compare our work with other works. Figure 8 shows the performance for 2-class classification. The model achieved detection accuracy of 99.82% with f-measure values as 99.85% and 99.75% for normal and attack classes, respectively, derived from the confusion matrix shown in Figure 7. On the contrary, the two closely related works [12] and [24], achieved a detection accuracy of 99.11% in data plane and 96% in control plane, respectively. It should be noted that both of these works did not address attack detection in the other plane.
We also measured the computational time for training and classification in our model using a machine with Intel (R) Core i7 CPU @ 3.40 GHz processor and 16 GB RAM running Matlab 2016a on Windows 7. Table 7 shows computational time for the training of 81,010 records and classification of 34,717 records specified in Table 5.

Discussion
With our DDoS detection system, we identify individual DDoS attack class and also determine whether an incoming traffic is normal or attack. A clear advantage in identifying each attack traffic type separately is enabling the mitigation technique to block only a specific type of traffic causing the attack, instead of all kinds of traffic coming towards the victim(s). Although we implemented a detection system, we separately extracted features for each host which has incoming traffic for an interval. Therefore, we can identify the hosts with normal traffic and the ones with attack traffic. Accordingly, the controller can install flow rules inside the switches to block the traffic for a particular host if it undergoes an attack.
Our proposed system has a few limitations in terms of processing capabilities. The TCFI and FE modules collect every packet to extract features and are implemented on the controller for low false-positive in detection. However, this approach may limit the controller's performance in large networks. We can overcome it by adopting a hybrid approach that can either use flow sampling or individual packet capturing based on the observed traffic in the organizational network. Another approach that could be employed to handle DDoS attacks in the data plane is to deploy the TCFI and FE modules in another host, send all packets to it instead of the controller for features processing, and then periodically notify the controller with extracted features for the TC module. To reduce the time in feature extraction, we can also apply distributed processing similar to our another previous work [18].

Conclusion
In this work, we implemented a deep learning based DDoS detection system for multi-vector attack detection in an SDN environment. The proposed system identifies individual DDoS attack class with an accuracy of 95.65%. It classifies the traffic in normal and attack classes with an accuracy of 99.82% with very low false-positive compared to other works. In the future, we aim to reduce the controller's bottleneck and implement an NIDS that can detect different kinds of network attacks in addition to DDoS attack. We also plan to use deep learning for feature extraction from raw bytes of packet headers instead of feature reduction from the derived features in future NIDS implementation.