Attacker Capability based Dynamic Deception Model for Large-Scale Networks

In modern days, cyber networks need continuous monitoring to keep the network secure and available to legitimate users. Cyber attackers use reconnaissance mission to collect critical network information and using that information, they make an advanced level cyber-attack plan. To thwart the reconnaissance mission and counterattack plan, the cyber defender needs to come up with a state-of-the-art cyber defense strategy. In this paper, we model a dynamic deception system (DDS) which will not only thwart reconnaissance mission but also steer the attacker towards fake network to achieve a fake goal state. In our model, we also capture the attacker’s capability using a belief matrix which is a joint probability distribution over the security states and attacker types. Experiments conducted on the prototype implementation of our DDS confirm that the defender can make the decision whether to spend more resources or save resources based on attacker types and thwart reconnaissance mission.


Introduction
The increasing rate of cyber networks and network devices has brought attention to make more resilient in terms of the security of the devices as well as cyber networks. The static nature of these cyber networks let the attacker perform reconnaissance activity and identify potential vulnerabilities. Using the reconnaissance mission, the attacker collects critical network information such as network topology, open ports, and services running on those ports, and unpatched vulnerabilities. Having that critical information increases the probability to penetrate the network and gaining access to critical infrastructure. As . Methods and techniques have developed to detect and mitigate those attacks. Among those approaches, one technique is to patch the vulnerability upon availability of patching. It is typically time consuming and costly for a vendor to discover and develop the patch. This significant delay puts the cyber networks being operational without patching the vulnerability, which is very risky. To address this issue, one need to develop an active form of cyber defense system which not only secure the cyber networks but also maintain the system availability to the trusted users.
To develop such a system is difficult in presence of active attackers in operational system because it can inject series of exploits simultaneously. To model these complexities, researchers have proposed graph-based tool such as attack tree/graph. Unfortunately, the attack graphs Md Ali Reza Al Amin et al. 2 can be enormously large for a medium-size cyber network thus makes difficult to apply in an enterprise network. To get rid of this issue, authors in [5] made an assumption on the attacker's behavior, named monotonicity, which states that the success of the previous exploit will not interfere with the success of future exploit. With the help of this assumption, the significant amount of information from the attack graph can be reduced and make a useful attack graph.
It is always beneficial for a defender if it has prior information on how an attacker can infiltrate the cyber network. This knowledge will help the defender to take appropriate actions to thwart any cyber-attack. One of the difficulties with this approach is how to quantify the attacker's progression at any given time. The attacker's status is constantly changing based on the defender's action. Also, the defender has less information about the attacker's actual strategy and actions. The defender has only access to a stream of noisy security alerts from intrusion detection system (IDS), and the security alerts suffer from a high rate of false alarm. The defender's action affects the system availability while maintaining the security of the network. So, the defender needs to make a trade-off between the availability cost and security cost.
In this paper, we aim to deceive the attacker with a fake network while maintaining the trade-off between availability and network resiliency. To do so, we use an exploit dependency graph to capture the attacker's progression throughout the network. We represent the exploit dependency graph as a hypergraph where nodes represent security conditions and directed hyperedges represent exploits. Each security conditions can be either true or false. When the condition is true that means the attacker possesses a certain capability. At any given time and certain security states, attacker uses certain capabilities to exploit the vulnerability. We incorporate these attacker capabilities in our model to capture different kinds of attacker behavior. Capturing attacker capabilities can help the defender to save or enforce more resources to prevent the attacker from reaching the goal. A TCP reset can block an attacker from further penetrating if the attacker is a novice attacker, but the action will not work for a more knowledgeable attacker. This is why it is very critical for a defender to learn the attacker's capability while the defender is making a counterattack plan. To do so, the defender maintains a belief matrix which is the joint probability distribution over attacker type and actions. In this paper, we defined attacker types based on the different level of attacker knowledge, aggression, and stealthiness. The belief matrix is constructed in such a way that it is consistent with all available defender's information such as security alerts, previously deployed action. Defender summarizes all this information make an optimal action which all balance the trade-off between security and availability. Taking optimal action casting as a partially observable Markov decision process (POMDP).
To resolve the scalability issue due to the high dimensionality of the defense problem, we use an online algorithm based on the partially observable Monte-Carlo planning [17], which simulates future possible state trajectories from the current belief to evaluate the effectiveness of various defender actions.
We use Software Defined Networking (SDN) to deceive the attacker with a mix of true and false information in the reconnaissance phase. These crafted information helps to change the network view perception from attackers' point of view. Defender can analyze malicious traffic and differentiate the traffic from trusted user to malicious user using SDN. In this way, a trusted user can take seamless services while the defender is brawling with the attacker.
As our main contribution in this paper, we develop a dynamic deception system (DDS) to deceive the attacker with the fake network while capturing attacker capabilities and maintaining the trade-off between availability cost and security cost. The key contributions of this paper are summarized below,

Related Work
Researchers have proposed cyber deception approaches that introduce fake networks by varying system characteristics [38], manipulating attacker's probes [36,37] and introducing virtual network interface controllers and route mutation [39]. These approaches are focused on introducing fake nodes from an attacker's point of view and assume a static environment and attacker and defender strategies.
In [38], authors introduce systems that change the view of a cyber network by obscuring some system characteristics where [36] alters the system view by manipulating attackers' probe. Trassare et al. [37] use traceroute function to deceive the network reconnaissance attack. In Dunlop et al. [42], authors propose a mechanism where they hide IPv6 packets to achieve anonymity. They added a virtual network interface controller and shared a secret with all hosts. To defend against eavesdropping and DoS attack Duan et al. [39]  Attacker Capability based Dynamic Deception Model for Large-Scale Network 3 Mutation Technique to change the networks data flow. Reconnaissance tools such as Nmap or Xprobe2 collects critical network information like host OS or service by analyzing the packet received after probe.
In recent publications [29], [30], and [31] authors present a system that performs dynamic address space randomization based on Software Defined Networking (SDN). These approaches turn out an effective one, but they suffer from high network overheads and also cannot detect the malicious scanning source where we achieve using our dynamic deception system.
The authors in [40], propose a defense system based on IP address randomization and placement of network decoys. In their system, they only consider the scanners from the Internet but in our system, we consider insider scanners as well as the malicious scanner from the internet.
All these approaches in cyber deception area tend to change the network view from an attacker's point of view. However, they all failed to answer the question of what if the attacker enters the network where unpatched vulnerabilities are present, and patches are not released yet. A network administrator cannot just let the attacker compromised the system. As we mentioned earlier, the attacker always has time-advantage over unpatched vulnerability where vulnerability exposure window is high. A defender has to take defensive action while making a tradeoff between availability and security. Our approach not only changes the network view but also influence the attacker to take the path toward fake networks while keeping availability and security at a satisfactory level.
In [10], the authors use the dependency graph to provide solutions for the cyber defense system. The issue with that approach is network availability to the trusted user because attacker always starts with the same static network and defender have to take actions (system modification induces blocked vulnerabilities) which will have a more significant impact on availability. We solve this issue by introducing fake networks along with real networks. Our approach will help defender maintaining the network service availability and collect critical intelligence information about the attacker.

Exploit Dependency Graph
The concept of attack trees and graphs were developed with a goal in mind that one can study all possible sequences of exploits that an intruder can take to infiltrate a network and reach its goal(s) state. Attack graph consists of vertices (system states) and edges (transition relations) where each vertex connect each other via exploits. To generate an attack graph, one has to enumerate all system states. In this process, the attack graph quickly grows exponentially. There are several attack graph applications in network security such as vulnerability analysis, intrusion alert correlation, and attack response system. Attack graph can be applied in both penetration testing and network hardening.
Significant progress has been made in generating attack graph automatically [1], [2], [3]. Along with the network size, attack graphs grow exponentially which makes the visualization nearly impossible for a human to understand what's going on. To deal with this complexity, [4] proposed a system where one can reduce the attack graph information without loss of any generality and create a graph which grows quadratically. Authors in [5] made an assumption regarding the attacker's behavior which allows to simplify the attack graph and also reduce the attack information. The assumption named as monotonicity [5], states that the success of one exploit does not interfere with the attacker's future ability to exploit. With the help of this assumption, one does not need to enumerate all security states, rather can create exploit dependency graph describing how security conditions relate to exploit. The advantage of exploit dependency graph is that it can be easily generated for a large network where the corresponding attack graphs would be obstinately very large to generate. In [5], the authors construct a graph where nodes represent security condition, and edges represent exploit, which termed as exploit dependency graph. Security conditions are the atomic fact that they can be either true or false and exploits relate to security conditions via preconditions and postconditions. The approach taken by Ammann et al. in [5] is similar we adopt in this paper to do the modelling of attack pathways using exploit dependency graph. The edges in exploit dependency graph relates security conditions in such a way where a single exploit might have multiple preconditions and multiple postconditions. Such edges which are connects two sets of nodes rather than a pair of nodes we called it hyperedges. The security conditions present in [5] are a mix of different attributes which is true under normal network configurations termed as initial conditions. During an attack, attributes can be made true which is attack conditions. With this issue, the initial conditions are set to be always true whether a network is subject to be an attack or not. For this reason, we take a slightly modified definition from [6] where it does not include conditions representing the normal network configuration explicitly rather assume that the set of conditions solely consists of attack conditions. This modification allows setting the conditions of a network false which has not been subject to an attack.

POMDP Approach
A partially observable Markov decision process (POMDP) is a process which connects unobservant system states to observations. POMDP is a combination of a Markov decision process (MDP) to model system dynamics with a hidden Markov model. The reward from the POMDP approach depends on an agent's action and sequence of system state where the agent cannot see the system state directly rather, the agent makes an observation. Based on the observation agent construct a belief state which is a EAI Endorsed Transactions on Security and Safety 04 2019 -08 2019 | Volume 6 | Issue 21 | e2 4 probability distribution over system states. Based on the belief matrix agent call the optimal action for each belief state. The advantage of POMDP is that its general enough to model different kinds of real-world problem such as robot navigation problem, cybersecurity, machine maintenance, and planning issue with uncertainty.
A discrete POMDP can be formally described as a 7tuple ( , , , , Ω, , ), where At each time step, the system is in some state ∈ and for the action ∈ taken by the agent the system state transitioned from state to 4 ∈ with probability ( 4 | , ). While transitioning state, at the same time agent receive an observation ∈ Ω with an observation probability ( | 4 , ). At last, the agent receives the reward ( , ). The ultimate goal is to choose an action in each belief state which will maximize the expected future discounted reward,

POMCP Framework
In large and fully observable domains, Monte-Carlo Tree Search (MCTS) has tremendous performance in online planning [6]. MCTS is a new approach to do online planning. It overcomes the curse of dimensionality by taking only sample states instead of taking the whole possible system states. MCTS requires a black box to simulate where the problems are too complicated or too large to represent the probability distribution. It has another advantage in terms of prior domain knowledge. In estimation the potential, MCTS uses the random simulation for long-term reward where it plans over the long horizon and often effective in estimation the potential where any prior domain knowledge or heuristics search is not present [7].
The authors in [6] extended MCTS to partially observable environments (POMDPs). Other planning algorithms, i.e., value iteration [8] suffers from two important issues referred to as scaling and history. For example, for n-states value iteration algorithm creates ndimensional belief state, and it must evaluate all history which is exponential in the horizon.
The search algorithm in [6] constructs a search tree of histories which is online-based. The value of history is estimated by the node of the search tree using Monte-Carlo simulation. The start space in each simulation is sampled from the current belief state, and transition and observations are sampled from the black-box simulator. The authors in [6] showed that for correct belief state the planning algorithm converges to the optimal policy for any finite horizon POMDP. Monte-Carlo simulation also can be used in updating the agent belief state [6]. The important feature of Partially Observable Monte-Carlo Planning (POMCP) algorithm is that it uses the same set of Monte-Carlo simulation for both trees search and belief state.

Threat Model and Assumptions
The model is based on a single attacker who is trying to penetrate the network where we are going to capture the attacker's capability. Without considering the attacker's capability, a security model is a waste of resource or lack of resource. Based on the attacker's capability, the defender is going to block vulnerabilities to thwart the attacker and drive the attacker towards the fake network. The defender is able to be blocking exploits by doing system modification. Those system modifications have an effect on normal system operation. This is why the defender needs to estimate the true attacker's capability. For a novice attacker, might be it is sufficient to apply some countermeasure rather than blocking a vulnerability. In our previous paper [9], we assumed attacker capabilities; however, in this paper, we incorporated attacker capabilities to do the dynamic security model which is presented in Fig. 1. There are two main primary objectives of our dynamic security model i.e., 1) quantify the security state and, 2) taking the optimum deception action based on the attacker capabilities. To quantify the security, we define the security state as a current level of attacker progression. To capture the attacker progression, we use an exploit dependency graph [10]  condition is in true state it means attacker has a particular set of capabilities whereas false value represent attacker does not possess any condition from hypergraph H. For an example, if the attacker possesses a condition that could be led to a conclusion that an attacker may build the trust relationship between two hosts or the attacker reached the goal state. To specify the goal state, we define a parameter to represent the goal node O are real and fake network goal node, respectively. This is the node defender wants to protect from an attacker. Defender's main objective is to protect the < O and drive the attacker towards Q O .
Each exploit from hyperedges has two conditions, termed as R S ( ) and R V ( ). We assume based on [10] that, to attempt an exploit R an attacker needs to set true all of the preconditions of that exploit termed as ∈ R S . There are some exploits without having any preconditions, R S = ∅, termed as initial exploits and denoted by I . To attempt initial exploits attacker does not need any prior capabilities (maliciously enabled). When an attempt to an exploit is successful, all of its postconditions become enabled and let the attacker penetrate more into the network. In Fig. 2, we present an exploit dependency graph which is created using Topological Vulnerability Analysis (TVA) [11] tool to explain the model and the results. Whenever a condition is enabled, it means an attacker is having a particular set of capability where the current security state, F , describes the set of capabilities of the attacker. A security state, ⊆ , is called a feasible security state if for every condition b ∈ there exists at least one exploit R = and set = { , , … , / } represents the state space for this model. In this model, we assume defender will act first and taking actions which eventually interfere with the attacker's progression and reduce the attack surface. The security state evolves probabilistically as a function of defender's and attacker's action [10]. We also assume that the defender has the capability to take action in effect of blocking vulnerabilities. This action includes changing network configuration or shut down a port or any active services. But in reality, the defender is not able to block any individual vulnerability as per authors in [10], rather defender's action induces a set of blocked vulnerabilities. On the other hand, sometimes defender's action is not able to block any vulnerability. To capture this behavior, we assume that the defender has some certain set of actions. The action which will block the vulnerability and influence an attacker to choose a different network path. So, we assume that the defender can change the network configuration on the fly based on the attacker's action to prevent vertical movement. The space of defender's available action set is represented by = { I , , , … , / }. Here, I represents defender's null action which eventually means the defender will not block any exploit. The remaining actions from the set of , signifies the network changes which will induce a set of blocked exploits. Each action associated with the set of blocked exploits influences the attacker to seek the available paths. Defender's action will have an impact on the availability of the system to trusted users. So, it is a goal to a defender to make the trade-off between network availability and network security. To capture this behavior, we assign a cost to each of the defender's action set. Based on the cost, the defender is able to choose an action which will limit the progression of the attacker throughout the network and minimizing the negative impact on the system availability.
Based on the single attacker who is trying to infiltrate the system can only increase its capability by exploiting more vulnerabilities. On the other hand, it also increases the chance of being detected. Defender's goal is to prevent the exploitation of a vulnerability on the real network and to allow the exploitation on the fake network. From the monotonicity assumption, we know that once an attacker enables a condition, it remains enabled all the time. For a given security state, F , the attacker will have some set of available exploits described by ( F ). From the available set of exploits, attacker will attempt exploits based on his capabilities. Available set of exploits is defined by Eq. (1) which is given below for real and fake network, Two essential requirements must be satisfied for an exploit R = ( R S , R V ) to be available: (1) R S ⊂ , i.e., all of the exploit's preconditions must be satisfied :(2) R V ⊈ , i.e., the exploit's postconditions must not all be satisfied [10].  does not lie within a set of blocked exploits, with a probability of attack and succeed which is defined by Eq. (3,5). In this example, only exploits m , n are succeeded and the updated security state is F = { m , n } (green circle). In the above figure, doubled circle shaded shape represents the security state. The strategy attacker will take solely depends on attacker capability. To model attacker types we assume an attacker will be one of types which are represented by the set Φ = { , , -, … , / }. Each type of attacker R ∈ Φ will have the conditional attack probabilities (CAP) over the exploits. CAP depends on the parameters such as defender's action p , the available set of exploits ] , and attacker capabilities ] . For a given security state F and under a defense action F the CAP over the real network exploit q ∈ is given by, Similarly, for the fake network, By dividing the set of available exploits into two categories helps us to understand how an attacker change the attacking strategy. When defender does not block any exploits, attacker attempt with a probability which is defined by the Exploits that are attempted with a probability depends on a certain parameter succeed, which is called attack success probability (ASP). To block vulnerabilities defender will choose the action from the action set ∈ . Attacker always tries to create a set of available initial exploits from reconnaissance state to penetrate the network. So, for any given exploit, q and q , there is a probability of success, Similarly, for the fake network, As soon as, the exploit attempts are successful, it enables all the postconditions, which eventually form the updated security state, as shown in Fig. 3. Defender's lack of information regarding the current security state and attacker true strategy which can be learned from noisy security alerts. In the next section, we describe how the defender uses that information to construct the belief by getting security alerts from the Intrusion Detection System (IDS). These security alerts are mixed with false positive and false negative alerts. For a defender, it is important to differentiate those mixed alerts for better defense actions. To do the modeling defender's observation with the security state, we take the approach from our previous paper [9] which is described below. Intrusion Detection System (IDS) is a major component in this model because the defender's certainty over the security state depends on security alert. IDS generate security alerts in a sequential form when an attacker attempts to exploit and progress through the network. Those security alerts are not free-form noise terms false positive and false negative. Even sometimes there will be no alert for exploit activity which solely depends on Md Ali Reza Al Amin et al.

EAI Endorsed Transactions on
Security and Safety 04 2019 -08 2019 | Volume 6 | Issue 21 | e2 7 attacker capability (stealthiness) termed as a false negative. Similarly, it generates alert for legitimate user activity termed as false positive. It is critically important for the defender to know the exploit activity is going on. Based on the alert, the defender will choose his defensive action to drive the attacker towards deployed fake networks. Filtering out the noisy alert from true alert is an important factor to improve the efficiency of the defender when it turns in real-time. In this work, we are considering only known vulnerabilities. There are several alert correlations with exploit activity techniques out there [12], [13], [14]. In this work, we are not focusing on alert correlation; rather, we are assuming that defender can do the alert correlation. Let = { , , -, … , / } and ′ = { , 4 , -4 , … , / 4 } represent the finite set of security alerts, real and fake network respectively, generated by the IDS which is eventually the observation set for the defender. Each of the alert from real nodes set and fake nodes set can be generated by the IDS, given by the set are the power set of and ′. The vector of security alerts received by the defender at time + 1, denoted by FV, ∈ Υ = {0,1} / ' , consists of all security alerts triggered during the given iteration [10].
To capture the uncertainty over the security state and attacker type we construct a belief matrix denoted by F . This belief matrix is also called information state [15]. It combines all the defender's available information into the matrix which includes initial security state, attacker type, history of all defense action from time 0 to − 1 and all observations (security alert) from time 0 to denoted by F = ( I , I , I , … , FS, , F ). The belief matrix represents joint probability distribution over security states and attacker types [10], is given below as a matrix form, The space ∈ ∆( × Φ) represents the probability distribution over state and type space ( × Φ). In the matrix, F presented in the double-stochastic matrix for each . Each row in the matrix probability mass function over the type space for a given state, and each column represents a probability mass function over the space of security states for a given type [10]. Defender update the matrix whenever any information reflects consisting of current defense action F and observation vector FV, . For any defense action F = and observation FV, = q , the belief update is defined as FV, = Ÿ b ( F , q , ) b ∈ where ( )′ ℎ is the update function, b ( F , q , ) = Š FV, = b ¡ F = , FV, = q , F = F ) is given by [8], The above terms are defined below, where b ª is the transition probability from state R to b under defense action u, and bq ª ( F ) = ( FV, | FV, = F , F , F ) is the probability that IDS will generate observation vector q when transitioning from state R to b under a defense action u. Eq. (8) defines the trajectory of beliefs based on security alerts termed as observations and series of actions. Under a defense action u, transition probability R to b is controlled by a set of exploit events. For the available set of exploits from Eq. (1), each event in the set of exploits in the binary form (successful and unsuccessful). The belief update procedure is a controlled Markov Chain where control is defender action [10]. The majority of POMDP planning methods operate under Bayes theorem [16]. For a large-scale cyber network, a single Bayes update procedure could be computationally infeasible. To plan efficiently for large-scale POMDP, we adopted the model described in [17] for the approximation of the belief state. As it is mentioned earlier in this section that this model is based upon a single attacker who is trying to penetrate into the network. However, from multiple attackers' perspective the model needs to be updated. As an example, if we think that there are two attackers in the network and defender is trying to deceive those two attackers, using our model defender can deceive one attacker at a time. Two attackers may appear at different locations in the network at the same time. As our model is state based so that to work with two or more attackers at a time, we need to improve our model. Another important factor, we need to consider that defender cannot block single vulnerability rather than defender's action induce a set of blocked vulnerabilities. This is another reason why multiple attacker concept will not work with our model.

Defender's Action
As soon as the attacker progress through the network defender will take action in real-time to limit the attacker progression. Selection of action step can be improved if the defender has some domain knowledge beforehand. To aid with the domain knowledge, we introduce the utility function. Before taking any defensive action, it is also necessary to measure the impact on availability and security cost.

Utility Function
Attacker builds an array of node utility function based on the base score metrics for exploiting vulnerabilities [18]. For every exploit, the attacker uses the metrics to justify the attack success probability which is illustrated in Eq. (13) and serves as the attacker's initial knowledge about the network and vulnerability. The defender also creates the same utility array. From [18], we borrow the impact (I), and exploitability (V) metrics to define the defender's utility.
The above terms are defined as CI = ConfImpact, II= IntegImpact, AI = AvailImpact, I = Impact, Vi = Exploitability, AC = AccessComplexity, AI = Authentication and AV = Accessvector. The utility array function is defined below, Example 1: Consider a scenario where there are five nodes and attacker send scan queries to the neighbors of node 1.
The defender needs to respond to the scan queries deceptively by mixing of true/false information at random. Here, 2, 3 are real nodes and 4, 5 are fake nodes having following vulnerabilities vul( -), vul( a ), vul( i ) and vul( j ). Defender wants to drive the attacker towards node 4 and 5. We are assuming that using above utility array equation defender come up with the following values p ( -) = 15, p ( a ) = 5, p ( i ) = 30, and Ua(n5) = 50. A true rational attacker will go after node 5.

Cost Function
In cyber-deception, there is a possibility where you can leverage the availability cost over the security cost. There are two benefits when the attacker is in the fake network: 1) defender can collect as much as intelligence information on the adversary which helps to derive the attacker's capability, intentions, and targets, etc., 2) defender can maximize the network availability to the trusted user during a cyber-attack. An availability cost p for each action defender take to drive the adversary towards the fake network. For some defense action, there will be no impact on the availability, and sometimes there will be a more significant impact. To formalize this notion, we represent the availability cost p : → ℝ for each defense action taken by the defender similarly for the security cost « : × → ℝ to depict the cost while the system is in various security state under defense action u. Here, we are considering the availability of a node regarding end-to-end packet delay (considering IT system).

End-to-End Packet Delay
Let's assume that, ³ and N represent total delay and number devices between a source and destination. The endto-end delay defined in [19] as,

14)
The above equation's terms are defined as follows £<´[ = processing delay, F<p/« = transmission delay, £<´£ = propagation delay, µª]ª] = queuing delay and £<´[´ = processing overhead because of authentication, integrity, and confidentiality. For an uncongested enterprise network, µª]ª] ≃ 0 and the distance between a source and destination node is very small so that £<´£ ≃ 0. The processing delay, £<´[ , is often negligible; however, it strongly influences a router's maximum throughput, which is the maximum rate at which a router can forward packets [24]. So that, Eq. (14) can be reduced to, where F<p/« = / , L = packet size and R = transmission rate. For every defense action, defender will measure the total end-to-end packet delay. So, the availability cost in terms of delay is defined as follows ª = ³ . We assign more cost to the goal conditions (attacker's target node) as defender's goal is to keep away the attacker from achieving the goal. The total cost regarding a security state and defense action is given below, Here, f, is a weighted factor, determines which cost focused more (f = 0 represents defender is concerned only with security cost, f = 1 means defender is only concerned with availability cost). The proposed online deception algorithm is based on an existing online solver [9], computes optimal action from deception standpoint to deceive attacker with the fake network while balancing availability and security cost.

Dynamic Deception System
In our dynamic deception system (DDS), we deploy fake networks along with the real networks to deceive the attacker and drive the attacker towards the fake network while the attacker is in real network. In this approach, defender can save more availability cost in terms of securing the cyber network. To deploy the fake network and make it as looks like the real network we use software defined networking (SDN). The core part of our dynamic deception system consists of SDN flow rules generated by our SDN controller which working with a deception server and make the network traffic in the way to looks like different than it actually is. Our dynamic deception system consists of five components such as a) a SDN controller which generates the flow rules dynamically and control the network traffic, b) deception server which manipulates network traffic, imitate some virtual network resources based on the user policy, and perform the online deception algorithm, c) delay handler which keeps the bandwidth balance between real and fake network, so that attacker couldn't distinguish the real and fake network, d) IDS alert correlation server is responsible for correlating the alert with the exploit activity, e) SDN network elements are responsible to controlling and analyzing the network traffic after getting the flow rules from SDN controller. When packet arrives at SDN switch, which is connected to our system, the SDN controller generates flow rule in accordance with our fake network. The packet either sends to the deception server or send to the destination after tagging each packet. When the packet sent to the deception server, the packet is crafted in accordance with the fake network when reply back to the sender by adding artificial delay to make consistency. If the packet is sent to the real network, an artificial delay is added when reply back to the sender to make consistency between real network and fake network. For a very large network, the deception server could be a bottleneck because of a large number of requests can come to the server. To handle this issue, our deception server can be replicated so that each of the deception servers can handle a certain number of requests. Our system is implemented using in Python. We use POX framework [22] to implement the SDN controller and Scapy framework [23] to implement our deception server. We use mininet [21], which is the current state-of-art SDN network emulator to test our implementation. In Fig. 4 we presented a systematic architectural overview of our DDS system. In the next couple of sections, we briefly describe our DDS system.

Online Deception Algorithm
For online deception algorithm, we took the approach described in our previous paper [9] which is described below.
Online defense algorithm is a heuristic search algorithm for determining defense actions in real-time as the attacker progresses through the network and security alerts are generated where scalability is achieved via a sample-based, online defense algorithm that takes advantage of the structure of the security model to enable computation in large-scale domains. After employing defense actions (e.g., blocking vulnerability) defender can evaluate the improvements by assessing the attacker's attacking path. For a scalable network, computing optimal action while deceptively interacting with the attacker is a challenge. Offline POMDP solver aims to compute the optimal action for each belief state before runtime. Although such solvers have improved their efficiency [24], capturing the optimal action can be intractable for large networks. To resolve this issue, Silver and Veness [6] developed an online algorithm termed as Partially Observable Monte-Carlo Planning (POMCP) to handle large-scale network while computing optimal action. Online methods interleave the computation and execution (runtime) phases of policy, yielding a much more scalable approach than offline methods.
POMCP algorithm is based on and makes use of POMDP [24]. There are two types of nodes in POMCP: belief nodes which represent a belief state and action nodes which are their children nodes that can reach by doing an action. In this work, action selection procedure as same as POMCP algorithm described in [6] and belief update procedure is based on [10] where it solves the large observation space problem. In POMCP, a belief state updates when a sample observation matches with realworld observation, but for large observation space, it barely matches with real-world observation. In the modified belief update procedure presented in Algorithm 1 check a statement whether each incoming alert R ∈ match with over a security state, ( ) = ( ). The alerts are generated whenever an attacker attempts an exploit. Alerts not in Z(s) cannot be generated by exploit activity for that security state. We are referring those alerts are false alerts for defender.
An agent begins the simulation by calling a generative model provides a sample successor state, observation and cost given a state and action, ( 4 , , )~( , ). The modified belief update procedure is given in Algorithm 1, where F is a state-action pair named particles. History of search tree as shown in  is constructed by calling the generative model and successive sampling from current belief. Monte-Carlo Tree Search (MCTS) uses Monte-Carlo simulation for assessing search tree nodes [25]. In the search tree, nodes represent histories and branches from the node in forwarding direction represents the possible future histories because of having partial observability of the fundamental process. A simpler version of MCTS uses greedy tree policy at the very beginning of the simulation, where it selects the action with the highest value. UCT algorithm [26] is used to improve the greedy action selection stage. In the search tree, each action selection is made using UCB1 [27], and the state is being viewed as multi-armed bandit rule to balance the exploration and exploitation. In the UCT algorithm, there is an option to use the domain knowledge [26] to initialize the new nodes. We use the utility array function p(<,Q) as our initial domain knowledge which is improved during more simulation runs. The optimum action for the defender while interacting with the attacker turns into a POMDP. Casting optimum action is defined as below, where 0 < < 1 is the discount factor, and ( F , F , ) represents the cost under attacker types F for each belief state F when an action F is selected from the space of action where ( F , F ) = ∑ F R Š F , F , ‹.

« ‰ ∈¬
For each belief state, defense action generates according to the policy function and belief update must follow the procedure defined in Eq. (14.7). The optimal policy * is obtained by optimizing the long-term cost, which is given below, * = min The optimal policy defined in Eq. (18) specifies the optimal action for each belief state F ∈ ∆( × ) where the expected minimum expected cost calculated over the infinite time horizon. The defender will choose the action where the cost makes the trade-off between availability and security cost. In POMCP, a belief state updates when a sample observation matches with real-world observation, but for large observation space, it barely matches with real-world observation. In the modified belief update procedure presented in Algorithm 1, check a statement whether each incoming alert R ∈ match with over a security state, ( ) = ( ). The alerts are generated whenever an attacker attempts an exploit. Alerts not in ( ) cannot be generated by exploit activity for that security state. We refer those alerts are false alarms for the defender. To evaluate the scalability of our approach, we experimented our online deception algorithm on a graph consisting 160 conditions (nodes), 150 exploits (hyperedges), 60 defense actions, 35 security alerts resulting more than 10 9 observation vectors. The resulting security states from this example exceed 100 million. The pseudocode for modified belief update is given below,

Software Defined Network Controller
In our DDS, the primary objective of the SDN controller is to generate network flow rules based upon the arrival of network packets. The generated flow rules later forward to SDN switch to control and analyse the network traffic. For our deception model, we use the following flow rules based on our fake network needs, 11 In our system, ARP request forwarding the most important part as all the requests are handled by our deception server. Usually, a network is flooded by ARP request to discover a host and match the IP address with MAC address. Deception server receives ARP request and responds with an appropriate response.

Routing of DHCP Packets
As fake networks associated with DHCP lease, our deception server serves as a DHCP server. It leases IP to the fake network's host when any host from the fake networks trying to connect with the network.

Routing of DNS Packets
To make sure the reachability to the legitimate services, DNS requests are handled by our deception server. To forward the DNS packets appropriate flow rules between host and the deception server are generated.

Deception Server
In our deception server, there are six components to deceive the cyber adversary and handle the packets coming from hosts connected with the network and crafted the packet based on the fake networks. Below we briefly discuss the six components,

DHCP Handler
The DHCP handler acts as a DHCP server in our deception server and responsible for assigning DHCP lease to hosts which are trying to connect with the network.

ARP Handler
All ARP requests are forwarded by appropriate flow rules to our deception server. Based on our fake network specifications, our deception server modified the request and sent back to the requesting host.

ICMP Handler
ICMP error messages are forwarded by the specific rules to our deception server. Packets with the message like destination host unreachable contain nested packet. Such a nested packet cannot be updated automatically in the SDN switches. We forward such packets to our deception server and crafted accordingly and send back to the destination.

DNS Handler
To make sure the reachability to the legitimate services, DNS requests are handled by our deception server and creates appropriate responses.

Gateway Simulator
Gateway simulator is using to make the fake network more realistic as some of the components from the fake network does not have any endpoints. Such endpoints are like routers or gateway. If our deception server receives any probing request, it sends back an appropriate response to the destination.

Route Simulator
Route simulator is using in our deception server to reply packets with mapping functions like traceroute. If the probing request to any node has lower TTL value than specified in our fake network, our deception server handles those packets on behalf of router/gateway between the scanning source node and destination node.

Delay Handler
Besides the traditional scanning method, advance level attackers can analyze the statistics of round-trip time and measured bandwidth on links to find the inconsistency [44]. To make the real and fake network indistinguishable, we take a similar approach described in [44]. By adding artificial delay to certain packets, we change the link bandwidth and host delays. To make the consistency, firstly, we collect measurement data from real network nodes and use those data as the basis for our fake network.

Experimental Setup and Metrics
Now we will investigate an illustrative example using the sample exploit dependency graph presented in Fig. 2. For this example, we assume an attacker will p = 4 types by varying attacker knowledge, aggression, and stealthiness level. We will present four use cases, how defender deceives the attacker with a fake network for four attacker types. Aggression level is defined by the conditional attack probabilities and success, which in terms called the rate of movement of the attacker throughout the real network. Knowledge level is defined by the Eq (3), (4) where the separation of two parameters x ]< s ( R ) & ]< s ( R ) dictate the knowledge level of the attacker. Stealthiness is described by the false alarm and the probabilities of detection. In the below table we presented the four attacker types Φ = { , , -, a , i } with their knowledge, aggression and stealthiness level. generated using TVA (Topological Vulnerability Analysis) [11]. We use the [28] software package to use the POMCP solver in our simulation and use python and Matlab to implement our model. In the section, we are going to present our simulation results for each of the attacker types defined in Table 1. In Table 2. we presented probabilities of detection for real networks for each of the four attacker types.  In Table 2. columns represent attempted exploit, and rows present the triggered alert. Each entry from the table represents the probability of detection under each of the attack types.

Experimental Results
Use Case I For this use case, we use attacker Type-I ( , ) from Table  1. We calculated the conditional attack probabilities for real and fake networks using Eq.  It is assumed that the security state starts from the empty state defined as, I = ∅. The defender uses utility array function to construct the initial belief which is defined in Eq. (13). We run the simulation 5000 times. The defender initially (from t=1 to t=4) does not take any action to save the availability cost. As the attacker progress and enable more conditions, defender belief gradually updates based on the received security alerts. Then defender begins to deploy actions (t=5) to block exploits. As we know from monotonicity assumption, once a security condition enabled it remains to enable all the time. Whenever defender belief reflects that attacker is close to goal conditions will block the exploits to prevent the attacker from reaching his goal. As we can see from Fig. 6 at time step t=8, defender blocks exploits { n , Å , ,I } which prevents the attacker from moving forward. From this point, the attacker will try to progress from another point as he received the response from the defender in the reconnaissance stage with a mix of true and false information. Then he moves toward the fake network, Fig.  7, based on his available set of exploits dictated by Eq. (1). At this stage defender let the attacker move forward. From time step t=9 to 13, defender action is null. As it (fake) is same as the real network from the attacker perspective, the defender will take action only when attacker has an alternative way to reach the next security state (see time steps t=14-20 in Fig. 7). In Table 3, we present our performance evaluation data while attacker start to exploit real initial nodes vulnerability and ended up with real to real network end state and real to fake end state. The numerical numbers in the 2nd column represent how many times out of 25 sample runs attacker start with real network initial nodes and 3rd column represents how many times attacker ended up with real network end state without transition to the fake network and 4th column represents how many times attacker make transition from real network to fake network and end up with fake goal state. In Table 4, we present the same statistics for the fake network.  Table 4, we can see that up to 76% of the time attacker starts with the fake initial nodes and carry out the series of exploit to achieve the fake goal state. When the «RÌ = 500, out of 25 sample runs 15 times attacker start with the real network (Table 3) and 13 times ended up with real network goal state because of poor quality of possible future histories estimation. When the number of simulations increases and more possible future histories are taken into account, the action estimation quality increased as well as policy function (e.g. «RÌ = 5000, 19 times out of 25 times attacker start and ended up with fake goal state).   Fig. 8, we plot the discounted cost against each time step for 25 sample runs while attacker in real network state. When «RÌ = 500, 15 times attacker starts with the real network where out of 15 times attacker reached the real goal state (node) 13 times. Trajectories which ended up with the red circle, represents the path where attacker reached the goal. Initially, for low simulation counts e.g., «RÌ = 500 defender does not have much information about attacker's strategy, capability. Because of this, defender aggressively blocks exploit from the very beginning (t = 0), which eventually produces a low quality of estimation and ended up with less availability. For poor estimation, attacker also reaches into the goal node several times as shown in Fig. 8 upper left corner. As soon as, simulation count increases more possible future histories are included which results in high quality of estimation (which set of exploits to be blocked). As it is evident from Fig. 8 bottom right corner, though attacker starts with real network for 5000 trials but could not reach any goal state.

Use Case II
For use case II, we use attacker type II ( -) from the Table  1 where attacker knowledge, aggression and stealthiness level as follows moderate, high, and high respectively. The conditional attack probabilities and success probabilities are given below for this use case,  We kept other simulation parameters same as for use case I as we are evaluating use case II for the same exploit dependency graph, we presented in Fig 1. In this simulation, we present the performance evaluation table to capture the attacker progression from real to real and real to fake network. From Table 5 we can see that, the number times attacker starts with the real node less than the use case I because attacker has less knowledge level than previous use case. The results are reasonable because attacker hardly distinguishes the real and fake network. Also, it is difficult for the attacker to discover which exploits are not blocked by the defender in a security state. In Table 6, we present the same simulation results for the fake network.  Table 6 represents the statistics on how many times attacker go back to real node from the fake node. As we stated earlier that as soon as attacker enters the fake network, attacker cannot go back to the real node. We can conclude based on this simulation that up to 84% of the 15 time attacker starts with the fake initial nodes and carry out the series of exploit to achieve the fake goal state because of moderate level of knowledge skill.

Use Case III
For use case III, we use attacker type III ( a ) from the Table 1 where attacker knowledge, aggression and stealthiness level as follows moderate, moderate, and moderate respectively. The conditional attack probabilities and success probabilities are given below for this use case,  The number of times attacker starts with the real node is increased in this simulation. As defender beliefs reflect that attacker is more knowledgeable, the conditional attack probabilities are higher than the previous case. In fact, in this simulation, the numbers are higher than previous two use cases. This is because defender possesses a high knowledge level. Because of his high knowledge level, he has the ability to find out the blocked exploits before he moves. As soon as the attacker identifies the blocked exploits, he will not attempt it unlit defender changed her action. In this case, up to 88% of the time attacker starts with the fake initial nodes.

Use Case IV
For use case IV, we use attacker type IV ( i ) from the Table 1 where attacker knowledge, aggression and stealthiness level as follows low, low, and low respectively. The conditional attack probabilities for attacker type IV are given below,   Node  500  1000  1500  2000  3000  4000  5000   12  10  10  9  9  5  2   1  0  0  2  1  0  0   11  10  10  7  8  5  2 From Table 8, we can see that though attacker starts with the real node few times but end up into the real network goal node very few times. The number times attacker ended up on fake goal node is higher than any of the previous three use cases. This is because of the attacker skillset (knowledge, aggression, and stealthiness) reflects as a novice attacker. From the statistics, we can infer that up to 92% of time attacker starts with the fake node and ended up with fake goal state. In this case, defender did not use many resources to block this attacker. As defender's belief reflects that it is a novice attacker. This is why defender saved a lot of resources in terms of availability and security cost.
We also investigate the host infection rate with and without our DDS based on network scanning techniques. To do this, we implemented some previous common scanning techniques [29], [30], [31], and [32] which is also discussed in the related work section. To implement these scanning techniques, we use a python library name libnmap [33] which provides an API to Nmap [34] as well as python scapy framework. Based on the discussion [35], an adversarial scanner first selects the scanning space which is denoted by Ω. In the scanning space, attacker selects the IP addresses to probe. Also, the address distance denoted by , specifies the numerical differences between IP address of scanner and scanning target [35]. Local Preference Scanning discussed in [29], is a kind of biased scanning technique. In this technique, based on the localhost information some specific regions of a network are chosen. But there is an issue, for the current state-of-the-art computer networks, hosts are not uniformly distributed within the address apace. The attacker can increase the speed to detect vulnerable host by scanning IP address where it densely populated [35].
Preference sequential scanning probes the IP address sequentially. In preference scanning technique, attacker use local preference and selects start IP address with small address distance (ℎ) to the host IP address.
Non-preference sequential scanning is the same as preference sequential scanning, but it selects the starting IP address in a random manner within the scanning space Ω.
Preference parallel using parallelism to increase the scanning performance with a drawback of causing a large amount of network traffic. For our simulation, we use 10 parallel probing messages.
In Fig. 9 we presented the performance of dynamic deception system. We deployed 20 subnets, and in each there are 45 hosts are present. The fake network nodes are evenly distributed throughout the subnet. From the performance figure, we can see that with our DDS the infected host detection rate is less than without DDS. Here infected host means attacker successfully exploit the vulnerabilities in that host. From the Fig. 9 it can be inferred that defender successfully drive the attacker towards fake network by blocking vulnerabilities in the real network.
From Table 9, it is clearly evident that as soon as attacker knowledge level is decreasing, defender can save more resources in terms of network availability to legitimate users. Based on our simulation results, it is evident that the defender can decide when and where to spend more resources or save resources.

Conclusion
In this paper, we show that with our dynamic defense system defender can save resource in terms of availability cost and security cost. By introducing fake networks, we also alter the perception of network view to the attacker, and defender's action influence an attacker to take fake network attack path towards fake goal state. Using SDN, the defender can analyze the malicious traffic and reply back to the attacker with a mix of true and false information. After adding attacker capabilities in the model, we learned that if the attacker's knowledge level is high and aggression and stealthiness level are moderate, the defender needs to spend more resources than the opposite case.