Formal Approach to Detect and Resolve Anomalies while Clustering ABAC Policies

In big data environments with big number of users and high volume of data, we need to manage the corresponding huge number of security policies. Using Attribute-Based Access Control (ABAC) model to ensure access control might become complex and hard to manage. Moreover, ABAC policies may be aggregated from multiple parties. Therefore, they may contain several anomalies such as conflicts and redundancies, resulting in safety and availability problems. Several policy analysis and design methods have been proposed. However, most of these methods do not preserve the original policy semantics. In this paper, we present an ABAC anomaly detection and resolution method based on the access domain concept, while preserving the policy semantics. To make the suggested method scalable for large policies, we decompose the policy into clusters of rules, then the method is applied to each cluster. We prove correctness of the method and evaluate its computational complexity. Experimental results are given and discussed. Received on 11 October 2018; accepted on 16 November 2018; published on 03 December 2018


Introduction
In the current big data environments, a huge amount of data can be generated from various sources, which require new forms of processing techniques in order to improve decision making.However, the rules regulating access to resources managed by such environments raise multiple security challenges.Hence, users need authorization systems to help them share their resources, data and applications with a large number of users without compromising security and privacy.Access control models represent a key component for providing security features.
Attribute-based access control (ABAC) has been suggested as a generic access control model [6,17].
ABAC considers three categories of attributes: subject, resource, and environment.An attribute is assigned to a subject (e.g., user, application or process), resource (e.g., data structure, web service or system component) and environment (e.g., current time, location).These attributes may be considered as characteristics of anything that may be defined and to which a value may be assigned.ABAC representation is more expressive and fine-grained than existing access control models, because it might consider any combination of subject, resource and environment attributes.However, due to the huge number of rules and the policies distributed management, deploying and managing an ABAC model to ensure access control might become too complex and hard to manage.In fact, an ABAC policy in distributed applications may be aggregated from multiple parties and can be managed by more than one administrator [8].Therefore, ABAC policies may contain several anomalies [7,35], such as conflicts and redundancies, which may lead to both safety and M. Ait El Hadj et al.
availability problems.Hence, detecting and resolving automatically such anomalies in large complex policies is crucial.
In the present paper, we suggest a method to detect and resolve anomalies in ABAC policies.We introduce the notion of Access Domain of a rule, which models the set of values of the attributes considered in that rule.Based on this concept of access domain, we develop a method to detect and resolve rigorously anomalies in an ABAC policy, while retaining the policy semantics.In contrast, several existing methods resolve anomalies by simply removing one of the conflicting rules, which modifies the semantics of the policy.To make the suggested method scalable with great policies (i.e., policies with a huge number of rules), we decompose the policy into several clusters of rules, and then the method is applied to each cluster.A preliminary version of this work is given in [2] which presents succinctly a method to detect and resolve anomalies.Compared to [2], our contributions are as follows: • We present with more details and explanations our method to detect and resolve anomalies.
• We prove formally correctness of our method, thus guaranteeing that the anomalies are detected and removed from the original policy, while preserving its semantics.
• We evaluate the computational complexity of the proposed approach of detection and resolution.
• We provide experimental results with a set of ABAC policies that demonstrate the time gained from clustering.
The rest of the paper is organized as follows: Section 2 presents the formal definitions of security rules and the considered anomalies.In section 3, we define formally our problem of anomaly detection and resolution, and then we present an outline of the method we have developed to solve our problem.Section 4 presents formally the method we have developed.In Section 5, we prove correctness of the method and evaluate its computational complexity.Section 6 reports and discusses experimental results.In Section 7, we present related work and recall our contributions.Finally, the conclusion and expected future work are presented in section 8.

Formal Definitions of Security Rules & Anomalies
In order to present a formal method to detect and resolve anomalies between the rules of a policy, we first need to define formally the rules and the anomalies.

Formal Definition of a Security Rule and its Access Domain
A policy P is a non-empty set of rules: P = {r 1 , r 2 , ..., r n }.Each rule r i ∈ P is specified by a condition and an access decision.The condition of a rule is specified by one or several assignments att ∈ V att , where att is a name that identifies an attribute and V att is a set of possible values of att.There is at most one assignment for each attribute.An access decision of a rule is noted X act , where X is the decision P ermit or Deny, and act is a set of (access) actions.P ermit read and Deny write are two examples of action decisions.A rule r i ∈ P will be written as follows: Note that the absence of assignment for an attribute att means the implicit existence of the assignment att ∈ ALL att , where ALL att denotes the set of all possible values of att.We consider the three categories of attributes of ABAC: subjects, resources, environment.The assignments corresponding to the same category are separated by a comma ",", while a semicolon ";" means the passing to the next category.An access request is defined by attribute values (at most one value for each attribute) and one action.We say that a value v of an attribute att satisfies an assignment att ∈ V att of a rule r i , if v is an element of V att .We say that an access request R matches a rule r i (we can also say r i matches R) if every attribute value of R satisfies the corresponding assignment of r i .Instead of formulating a rule r i as in (1), we will use the equivalent formulation (2) which is more convenient to define anomalies and their detection and resolution.The idea is to specify a unique set of values (namely V att 1 × V att 2 × ... × V att m ) for the n-tuple (att 1 , att 2 , ..., att m ), instead of specifying a set of values for each att i .Such set V att 1 × V att 2 × ... × V att m is called the access domain of r i and noted AD r i .Hence, a rule is expressed in the form r i = X act ((att 1 , att 2 ..., att m ) ∈ AD r i ).We can also write r i = X act (AD r i ) when the attributes are known from the context and can hence Formal Approach to Detect and Resolve Anomalies while Clustering ABAC Policies be implicit.

Formal Definitions of the Considered Anomalies
Anomalies are defined as patterns in data that do not conform to a well-defined notion of normal behavior [7].More specifically, in a security policy P , an anomaly may exist only if several rules of P match the same access request.We have considered two types of anomalies: Redundancies and Conflicts.
Definition 2.1.Redundancy occurs in a policy P , when P contains useless (or redundant) rules, i.e. rules whose removal does not modify the behavior of the P .Consider two rules r i = X a (AD r i ) and r j = Y b (AD r j ).r i is redundant to r j iff: Intuitively, every decision taken by r i on any request is also taken by r j .Therefore, r i is useless and hence can be removed from the policy.We consider redundancy as an anomaly, because it may affect the performance of a policy, since verifying if an access request respects a policy depends on the size of the policy.We define the following notions, given two rules r i = X a (AD r i ) and r j = Y b (AD r j ) : • Common access domain of r i and r j is the intersection of their access domains, i.e.AD r i ∩ AD r j .• Set of common actions of r i and r j is the intersection of their sets of actions, i.e. a ∩ b.

Definition 2.2.
A conflict occurs in a policy P , when P contains two or more rules that generate contradictory decisions on an access request.Consider two rules r i =X a (AD r i ) and r j =Y b (AD r j ).r i and r j present a conflict (or are conflicting) iff: Intuitively, when an access request matches the common access domain of r i and r j (AD r i ∩ AD r j ), we have contradictory decisions (from X Y ) on common actions (from a ∩ b 0).Example 2.3.Consider the following rules r 1 and r 2 : • r 1 : Deny {read} ((position; fileType; time) ∈ {Doctor, Nurse} × {Source, Documentation} × [8:00, 18:00]) • r 2 : P ermit {read,write} ((position; fileType; time) ∈ {Nurse} × {Documentation} × [8:00, 18:00]) r 1 and r 2 are conflicting, because AD r 1 ∩ AD r 2 = N urse × Documentation × [8 : 00, 18 : 00] ∅, while the action read is permitted by r 2 and forbidden by r 1 .Intuitively r 1 forbids that nurses read the documentation, while r 2 permits it.

Problem Definition
Access control models are concerned with determining the allowed activities of legitimate users, mediating attempt by a user to access a resource in a given system [24,36].In this paper, we consider Attribute-Based Access Control (ABAC) that is widely used as a generic access control model.Correctness of an ABAC policy is critical for the security of the system that uses it, because any error in ABAC definition may result in violations of security features (e.g., confidentiality, integrity).In large and distributed organizations with complex ABAC policies, deploying and managing an ABAC model to ensure authorization management might become too complex and hard to manage.An ABAC policy in distributed applications may be aggregated from multiple parties and can be managed by more than one administrator (distributed management).Therefore, it may contain several anomalies such as redundancies and conflicts (see definitions 2.1 and 2.2), which may lead to safety and availability problems.Moreover, manual inspection for correctness can be impractical, because of the huge number of rules and the policies distributed management [25].Thus, detecting and resolving automatically such anomalies is essential to ensure that an ABAC policy conforms to desired correctness properties.The problem we aim to solve is formulated as follows: Given a policy P defined by a set of rules formulated as shown in Expression (2) of section 2.1, the objective is to detect and remove from P : redundancies and conflicts, which have been formally expressed in definitions 2.1 and 2.2.
Anomaly detection and resolution is motivated by the fact that errors in the policy definition may compromise the system security.Conflicts, if not handled properly, may lead to inappropriate decisions.As a result, conflicts may lead to safety problems by allowing unauthorized accesses, and availability problems by denying authorized accesses.As for redundancies, their detection and resolution is motivated by the fact that they affect the performance of the policy execution, since the response time of a policy to an access request depends on the number of rules to be parsed in the policy.
Let P be any security policy and Q be the policy obtained from P by our detection and resolution procedure.An understandable requirement is that Q must be generated from P in a bounded time, i.e. the procedure does not enter in an infinite loop.A second natural requirement is that P and Q must support the same set of access requests.A third obvious requirement is that Q must be anomaly-free.A fourth requirement which makes sense is that P and Q must take the same decision for every access request for which P is non-conflicting.The fourth requirement is necessary to avoid the generation of Q that permits requests that are not permitted by the original policy P .In the same way, it is necessary to avoid the generation of Q that denies requests that are not denied by P .From these requirements, we define correctness of our detection and resolution method as follows, where the access domain of a policy P is the union of the access domains of all the rules of P : Definition 3.1.Our detection and resolution method (let us call it M) is said to be correct, if for every policy P given as input to M and the policy Q obtained by M from P , the following five conditions are satisfied:

Outline of the Method to Detect and Resolve Anomalies
The suggested method to detect and resolve anomalies is preceded by two steps: rules extraction, and rules clustering: • Rules Extraction: It consists in parsing the policy in order to recognize and extract its rules.The extracted rules are expressed with the formulation 2, section 2.1.Recall that we use the three attribute categories of ABAC: Subject, Resource and Environment.• Rule Clustering: To make the detection and resolution method scalable for policies with a huge number of rules, we suggest to apply a clustering method to group similar rules in the same cluster, based on an adequate similarity score such that non-similar rules are unlikely to be redundant or conflicting.The similarity measure we adopt is presented in our previous work [1].We recall that the similarity measure is a function that assigns a similarity score to any two given rules r i and r j .Such a score reflects the degree of similarity between r i and r j , with respect to their subject, resource and environment attributes values.We say that two rules r i and r j are similar if their similarity score is greater than a given threshold.Its worth noting that the resulted clusters satisfy two properties: (1) each cluster contains at least one rule and (2) every rule is contained in one or more clusters.
After the extraction and clustering of rules, we arrive at the actual phase of detection and resolution of anomalies, which is executed within each cluster.The method is presented in detail in section 4, and its correctness is formally proved in section 5.

Anomalies Detection and Resolution
After constructing clusters of rules, the proposed anomaly detection and resolution method attempts to detect and remove anomalies within each cluster.In this section, we first show how redundancies and conflicts are detected and resolved between two rules, then within a set (cluster) of rules.

Redundancy Detection and resolution between two rules
The response time of a policy to an access request depends on the number of rules to be parsed in the policy [26].So redundancy (i.e.existence of useless rules) may affect the performance of a policy, and hence is treated as an anomaly.Thus, removing redundancies is considered as one of the effective solutions for optimizing ABAC policies and improving the performance in policy decision time.
Given two rules r i =X a (AD r i ) and r j =Y b (AD r j ), r i is detected to be redundant to r j if the three conditions (3) in definition 2.1 are satisfied.The resolution of that anomaly consists in removing r i .Formal Approach to Detect and Resolve Anomalies while Clustering ABAC Policies r 1 , which implies that r 2 is useless in the presence of r 1 .That is why, this redundancy is resolved by simply removing r 2 (and keeping r 1 ).

Conflict Detection and resolution between two rules
Given two rules r i =X a (AD r i ) and r j =Y b (AD r j ), r i and r j are detected to be conflicting if the three conditions (4) in definition 2.2 are satisfied.Since X Y , let us take X=P ermit and Y =Deny.Recall that the intuition of a conflict between two rules is the existence of access requests for which the two rules do not agree whether to permit or deny them.We consider the following two resolution strategies: • Permissive resolution: to permit the access requests for which the two rules disagree.This is realized by not modifying r i and replacing r j by the following two rules: - Intuitively, the unique modification that has been done is not denying the common actions of r i and r j for requests matching both r i and r j .It is easy to check that r i , r j and r j are conflict-free with each other.
• Restrictive resolution: to deny the access requests for which the two rules disagree.This is realized by not modifying r j and replacing r i by the following two rules: - Intuitively, the unique modification that has been done is not permitting the common actions of r i and r j for requests matching both r i and r j .As in the permissive resolution, it is easy to check that r i , r i and r j are conflict-free with each other.The access domains of r 1 and r 2 are respectively, AD r 1 = {Doctor, Nurse} × {Documentation} × [8:00, 18:00], AD r 2 = {Nurse} × {Documentation} × [8:00, 18:00].Therefore, the common access domains is AD r 1 ∩ AD r 2 =AD r 2 , and the set of common actions is {read, write} ∩ {read, create} = {read}.r 1 and r 2 are conflicting because the three conditions (4) given in definition 2.2 hold, i.e.: their common access domain and set of common actions are not empty, while their decisions are opposite.
The resolution of such conflict is as follows: if we consider the permissive resolution, r 1 is not modified (because the decision of r 1 is P ermit) and r 2 is replaced by the following two rules: • r 2 =Deny {read,create} ((position; fileType; time)∈ ∅), so this rule is not considered since its access domain is empty.

Anomaly detection and resolution in a cluster of rules
Anomaly detection and resolution in a cluster is an iterative process that consists in verifying the existence of anomalies and, if any, in modifying the rules of the cluster until the set of rules is anomaly-free.The approach consists in first constructing a graph (N , L), where N is a set of nodes, and L is a set of edges, where each edge is defined by a pair of nodes, i.e.L ⊆ N × N .Each node represents a rule r i , and each edge (r i , r j ) means that we have to verify if there is an anomaly between r i and r j , and resolve it, if any.Initially, all nodes are connected, i.e.L consists of all pairs (r i , r j ) ∈ N × N such that i j.This is the input of Algorithm 1 which will verify and modify iteratively the graph until we obtain a graph without edges, which means that we have obtained an anomaly-free set of rules.At each iteration of Algorithm 1, the anomaly detection and resolution is applied to every edge (r i , r j ) of L as explained below: • If we detect that one of the two rules is redundant to the other one, then the resolution consists in removing the redundant rule from N (lines 4-9).• If a conflict between r i and r j is detected (line 10), we have seen in subsection 4.2 that there are two strategies.• In the permissive resolution (lines 15-22), r i is not modified and r j is replaced by r j and r j .Therefore, the graph is updated as follows: 1.In the node r j , replace the AD and act of r j by the AD and act of r j .

4:
if n 1 and n 2 have the same decision (P ermit or Deny) then

5:
if all actions of n 1 are also actions of n 2 then

6:
Remove n 1 from N

7:
else if all actions of n 2 are also actions of n 1 then

8:
Remove n 2 from N

9:
end if 10: else if n 1 and n 2 have different decisions and common actions then

11:
Let np be the node among n 1 and n 2 whose decision is P ermit, and let Ap be its set of actions 12: Let nd be the node among n 1 and n 2 whose decision is Deny, and let Ad be its set of actions 13: Let CD be the common access domain of np and nd

14:
Let CA be the set of common actions of np and nd

15:
if the resolution strategy is permissive then 16: Subtracts CD from the access domain of nd 17: for every node n of N other than np and nd do 18: if L does not contain the edge (nd, n) then

19:
Insert the edge (nd, n) in L 20: end if

21:
end for

22:
Insert in N a new deny node nn whose access domain is CD and set of actions is Ad \ CA

23:
else if the resolution strategy is restrictive then

24:
Subtracts CD from the access domain of np

25:
for every node n of N other than np and nd do

26:
if L does not contain the edge (np, n) then

27:
Insert the edge (np, n) in L

29:
end for

30:
Insert in N a new permit node nn whose access domain is CD and set of actions is Ad \ CA

33:
if the access domain of n or its set of actions is empty then

34:
Remove n from N

36:
end for

37:
if nn has not been removed from N then

38:
for every node n of N other than nn do

39:
Insert the edge (n, nn) in L 40: end for

41:
else if np and nd have not been removed from N then

42:
Remove the edge (np, nd ) from L

46:
return N 47: end procedure 2. Insert a new node in N that contains the AD and act of r j .3. Update L by linking r j and r j to all the nodes of the graph, except r i (no link is created between r j and r j ). 4. Remove the edge (r i , r j ) from L, because there is no anomaly between them (after the modification of r j in Point 1).
• In the restrictive resolution (lines 23-30), r j is not modified and r i is replaced by r i and r j .Therefore, the graph is updated as follows: 1.In the node of r i , replace the AD and act of r i by the AD and act of r i .2. Create a new node that contains the AD and act of r i .
3. Update L by linking r i and r i to all the nodes of the graph, except r j (no link is created between r i and r i ). 4. Remove the link between the nodes of r i and r j , because there is no anomaly between them (after the modification of r i in Point 1).
• If no anomaly is detected between a pair of linked nodes r i and r j , the resolution algorithm simply removes the link between r i and r j (lines 41-43).Also, remove any node whose rule has an empty access domain or empty set of actions (lines 32-36).are (r 3 , r 4 ), (r 3 , r 4 ), (r 3 , r 4 ) and (r 3 , r 4 ).For each of these pairs, the intersection of access domains or the intersection of sets of actions is empty.Therefore, their four links are removed through the four iterations 4 to 7. We obtain Graph 5 of figure 1, that has no link.Therefore, the algorithm terminates.

Local Correctness
Our detection and resolution procedure generates a policy Q from P .In step 2 (rule clustering, section 3.2), P is decomposed into several clusters P 1 , P 2 , • • • , P k , where for each P i we apply algorithm 1 to obtain Q i .Then, all Q i are aggregated to obtain Q.In this section, we prove correctness of the detection and resolution method, which is stated by the following theorem.
Theorem 1.Given a policy P = P 1 , P 2 , • • • , for each cluster In the following, we prove theorem 1.
For each cluster i (i=1, 2, ...), Algorithm 1 proceeds iteratively, where at each iteration k + 1 (k ≥ 0), a graph G i (k + 1) is computed from a graph G i (k), where each graph G i (k) represents a policy noted P i (k).In particular, the original graph G i (0) corresponds to the original policy P i , and the final graph G i (q) corresponds to the resulting policy Q i obtained after a finite number q of iterations (finiteness of q comes from C 1 which is proved in section 5.1).Each iteration of Algorithm 1 consists in processing a pair of rules (r 1 , r 2 ) in one of the following three cases: • Case a: there is no anomaly between r 1 and r 2 .
• Case b: there is a redundancy between r 1 and r 2 .
• Case c: there is a conflict between r 1 and r 2 .
We will consider cases a, b and c in the following proofs.
Proof of condition C 1 .In an iteration of the algorithm: • In case a: the link between the two rules is removed.
• In case b: the redundant rule is removed.
• In case c: one of the two rules (let AD denote its access domain) is split into two rules (let AD 1 and AD 2 denote their respective access domains) such that: We have the following: 1. Case a (resp.b) decreases the number of links (resp.nodes) of the graph.
2. Case c increments by 1 the number of nodes of the graph.
3. From 1 and 2 and the fact that the size of G i (0) is finite, the size of every G i (k) is finite.
4. From 1 and 3, we cannot have an infinite number of consecutive iterations executing cases a and b.
5. The size of the access domain of P i is finite, because the domain of each attribute is finite.
6. Case c splits an access domain AD in two disjoint access domains AD 1 and AD 2 , such that at least one of the two access domains is nonempty.
7. From 1, 5 and 6, the total number of iterations executing case c is finite.
8. From 4 and 7, the algorithm executes a sequence of iterations containing a finite number of cases c, such that two cases c are separated by a finite number (possibly 0) of cases a and b.Hence, the sequence is finite.
In the following, q denotes the finite number of iterations, i.e.G i (q) is the graph of the resulting policy Q i .Proof of condition C 2 .In iteration k + 1 (for 0 ≤ k < q) of the algorithm: • In case a: the algorithm removes a link, without modifying any rule of the policy.Therefore, P i (k + 1) and P i (k) have the same access domain.Hence, in each iteration k + 1 (k ≥ 0), P i (k + 1) and P i (k) have the same access domain.By applying this result to all the iterations 1 to q, we obtain that Q i and P i have the same access domain.

Proof of condition C 3 . Consider the following condition A:
A: For every pair of unlinked rules r 1 and r 2 , there is no anomaly between r 1 and r 2 .
In iteration k + 1 (for 0 ≤ k < q) of the algorithm: • In case a: the algorithm removes a link.• In case b: the redundant rule is removed.
• In case c: let us consider the two resolution strategies.
-Permissive resolution strategy: r 2 is replaced by r 2 and r 2 which are then linked to all rules in the graph, except that no link is added between r 1 , r 2 and r 2 because there is no anomaly between them.
-Restrictive resolution strategy: r 1 is replaced by r 1 and r 1 which are then linked to all rules in the graph, except that no link is added between r 1 , r 1 and r 2 because there is no anomaly between them.
We have the following: 1.In all cases a, b and c, we have not created any pair of unlinked anomalous rules.2. From 1, we deduce that if G i (k) satisfies A, then G i (k + 1) satisfies A. 3. G i (0) satisfies condition A, because all its rules are linked.4. From 2 and 3, we deduce that the graph G i (q) of Q i satisfies A. 5. From 4 and the fact that G i (q) has no link, we deduce that Q i is anomaly-free.
Proof of condition C 4 .Let us first prove that for any access request rq : If G i (k) has no rule permitting rq, then G i (k + 1) has no rule permitting rq.Consider an access request rq and assume that G i (k) has no rule permitting rq.In iteration k + 1, for 0 ≤ k < q: • In case a: Since the resolution consists in removing a link, no new rule permitting rq is created.• In case b: Since the resolution consists in removing a rule, no new rule permitting rq is created in another rule of the policy.Hence, P i (k + 1) and P i (k) have the same access domain.

• In case c:
-Permissive resolution strategy: Since the resolution consists in replacing a deny-rule by two deny-rules, no new rule permitting rq is created.
-Restrictive resolution strategy: in iteration k + 1: A rule r = P ermit a (AD) is replaced by two rules r u = P ermit a (AD 1 ) and From the fact that G i (k) has no rule permitting rq and the fact that no new rule permitting rq is created in iteration k + 1, we deduce that G i (k + 1) has no rule permitting rq.By applying this result to all the iterations 1 to q, we obtain that if P i has no rule permitting rq, then Q i has no rule permitting rq.
Proof of condition C 5 .The proof of C 5 is obtained from the proof of C 4 , by just replacing a few words as follows: • Permit(ting) is replaced by Deny(ing), and vice versa.• Permissive is replaced by Restrictive, and vice versa.

Correctness
We have proved in section 5.1 that for each i = 1 • • • k: algorithm 1 satisfies C 1 -C 5 w.r.t P i and Q i .In this section, we prove that C 1 , C 2 , C 4 and C 5 are satisfied w.r.t.(P , Q). Regarding C 3 , we will prove that it is satisfied with a high probability.Indeed, C 3 is not satisfied in rare cases due to clustering.The latter is motivated by the fact that it improves the performance (as shown by experiment results in section 6).
• Proof of C 1 and C 2 1. P = {P 1 , P 2 , • • • , P k }, where the access domain of P is the union of the access domains of all (section 5.1).4. From 1,2 and 3, Q is executed in a finite number of iterations, and have the same access domain as P .Hence, Q satisfies C 1 and C 2 .
• Proof that C 3 is satisfied in most cases, i.e.Q is anomaly-free with high probability 2. Given the similarity measures used in the clustering process [1], the probabilities of existence of inter-clusters anomalies (i.e.anomalies between rules of different clusters) are much lower than the probabilities of existence of intra-cluster anomalies (i.e.anomalies between rules in the same cluster).
In other words, the probability of not detecting anomalies due to clustering is very small.
3. Consider the case of a rule R of P that is element of two clusters P i and P j .If R is not modified or removed in the computations of Q i and Q j , R will be element of Q i and Q j .However, the grouping process of all the Q i to obtain Q inserts R only once in Q, and hence avoids to create a new redundancy in Q.
• Proof of C 4 We have proved that for any request rq: if P i has no rule permitting rq, then Q i has no rule permitting rq.Consider an access request rq, and assume that P has no rule permitting rq.From the fact that permitting rq, we deduce that Q has no rule permitting rq.
• Proof of C 5 The proof of C 5 is obtained from the proof of C 4 , by just replacing the word "permitting" by "denying".

Complexity of Algorithm 1
We use the following notation: • n is the size of the policy P , i.e. its number of rules, • m is the number of attributes used to define the rules of P , • d is the size of the access domain of P ; it is the product of the domain sizes of all attributes, i.e. d = m i=1 |V att i |.We have seen in section 5.1 that in each of its iterations, the algorithm is in one case among three: case a, case b, or case c.The best situation in terms of execution time occurs when we first have successively uniquely case b (redundancy), because it removes one node and one or more edges, while case a removes one edge and no node.The worst situation in terms of execution time occurs when we first have successively uniquely case c, because it increases the number of nodes and edges.Hence, the worst case scenario that leads to a graph without edges should be to have two phases: phase 1 consists of successive iterations of case c, and phase 2 consists of successive iterations of case a.

Complexity of phase 1 (successive iterations of case c):
1. Splitting a set of x elements x − 1 times results in x singletons.2. The maximum size of the access domain of any rule of P is d. 3. From 1 and 2, the number of times where a given rule of P is split due to case c, is upper-bounded by d − 1. 4. From 3, if we consider the n rules of P , we obtain that the number of iterations of case c is upper bounded by n × (d − 1). 5. From 4 and the fact that case c increments the number of rules by at most 1, after a series of n × (d − 1) cases c, we obtain a policy whose number of rules is upper-bounded by n × d. 6.From 5, after a series of n × d cases c, we obtain a policy whose number of edges is upper-bounded by 8. From 4 and 7, the complexity of phase 1 (i.e. the treatment for all iterations of case c) Total complexity (phase 1 + phase 2): Since the complexity of phase 1 is greater than the complexity of phase 2, the former is the order of the total complexity : O(n × d 2 (max(d, n)).We assume that attributes are not constant parameters, i.e. |V att i | ≥ 2, for every i = 1 • • • m, which implies d ≥ 2 m .Therefore, the number of attributes has an exponential effect on the complexity.

Experimental Results
To evaluate the suggested approach, we have implemented our method of anomaly detection and resolution in Algorithm 1 with Java programming language in the experimental environment indicated in Table 1  have applied our method to several examples of important sizes.We construct a set of ABAC policies (synthetic datasets), composed of the combination of eight subject attributes, four resource attributes and two environment attributes.An evaluation on real dataset would be preferable, however no benchmark has been published in this area, and real medical data are hard to obtain because of confidentiality constraints.We have generated ABAC policies of up to 15000 rules.Figure 2 illustrates how the numbers of redundancies and conflicts in the generated policies increase proportionally with the policy size.
To analyze our results, we have considered the criterion of total execution time.More precisely, we have analyzed how the running time is influenced by the following parameters: • n: the size of the considered ABAC policy • The threshold value (see Rule clustering in section 3.2) • d: the size of access domain As indicated in section 3.2, our method consists of three steps: 1) rule extraction, 2) rule clustering, and 3) anomalies detection and resolution.The latter step consists in executing algorithm 1 in each cluster.Figure 3 shows the sum of running times of algorithm 1 for all the clusters (for d = 16).Figure 4 shows the total running time, i.e. the time of figure 3  Anomalies detection and resolution is executed in each cluster, where the number of rules in each cluster is less than n (the time complexity of step 3 is given in subsection 5.3).Moreover, a cluster may contain only one rule, thus the time required for anomalies detection and resolution in that cluster is 0. Therefore, the time required for step 3 is less than the time required for steps 1 and 2. The threshold used in the clustering algorithm influences the result of clustering [21,22] (i.e. the number of clusters) which in turn influences the results of the execution time of algorithm 1 (in all the clusters).Figure 6 shows the total running time based on different thresholds for the same policy (for n = 1000 and d = 16).The obtained curve demonstrates the impact of the threshold values (i.e., 0, 0.5, 0.6, 0.7, 0.8 and 0.9. ) on the performance (running time).The obtained results can be explained by the fact that when the threshold decreases, the sizes of the obtained clusters increase, and hence the running time also increases.In the extreme case where threshold = 0 (i.e., similar to applying our method on the whole set of rules without clustering), we obtain the worst running time.These results demonstrate the time gained from using clustering.The threshold impact becomes negligible from the value 0.7.On the average, the best running time is obtained from the threshold 0.8.Thus, the default value of the selected threshold for our experiments is set to 0.8.

Related Work and Contributions
Attribute-based access control (ABAC) policies support fine-grained access control.Therefore, they are more flexible in governing the access to information and resources in a variety of applications including web services [20,33], Cloud Computing [10,28,31] collaborative environment [23,32], Internet Of Things [39][40][41] and so on.Attribute-based policies regulate users requests based on set of conditions related to the requestors and the demanded resources.However, ABAC policies are often complex and conflict prone.In an ABAC policy, multiple rules may overlap, which means one access request may match several rules with the same effect.Moreover, multiple rules, with conflicting access decisions, may match the same request.These kinds of anomalies may lead to both: safety problems (allowing unauthorized accesses) and availability problems (denying an access in emergencies).Therefore, detecting and resolving anomalies is an important aspect of dealing with ABAC policies.
Khoumsi et al. [11] categorize the anomalies into two categories: a conflicting anomaly and an nonconflicting anomaly.The first category occurs when a request matches several rules that have different actions (conflicts).Whereas, the second occurs when the same request matches several rules that have the same action (redundancies).On the other hand, Moffett et al. [13] have defined conflict by three synonyms: difference, disagreement and opposition.Where they have categorized a conflict into: conflict of modality, conflict between imperative and authority policy, conflict of duties and conflict of priorities.
eXtensible Access Control Markup Language (XACML) [3] is the most convenient way to express ABAC policies.XACML defines an XML schema that supports the ABAC model.In fact, an XACML policy in distributed applications may be aggregated from multiple parties and can be managed by more than one administrator [8] which may arise anomalies between rules.Various research efforts have been devoted to anomaly detection of XACML policies using verification techniques [8,15,16,18,29,34].The most important policy analysis techniques and formal approaches are presented by [37].For instance, Mourad et al. [15] use the Unified Modelling Language (UML) to detect conflicting and redundant rules prior to their enforcement in the system.Ramli et al. [16] uses Answer Set Programming (ASP) to detect incompleteness, conflicting and unreachability XACML Policies.Although, this approach has some limitation in modeling XACML dealing with types of attributes which do not belong to Ansprolog, such as strings.Martin et al. [18] encode the policy rules in Coq [5] with two fields, The first field is the rule effect and the second field combines the four elements of XACML: "subject-resource-action-condition" referred as srac.If two rules have identical srac with different effects, a conflict is detected.Otherwise, if the effects are similar, then a redundancy in detected.On the other hand, [8,37] consider representing XACML policies as decision trees to detect and resolve conflicts and redundancies.Another representation of XACML policies was proposed by [19]. it represents the XACML using Prolog which uses constraint logic programming techniques (CLP), which are welladapted to hierarchical XACML policy logic and avoid pair-wise comparisons altogether by taking advantage of Prolog's built-in powerful indexing system.In addition, the authors in [34] consider SAT modulo theories (SMT) [27] as the underlying reasoning method for the analysis of XACML policies.
The resolution of anomalies was already handled by XACML itself.in fact, XACML offers a set of Rule Combining Algorithms (RCA) to overcome the issue of conflicting rules: Deny-Overrides, Permit-Overrides, First Applicable and Only-One-Applicable.For instance, deny-overrides returns deny if one of the conflicting policies evaluates to deny.Otherwise, the result is permit [30].However, The RCAs need to be defined manually and at a priory stages by the policy administration.Moreover, only one RCA can be applied to all kinds of detected anomalies.Therefore, this technique remains static and can not be applied to distributed and dynamic systems.Therefore, several research efforts [9,12,14] have been addressed for dynamic anomaly resolution strategies.For instance, Kagal et al. [9] have considered the low priority technique to resolve the conflict, i.e. negative authorizations are allowed.When Matteucci et al. [12] have proposed a strategy for policy conflict resolution based on multi-criteria decision.Where the decision is taken based on some calculations of multiple criteria retrieved from the policies' attributes and represented in a matrix.In addition, Bauer et al. [4] adopted a datamining technique to remove inconsistencies occurring between access control policies and user's intentions.In contrast, our proposed method detects and resolves anomalies within ABAC policies caused by overlapping relations (i.e., the intersection of access domains).
With respect to the solutions proposed in the papers mentioned above, our approach aims at defining a generic strategy for anomaly detection and resolution.First of all, our approach to detect and resolve anomalies within ABAC policies takes into account a large set of rules and attributes.To make the suggested method scalable with the huge number of rules, it proposes decomposing the policy into clusters of rules.Where the anomaly detection and resolution method is performed in each cluster of rules, instead of the whole policy set, which implies less processing time.The proposed approach is mainly based on the concept of rule access domain.The main advantage of the suggested approach is guaranteeing that the semantics of the original policy is preserved (the proof is presented in section 5).This is done by decomposing a given rule into access domains, based on this technique, we identify accurately the domain of conflict.Therefore, apply the resolution only to that domain by rewriting the conflicting rules.Furthermore, we consider two types of resolution strategies: restrictive resolution, where we permit less actions and deny more actions;

Conclusion
We have presented a formal method that detects and resolves anomalies in ABAC policies.The suggested method uses a concept called access domain, which is used to accurately identify and resolve effectively policy anomalies.To make the suggested method flexible with scalable policies, the proposed approach is preceded by rules extraction and rules clustering.Where the policy is decomposed into several clusters of rules, and then the method is applied to each cluster.Besides, we consider two types of resolutions, permissive resolution and restrictive resolution.An important advantage of the suggested approach is guaranteeing that the semantics of the original policy is preserved.We have proved the correctness of the method and evaluate its computational complexity.Furthermore, we have proposed the method while providing an algorithm that was implemented and experimented.
As future work, we already started implementing a parallel version of the proposed approach using MapReduce technique, in order to improve the running time.We also aim to conduct a real case study and automate the procedure through lessons learned.

5
EAI Endorsed Transactions on Security and Safety 10 2018 -12 2018 | Volume 5 | Issue 16 | e3 Algorithm 1 Anomaly Detection and Resolution Require: Graph (N , L): N is a set of nodes (corresponding to rules), L is a set of edges ⊆ N × N Ensure: Set of nodes N (i.e.graph without edge) 1: procedure AnomalyResolution(N , L) 2: while L is not empty do 3: Consider an edge (n 1 , n 2 ) of L

8 M.
Ait El Hadj et al.EAI Endorsed Transactions on Security and Safety 10 2018 -12 2018 | Volume 5 | Issue 16 | e3 Formal Approach to Detect and Resolve Anomalies while Clustering ABAC Policies

2
from the fact that P has no rule permitting rq, we deduce that for each i = 1 • • • k: P i has no rule permitting rq. 2. For each i = 1 • • • k: if P i has no rule permitting rq, then Q i has no rule permitting rq. 3. From 1 and 2, we deduce that for each i = 1 • • • k: Q i has no rule permitting rq.

(
n×d)×(n×d−1) 2 which is in O(n 2 × d 2 ). 7. The treatment of each iteration of case c consists mainly in: • Computing the intersection of the access domains of two rules, which is in O(d 2 ) from 2. 9 EAI Endorsed Transactions on Security and Safety 10 2018 -12 2018 | Volume 5 | Issue 16 | e3 • Adding edges to other nodes, which is in O(n × d) from 5.

Example 5 . 1 .
Consider a policy P with 3 rules.Each rule has two attributes att 1 and att 2 , such that att 1 ∈ {v 1 , v 2 , v 3 } and att 2 ∈ {v 4 , v 5 , v 6 , v 7 }.The access domain size of P is d = 2 i=1 |V att i | = 3 × 4 = 12.The maximum number of rules that can be obtained after a series of case c is n × d = 3 × 12 = 36 rules (rules with singleton access domains).Complexity of phase 2 (successive iterations of case a): 1.We have seen that after a series of case c of length n × (d − 1), we have a graph whose number of edges is in O(n 2 × d 2 ). 2. The effect of case a is to remove an edge.3. From 1 and 2, the number of iterations of case a is in O(n 2 × d 2 ). 4. The treatment of each iteration consists mainly in computing the intersection of the access domains of two rules, which is in O(1), because after the series of case c in phase 1, the ADs of the rules are singletons.5. From 3 and 4, the complexity of phase 2 (i.e. the treatment for all iterations of case a) is in O(n 2 × d 2 ).

Figure 2 .
Figure 2. The number of anomalies in the generated policies

Figure 5
shows the total running time as a function of the size of access domain d = m i=1 |V att i | (for n = 3000 and threshold = 0.8).Each rule is composed of eight attributes (i.e., m = 8), and for each attribute we considered |V att i |= 1, 2, 3, 4 and 5.The obtained curve demonstrates the impact of the access domain sizes (i.e., 1, 2 8 , 3 8 , 4 8 and 5 8 ) on the performance (running time).The obtained results are justified in subsection 5.3 by an evaluation of the time complexity.

12 M.
Ait El Hadj et al.EAI Endorsed Transactions on Security and Safety 10 2018 -12 2018 | Volume 5 | Issue 16 | e3 Formal Approach to Detect and Resolve Anomalies while Clustering ABAC Policies and permissive resolution, where we permit more actions and deny less actions.

. For every access request rq: If P has no rule permitting
rq, then Q has no rule permitting rq.• C 5 .For every access request rq: If P has no rule denying rq, then Q has no rule denying rq.
The algorithm replaces a rule whose access domain is AD by two rules whose access domains AD 1 and AD 2 are such that AD = AD 1 ∪ AD 2 .Hence, P i (k + 1) and P i (k) have the same access domain.
where the access domain of Q is the union of the access domains of all Q i

Table 1 .
. We Experimental Environment