A genetic algorithm approach to automated discovery of hierarchical production rules with fuzzy hierarchy

Automated discovery of hierarchical structures in large datasets has been an active research area in the recent years. A concept hierarchy can facilitate mining knowledge at multiple level of abstraction. Crisp description for a concept hierarchy usually cannot represent human knowledge completely and practically. This paper focuses on the issue of mining generalized rules with fuzzy hierarchical structure using Genetic Algorithm (GA) to knowledge discovery. A fuzzy subsumption relation and suitable fitness functions are proposed. Appropriate genetic operators are proposed for the suggested encoding. Finally, Hierarchical Production Rules with Fuzzy Hierarchy (HPRFH) are generated from the discovered hierarchy. Experimental results are presented to demonstrate the performance of the proposed approach.


Introduction
Over the last decade there has been an increasing amount of research in the field of automated learning and discovery, in general, and Knowledge Discovery in Databases (KDD), in particular [8].KDD can be defined as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [8].Data mining is a core stage in the entire process of KDD which applies an algorithm to extract patterns of data [11,17].Improving the quality of the discovered knowledge is important for making correct decisions in an unpredictable environment [17,20].Genetic Algorithms (GAs) consist of stochastic search algorithms that are based upon Darwin's natural selection theory of evolution, where a population is progressively improved by selectively discarding the worse and breeding new offspring from the better [10].The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision.As an extension of PR, Michalski and Winston [14] have suggested Censored Production Rule (CPR), which has the form If P Then D Unless C, where C (censor) is the exceptional condition.To address various problems and shortcomings with CPRs system Bharadwaj & Jain [5] introduced the concept of Hierarchical Censored Production Rules (HCPRs) by augmenting CPRs with specificity and generality information.The general form of the HCPR is given as: Decision If <condition> Unless <censor> Generality <general info> Specificity <specific info>.
Automated discovery of fuzzy hierarchical structure from large database plays fundamentally importance role in data mining because it provides comprehensible results that capture real-life inheritance of objects [12,18].As a special case (dropping the Unless operator) Hierarchical Basheer Mohamad Al-Maqaleh and K. K. Bharadwaj 2 Production Rules with Fuzzy Hierarchy (HPRFH) takes the form: where P is the set of preconditions and the Specificity element D ki (di) means that D ki is a specific class of D k with degree of subsumption d i .The degree of inheritance (subsumption) deg_sub between the general and specific classes gives the fuzzy hierarchical structure.
In this paper, the proposed approach focuses on the automatic generation of HPRFH for nominal type data from a dataset.The rest of this paper is organized as follows:-Section 2 discusses related work.Section 3 describes the fuzzy hierarchical concepts.The design of GA model is introduced in Section 4. Section 5 reports the results of experiments and evaluating the performance of the proposed approach on datasets used.Conclusion and future work directions are given in Section 6.
An Evolutionary Approach to automated discovery of Censored Production Rules with Fuzzy Hierarchy (EACPRFH) is proposed in [2].An automated discovery of Censored Production Rule with Fuzzy Hierarchy(CPRFH) using a parallel Genetic Programming (GP) approach with two advanced genetic operators, namely; fission and fusion is proposed in [6].They suggested fission and fusion as the special genetic operators to produce offspring from a population of HCPR-trees.An evaluation of hierarchical interestingness measures for mining pairwise generalized association rules is introduced in [7].They proposed a framework instance and extended the set of performance measures by two novel measures: relation learning and precision which take into account hierarchical relationships.
A combination of rough set and evolutionary approach for automated discovery of CPRFH is presented in [16].They used rough set methodology to discover characterization set and censors for all the classes in the dataset and then a GA evolves CPRFH.A HIerarchical DEcision Rules (HIDER) discovery using evolutionary algorithm in continuous and discrete domain is proposed in [1].The evolutionary algorithm used both real and binary coding for the individuals of the population.Mining multi-level association rule at different concept hierarchy is presented in [13].They described the implementation of multi-level association rule mining efficiently using concept hierarchies.
In the present work, the proposed approach is designed to automated discovery of HPRFH from large datasets.It integrates the process of hierarchy generation and rule discovery using a novel approach based on GA.Appropriate fuzzy subsumption relation, encoding, suitable genetic operators and effective fitness functions are suggested for the proposed approach.A concept of Frequency Matrix (FM) [3] is used to summarize the large dataset.Also, the proposed algorithm uses the fusion as advanced genetic operator which helps the proposed algorithm to discover new classes/concepts during the evolution process.

Fuzzy Subsumption Relation
A class D i can be defined by a set of properties (values of distinct attributes), class_prop(D i ).Let D i and D j be any two classes with the set of properties class_prop(D i ) and class_prop(D j ), respectively.
First, we define a subsumption (i.e., knowing if a class is an ancestor of another) measure between two attributes P i (x) and P j (y) (where x and y are frequencies of P i and P j with respect to classes D i and D j respectively) as follows [2,3]: -subsume(P i (x), P j (y))=1 if(  x ) and (  y ) and ( y x  ) and (attribute P i = attribute P j ).
-subsume(P i (x), P j (y))= y if(  x ) and (  y ) and (x > y) and (attribute P i = attribute P j ).
-subsume(P i (x), P j (y)) =0 if (x=) or (y=) or (attribute Further, a degree of subsumption (deg_sub) between two classes D i and D j is defined as follows [3]: Let  i is be i-th property in P Di and  j is be j-th property in P Dj .Di Di Dj Only the deg_sub with max (deg_sub(D i , D j ) , deg_sub (D j , D i ))  threshold will be considered during the construction of fuzzy hierarchy.
A general rule can be represented as: denotes the frequency of the property P i with respect to class D, where M is the total number of distinct values of attributes, and x i is computed as follows: For example, consider D i and D j two classes with the set of properties {P 1 (1), P 2 (0.7), P 3 (1)} and {P 1 (1), P 2 (1), P 3 (0.8), P 4 (1)} respectively.The deg_sub between class D i and D j is computed as follows: If D i subsumes D j , then D i is more general than D j .

Discovering Root Properties
Root Property (P r ) is the property(ies), which covers maximum number of classes, i.e. (FM[P r ,D k ]) >= threshold in the data being mined.The degree of an element, P i (1≤ i≤ M) with respect to the class D k (1≤ k≤ N), where N is the number of distinct classes in the dataset, deg_element(P i , D k ) is defined as follows [2,3]: The degree of a tuple, t i corresponding to the property P i , deg_element(P i ,D k ) is computed using the following formula [2,3]: The following cases would be considered for the discovery of root property [3]: Case a: If there is a unique tuple with the highest degree, then the property, P r , corresponding to that tuple is chosen as the root property.Case b: In case more than one tuple ti has the highest value, deg_tuple(ti), we can compute the sum of frequencies for the corresponding tuple ti, using the following formula: If there is a unique tuple ti having the highest value of the sum_freq_tuple(ti), then the property, P r , corresponding to that tuple is chosen as the root property.Case c: In case more than one tuple ti has the highest value of sum_tuple(ti), then the set of properties corresponding to the combination of tuples covering the maximum number of classes would form the root property.

Discovering Strong Properties and Classes
For any class D k (1≤ k ≤ N) a property P i ( 1 ≤ i ≤ M) is considered as strong property or weak property, according to the following definition [2,3]: Only the strong properties for the specified class D k would participate in the evolutionary process.The weak properties for the specified class D k , will not participate in the evolutionary process for the class under consideration (D k ) and will not take a part in the fuzzy hierarchy generation.
Strong class is the class which is covered by the root property (ies) and this class would certainly appear in the fuzzy hierarchy.For this class the proposed scheme could discover a HPRFH.Weak class is the class which is not covered by the root property (ies) and this class will not take part in the fuzzy hierarchy.In this case, the proposed approach could discover a standard PR for the weak class.For any class D k (1≤ k ≤ N) is considered as strong class or weak class, according to the following definition [2,3]:

Applying GA to HPRFH Discovery
In this Section a GA is presented for the automated discovery of HPRFH as the underlying knowledge representation.The proposed approach handles only categorical attributes and also cannot cope with the missing values.That is why, in each dataset the few instances that contained missing values were simply removed.

Individual Representation
Suppose there are l predicting attributes with the total number of distinct values of predicting attributes M and a single goal attribute D with the number of distinct values (classes) N in the data being mined.The linear representation of an individual (chromosome) as shown in Figure 1    It is to be noted that the Generality part would be empty for the root class and the Specificity part would be empty for the leaf classes.

Genetic Operators
A number of genetic operators suitable for the proposed approach are suggested.We used fitness proportional selection, one-point crossover, with probability 0.90, and mutation operator namely (insert, dropping) with probability 0.20.A crossover point is randomly chosen, represented in Figure 3  Mutation is an operator that acts on a single individual at a time [12].We developed two mutation operators tailored for our chromosome representation, namely insert, and dropping.While applying mutation operator the proposed approach chooses insert or dropping randomly.As an example in Figure 4, the mutation operator replaces the allele "P 3 " by the allele "P4" in the If part, deletes the allele "P 5 " in the If part, inserts the allele "D 2 " in the Generality part, whereas, it inserts the allele "D 5 " in the Specificity part.
Fusion as an advanced genetic operator is a combination of two HPRFH-trees to generate one or more new HPRFH-trees [5,6].During construction of the hierarchy a new class D new may be introduced with an appropriate label assigned in accordance with its properties using fusion operator.Whenever the fusion operator introduces new class to the hierarchy, the deg_sub for this new class and its descendant is always equal to 1 because the newly created class D new totally subsumes all the classes in the FM [3].

Fitness Function
The most difficult and most important concept of GA is the fitness function.For the proposed approach, the fitness measure is defined as:  (14) where N is the number of classes in S part.

Computational Results
The proposed approach is implemented and tested on several datasets available publicly from UCI Repository of Machine Learning datasets [http:// www.ics.uci.edu/~mlearn/ MLRepositry.html].
Each GA run consisted of a population of 100 individuals evolving during 200 generations.The proposed approach was terminated when the best fitness did not change continually throughout 20 generations.In the following experiments each decision (class) in a dataset is dealt with separately.During each run, the same decision (class) under consideration is assigned to all the individuals in the population.
The Then (decision) part of the rule does not need to be encoded into the individual.This approach simplifies the design of the proposed scheme and it is particularly natural when the user is not interested in a complete classification rule set (where different rules predict different classes) [10].Each data set was randomly partitioned into two parts with 2/3 of the instances used Basheer Mohamad Al-Maqaleh and K. K. Bharadwaj for training and 1/3 of the instances used for testing the quality of the discovered rules.
We have compared the predictive accuracy of the rules discovered by the proposed approach with the predictive accuracy of the rules discovered by the EACPRFH approach which is presented in [2].The performance of the proposed approach on different datasets is demonstrated below:
Using the discovered rules in Table 1 and based on the fusion operator the following fuzzy hierarchy is generated as shown in Figure 5. From the discovered fuzzy hierarchy shown in Figure 5  The Zoo dataset was used for this experiment.This dataset has 101 examples, 17 predicting attributes and a goal attribute, which can take 7 classes.The predicting attributes were nominal.The proposed approach would discover the following fuzzy hierarchy as shown in Figure 6.
Finally, from the discovered fuzzy hierarchy shown in Figure 6, the HPRFH are generated.

Predictive Accuracy
In the context of rule discovery, it is very common practice to evaluate the quality of discovered rules with respect to their predictive accuracy.As usual in the literature, this evaluation was done by measuring the accuracy rate on the test set.
The results comparing the performance of the proposed approach with the EACPRFH approach [2] are reported in Table 3.First of all, just looking into the performance in Table 3, it is clear that the proposed approach achieves better predictive accuracy results than the EACPRFH approach in Bridge and Nursery datasets.
In Zoo dataset the EACPRFH approach achieves better predictive accuracy result than the proposed approach.
As mentioned in [2] this dataset contains some exceptions.So, the EACPRFH approach performed considerably better than the proposed approach because the EACPRFH approach discovered rules are used much better to classify test examples containing exceptions than the proposed approach.

Conclusion and Future Work
As an attempt towards integrating hierarchy generation processes with the data mining algorithm, a GA approach is presented that discovers Hierarchical Production Rules with Fuzzy Hierarchy (HPRFH).
The discovered hierarchy enables knowledge managers to easily handle the information through generation and specification relationships, and it helps information users to understand the overall structure of a dataset quickly.The proposed approach uses the fusion as advanced genetic operator which helps to discover new classes which are not present explicitly in the dataset being mined.Experimental results have demonstrated the effectiveness of the proposed approach.
One of the most important future research directions would be the discovery of Fuzzy Hierarchical Censored Production Rules (FHCPRs) from large datasets using evolutionary algorithm.
is divided into three parts: If part consisting of a conjunction of conditions on the values (M) of the predicting attributes (l), Generality(G) part consisting of one general class for the class under consideration D k , and the third part is Specificity(S) part consisting of a set of specific classes for the class under consideration D k .It is to be noted that any of the l predicting attributes (one or more) can form conditions in the If part.Further, for any PRFH the specific classes are mutually exclusive, i.e. properties (D ks1 )  … properties (D ksi ) … properties(D ksN ) = .

5 Figure 3 .
by the dotted line and the genes to the right of the crossover point are swapped between two individuals, yielding the new offspring individuals.Crossover operator

Figure 5 .
Figure 5. Fuzzy hierarchy for the Bridge dataset.

 Experiment 4
The Balance Scale dataset was used for this experiments.It has 625 instances with no missing attribute values, 4 attributes and 3 classes (D 1 = L (288 instances), D 2 =B (49 instances) and D 3 =R (288 instances)).Choosing a threshold =0.50, and based on the FM none of these classes (D 1 , D 2 , and D 3 ) has strong properties.So the proposed approach would not discover any rule for any class.Therefore, no HPRFH can be discovered for this dataset

Figure 7
Figure 7 depicts the comparative performance of the two algorithms.

Figure 7 .
Figure 7. Predictive accuracy of mined rules by the proposed algorithm and EACPRFH algorithm.

Table 1 .
the following HPRFH are generated.Nursery dataset has 12960 examples with no missing attributes, 8 attributes and 5 classes.From this dataset, the proposed approach discovered four rules (two ad-hoc HPRFH and two PR) which are given in Table2.Result from the Bridge dataset (threshold= 0.60).

Table 3 .
Summary of predictive accuracy results.