Classification of objective interestingness measures

The creation of the interestingness measures for evaluating the quality of the association rule based knowledge plays an important role in the post-processing of the Knowledge Discovery from Databases. More and more interestingness measures are proposed by two approaches (subjective assessment and objective assessment), studying the properties or the attributes of the interestingness measures is important in understanding the nature of the objective interestingness measures. In this paper, we focus primarily on the objective interestingness measures to obtain a general view of recent researches on the nature of the objective interestingness measures, as well as complete a new classification on 109 selected objective interestingness measures on 6 criterions (independence, equilibrium, symmetry, variation, description, and statistics).


Introduction
The process of knowledge discovery from databases (KDD) (Fayyad et al., 1996) is usually divided into three main stages: preprocessing, processing or forming knowledge patterns (mining), and post-processing those patterns.The evaluation of the interestingness or the quality of the patterns found in the processing stage is always one of the contents attracting researchers.During the last decade, the research community in the KDD field recognized the postprocessing stage to evaluate the interestingness or the quality of the knowledge patterns generated from the processing stage to be a complex and important part of the KDD process (Silberschartz and Tuzhilin, 1996; Liu et al., 1999;Hilderman and Hamilton, 2001;Tan et al., 2004).For solving this problem, most of approaches are based on the creation of the interestingness measures.From the initial approaches (Piatetsky-Shapiro, 1994; Piatetsky-Shapiro and Matheus, 1991; Agrawal and Srikant, 1994) to the recent approaches, many interestingness measures with reciprocal nature has been proposed to search the best knowledge with many views, perspectives and different evaluations (Sahar and Mansour, 1999) such as summarization (Hildermand and Hamilton, 2001), objectiveness (Tan et al., 2004;Huynh et al., 2007;Bayardo and Agrawal, 1999;Guillet and Hamilton, 2007;Tamir and Singer, 2006 (Silberschatz and Tuzhilin, 1996).
The interestingness measures can be divided into two types (Silberschatz and Tuzhilin, 1996): subjective interestingness measures and objective interestingness measures.The subjective measures evaluate the found knowledge patterns by basing on the target, the knowledge, and the belief of user.The objective measures evaluate knowledge patterns by basing on the distribution of data.
This article focuses on studying the evaluation criteria in theory for objective measures.These objective measures are commonly used for evaluating the quality of knowledge patterns in the association rule form (Agrawal and Srikant, 1994).
The article is organized into six sections.Section 1 introduces the approaches of interestingness measures generally.Section 2 is about an overview of subjective interestingness measures.Section 3 presents objective interestingness measures, and the method to calculate their values by using association rules.Section 4 analyses and summarizes the basic criteria in evaluating the quality of objective measures.Section 5 classifies those objective measures by using some key criterions, and raises the comments concerning the measure nature.The last section summarizes some achieved important results.

Subjective interestingness measures
Subjective measures (Piatetsky-Shapiro and Matheus, 1994; Silberschatz andTuzhilin, 1995, Silberschatz andTuzhilin, 1996) were studied in the domain-independent context.The interestingness or the benefit from an achieved knowledge pattern (e.g., an association rule, classification rule, etc.) is subjectively evaluated by the view and the perspective of user.A knowledge pattern is usually identified as an interesting or useful one by basing on two approaches (Silberschatz and Tuzhilin, 1996): (i) a knowledge pattern is considered to be unexpectedness if it causes users to surprise (Silberschatz and Tuzhilin, 1995); (ii) and a knowledge pattern is considered to be actionability if users can build actions from the found knowledge, and those actions bring benefit to users (Piatetsky-Shapiro and Matheus, 1994).

Actionability
Actionability is a subjective interestingness measure allowing users to create some actions in response to the newly found knowledge (Silberschatz and Tuzhilin, 1996).The method for capturing association rules and using them to propose the actionable patterns is always a difficult issue.One of the important factors affecting the above mentioned issue is the required actions (i.e., from the perspective of the individual) which can change over time, and are also very difficult to retain.
The found knowledge patterns resulting in suggested actions can be found via the system exploring the change of rules (Piatetsky-Shapiro and Matheus,1994), the hierarchical structure of actions, or the extraction of patterns responding to actions.

Unexpectedness
Unexpectedness is a subjective interestingness measure which provides the knowledge patterns not previously anticipated, and being contradictory to the users' expectation (Silberschatz and Tuzhilin, 1996).The users' expectation depends strongly on the user's belief.The belief can be divided into two types: (i) the hard belief -the belief constraints are unchanged and depend strongly on the users' perspective, and (ii) the soft belief -the user wants to change to a certain allowed level of the belief.The level of the soft belief can be associated with different approaches such as Bayesian, Dempster-Shafer, frequency of the occurrence, or statistics.
An association rule (i.e., a knowledge pattern) will always be interesting or beneficial if it is contrary to the existing hard belief of users.For the soft belief, the interestingness of a knowledge pattern can be calculated as the follow ∑ with is the weight function associated with each the soft belief in the soft belief system , ∑ are the events occurring before.

Objective interestingness measures
Suppose that is a finite set of transactions (e.g., transactions of customers in a supermarket (Agrawal and Srikant, 1994)).An association rule is represented in the form where and are two disjoint sets .Set (set ) is attached to a subset of transactions { } ( ).Set ̅ ( ̅ ) is attached to ).In order to accept or reject the tendency of 's appearance when has appeared, normally  The interestingness value of an association rule based on an objective interestingness measure will then be calculated by using the cardinality of the rule: ̅ .To calculate easily, the following equivalent transformations should be used: For example, two given sets and and an association rule is in the form where ̅ ; and the objective interestingness measure, Pavillon, is identified by the formula: ̅ ̅ ̅ , the interestingness value is: . The formulae of interestingness measures calculated by using the cardinality ̅ are collected and presented in Table 1 (see Appendix).

Evaluation criteria
In order to understand how an objective interestingness measure is "good", several criteria have been proposed (Bayardo and Agrawal, 1999  The trend of the values decline of an interestingness measure should start slowly when there are appearances of elements or transactions that do not support the existence of the studied association rules by the reasons such as a change, a noise, and an error (Figure 2).These interestingness values then should decrease rapidly when there are more and more appearances of elements that do not support the formation of rules, and strongly threaten to the formation of the existence of association rules being reviewed and evaluated.The interestingness values of an objective measure have to also decline as there are more and more appearances of the unimportant transactions (i.e., it does not contain any useful information according to Shannon entropy), which does not contain any information about the formation of association rules.
In addition, a good objective interestingness measure is not allowed to output interestingness values varying linearly with the number of elements that do not support the formation of the corresponding rule.Observing and evaluating particular situations that occur during the variation of interestingness values is an important method to understand the behaviour of interestingness measures effecting on association rules deeply.Two important particular situations are investigated: independence and equilibrium.Both situations are called the subject of an objective interestingness measure.
Independence occurs when the antecedent and the consequent of an association rule are independent together according to statistical factors.This situation occurs when or ̅ ̅ , then the interestingness value of the rule is a constant.
Equilibrium occurs when the number of elements that support the formation of a rule and the number of elements that does not support the formation of that rule are equal.This situation occurs when ̅ , then the interestingness value of the rule is a constant.

( )
By considering the variation of interestingness values from independence value or equilibrium value, the interestingness measure will be evaluated as the change tendency from independence value or equilibrium value.
Moreover, the determination of a threshold of an interestingness value will be necessary if we wish to observe a limited range of the benefit value.When ̅ , the association rule tends to become a logical rule.In this case, the implicative tendency of an association rule will not exist, and the association rule is not itself as well as loses its interestingness.

Paradoxical situation
The interestingness values of a measure are not the same when the paradoxical situation occurs such as in the symmetric situation or in the inverse situation ̅

Countable situation
The analysable criterion of an interestingness measure (i.e., countable) helps determine the order or create a pre-order structure.

Diversification
Interestingness measures have to be fully analysed on the flexibility and the generality when they are handled and applied on the different types of variables.

Discriminative ability
The discriminative ability of an objective interestingness measure is not affected by a noise or a big capacity data (i.e., increases).If the interestingness value of a measure is not vary when its input parameters vary with a certain coefficient : ̅ ̅ , then that measure is called a descriptive measure (a statistical measure in the otherwise).
The descriptive or the statistical aspect of a measure is also known as the nature of the measure.

Interpretable situation
The execution time of formulas and algorithms used to calculate the interestingness values of association rules is not been too long.Their definitions have to be assessed visually, and the obtained values have to be explainable.

Imbalance
The unbalanced problem will be interested when the effect of a little number of elements that does not support the formation of association rules (i.e.̅ ) is observed.This attention is essential because it can bring the extremely valuable knowledge.

Attribute interestingness
An interested association rule in the entire set of rules may lead to the situation in which two rules will have the same interestingness value.These two rules can have two different degrees of interestingness for users.The distinction is based on the appearance of the attribute in the rule antecedent.To solve this problem, the degrees of interestingness of each attribute appearing in the rule antecedent of an association rule need to be interested.

Quasi-
Determining the quasi-relationships in calculating the interestingness values is placed in the context to be determined, in some cases, some of the relationships among objective interestingness measures.Relationships to be considered are quasi-implication, quasi-conjunction and quasi-equivalence.
An interestingness measure is a quasi-implication if that measure satisfies the condition ̅ ̅ where ̅ .An interestingness measure is a quasi-conjunction if that measure satisfies the condition where ̅ ̅ .An interestingness measure is quasi-equivalence if that measure satisfies the condition

Classification of interestingness measures
In this research, to collect the interestingness measures, the selected articles have to own the following criteria: (i) studying the interestingness measures and being cited by many others articles, (ii) being published by the reliable sources such as IEEE, Springer, ACM, Science Direct, (iii) being researched and analysed by the research groups independently.
The collected result shows that there are: (i) 21 groups of interestingness measures in which each group consists of the measures called by different names but having the same formula (in Table 2); (ii) 109 interestingness measures presented (in Table 3).This research focuses on some important criteria mentioned in the previous section.They are Variation (VAR.),Independence (IND.),Equilibrium (EQU.),Symmetric (SYM.),Descriptive (DES.), and Statistical (STA.).Table 4 presents the responses of 109 interestingness measures for these criterions where 1 is responsive, and 0 is unresponsive.
for Innovation EAI Endorsed Transactions on Context-aware Systems and Applications - Based on the results in Table 4, the interestingness measures for each criterion is listed in Table 5, and the classification of these 109 objective interestingness measures is shown in Table 6.

Conclusions
A lot of researchers in field KDD focus on ranking association rules by using the interestingness measures.Two types of interestingness measures studied in those researches are: subjective measures and objective measure.This article searched 109 objective interestingness measures which are discussed widely, transformed their formulae into a generic form using the cardinality ̅ , learned the evaluation criteria, and classified those interestingness measures based on 6 criterions.This classification is also evaluated closely to show the relationship among measures with common and particular characteristics. Appendix.

̅
(negative examples, contra-examples) which does not tend to support the rule formation would be interested.Each rule is characterized by 4 parameters:

4. 1 .
Value variation Determining the variation of interestingness values is always one of the most important criteria in evaluating interestingness measures.The interestingness value increases monotonically with and decreases monotonically with ̅ or ̅ .It should be noted that values of ̅ ̅ vary while the other parameters are the fixed values.This helps us track the variation of interestingness values clearly and homogeneously.

Table 2 .
Interestingness measures called by different names but having the same formula EAI Endorsed Transactions on Context-aware Systems and Applications 05 -09 2016 | Volume 3 | Issue 10 | e4 EAI Endorsed Transactions on Context-aware Systems and Applications 05 -09 2016 | Volume 3 | Issue 10 | e4Classification of objective interestingness measures

Table 5 .
The interestingness measures of each criterion