You are on page 1of 9

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.

COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

106

Survey on Post Mining Methodologies of Association Rules
KalliSrinivasaNageswara Prasad, Prof. S. Ramakrishna
Abstract-Association rules present one of the most impressive techniques for the analysis of attribute associations in a given dataset related to applications related toRetail, Bioinformatics, and Sociology. In the area of data mining, the importance of the rule management in Associating rule mining is rapidly growing. Usually, if datasets are large,the induced rules are large in volume. The density of the rule volume leads to the obtained knowledge hard to be understood and analyze. One better way of minimizing the rule set size is eliminating redundant rules from rule base.Many efforts have been made and various competent and excellent algorithms have been proposed. In this literature survey we discuss the recent approaches proposed in Post Association Rule Mining. The criteria of models discussed in this survey are much concern about the impact of input strategies. Index Terms:- Post Mining,Association Rules,Closed Item Set,Frequent Item Set, Knowledge Discovary. ------------------------------------------

---------------------------------------

1. INTRODUCTION
In general, association rules tend to deliver an efficient method of analysing binary or discretized data sets that are large in volume. One common practice is to determine relationships between binary variables in transaction databases, which is known as µmarket basket analyses. In the case of non-binary data, initially data being coded as binary and then association rules will be used to analyse. Association rules having their impact on analysing large binary datasets and considered as versatile approach for modern applications such as detection of bio-terrorist attacks [1] and the analysis of gene expression data [2], to the analysis of Irish third level education applications [4]. The steps involved in a typical association rule analysis are "Coding of data as binaryif data is not binary´ ->³Rule generation´ >³Post-mining". This survey focused on post mining. It was a century after the introduction of association rules (associations initially discussed in 1902), it is still continuing that the absence of items from transactions is often ignored in analyses. ________________________
y

2. ASSOCIATION RULE MINING
Given a set I that is non-empty, a rule of association is a statement of the form X Y, where X, Y I such that X  , Y  , and X Y = . The dataset X is called the antecedent of the rule, the set Y is called the consequent of the rule, and we shall call I the master itemset. Association rules are generated over a large set of transactions, denoted by Xn association rule can be deemed interesting if the items involved occur together often and there are suggestions that one of the sets might in some cases lead to the presence of the other set. Association rules are characterised as interesting, or not, based on mathematical notations called µsupport¶, µconfidence¶ and µlift¶. Although there are now a multitude of measures of interestingness available to the analyst, many of them are still based on these three functions. In many applications, it is not only the presence of items in the antecedent and the consequent parts of an association rule that may be of interest. Consideration, in many cases, should be given to the relationship between the absence of items from the antecedent part and the presence or absence of items from the consequent part. Further, the presence of items in the antecedent part can be related to the absence of items from the consequent part; for example, a rule such as {margarine} {not butter}, which might be referred to as a µreplacement rule¶. One way to incorporate the absence of items into the association rule mining paradigm is to consider rules

y

KalliSrinivasaNageswara Prasad, Research Scholar in Computer Science, SriVenkateswara University, Tirupati, AndhraPradesh, India. Prof. S. Ramakrishna, Department of ComputerScience, SriVenkateswara University, Tirupati, Andhra Pradesh, India.

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

107

of the form X Y [20]. Another is to think in terms of negations. Suppose X I,and then write ¬X to denote the absence or negation of the item, or items, in X from a transaction. Considering X as a binary {0, 1} variable, the presence of the items in X is equivalent to X = 1, while the negation ¬X is equivalent to X=0. The concept of considering association rules involving negations, or ³negative implications´, is due to Silverstein et al [20]. POST MINING Pruning rules and detection of rule interestingness are employed in the post-mining stage of the association rule mining paradigm. However, there are a host of other techniques used in the postmining stage that do not naturally fall under either of these headings. Some such techniques are in the area of redundancy- removal. There are often a huge number of association rules to contend with in the post-mining stage and it can be very difficult for the user to identify which ones are interesting. Therefore, it is important to remove insignificant rules, prune redundancy and do further post-mining on the discovered rules [18],[11]. Liu et al [11] proposed a technique for pruning and summarising discovered association rules by first removing those insignificant associations and then forming direction-setting rules, which provide a sort of summary of the discovered association rules. Lent et al.[19] proposed clustering the discovered association rules. 3.

model of the domain of the texts, for guiding the extraction of knowledge units from the texts. One of the obvious hot topics of data mining research in the last few years has been rule discovery from binary data. It concerns the discovery of set of attributes from large binary records such that these attributes are true within a same line often enough. It is then easy to derive rules that describe the data, the popular association rules though the interest of frequent sets goes further. POST MINING MODELS IN RECENT LITERATURE Huawen Liu, et al[14] prposed post processing approach for rule reduction using closed set to filter superfluous rules from knowledge base in a postprocessing manner that can be well discovered by a closed set mining technique. Most of these methods are based on the rule structure analysis where the relations between the rules have been analysed using corresponding problem. This procedure claims to eliminate noise and redundant rules in order to provide users with compact and precise knowledge derived from databases by data mining method. Further, a fast rule reduction algorithm using closed set is introduced. Other endeavours have been attempted to prune the rule bases directly. The typical cases have been elaborated and illustrated by eminent people from all over. Modest number of proposals is addressed on prepruning and post-pruning. In a line, the pruning operation occurs at the phase of generation of significant rules. To add to this, the post pruning technique mainly concerns primarily emphasises that pruning operation occurs after the rule generation, among which the rule cover is a representative case. To extract interesting rules, aprori knowledge has also been taken into account in literatures and a template denotes which attributes should occur in the antecedent or the consequent part of the rule.An association rule is an implication expression illustrates a kind of association relation. A rule is said to be interesting or valid if its support and confidence are user specified minimum support and confidence thresholds respectively. The association rule primarily comprises two phases based on the identification of frequent item sets from the given data mining contexts. However, the problem of massive real world data transactions can be rounded off by adopting other alternatives, which in turn benefits in lossless representation of data. Theoretically, transaction database and relation database are two different inter-transformable representations of data. The production of association mining is a rule base with hundreds to millions of rules. In order to 5.

4. INFLUENCES OF INPUT FORMATS:
The input formats that influence the post mining methodologies are binary data, text data and streaming data. In association rule mining, much of recent research work has focused on the difficult problem of mining data streams such as click stream analysis, intrusion detection, and web-purchase recommendation systems. In the case of streaming data, it is not possible to perform mining on cached and fixed data records. The attempt of caching data leads to memory usage issues, and the attempt of mining static data leads to worst time complexity since continuous dataset updates leads to continuoues passes through dataset. From the data mining point of view, texts are complex data giving raise to interesting challenges. First, texts may be considered as weakly structured, compared with databases that rely on a predefined schema. Moreover, texts are written in natural language, carrying out implicit knowledge, and ambiguities. Hence, the representation of the content of a text is often only partial and possibly noisy. One solution for handling a text or a collection of texts in a satisfying way is to take advantage of a knowledge

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

108

highlight the important and key ones, certain other rules are proposed which are Second ±order rule which states that if the cover of the item set is known, then the corresponding relation can be easily derived. All the technical definitions given hence forth deal with the transaction of the data through item-sets in association mining. The equivalent property significantly states that rules and classes of the same hierarchical database support the power in the content. Traditional data mining techniques are implemented in order to justify the property and its corresponding definition in the specified context. The thus identified second order rules can be used to filter out useless rules out of the priority rule-set. The effectiveness and efficiency of the classical methods in plating the rules is thoroughly verified under the 2.8 GHZ Pentium PC. Two group experiments were conducted to prune the insignificant association rules and to remove useless association classification rules. Removal of nonpredictive rules by virtue of information gain metric is much similar to CHARM(Closed Association Rule Mining) and CBA(classification Based on Association Rules) which also work on the same track. To generate association and classification rules by pruning method of Apriori software, some external tools are essentially required. The effectiveness of the pruning algorithm can be inversely related to the number of rules. This along with the computational time consumed, determine an efficient criterion of pruning.Efficient post processing methods are hence proposed to remove pointless rules from rule-ways by eliminating redundancy among rules. The dependent relation, exploitation, makes this method a self manageable knowledge. The pruning procedure has been sliced into three stages starting from derivation to pruning operation on rule-set by the use of close rule-sets. It is cost-effective and consumes very little time for the transaction. Hence forth, it can be applied to exploit sampling techniques and data structures, thereby increasing the efficiency. HetalThakkar et al[12] discussed a solution "Continuous Post-Mining of Association Rules in a Data Stream Management System" for streaming data.The overview of this model follows the mining process from data cleaning to post mining cannot be structured as a sequence of steps performed different scientists. But the process must be streamlined into a workplace supported by an efficient system providing quality of service guarantees that are expected from higher Data Stream Management systems(DSMS¶s). The various architecture techniques that determine the functionality in the Stream Mill Miner(SMM)prototype, an SQL-based DSMS that

could be designed to support continuous mining queries. It supports various applications such as click stream analysis,instrusion detection and webpurchase recommendation systems. The problem of post mining the association rules, so derived from data streams has been given less attention but it has rich importance in practical and researchsectors. Data stream mining is a continuous and incremental process and hence forth applied for specific knowledge and meta knowledge to develop quick search of new rules. The current search for new frequency patterns and post processing candidate rules derived earlier have paved through efficient and tightly coupled primitives for mining and post mining association rules. SMM solves some of the inter related problems pertaining to: 1. Performance 2. Functionality and 3. Usability. This mining method includes operators making online mining fast for burst data streams. The steps involved are preprocessing of data, core mining task and post mining task for rule extraction, validation and preservation. The major drawbacks being the difficulty level and development of trivial and nonsensical generation of rules, perhaps the SMM filters large set of candidate rules.It thus customizes the prioritizing and filtering rules by semantic criteria(correlation). It memorizes rules that were graded earlier and hence applies them. The RPM (Rule Post Miner) module compares and filters all invalid data. Mozafari¶sSliding Window Incremental Mining (SWIM) algorithm is used to mine the data streams that support incremental pattern mining and feed back in post mining process. The SWIM algorithm divides data stream into segments (windows) and each window is further divided into n slides. The Pattern tree which is much similar to Han¶s fp-tree(Frequent Pattern tree) data structure keeps count on the all the frequent patterns. The true count of every pattern is updated in Pattern Tree, by subtracting the count from that in the expired slide and adding the count from new slides. However the frequency of new pattern is unknown since it wasn¶t counted in the previous slides. Expiry of all the slides is the criteria of termination of computation. Verification, a primary drive to SWIM shows subtle difference of skipping all frequencies below the minimum frequency, which is not the case with simple counting. The managing association rules sets user specified thresholds named confidence. If this confidence is above the user specified threshold then it is termed as frequent association rule. With generalization, validation and summarization of associate rules it is possible to characterize the filter,

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

109

and is supported by rule history. But in SMM special focus lays on temporal analysis technique proposed by Liu et al(2001).It determines true and stable relationships in the domain and divides the rules in three categories: Semi-Stable, Stable and Trendy. SMM also supports SQL-TS Sequence Query language which makes the analysis of historical values easy. By providing collaborative filtering over the rules, one can predict the support and confidence of a given rule in a time interval. DSMSs by Thakkar in 2008 play an important role in load shedding, query scheduling, buffering, windowing and synopses which play important role in online mining process. Thus all algorithms need to be integrated within DSMS namely SMM to leverage DSMS essentials. One can modify/extend the mining model to derive new models as such of associate rule mining. The SMM model provides relational views much similar the Calder¶s over patterns and rules that are discovered from data streams. The match-past rules allows the system to report only the changes in association rules, as opposed to providing a huge list of patterns again and again and are listed in New Rules. The two separate data flows are designed as follows in SMM model. F1: Frequent Itemsets Association Rules Prune Summarize Rules Match Past Rules. F2: Frequent Itemsets Association Rules Match Past Rules. The SMM claims to have sufficient power and functionality to support stream mining process. Earlier database of archived rules supports the Summarization, Clustering and Trend analysis in this technique. And do result in high level mining language that specifies mining process. HaceneCherfi et al[10] discussed a post association rule minig approach for text mining, which is an attempt to define the criteria for classifying the extracted rules that basically convey a common knowledge, most of which are classified based on numerical quality measures. This model was derived based on the earlier literature and proposals made by Kuntz et al.[13],Jaroszewicz et al.[15, 16] and Faure et al. [17]. The authors have introduced two classification methods- the classical numerical approach and the domain knowledge. In the crisis of text handling, one satisfying solution is to take advantage of a knowledge model of the domain of the texts. In the paper elaborates a Knowledge Based Text Mining Process (KBTM) relying on the Knowledge Discovery Process(KDD). The objective of the KBTM process is to enrich the knowledge model of the text domain, and, in turn improve the process itself. Two text mining approaches based on association rules are studied in

detail.The experiments conducted, characterize the areas of extracting textual fields in the sources, Partof Speech tagging, terminological indexing, key term identification and variants. An association rule is a weighted implication such as A B which relies on probabilities given by the confidence of the rules. Several algorthms aim at extracting association rules. Several attempts to imply the rules in a fruitful way led to the introduction of a number of other quality measures such as the interest measures, the conviction, the dependency, the novelty and the satisfaction measures. The Conformity of an Association Rule with respect to a Knowledge model can be applied onto simple rules through a series of definitionsKnowledge model, Conformity of a simple AR with the Knowledge model, Calculation of the confirmity measures for Simple rules. Given the domain knowledge model, the transistion probability can be tabulated and used as a basis of confirmity calculation wherein the transistion depends on the minimal distance between the keyterms. All the postulates are thoroughly datailed through simple illustrations. The paper also describes about the conformity for complex rules which have their left and right parts composed of more than one keyterms and three different rules have been differentiated. It has been rigidly fortified about the application of Text mining on Molecular Biology corpus. All the discussed rules, both simple and complex, have been applied into the corpus. In the KBTM process, the quality analysis of the AR rankings is conducted for four different AR classes wherein we compare its conformity measure, statistical measures. This methodology is of great interest to highlight rule properties such as resistance to noise in the data set, or to establish whether a rule is extracted randomly or not. However, the limitations of the measures come from the fact that they do not consider any knowledge model. The background knowledge is used using the data mining process with a Bayesian Network to filter interesting frequent itemsets. A Bayesian network is similar to the knowledge model described earlier in the paper. A sampling based approach algorithm for fast discovery of the interesting itemsets, is given in the model. This methodology is further extended to drive both the AR extraction and the Bayesian Network's weight updates. Another interesting work for the post mining of the association rules involves user attraction as background knowledge. Here, the number of rules to study decreases rapidly than a random choice. Basu et.al[21] proposes another approach and uses WORDNET lexical network to evaluate the quality of the rule where keyterms are

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

110

actually wor ds. However, it cannot be used to update a knowledge model. Our present research knowledge sets a knowledgebased text mixing driven by the knowledge model of the domain. The behaviour of the conformity measure is in agreement with the KBTM process. Furthermore, the conformity measure proposed in this first study can be extended to a number of promising directions in order to assess its effectiveness in different knowledge domains and contexts. In doing so, we will be able to have a deeper understanding of the texts and suggest an accurate modification of the modification of the knowledge model itself within the KBTM process. Ronaldo Cristiano Prati [3] proposed a post mining process called "QROC: A Variation of ROC Space to Analyze Item Set Costs/Benefits in Association Rules". They presented an extension to ROC analysis named as QROC is presented which can be applied to association rules and can be used to help analysts to evaluate the relative intrestingness among different association rules in different cost scenarios. Receiver operation characteristics (ROC) graph is a popular way of assesing the performance of classification rules which, are based on class conditional probailities making them inappropriate to evaluate the quality of association rules. This extension is necessary as it can be applied in classification content; it is often assumed that class conditional likelihoods are constant. This extension doesn't hold in association rule content as the consequent part of two different association rules might be very different from each other. In many data mining applications a key issue to discover useful and actionable knowledge from association rules is to properly select the most profitable rules. It is desirable to properly take into account expected costs/benefits of different items sets of different item sets of different rules. ROC developed in context of signal detection theory in engineering applications. ROC convex hull(ROCCH) provides a straightforward tool to select possibly optimal models and to discard sub-optimal ones independently from context or the class distribution. A variation of ROC graphs, named QROC for association rule evaluation has two advantages, firstly, for pruning uninterested rules from a large set of association rules and secondly, for helping the expert in analysing the performance of the association rules in different cost scenarios. Rules can be considered as an association of two binary variables i.e. rule antecedent(x), rule consequent(y). To assess the degree of agreement between x and y a 2×2 contingency table is a convenient way to tabulate statistics for evaluating

the quality of an arbitrary rule. Numerous rule evaluationmeasures can be derived from contingency table The conditional probability that x is false given that y is false is the Specificity (SP) and the conditional probability that x is true given that y is true is the Sensitivity (SE). ROC space is appropriate for measuring quality of classification rules. A model is said to be the best of a set of models, if it dominates all other models by having greater sensitivity and greater specificity. Iso-performance lines are obtained and the lines which are more northwest are better as they correspond to models with lower expected cost. A model is optimal only if it lies on the upper convex of the set of points in the ROC space. Pitfalls of using ROC analysis for evaluating association rules are due to the assumptions often made in classification contexts that the conditional distributions are the same for all points in the ROC space does not hold. To be able to extend ROC analysis to association rules, we must previously perform a scale conversion so that analysing the graph we can directly compare the performance of rules with different consequents. In QROC graph the axes are directly re-scaled so that points in QROC can be directly compared regardless of the consequent. QROC is especially suitable when there is a strong skew ratio between the marginal probabilities in a 2×2 contingency table. In such situations most points are constrained to the corner in the ROC space. The proposed extension is based on ³corrected for chance´ or quality versions of sensitivity and specificity. These measurements are also known as Cohen's kappa coefficients. The weighed kappa coefficient, which can be used to take different disagreement costs into account in the analysis, is also presented. The concepts of isometrics are used in ROC space for geometric derivation of QROC space. Prevalence is the fraction of cases for which the consequent y of a rule is true. The QROC plane is a re-scaling of the ROC space to give chance to points at different levels of p(y) to become more visible. The geometric interpretation of rule quality measures is used for evaluating association rules in the QROC space. The coverage isometrics in ROC space are parallel lines in which the slope depends on p(y). The chi-square index is the difference between observed and expected values for each inner cell in the contingency table and is used as a measure of dependence between two variables. Under the illustrative example the retail data set available in the Frequent Item Set mining data set repository is used. The data are collected over three non-consecutive periods totalising approximately five months of data between half of December 1999 to the end of November 2000. After running the

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

111

implementation, it was found that the rules appear concentrated in the low right corner of the ROC graph. Thismodel was influenced by proposal of Prati and Flach [5] that is an iterative algorithm which uses the ROC convex hull to select rules from a large rule set so that the selected rules from a rule list,proposal of Fawcett [6]that labelled as PRIE algorithm that uses some principles from rule learning and computational geometry to search for promising rule list and Kawano and Kawahara[7] proposed model called ROC based extension of the standard association rule mining algorithm in the context of an information navigation system. Much work has been done in terms of rule interestingness measures aiming to select the most interesting rules out of a large set of discovered rules. Piatetsky, shapiro et al.,[8] presented three principles of rule interestingness and Liu et al[9] proposed the use of the chi-square statistical test as a way of pruning uninteresting rules, which can also be claimed as related literature for this proposal. Claudia Marinica et al [22] proposed a model to overcome the drawbacks in data mining. The proposed model based on concise representations, redundancy reduction, and post processing methods. They added the usage of ontologies and rule schema formalism for the convenience of the users.Integrating domain expert knowledge had been proposed to reduce the number of rules.AssociationRule Mining which aims at discovering implicative tendencies is one of the most important tasks in Knowledge Discovery in Databases.An association rule is defined as implication.The support threshold should be low in order to extract valuable information, which increases the number of rules.Several algorithms and postprocessing methods were introduced to compensate this drawback. The rule pre-processing methods should be imperatively based on strong interactivity with the user.The user knowledge is directly proportional to the rule selection. Ontology is the most appropriate representation to express the complexity of the user knowledge. This paper proposes ARIPSO (Association Rule Interactive Post-processing using Schemas and Ontologies). The association rule mining task can be stated by means of itemsets. An association rule is an implication of two functions(X &Y) namely antecedent and consequent. In consequence, in a collection of rules, the nonredundant rules are the most general ones. The rule set with no greater interestingness than one of its more general rules is optimal. ontology is quintuple. This paper contributes on several levels at reducing the number of association rules and a specification language proposed by Liu et al. General

Impressions (GI), Reasonably Precise Concepts (RPC), and Precise Knowledge (PK)by the use of ontology concept. Interestingness measures represent metrics in the process of capturing dependencies and implications between database items, and express the strength of the pattern association. Mining frequent closed itemsets was proposed in order to reduce the number of frequent itemsets. CLOSET(Closed Itemsets) algorithm which uses a novel frequent pattern tree was proposed as a new efficient method for mining closed itemsets. Zakians Hsiao used frequent closed itemsets in Charm algorithm. They first found minimal generator for closed itemsets, and then, they generated nonredundant association rules using two closed itemsets. Pasquier et al proposed the CLOSET algorithm in order to extract association rules. The list of interesting literature that related to this model; i). Zaki and Hsiao[23], which proved that algorithm CHARM out-performs CLOSET[24], Closed Item sets[25], and Mafia[26] algorithms. ii). Li et al[27] proposed optimal rules sets, defined with respect to an interestingness metric. Reduction techniques based on generalization of the antecedent/consequent of the rules for redundant rules was proposed and implemented. iii) Hahsler et al [28] were interested in generating association rules from arbitrary sets of itemsets. iv). Toivonen et al[29]. proposed in a novel technique for redundancy reduction based on rule covers, the subset of a rule set describing the same database transaction set as the rule set. v). Bayardo, Jr., et al[30]. proposed a Minimum Improvement described as the difference between the confidences of two rules in a specification/ generalization relationship. Omiecinski proposed in three new important interestingness measures in order to fulfil the drawbacks both closed and maximal itemset mining still break down at low support thresholds. Interestingness had been divided into objective measures and subjective measures. vi).Klemettinen et al[31]. Proposed templates to describe the form of interesting rules (inclusive templates). Imielinski et al. proposed a query language for association rule pruning based on SQL, called M-SQL. Ontology was defined by Gruber as a formal, explicit specification of a shared conceptualization. Maedche and Staab proposed a more artificial-intelligence-oriented definition. Depending on the granularity, four types of ontologies are proposed in the literature: Upper (or top level) ontologies, Domain ontologies, Task ontologies, and Application ontologies. This model focuses on domain and background knowledge ontologies. Domain ontologies were introduced by Srikant and Agrawal[32] with the concept of

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

112

Generalized Association Rules(GAR) and also taxonomies of mined data. The item-relatedness filter was proposed by Natarajan and Shekar. Garcia et al. developed in and extended a novel technique called Knowledge Cohesion (KC) composed of Semantic Distance (SD) and Relevance Assessment (RA). Rule Schemas bring the expressiveness of ontologies in the post processing task of association rules combining not only item constraints, but also ontology concept constraints. A Rule Schema expresses the fact that the user expects certain elements to be associated in the extracted association rules. Authors had proposed two operators: Pruning and Filtering operators. The filtering operator is composed of conforming, unexpectedness, and exception. This study is based on a questionnaire database, provided by Nantes Habitat dealing with customer¶s satisfaction concerning accommodation. The expert proposed a set of pruning rule schemas and filtering rule schemas, the quality of the selected rules was certified by the Nantes Habitat expert. This paper discusses the problem of selecting interesting association rules throughout huge volumes of discovered rules. By applying their new approach over a voluminous questionnaire database, they allowed the integration of domain expert in the post processing step in order to reduce the number of rules, the quality of the filtered rules was validated by the expert throughout the interactive process. Observations: Huawen Liu, et al[14] presented a technique on post-processing for rule reduction using closed set that was targetting to filter the otiose rules in a postprocessing of rule mining. The empirical study proved that the discovery of dependent relations from closed set helps to eliminate redundant rules. HetalThakkar et al [12] In the case of stream data, the post-mining of association is more challenging and continuous post mining of association rules is an unavoidable requirement, which is discussed by this author. He presented a technique for continuous postmining of association rules in a data stream management system. He described the architecture and techniques used to achieve this advanced functionality in the Stream Mill Miner (SMM) prototype, an SQL-based DSMS designed to support continuous mining queries, which is impressive. HaceneCherfi et al, [10] discussed a post association rule minig approach for text mining that combines data mining, semantic techniques for post-mining and selection of association rules. To focus on the result analysis and to find new knowledge units, classification of association rules according to

qualitative criteria using domain model as background knowledge has been introduced. The authors carried out an empirical study on molecular biology datasetthat proved the benefits of taking into account a knowledge domain model of the data. . Ronaldo Cristiano Prati [3]. The Receiver Operating Characteristics (ROC) graph is a popular way of assessing the performance of classification rules, but they are inappropriate to evaluate the quality of association rules, as there is no class in association rule mining and the consequent part of different association rules might not have any correlation at all. Chapter VIII presents a novel technique of QROC, a variation of ROC space to analyzeitemset costs/benefits in association rules. It can be used to help analysts to evaluate the relative interestingness among different association rules in different cost scenarios. Claudia Marinica et al [22]. Discussed the problem of selecting interesting association rules from given ruleset that is huge in volume. They aimed to integrate user knowledge into association rule mining. They attempt to integrate user knwoledge using ontologies and rule schemas. These authors empirically proved that the model proposed was successfull to deliver the significant results. But the whole proposed model based on the ontologies and rule schemas, which can¶t be possible to manage without domain expertization and format of the rule schemas. Conclusion In the process of this literature survey we noticed that none of the available post mining mechanisms are effective and flawless. We also observed that in this post association rule mining domain, no or fewer number of articles published in recent years. These observation claims that there is huge research scope in post association rule mining domain.

References [1]Fienberg, S. E. &Shmueli, G: "Statistical issues and challenges associated with rapid detection of bio-terrorist attacks. Statistics in Medicine, 2005 [2] Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J. M., &Pascual-Montano, A: :Integrated analysis of gene expression by association rules discovery". BMC Bioinformatics, 2006 [3] Ronaldo Cristiano Prati, "QROC: A Variation of ROC Space to Analyze Item Set Costs/Benefits in Association Rules", Post-

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

113

Mining of Association Rules: Techniques for Effective Knowledge Extraction, IGIGlobal,2009 Pages: 133-148 pp. [4]McNicholas, P. D: "Association rule analysis of CAO data (with discussion). Journal of the Statistical and Social Inquiry Society of Ireland, 2007 [5]Prati, R., &Flach, P. "ROCCER: an algorithm for rule learning based on ROC analysis". Proceeding of the 19th International Joint Conference on Artificial Intelligence, 2005 [6]Fawcett, T: "PRIE: A system to generate rulelists to maximize ROC performance". Data Mining and Knowledge Discovery Journal, 2008, 17(2), 207-224. [7] Kawano, H., & Kawahara, M: "Extended Association Algorithm Based on ROC Analysis for Visual Information Navigator". Lecture Notes In Computer Science, 2002, 2281, 640649. [8] Piatetsky-Shapiro, G., Piatetsky-Shapiro, G., Frawley, W. J., Brin, S., Motwani, R., Ullman, J. D., et al: "Discovery, Analysis, and Presentation of Strong Rules". Proceedings of the 11th international symposium on Applied Stochastic Models and Data Analysis ASMDA, 2005, 16, 191-200 [9]Liu, B., Hsu, W., & Ma, Y: "Pruning and summarizing the discovered associations". Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining 1999, (pp. 125-134) [10] HaceneCherfi, Amedeo Napoli, Yannick Toussaint: "A Conformity Measure Using Background Knowledge for Association Rules: Application to Text Mining", Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction, IGIGlobal,2009 Pages: 100-115 pp. [11] Liu, B., Hsu, W., & Ma, Y.: "Pruning and summarizing the discovered associations". Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 1999 [12] HetalThakkar, BarzanMozafari, Carlo Zaniolo: "Continuous Post-Mining of Association Rules in a Data Stream Management System", Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction, IGIGlobal,2009 Pages: 116-132 pp.

[13] Kuntz, P., Guillet, F., Lehn, R., & Briand, H: "A User-Driven Process for Mining Association Rules". In D. Zighed, H. Komorowski, & J. Zytkow (Eds.), 4th Eur. Conf. on Principles of Data Mining and Knowledge Discovery 2000 [14]Huawen Liu, Jigui Sun, Huijie Zhang: "Post-Processing for Rule Reduction Using Closed Set", Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction, IGIGlobal,2009 Pages: 81-99 pp. [15]Jaroszewicz, S., &Simovici, D. A: "Interestingness of Frequent Itemsets using Bayesian networks as Background Knowledge". ACM SIGKDD Conference on Knowledge Discovery in Databases, 2004 [16] Jaroszewicz, S., &Scheffer, T: "Fast Discovery of Unexpected Patterns in Data, Relative to a Bayesian Network". ACM SIGKDD Conference on Knowledge Discovery in Databases, 2005 [17] Faure, C., Delprat, D., Boulicaut, J. F., & Mille, A: "Iterative Bayesian Network Implementation by using Annotated Association Rules." 15th Int¶l Conf. on Knowledge Engineering and Knowledge Management ± Managing Knowledge in a World of Networks, 2006 [18]Liu, B., & Hsu, W: "Post-Analysis of Learned Rules". AAAI/IAAI, 1996 [19] Lent, B., Swami, A. N., &Widom, J: "Clustering Association Rules". Proceedings of the Thirteenth International Conference on Data Engineering, 1997, Birmingham U.K., IEEE Computer Society [20] Savasere, A., Omiecinski, E., &Navathe, S. B: "Mining for strong negative associations in a large database of customer transactions". Proceedings of the 14th International Conference on Data Engineering, Washington DC, USA, 1998 [21] Basu, S., Mooney, R. J., Pasupuleti, K. V., &Ghosh J: "Evaluating the Novelty of TextMined Rules using Lexical Knowledge". 7th ACM SIGKDD International Conference on Knowledge Discovery in Databases, 2001 [22] Claudia Marinica and FabriceGuillet: "Knowledge-Based Interactive Postmining of Association Rules Using Ontologies", IEEE TRANSACTIONS ON KNOWLEDGE AND

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

114

DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010 [23] M.J. Zaki and C.J. Hsiao, ³Charm: An Efficient Algorithm for Closed Itemset Mining,´ Proc. Second SIAM Int¶l Conf. Data Mining, pp. 34-43, 2002. [24] J. Pei, J. Han, and R. Mao, ³Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets,´ Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 21-30, 2000. [25] M.J. Zaki and M. Ogihara, ³Theoretical Foundations of Association Rules,´ Proc. Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD ¶98), pp. 1-8, June 1998. [26]D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, ³Mafia: A Maximal Frequent Itemset Algorithm,´ IEEE Trans. Knowledge and Data Eng., vol. 17, no. 11, pp. 1490-1504, Nov. 2005. [27] J. Li, ³On Optimal Rule Discovery,´ IEEE Trans. Knowledge and Data Eng., vol. 18, no. 4, pp. 460-471, Apr. 2006. [28] M. Hahsler, C. Buchta, and K. Hornik, ³Selective Association Rule Generation,´ Computational Statistic, vol. 23, no. 2, pp. 303315, Kluwer Academic Publishers, 2008. [29] H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hatonen, and H. Mannila, ³Pruning and Grouping of Discovered Association Rules,´ Proc. ECML-95 Workshop Statistics, Machine Learning, and Knowledge Discovery in Databases, pp. 47-52, 1995. [30]J. Bayardo, J. Roberto, and R. Agrawal, ³Mining the Most Interesting Rules,´ Proc. ACM SIGKDD, pp. 145-154, 1999. [31] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, ³Finding Interesting Rules from Large Sets of Discovered Association Rules,´ Proc. Int¶l Conf. Information and Knowledge Management (CIKM), pp. 401-407, 1994. [32]R. Srikant and R. Agrawal, ³Mining Generalized Association Rules,´ Proc. 21st Int¶l Conf. Very Large Databases, pp. 407-419, http://citeseer.ist.psu.edu/srikant95mining.html, 1995.

About Authors: KalliSrinivasaNageswara Prasad has completed M.Sc (Tech)., M.Sc., M.S (Software Systems)., P.G.D.C.S. He is currently pursuing Ph.D degree in the field of Data Mining at Sri Venkateswara University, Tirupati, Andhra Pradesh State, India. He has published Three papers in International journals.

Prof. S.Ramakrishna is currently working as a professor in the Department of Computer Science, College of Commerce, Management & Computer Sciences in Sri Venkateswara university, Tirupathi, Andhra Pradesh State, India. He has completed M.Sc,M.Phil., Ph.D.,M.Tech(IT).He is specialized in Fluid Dynamics & Theoretical Computer Science. His area of Research includes Artificial Intelligence, Data Mining & Networks. He has an experience of 23 years in Teaching field. He has published 32 Research papers in National & International Journals. He has also attended 13 national Conferences and 11 International Conferences. He has guided 14 Ph.D Scholars and 15 M.Phil Scholars.