You are on page 1of 4

Multi-Objective Metaheuristic Algorithms for Finding Interesting Rules in Large Complex Databases

This research was funded by the EPSRC, grant number GR/T04298/01. Introduction The research was concerned with developing efficient algorithms for finding classification rules from large databases. The data mining task of classification is concerned with finding patterns in classification data, that is, data which has a class label for each instance or record in the database. When the classification task is restricted to a pre-defined class label, the data mining task is known as partial classification or nugget discovery. The patterns sought for this task are class descriptions, often of the general form antecedent ⇒ class-label = u for some value of u. The antecedent is a predicate used to define subsets of records from the database, and it is often defined by a conjunction of attribute-value tests. The simplest rule format may include only individual equality tests on nominal attributes (hence numeric attributes would require pre-discretisation). More complex rule formats include tests on a set of values (a disjunction of values) for a nominal attribute or a range of values for an ordinal attribute. A more complex format may imply a better class description, but there is a much larger search space of possible rules. Current methods for classification rule induction work by placing constraints on the search space to keep the problem tractable. Alternatively, the search space may be constrained with the use of minimum support and confidence constraints. These approaches may lead to loss of information and weaker class descriptions. There are problems of scalability for large databases. Additionally, the number of rules found is frequently large, and many of them may describe the same population. We used multi-objective evolutionary algorithms to perform a search for non-overlapping strong rules containing numerical or nominal attributes. Experimental results are very promising. We extended the techniques to find non-overlapping strong rules and this presents a unique rule induction mechanism; coupled with an emphasis on scalability and efficiency the project has delivered new state-of-the-art data mining algorithms. Adaptation to cope with missing or uncertain data, and with complex data will provide an invaluable tool for medical data mining, where data often presents those characteristics. Key Advances and Supporting Methodology In this section, the achievements of the project are reviewed according to the original aims and objectives. The project aim was to develop scalable techniques for nugget discovery using Multi-Objective Metaheuristic algorithms (MOMH) in complex data, and to undertake a number of experiments using benchmark databases from the UCI repository to prove the effectiveness of the new techniques. In order to test the extensions of the algorithm for large and complex data, the techniques were to be applied to some real medical case studies. The work would build upon some of the preliminary work already undertaken but would seek to enhance and extend the techniques in a number of ways.

The results [ ] showed the potential of MOMH for nugget discovery as for every benchmark database tested. such as novelty. As part of the project.II. In order to present more interesting rules to the user and to aid the rule evaluation task. the set of rules obtained was close to or coincided with the Pareto Optimal set of rules (upper confidence/coverage border).g. Using our method. SPEA2. a multi-objective local search. the GRASP algorithm brings additional efficiency to the search for partial classification rules. Our second objective was to explore concepts related to the interest of the rule. the all-rules (ARA) algorithm. with our objective of scalability and efficiency in mind. Our algorithm represents one of the first attempts to employ MOMH in data mining and particularly for classification. and this is in fulfillment of another of our objectives: to enhance the scalability of the algorithm produced. a well known MOMH algorithm. and some rules in this set were similar to one another. a variant of a hierarchical clustering algorithm known as AGNES was also explored and resulted in different clustering solutions. While the results obtained by NSGA II were encouraging. using either linear combinations of the objectives or additional constraints. Furthermore. the work was developed in two directions. allowing for the very effective construction of a set of solutions that form the starting point for the local search phase of the GRASP. and generally only through repeated optimization of single objective problems. PAM and CLARANS. The approach we developed takes advantage of some specific characteristics of the data mining problem being solved. simple strong novel rules that are interesting within the set as well as individually). and extend the algorithm to find sets of interesting rules (e. We first explored the concept of clustering similar rules to allow for summarisation of rule sets into clusters with representative rules. but it has very seldom been used in a MO setting. the set of rules returned was large. in order to determine which approach would be most effective. we therefore implemented a second multi-objective genetic algorithm. we evaluated a number of clustering algorithms on this task including Partition Based Clustering Algorithms such as k-medoids. However. In addition. as the engine for nugget discovery. The resulting algorithm is guided solely by the concepts of dominance and Pareto2 optimality. those of high confidence and coverage) in large databases by using multi-objective optimisation techniques. We therefore succeeded in producing an efficient and scalable MOMH algorithm for partial classification in fulfilment of our first and sixth objectives. The results were reported in journal publication [ ] and they should have impact in the way in which sets of rules from a number of algorithms including association rule algorithms. and two forms of multi-objective genetic local search for the task of nugget discovery. clusters of rules can be presented to the . All clustering algorithms were applied to the clustering of rules obtained from both the all-rules and MOMH for 3 partial classification. In addition to partition-based algorithms. This makes GRASP suitable for the exploration of larger databases. surprise. a modification to CLARANS was made that allows it to perform as well as PAM in long runs and significantly outperform all the partition-based algorithms in short runs. Our preliminary algorithm used NSGA. simplicity. A preliminary MOMH nugget discovery algorithm was created and its performance was compared on a number of benchmark databases against another algorithm for nugget 1 discovery. Our algorithm has been accepted for journal publication [ ].The first objective was to develop efficient algorithms for finding strong classification rules (e. we also developed our own multi-objective algorithm based on GRASP (Greedy Randomized Adaptive Search Procedure). As the clustering of rules is a new area of research. The experimental results for our partial classification GRASP and other MO meta-heuristics have shown that MOMH are generally very well suited to nugget discovery and furthermore. it was felt that a range of multi-objective meta-heuristics should be applied to this problem. This was an important development in the context of multi-objective optimization as GRASP has been applied to a number of problems in combinatorial optimization. so the work is very innovative and is already receiving citations by other authors working in the field. etc. PAES. are presented to the user.g. The second part of the first objective was to evaluate a number of different approaches to Multiobjective optimisation and their application to this particular data mining task.

user by. we studied the novelty of the rules in relation to other rules within the set. If rule q provides enough novelty with respect to rule r. ETs are different to Decision Trees. we decided to investigate the creation of a predictive classifier. specifically by using Expression Trees (ETs). easy to understand and provided useful partial descriptions of the class of interest. medoid rules for each cluster. Additionally. hence rules within a particular cluster present little diversity in terms of support sets. We created a modification of NSGA-II by introducing the concept of rule dissimilarity in the crowding measure. Other authors have attempted to produce predictive classifiers from simple association rules using a MO algorithm to optimize rule set complexity and error rate. Next. The results [ ] illustrated that the approach can produce a reasonable classification on two class problems. The modified algorithm and results were presented in the third international conference on 4 Evolutionary Multi-Criterion Optimisation (EMO 2005) [ ]. The amount of novelty required depends upon the relative quality of the two rules according to the two prime objectives: coverage and confidence. The resulting algorithm was tested on a number of 6 benchmark databases. On the other hand. The contribution of our modified dominance relation was not only to our algorithm for partial classification but also to the field of Multi-Objective optimisation where diverse solutions are often sought and a similar approach can be taken. This has allowed us to increase the diversity of rules in some areas of the Pareto front in terms of support sets. Expression trees contain Attribute Tests (ATs) as leafs while internal nodes contain a Boolean operator. When rule q would normally be dominated by rule r. whereas any record not described by the ET belongs to the negative class. creating therefore a multi-objective problem scenario. the simple rule representation of partial classification was insufficient to produce individual rules with both high confidence and coverage. The ET can be used for binary classification problems as any record described by the ET belongs to the positive class or class of interest. NSGA-II. The results were very well received because the presentation of results by clustering was applicable to many other problems in MultiCriterion Optimisation and represented a very interesting development in this area. The modified dominance relations were designed to encouraging diversity and novelty. If a cluster of rules appears to contain interesting rules. for example. The results [5] presented in the 2006 IEEE International Joint Conference on Neural Networks received a price as best paper for the session. As a different direction on our research. it was difficult to see how they could be combined to provide a description for the whole database. In previous research. it is permitted to remain non-dominated.g. competitive with other classification algorithms in terms of . ETs were evaluated according to their error rate or misclassification cost and to their complexity. The combination of ATs in the ET represents a rule in Disjunctive Normal Form which forms a complete description of the class of interest. We modified the dominance relation as follows. The clustering of rules extracted with the MOMH partial classification algorithms showed that rules that are close in objective space are also similar in terms of their support sets (they are supported by overlapping sets of records). As this approach has a number of disadvantages we choose instead to produce a more expressive rule representation. This was because although individual rules produce with our MOMH were simple. parallel coordinate techniques) in a user interface for easy evaluation of large sets of rules. A MOMH algorithm. Experiments showed that the new modified dominance relation encourage the production of rules that were previously undiscovered and provided additional information to the user. The creation of diverse rules was the subject of further research efforts by the use of modified dominance relations in the MOMH algorithm. we attempted to improve the quality of our rule sets by modifying the original algorithm to improve the diversity of rules within the set. To this aim. was implemented to search for the best ET given the optimization objectives. Decision Trees contain internal nodes that define partitions of the data and leaf nodes that indicate class membership. the difference between the two rules is considered. the cluster can then be expanded to allow the user to examine individual rules. novelty was measured in terms of the rules’ support sets. This hierarchical presentation of rules could be combined with visualization techniques (e. In this research we adapted the methods to also consider apparent/syntactic novelty.

Autumn 2007. although successful. Alan Reynolds. G. 99106. 2008). Reynolds. de la Iglesia. 3. pp.43-48. Proceedings of the 2007 IEEE Symposium on Computational. 169:3. No. The set of rules produced represent a complete description of the class of interest. de la Iglesia. This work can have a major impact in the multi-objective meta-heuristic community were loss of diversity is often a problem. Lecture Notes in Computer Science. The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification. This loss of diversity refers to the range of genetic material in the population that may be usefully combined to create new solutions. de la Iglesia. ISSN: 1465-4091. Reynolds and B de la Iglesia. P. Reynolds. pp. EJORS. 516-530. 2006. Matsushima. Japan. J. Quite often they represent a more concise rule set that the rule set obtained by a Decision Tree. Reynolds and B. The overall approach is also flexible. and B. B. Pages 826 – 840. The most obvious is that the user can be provided with a range of models with different trade-offs between rule complexity and misclassification costs. 2005. instead of diversity on the solutions on the Pareto front. This new algorithm was a deliverable additional to the original objectives of the project. 5:4. Volume 3410. de la Iglesia. It is however a very important and innovative development as it allows us to produce sets of rules using MOMH algorithms. This allows the client to select a set of rules that is accurate enough while also being comprehensible. IJCNN 2006. Expert Update. Developments on a Multi-objective Metaheuristic (MOMH) Algorithm for Finding Interesting Sets of Classification Rules.. The PI was an invited speaker at the third UK KDD 8 Symposium [ ] to talk about application of Multi-objective algorithms in data mining. [ ] B. 4 [ ] Beatriz de la Iglesia. The project has been very successful in launching multi-objective algorithms as a platform for data mining algorithm development. identified a problem with the algorithm as a major loss of population diversity was observed early in the search process. 2 [ ] A. Richards. 2007. Richards. P. 2006. 7 [ ] Reynolds. Managing Population Diversity Through the Use of Weighted Objectives. Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms. Journal of Mathematical Modelling and Algorithms. the algorithm provides additional benefits. A Multi-Objective GRASP for Partial Classification. pp 475-504 . 2007.misclassification cost. pp 6375-6382. Philpott and V. B. LNCS 4403. However. Proceedings of the 4th International Evolutionary Multi-Criterion Optimization Conference (EMO 2007). 2006. Proceedings of the Third UK Knowledge Discovery and Data Mining Symposium (Invited Talk). Expert Update. de la Iglesia. 6 [ ] A. Rayward-Smith. We used of both linear combinations of objectives and modified dominance relations to control population diversity. 1 . Rules may be presented to the user in a number of ways and the measure of rule complexity can easily be adapted to match the method of rule presentation and the client’s concept of rule comprehensibility. producing higher quality results in shorter run times 7. Vol. (to appear.P.P. 9. Further experiments are underway to submit some of the work for Journal publication. Different measures of misclassification cost can easily be used. Vic J Rayward-Smith. M. S. G. Application of Multi_objective Metaheuristic Algorithms in Data Mining. pp 898-917. A. There is no restriction on the data types of the fields of the dataset and no need to discretise numeric fields as required by other algorithms. and de la Iglesia. Soft Computing Journal. V. 5 [ ] A. J Rayward-Smith. Rule Induction for Classification Using Multi-Objective Genetic Programming. 8 [ ] B. The preliminary work on ET. Rule Induction Using Multi-Objective Metaheuristics: Encouraging Rule Diversity (Winner of Best Session). and are still relatively simple and understandable. 3 [ ] A.