You are on page 1of 15

Data Mining: Concepts and Techniques By Jiawei Han, Micheline Kamber, Jian Pei ing&ots=txFzYUpy-Z&sig=gnsQYkKmr6XHZu0m9eZWkQUWsw&redir_esc=y#v=onepage&q=data%20mining&f=false …………………. From Data Mining to Knowledge Discovery in Databases (From Data Mining to 01) Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth …………… weka …… ACM SIGKDD Explorations Newsletter Volume 11 Issue 1, June 2009 SPECIAL ISSUE: Open source analytics Open source analytics: an introduction to the special issue Robert L. Grossman Pages: 3-4 doi>10.1145/1656274.1656276 Full text: Pdf

This special issue contains six articles on open source analytics. It includes an article describing the Weka data mining system, two articles on infrastructure to support analytics, an article on the PMML standard for statistical and data mining models, an article on how clouds are being used in analytics, and an article about an open source tool for cleaning data.

P. Taiwan Univ. model..S.1109/69. Hand. of Electr. The growing interest in data mining is motivated by a common problem across disciplines: how does one store. Eng.cfm?id=1656278 …………… Data mining: an overview from a database perspective Ming-Syan Chen.http://dl. Jiawei Date of Current Version: 06 agosto 2002 Sponsored by: IEEE Computer Society …………………. Principles of data mining By D. and ultimately describe and understand very large data sets? Historically. IEEE Transactions on Issue Date: Dec 1996 Volume: 8 Issue:6 On page(s): 866 .google.acm. different aspects of data mining have been addressed independently by different . Heikki Mannila. J. Nat. Padhraic Smyth http://books. Dept.883 ISSN: 1041-4347 References Cited: 87 Cited by : 211 INSPEC Accession Number: 5476684 Digital Object Identifier: 10. Taipei This paper appears in: Knowledge and Data Engineering.

provides a straightforward introduction to basic machine learning and data mining methods. and local "memory-based" models. belief networks. ………… http://dl. New York. covering the analysis of numerical. nonlinear models such as neural networks. data mining algorithms. association rules. and statistics. classical statistical models. foundations. Methods and Applications Editors: Ryszard S. The presentation emphasizes intuition rather than rigor.acm. USA ©1998 ISBN:0471971995 From the Publisher: Master the new computational tools to get the most out of your information system.disciplines. This is the first truly interdisciplinary text on data mining. blending the contributions of information science. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. provides a tutorial overview of the principles underlying data mining algorithms and their application. The algorithms covered include trees and rules for classification and regression.cfm?id=231007 Data mining with neural networks: solving business problems from application development to decision support . ………………………….cfm?id=551917 Machine Learning and Data Mining. Michalski Ivan Bratko Avan Bratko Machine Learning and Data Mining. This practical guide. The second section. and data preprocessing. the first to clearly outline the situation for the benefit of engineers and scientists. shows how algorithms are constructed to solve specific problems in a principled manner. Inc. Topics include the role of metadata. and sound data. how to handle missing data. computer NY. The first. Methods and Applications John Wiley & Sons.acm. The book consists of three sections. http://dl.

Rapid prototyping is an approach which allows crucial design decisions as early as possible. http://dl. While a large number of methods has been established for numerous problems. New tasks emerge requiring the development of new methods or processing schemes. NJ. Like in software development. specification. implementation. Bigus IBM. the development of such solutions demands for careful analysis. a free open-source environment forKDD and machine learning. NY.This paper describes Yale. USA ©1996 ISBN:0-07-005779-6 ………………………. and testing.Author: Joseph P.cfm?id=1150531 YALE: rapid prototyping for complex data mining tasks Authors: Ingo Mierswa University of Dortmund Michael Wurst University of Dortmund Ralf Klinkenberg University of Dortmund Martin Scholz University of Dortmund Timm Euler University of Dortmund 2006 Article Published in: · Proceeding KDD '06 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining ACM New York. as well as simple and quick integration of new Inc.1150531 KDD is a complex and demanding task. USA ©2006 ISBN:1-59593-339-5 doi>10. MN Publication: Book Data mining with neural networks: solving business problems from application development to decision support McGraw-Hill. many challenges remain to be solved.acm. Hightstown. A rapid prototyping system should support maximal re-use and innovative combinations of existing methods. Rochester.1145/1150402. Yale provides a rich variety of .

text mining. Len Trigg2. These case studies cover tasks like feature ensemble methods and distributed data mining.*. Hamilton. the underlying XML representation enables automated applications after the prototyping phase. Revision received February 3. we counted more than Private Bag 3105. New Zealand and 2Reel Two. Additionally. Geoffrey Holmes1 and Ian H. While the graphical user interface supports interactive design. 2004. New Zealand Contact: eibe@cs.oxfordjournals. 2003. Abstract .google. PO Box 1538.waikato.000 downloads during the last twelve months. Following the paradigm of visual programming eases the design of processing schemes. Mark Hall1. Witten1 + Author Affiliations 1Department of Computer Received December 3. Yale offers extensive functionality for process evaluation and optimization which is a crucial property for any KDD rapid prototyping ining&ots=k34QsQGf75&sig=LVc9cgC37HknysfO_0fxfyRCj3c&redir_esc=y#v=onepage&q=data%20 mining&f=false Data preparation for data mining. …………………. This variety of applications is also reflected in a broad user base.short Data mining in bioinformatics using Weka Eibe Frank1. we illustrate the advantages of rapid prototyping for KDD on case studies ranging from data preprocessing to result visualization. http://books. 2004.methods whichallows rapid prototyping for new applications and makes costlyre-implementations unnecessary. Hamilton.After a discussion of the key concepts of Yale. University of Waikato. Volume 1 By Dorian Pyle ………………………… data stream mining and tracking drifting concepts. Accepted February 26.

It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same Dept.1007/978-3-54030116-5_58 Orange: From Experimental Machine Learning to Interactive Data Mining Janez Demšar.pdf&sid=prbm30bojvkxyxlzzvouboif&sh=www. G..Summary: The Weka machine learning workbench provides a general-purpose environment for automatic classification. M..metapress. 537-539. Blaž Zupan.-Dec. http://ir. New Zealand This paper appears in: Knowledge and Data Engineering. of Comput. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004 Lecture Notes in Computer DOI: 10. Volume 3202/2004.springerlink.. Waikato Univ..1447 . Holmes. 2004. IEEE Transactions on Issue Date: Nov. clustering and feature selection—common data mining problems in bioinformatics research. Gregor Leban and Tomaz Curk ………………. 2003 Volume: 15 Issue:6 On page(s): 1437 .pdf DMQL: A Data Mining Query Language for Relational Databases …………… https://springerlink3. ………………. Benchmarking attribute selection techniques for discrete class data mining Hall. Sci. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it.iit. Hamilton..

redundant. in their attempts to construct models of data. and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Dept.2003. The success of many learning schemes. Setiono. IEEE Transactions on Issue Date: Dec 1996 Volume: 8 Issue:6 On page(s): 957 .1245283 Date of Current Version: 17 noviembre 2003 Sponsored by: IEEE Computer Society ABSTRACT Data engineering is generally considered to be a central issue in the development of data mining applications.961 . All the methods produce an attribute ranking. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. a useful devise for isolating the individual merit of an attribute. Effective data mining using neural networks Hongjun Lu. R. The inclusion of irrelevant.ISSN: 1041-4347 References Cited: 16 Cited by : 65 INSPEC Accession Number: 7959848 Digital Object Identifier: 10. Nat. …………. Results are reported for a selection of standard data sets and two diverse learning schemes C4. Univ. Sci.1109/TKDE.5 and naive Bayes. Huan Liu.. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. of Inf. hinges on the reliable identification of a small set of highly predictive attributes. of Singapore This paper appears in: Knowledge and Data Engineering.. Syst. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. & Comput. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes.

.sciencedirect.1109/69.olemiss. The paper presents an approach to discover symbolic classification rules using neural networks. Redundant connections of the network are then removed by a network pruning algorithm. DM01 http://150.553163 Date of Current Version: 06 agosto 2002 Sponsored by: IEEE Computer Society ABSTRACT Classification is one of the data mining problems receiving great attention recently in the database community. ………….bus. With the proposed approach._Fuzzy-Sets-Syst. and classification rules are generated using the result of this analysis. The activation values of the hidden units in the network are analyzed.pdf ng/Knowledge%20management%20and%20data%20mining%20for%20marketing. The network is first trained to achieve the required accuracy rate.214. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of standard data mining test problems …………….154/gfs/pdf/2004_Ishibuchi-H.pdf DM02 Knowledge management and data mining for marketing Michael J.190.ISSN: 1041-4347 References Cited: 10 Cited by : 42 INSPEC Accession Number: 5476692 Digital Object Identifier: 10. concise symbolic rules with high accuracy can be extracted from a neural network. Neural networks have not been thought suited for data mining because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by humans. Shaw Chandrasekar Subramaniam .

org/explorations/issues/1-1-1999-06/survey.cs. Sommerfield.1. Ross Quinlan · Joydeep Ghosh · Qiang Yang · Hiroshi Motoda · Geoffrey J.pdf DM03 A SURVEY OF DATA MINING AND KNOWLEDGE DISCOVERY SOFTWARE TOOL Michael Goebel Le Gruenwald ……………………….9003 Data mining using ℳℒ𝒞++ a machine learning library in C++ Kohavi.umd. ……………….D. D. J. Top 10 algorithms in data mining Xindong Wu · Vipin Kumar · J.27. Mountain View. CA. Hand · Dan Steinber …………….1. McLachlan · Angus Ng · Bing Liu · Philip S..sigkdd. R..Gek Woo Tan Michael E. DMMM larose Data Mining Methods and Models By Daniel T. Ph. USA (weka) . Larose. Welge ………………… http://www.. Syst.... 10Algorithms-08 http://www. Silicon Graphics Comput. Yu · Zhi-Hua Zhou · Michael Steinbach · David J. Dougherty.

interfaces to other programs.ufmg. …………………. 1996. ℳL𝒞++ not only provides a work-bench for such comparisons. NnGPcomparison http://www. Proceedings Eighth IEEE International Conference on Issue Date: 16-19 Nov.245 ISSN: 1082-3409 Print ISBN: 0-8186-7686-7 Cited by : 6 INSPEC Accession Number: 5437399 Digital Object Identifier: 10. We describe a system called ℳL𝒞++ which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. we focus on classification algorithms and review the need for multiple classification algorithms.cpdee. In this paper. and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. IEEE Transactions on Issue Date: Feb 2001 Volume: 5 Issue:1 . and visualization of the resulting classifiers. We discuss design issues.1996.560457 Date of Current Version: 06 agosto 2002 ABSTRACT Data mining algorithms including machine learning.pdf A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining Markus Brameier and Wolfgang Banzha This paper appears in: Evolutionary Computation. but also provides a library of C++ classes to aid in the development of new algorithms..This paper appears in: Tools with Artificial Intelligence. 1996 On page(s): 234 . statistical analysis. especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from

We conclude by listing several open research problems in data mining and knowledge discovery.. 41-55. 1997.910462 Date of Current Version: 07 agosto 2002 Sponsored by: IEEE Computational Intelligence Society ABSTRACT We introduce a new form of linear genetic programming (GP).1007/3-540-622225_35 Methods and problems in data mining Heikki Mannila Abstract Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. COMPUTER SCIENCE DATABASE THEORY — ICDT '97 Lecture Notes in Computer Science.1109/4235. and look at the use of sampling in data mining.On page(s): 17 . We also discuss possibilities for compiling data mining queries into algorithms. Volume 1186/1997.26 ISSN: 1089-778X References Cited: 29 Cited by : 70 INSPEC Accession Number: 6876106 Digital Object Identifier: 10. Acceleration of runtime is especially important when operating with complex data sets. We show how this technique can be used in various applications. Our results show that GP performs comparably in classification and generalization ……………. Two methods of acceleration of our GP approach are discussed: 1) an efficient algorithm that eliminates intron code and 2) a demetic approach to virtually parallelize the system on a single processor. concentrating on levelwise search for all frequently occurring patterns. . We compare GP performance on medical classification problems from a benchmark database with results obtained by neural networks. because they are occurring in realworld applications. We consider some methods used in data mining. DOI: 10.

DM07 http://www.iastate. ……………….buu.wiley. DM05 http://media.Part of this work was done while the author was visiting the Max Planck Institut für Informatik in Saarbrücken.cs.. DM06 Using Neural Networks for Data Mining Mark W Craven Jude W Shavli http://www.pdf …………….sciencedirect.pdf . Work supported by the Academy of Finland and by the Alexander von Humboldt …………….

……………… .