CITS4243: Advanced Databases



Lecturer : Amitava Datta datta@csse.uwa.edu.au Room 1.07 Computer Science Building Office hours: Anytime, 10-11 am Monday 

Unit web page: http://undergraduate.csse.uwa.edu.au/units/CITS42 43 

March 16, 2011 Data Mining: Concepts and Techniques

1

CITS4243:Assessments 


Project and/or paper: 50% Final Examination: 50%

The project/paper will consist of two parts: 


An analysis of a business scenario through an OLAP or data mining tool Writing a small paper that will explain clearly the material in a research paper that I will assign to you.

March 16, 2011

Data Mining: Concepts and Techniques

2

CITS4243:Textbook 

Data Mining: Concepts and Techniques, 2nd ed. Jiawei Han and Micheline Kamber The book is available from google books preview, but with missing pages. I will use lecture slides from this book. However, the publisher has given me permission to do so with the constraint that I can keep the slides only at a password protected site. I will not record the lectures.
Data Mining: Concepts and Techniques   

March 16, 2011

3

CITS4243: Advanced Databases 

Your password for accessing lecture notes will be: the first three digits of your student number, followed by your last name (surname), followed by the rest of your student number. The site for lecture slides will be ready by early next week (there is a link to this site from the unit page). I will keep both the original slides and my modified slides. All original slides are available for download from Jiawei Han s web page.
Data Mining: Concepts and Techniques  



March 16, 2011

4

CITS4243: Laboratories 

We have been allocated only one lab session and we have 80+ students. You have to do the labs on your own. We will use two open source software packages: Palo for OLAP Weka for data mining Both can be downloaded freely. Labs will start in the fourth or fifth week. The lab session will be used more as a question/answer session.
Data Mining: Concepts and Techniques 


1. 2. 



March 16, 2011

5

Chapter 1. 2011 Data Mining: Concepts and Techniques 6 . Introduction       Motivation: Why data mining? What is data mining? Data Mining: On what kind of data? Data mining functionality Classification of data mining systems Major issues in data mining March 16.

but starving for knowledge! Necessity is the mother of invention analysis of massive data sets Data mining Automated March 16. stocks. bioinformatics.Why Data Mining?  The Explosive Growth of Data: from terabytes to petabytes  Data collection and data availability  Automated data collection tools. scientific simulation. digital cameras. e-commerce. Web. 2011 Data Mining: Concepts and Techniques 7 . Society and everyone: news. Science: Remote sensing. database systems. computerized society  Major sources of abundant data   Business: Web. transactions. YouTube    We are drowning in data.

45(11): 50-54. empirical science 1600-1950s. Data mining is a major new challenge!  1950s-1990s. or physics. Comm. empirical. Over the last 50 years. It grew out of our inability to find closed-form solutions for complex mathematical models. query. Nov. computational branch (e. 2002 Data Mining: Concepts and Techniques March 16. and visualization tasks scale almost linearly with data volumes. acquisition. organization. management.Evolution of Sciences   Before 1600.) Computational Science traditionally meant simulation. or linguistics. theoretical. and computational ecology. theoretical science  Each discipline has grown a theoretical component. 2011 8 . data science      Jim Gray and Alex Szalay. computational science    1990-now. ACM. Theoretical models often motivate experiments and generalize our understanding.g. The World Wide Telescope: An Archetype for Online Science. most disciplines have grown a third. The flood of data from new scientific instruments and simulations The ability to economically store and manage petabytes of data online The Internet and computing Grid that makes all these archives universally accessible Scientific info.

and Web databases Stream data management and mining Data mining and its applications Web technology (XML. deductive. etc. engineering.) Application-oriented DBMS (spatial. scientific. relational DBMS implementation RDBMS. IMS and network DBMS Relational data model. data integration) and global information systems Data Mining: Concepts and Techniques  1970s:   1980s:    1990s:   2000s    March 16. OO. data warehousing. multimedia databases. database creation.) Data mining. 2011 9 . etc.Evolution of Database Technology  1960s:  Data collection. advanced data models (extended-relational.

data dredging. knowledge extraction. 2011 10 . etc. data/pattern analysis. Simple search and query processing (Deductive) expert systems Data Mining: Concepts and Techniques  Alternative names   Watch out: Is everything data mining ?   March 16. previously unknown and potentially useful) patterns or knowledge from huge amount of data  Data mining: a misnomer? Knowledge discovery (mining) in databases (KDD).What Is Data Mining?  Data mining (knowledge discovery from data)  Extraction of interesting (non-trivial. implicit. business intelligence. information harvesting. data archeology.

2011 Data Mining: Concepts and Techniques Selection 11 .Knowledge Discovery (KDD) Process  Data mining core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Data Warehouse Data Cleaning Data Integration Databases March 16.

Data Mining and Business Intelligence Increasing potential to support business decisions End User Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Business Analyst Data Analyst Data Exploration Statistical Summary. 2011 Data Mining: Concepts and Techniques DBA 12 . Scientific experiments. Web documents. Database Systems March 16. and Reporting Data Preprocessing/Integration. Querying. Data Warehouses Data Sources Paper. Files.

Data Mining: Confluence of Multiple Disciplines Database Technology Statistics Machine Learning Pattern Recognition Data Mining Visualization Algorithm Other Disciplines March 16. 2011 Data Mining: Concepts and Techniques 13 .

temporal data. sequence data Structure data. 2011 14 . scientific simulations  High-dimensionality of data   High complexity of data        New and sophisticated applications Data Mining: Concepts and Techniques March 16. social networks and multi-linked data Heterogeneous databases and legacy databases Spatial. spatiotemporal.Why Not Traditional Data Analysis?  Tremendous amount of data  Algorithms must be highly scalable to handle such as tera-bytes of data Micro-array may have tens of thousands of dimensions Data streams and sensor data Time-series data. multimedia. text and Web data Software programs. graphs.

transactional. data warehouse. association. Multiple/integrated functions and mining at multiple levels Database-oriented. banking. stock market analysis. etc. 2011 15 . time-series. discrimination. visualization.Multi-Dimensional View of Data Mining  Data to be mined  Relational. multi-media. text. active. etc. heterogeneous. spatial. bio-data mining. statistics. objectoriented/relational. Web mining. classification. clustering. fraud analysis. etc. data warehouse (OLAP). telecommunication. text mining. machine learning. stream. legacy. outlier analysis. Data Mining: Concepts and Techniques  Knowledge to be mined    Techniques utilized   Applications adapted  March 16. WWW Characterization. Retail. trend/deviation.

2011 Data Mining: Concepts and Techniques 16 .Data Mining: Classification Schemes  General functionality   Descriptive data mining Predictive data mining  Different views lead to different classifications     Data view: Kinds of data to be mined Knowledge view: Kinds of knowledge to be discovered Method view: Kinds of techniques utilized Application view: Kinds of applications adapted March 16.

data warehouse. graphs. 2011 17 .Data Mining: On What Kinds of Data?  Database-oriented data sets and applications  Relational database. sequence data (incl. transactional database  Advanced data sets and advanced applications          Data streams and sensor data Time-series data. social networks and multi-linked data Object-relational databases Heterogeneous databases and legacy databases Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web Data Mining: Concepts and Techniques March 16. bio-sequences) Structure data. temporal data.

g.5%.. association. or classify cars based on (gas mileage)  Predict some unknown or missing numerical values Data Mining: Concepts and Techniques March 16.Data Mining Functionalities  Multidimensional concept description: Characterization and discrimination  Generalize. e. causality   Classification and prediction  Construct models (functions) that describe and distinguish classes or concepts for future prediction  E. classify countries based on (climate).. summarize. and contrast data characteristics.g. correlation vs. wet regions Diaper Beer [0. 2011 18 . 75%] (Correlation or causality?)  Frequent patterns. dry vs.

. e.. cluster houses to find distribution patterns  Maximizing intra-class similarity & minimizing interclass similarity Outlier analysis  Outlier: Data object that does not comply with the general behavior of the data  Noise or exception? Useful in fraud detection.Data Mining Functionalities (2)     Cluster analysis  Class label is unknown: Group data to form new classes.g.g. digital camera large SD memory  Periodicity analysis  Similarity-based analysis Other pattern-directed or statistical analyses Data Mining: Concepts and Techniques March 16. rare events analysis Trend and evolution analysis  Trend and deviation: e.g.. regression analysis  Sequential pattern mining: e. 2011 19 .

and scalability Pattern evaluation: the interestingness problem Incorporation of background knowledge Handling noise and incomplete data Parallel. integrity. distributed and incremental mining methods Integration of the discovered knowledge with existing one: knowledge fusion Data mining query languages and ad-hoc mining Expression and visualization of data mining results Interactive mining of knowledge at multiple levels of abstraction Domain-specific data mining & invisible data mining Protection of data security. e. 2011 20 .g.Major Issues in Data Mining  Mining methodology  Mining different kinds of knowledge from diverse data types. effectiveness. Web Performance: efficiency.. and privacy Data Mining: Concepts and Techniques        User interaction     Applications and social impacts   March 16. bio. stream.

etc. 1991) Advances in Knowledge Discovery and Data Mining (U. Piatetsky-Shapiro. Frawley. G. and R. PKDD (1997). Fayyad. 1996)  1991-1994 Workshops on Knowledge Discovery in Databases   1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD 95-98)  Journal of Data Mining and Knowledge Discovery (1997)   ACM SIGKDD conferences since 1998 and SIGKDD Explorations More conferences on data mining  PAKDD (1997).  ACM Transactions on KDD starting in 2007 Data Mining: Concepts and Techniques March 16. Piatetsky-Shapiro and W. SIAM-Data Mining (2001).A Brief History of Data Mining Society  1989 IJCAI Workshop on Knowledge Discovery in Databases  Knowledge Discovery in Databases (G. (IEEE) ICDM (2001). Uthurusamy. P. 2011 21 . Smyth.

on KDD 22  Journals     March 16. On Knowledge and Data Eng. SIGIR ICML. CVPR.Conferences and Journals on Data Mining  KDD Conferences  ACM SIGKDD Int. (SDM)  (IEEE) Int. on Knowledge Discovery and Data Mining (PAKDD)  Other related conferences      ACM SIGMOD VLDB (IEEE) ICDE WWW. on Knowledge Discovery in Databases and Data Mining (KDD)  SIAM Data Mining Conf. on Data Mining (ICDM)  Conf. Conf. on Principles and practices of Knowledge Discovery and Data Mining (PKDD)  Pacific-Asia Conf. 2011 Data Mining: Concepts and Techniques . NIPS Data Mining and Knowledge Discovery (DAMI or DMKD) IEEE Trans. Conf. (TKDE) KDD Explorations ACM Trans.

Conferences: SIGIR. etc.. J. etc. NIPS. EDBT. etc. COLT (Learning Theory). etc. Journals: IEEE Trans. Conference proceedings: CHI. Journals: Annals of statistics. ICDT. etc. Data Mining: Concepts and Techniques  Database systems (SIGMOD: ACM SIGMOD Anthology CD ROM)    AI & Machine Learning    Web and IR    Statistics    Visualization   March 16. ACM-TODS/TOIS. Conferences: Joint Stat. 2011 23 . Artificial Intelligence. Meeting. IEEE-ICDE. DASFAA Journals: IEEE-TKDE. Journal: Data Mining and Knowledge Discovery. VLDB J. SIAM-DM. Google  Data mining and KDD (SIGKDD: CDROM)   Conferences: ACM-SIGKDD. JIIS. IEEE-PAMI. ACM TKDD Conferences: ACM-SIGMOD.Where to Find References? DBLP. PAKDD. etc. CiteSeer. CVPR. etc. etc. visualization and computer graphics. IEEE-ICDM. Knowledge and Information Systems. etc. ACM. Conferences: Machine learning (ML). Journals: WWW: Internet and Web Information Systems. PKDD. IJCAI. WWW. Journals: Machine Learning. Sys.. ACM-SIGGraph. VLDB. Info. CIKM. ACM-PODS. KDD Explorations. AAAI.

AAAI/MIT Press. Han and M. Smyth. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured Data. G. McGraw Hill. The Elements of Statistical Learning: Data Mining. 2001 J. Wiley-Interscience. Chakrabarti. M. Advances in Knowledge Discovery and Data Mining. 2003 U.. Uthurusamy. Wierse. Mitchell. Stork. 2nd ed. Pattern Classification. Piatetsky-Shapiro and W. 2011 Data Mining: Concepts and Techniques 24 . Tan. 2005              March 16. 2001 B.. 2nd ed. 2000 T. Introduction to Data Mining. P. G. and J. Springer-Verlag. O. Exploratory Data Mining and Data Cleaning. Duda.Recommended Reference Books  S. MIT Press. John Wiley & Sons. Predictive Data Mining. Piatetsky-Shapiro. H. Frank.-N. E. and Prediction. Springer 2006. H. Knowledge Discovery in Databases. and D. M. Inference. Morgan Kaufmann. Machine Learning. Hastie. Liu. 2001 T. 1991 P. Steinbach and V. Kumar. 1997 G. R. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Witten and E. Kamber. 2002 R. Hart. G. 2005 S. Friedman. Morgan Kaufmann. T. J. and P. Johnson. Data Mining: Concepts and Techniques. Fayyad. Tibshirani. Grinstein. 2ed. Morgan Kaufmann. Weiss and N. 1996 U. Hand. Frawley. Indurkhya. Wiley. 1998 I. M. and R. P. AAAI/MIT Press. M. J. Web Data Mining. Fayyad. 2006 D. Information Visualization in Data Mining and Knowledge Discovery. Mannila. Principles of Data Mining. Dasu and T. Smyth. Morgan Kaufmann. and A. Morgan Kaufmann.

with wide applications A KDD process includes data cleaning. 2011 25 . in great demand. Data mining systems and architectures Major issues in data mining Data Mining: Concepts and Techniques       March 16. data mining. etc. transformation. and knowledge presentation Mining can be performed in a variety of information repositories Data mining functionalities: characterization. clustering. data integration. pattern evaluation. data selection. classification. discrimination. association. outlier and trend analysis.Summary  Data mining: Discovering interesting patterns from large amounts of data A natural evolution of database technology.

customer relationship management (CRM). improved underwriting. 2011 26 .Why Data Mining? Potential Applications  Data analysis and decision support  Market analysis and management  Target marketing. documents) and Web mining Stream data mining Bioinformatics and bio-data analysis Data Mining: Concepts and Techniques  Other Applications    March 16. market basket analysis. competitive analysis  Risk analysis and management   Fraud detection and detection of unusual patterns (outliers) Text mining (news group. cross selling. quality control. email. market segmentation Forecasting. customer retention.

Each tuple in a relational table represents a object identified by a unique key and described by a set of attribute values.Relational Databases  A relational database is a collection of tables.   March 16. 2011 Data Mining: Concepts and Techniques 27 . Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). each of which is assigned a unique name.

An ER data model represents the database as a set of entities and their relationships. 2011 Data Mining: Concepts and Techniques 28 .Relational Databases  A semantic data model.  March 16. is often constructed for relational databases. such as the entityrelationship data model.

Similarly. annual income. The relation customer consists of a set of attributes. item.An example  The relational database of the AllElectronics company has four relational tables: customer. customer name. occupation. age. address. including a unique customer_ID.   March 16. 2011 Data Mining: Concepts and Techniques 29 . employee and branch. etc. the other three relations have their attributes.

selection and projection. A given query is transformed into a set of relational operations such as join.Relational Databases  Relational data can be accessed by database queries written in a relational language such as SQL. efficiency of update and integrity are the key requirements of a good relational database. Efficiency of retrieval. 2011 30 . and is then optimized for efficient processing. Data Mining: Concepts and Techniques   March 16.

Examples of queries  Show me a list of all items that were sold in the last quarter Show me the total sales of the last month. grouped by branch Which sales person has the highest amount of sales? How many sales transactions occurred in the month of September? Data Mining: Concepts and Techniques    March 16. 2011 31 .

Purpose of relational databases  The main purpose of a relational database is to store data correctly and retrieve data on demand. 2011 Data Mining: Concepts and Techniques 32 .   March 16. Relational databases are passive data repositories in the sense that a query only shows you what is stored in the database. but cannot tell you much about the meaning or trend of the data. This type of data processing is sometime called Online Transaction Processing (OLTP).

clustering and other data mining techniques. associations. machine learning. We will discuss data warehousing. 2011 33 .Aim of this unit  The aim of this unit is to discuss techniques that gives insight into data stored in a database. Data Mining: Concepts and Techniques    March 16. classifications and derivation of knowlwedge from data. We want to analyze trends. correlations. We will assume that the data come from an underlying database that (perhaps) uses relational database technology. Online Analytical Processing (OLAP).

etc. spending habits. 2011 34 . 1: Market Analysis and Management  Where does the data come from? Credit card transactions. Determine customer purchasing patterns over time   Cross-market analysis Find associations/co-relations between product sales. discount coupons.Ex. loyalty cards. plus (public) lifestyle studies Target marketing   Find clusters of model customers who share the same characteristics: interest. & predict based on such association Customer profiling What types of customers buy what products (clustering or classification) Customer requirement analysis     Identify the best products for different groups of customers Predict what factors will attract new customers Multidimensional summary reports Statistical summary information (data central tendency and variation) Data Mining: Concepts and Techniques  Provision of summary information   March 16. customer complaint calls. income level.

trend analysis.Ex. 2: Corporate Analysis & Risk Management  Finance planning and asset evaluation    cash flow analysis and prediction contingent claim analysis to evaluate assets cross-sectional and time series analysis (financial-ratio.)  Resource planning  summarize and compare the resources and spending  Competition    monitor competitors and market directions group customers into classes and a class-based pricing procedure set pricing strategy in a highly competitive market Data Mining: Concepts and Techniques March 16. etc. 2011 35 .

and ring of references Unnecessary or correlated screening tests Phone call model: destination of the call. 2011 Data Mining: Concepts and Techniques 36 . outlier analysis Applications: Health care. 3: Fraud Detection & Mining Unusual Patterns   Approaches: Clustering & model construction for frauds. telecomm. retail. ring of doctors. duration.   Money laundering: suspicious monetary transactions Medical insurance   Professional patients.Ex. Analyze patterns that deviate from an expected norm Analysts estimate that 38% of retail shrink is due to dishonest employees  Telecommunications: phone-call fraud   Retail industry   Anti-terrorism March 16. credit card service. time of day or week.

dimensionality/variable reduction. invariant representation summarization. regression. transformation. etc. clustering  Choosing functions of data mining     Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation  visualization. 2011 37 . removing redundant patterns. association. classification.  Use of discovered knowledge Data Mining: Concepts and Techniques March 16.KDD Process: Several Key Steps  Learning the application domain  relevant prior knowledge and goals of application    Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation  Find useful features.

g. etc. or validates some hypothesis that a user seeks to confirm  Objective vs. support.  March 16. unexpectedness. novelty. novel.. subjective interestingness measures  Objective: based on statistics and structures of patterns. potentially useful. e. e. valid on new or test data with some degree of certainty.. Subjective: based on user s belief in the data. confidence. 2011 Data Mining: Concepts and Techniques 38 .g.Are All the Discovered Patterns Interesting?  Data mining may generate thousands of patterns: Not all of them are interesting  Suggested approach: Human-centered. actionability. focused mining  Interestingness measures  A pattern is interesting if it is easily understood by humans. etc. query-based.

classification vs. 2011 39 . exhaustive search Association vs. clustering Can a data mining system find only the interesting patterns? Approaches     Search for only interesting patterns: An optimization problem   First general all the patterns and then filter out the uninteresting ones Generate only the interesting patterns mining query optimization Data Mining: Concepts and Techniques  March 16.Find All and Only Interesting Patterns?  Find all the interesting patterns: Completeness  Can a data mining system find all the interesting patterns? Do we need to find all of the interesting patterns? Heuristic vs.

Other Pattern Mining Issues  Precise patterns vs. non-constrained patterns   Why constraint-based mining? What are the possible kinds of constraints? How to push constraints into the mining process? Data Mining: Concepts and Techniques March 16. approximate patterns  Association and correlation mining: possible find sets of precise patterns   But approximate patterns can be more compact and sufficient How to find high quality approximate patterns?? How to derive efficient approximate pattern mining algorithms??  Gene sequence mining: approximate patterns are inherent   Constrained vs. 2011 40 .

Why Data Mining Query Language?  Automated vs. query-driven?  Finding all the patterns autonomously in a database? unrealistic because the patterns could be too many but uninteresting User directs what to be mined  Data mining should be an interactive process   Users must be provided with a set of primitives to be used to communicate with the data mining system Incorporating these primitives in a data mining query language     More flexible user interaction Foundation for design of graphical user interface Standardization of data mining industry and practice March 16. 2011 Data Mining: Concepts and Techniques 41 .

discrimination.Primitives that Define a Data Mining Task  Task-relevant data      Database or data warehouse name Database tables or data warehouse cubes Condition for data selection Relevant attributes or dimensions Data grouping criteria Characterization. outlier analysis. classification. clustering. association. 2011 42 . other data mining tasks  Type of knowledge to be mined     Background knowledge Pattern interestingness measurements Visualization/presentation of discovered patterns Data Mining: Concepts and Techniques March 16. prediction.

2011 43 . street < city < province_or_state < country E.au login-name < department < university < country  Set-grouping hierarchy   Operation-derived hierarchy   Rule-based hierarchy  low_profit_margin (X) <= price(X.Primitive 3: Background Knowledge   A typical kind of background knowledge: Concept hierarchies Schema hierarchy  E.g. P1) and cost (X. P2) and (P1 P2) < $50 Data Mining: Concepts and Techniques March 16.uwa.. {20-39} = young.g..edu. {40-59} = middle_aged email address: xyz01@csse.

P(A|B) = #(A and B)/ #(B). etc. noise threshold (description)  Novelty not previously known. classification reliability or accuracy..Primitive 4: Pattern Interestingness Measure  Simplicity e. (association) rule length.   Utility potential usefulness. (decision) tree size Certainty e.g. surprising (used to remove redundant rules) March 16. e.. discriminating weight.. certainty factor. support (association).g.g. 2011 Data Mining: Concepts and Techniques 44 . rule quality. confidence. rule strength.

crosstabs. slicing and dicing provide different perspectives to data  Different kinds of knowledge require different representation: association. rules. clustering. pivoting. 2011 Data Mining: Concepts and Techniques . tables. classification.g.  Concept hierarchy is also important  Discovered knowledge might be more understandable when represented at high level of abstraction  Interactive drill up/down. etc.Primitive 5: Presentation of Discovered Patterns  Different backgrounds/usages may require different forms of representation  E. 45 March 16. etc. pie/bar chart..

2011 Data Mining: Concepts and Techniques 46 . commercialization and wide acceptance    Design  DMQL is designed with the primitives described earlier March 16.DMQL A Data Mining Query Language  Motivation  A DMQL can provide the ability to support ad-hoc and interactive data mining By providing a standardized language like SQL   Hope to achieve a similar effect like that SQL has on relational database Foundation for system development and evolution Facilitate information exchange. technology transfer.

2011 Data Mining: Concepts and Techniques 47 .An Example Query in DMQL March 16.

dmg. C# Integrating DBMS. 2011 Data Mining: Concepts and Techniques 48 .Other Data Mining Languages & Standardization Efforts  Association rule language specifications    MSQL (Imielinski & Virmani 99) MineRule (Meo Psaila and Ceri 96) Query flocks based on Datalog syntax (Tsur et al 98)  OLEDB for DM (Microsoft 2000) and recently DMX (Microsoft SQLServer 2005)   Based on OLE. data warehouse and data mining  DMML (Data Mining Mark-up Language) by DMG (www.org)   Providing a platform and process structure for effective data mining Emphasizing on deploying data mining technology to solve business problems March 16. OLE DB for OLAP. OLE DB.

semi-tight-coupling. 2011 49 . Data warehouse systems coupling  No coupling. first clustering and then association Data Mining: Concepts and Techniques March 16. loose-coupling. DBMS. tight-coupling  On-line analytical mining data  integration of mining and OLAP technologies  Interactive mining multi-level knowledge  Necessity of mining knowledge and patterns at different levels of abstraction by drilling/rolling. etc. pivoting. slicing/dicing.  Integration of multiple mining functions  Characterized classification.Integration of Data Mining and Data Warehousing  Data mining systems.

histogram analysis. query processing methods. aggregation. precomputation of some stat functions  Tight coupling A uniform information processing environment  DM is smoothly integrated into a DB/DW system. 2011 50 . indexing. indexing.g. sorting.Coupling Data Mining with DB/DW Systems   No coupling flat file processing. Data Mining: Concepts and Techniques March 16. multiway join. mining query is optimized based on mining query. not recommended Loose coupling  Fetching data from DB/DW  Semi-tight coupling enhanced DM performance  Provide efficient implement a few data mining primitives in a DB/DW system.. etc. e.

Architecture: Typical Data Mining System Graphical User Interface Pattern Evaluation Data Mining Engine Database or Data Warehouse Server data cleaning. integration. 2011 Data World-Wide Other Info Repositories Warehouse Web Data Mining: Concepts and Techniques 51 . and selection Knowl edgeBase Database March 16.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.