You are on page 1of 43

The key in business is to know something that nobody else knows.

Aristotle Onassis

To understand is to perceive patterns. Sir Isaiah Berlin

Why Data Mining


From a managerial perspective:

Analyzing trends Wealth generation

Security Strategic decision making

What is Data Mining


What is data mining? Generally, data mining (sometimes called data or
knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data processing Technique Process Opportunity Recommendations

key properties of data mining

Automatic discovery of patterns Prediction of likely outcomes Creation of actionable information Focus on large data sets and databases

Data Mining Process

What can data mining do


Data mining can recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed.

Data mining can discover patterns and relationships in the data in order to help make better business decisions. Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty.
Market segmentation - Identify the common characteristics of customers who buy the same products from your company. Customer churn - Predict which customers are likely to leave your company and go to a competitor. Fraud detection - Identify which transactions are most likely to be fraudulent. Direct marketing - Identify which prospects should be included in a mailing list to obtain the highest response rate. Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing. Market basket analysis - Understand what products or services are commonly purchased together; e.g., beer and diapers. Trend analysis - Reveal the difference between a typical customer this month and last.

How does data mining work

Data mining consists of five major elements:


Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and/or information technology professionals. Analyze the data using application software. Present the data in a readable format.

Stages of Data mining process:

Data gathering Data cleaning Feature extraction Pattern extraction and discovery Visualization of data Evaluation of result

Data Mining Techniques Data Mining

Predictive

Descriptive

Classification

Regression

Clustering

Association

Sequential analysis

Data Mining: A KDD Process


Pattern Evaluation

Data mining: the core of knowledge discovery process.

Data Mining

Task-relevant Data
Data Selection Data Preprocessing

Data Warehouse
Data Cleaning Data Integration

Databases

Steps of a KDD Process


Learning the application domain:
relevant prior knowledge and goals of application

Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation:
Find useful features, dimensionality/variable reduction, invariant representation.

Choosing functions of data mining


summarization, classification, regression, association, clustering.

Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

Data Mining and Business Intelligence


Increasing potential to support business decisions

Making Decisions
Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Analysis, Querying and Reporting

End User

Business Analyst Data Analyst

Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Providers, Database Systems, OLTP

DBA

Data Mining - Different levels of analysis

Artificial neural networks Genetic algorithms Decision trees Nearest neighbor method Rule induction Data visualization

What is Data Warehouse??

Inmonss definition
A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of managements decision making process.

Subject-oriented
Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.

Integration
Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency.
RDBMS

Legacy System

Data Warehouse

Flat File

Data Processing Data Transformation

Integration
In terms of data.
encoding structures.
Measurement of attributes. physical attribute. of data
remarks

naming conventions.
Data type format

Time-variant
Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time

Nonvolatile
Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data

load

acces s

Data Warehouse Schema


Star Schema Fact Constellation Schema Snowflake Schema

Star Schema
A single,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.

Star Schema (contd..)


Store Dimension Fact Table Time Dimension

Store Key Store Name


City State Region

Store Key
Product Key Period Key Units

Period Key
Year Quarter Month

Price

Product Key Product Desc Product Dimension

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.

SnowFlake Schema
Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables

SnowFlake Schema (contd..)


Store Dimension Store Key Store Name City Key City Dimension City Key City State Product Key Product Desc Product Dimension Fact Table Time Dimension

Store Key
Product Key Period Key Units

Period Key
Year Quarter Month

Price

Region

Drawbacks: Time consuming joins,report generation slow

Fact Constellation
Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.

Fact Constellation (contd..)


Sales Fact Table
Store Key Product Key Period Key Units Price Store Dimension Store Key Store Name City State Region Product Dimension Product Key Product Desc Shipping Fact Table Shipper Key Store Key Product Key Period Key Units Price

Data Warehousing includes


Build Data Warehouse Online analysis processing(OLAP). Presentation.
Cleaning ,Selection & Integration RDBMS Presentation

Flat File

Warehouse & OLAP server

Client

Building Data Warehouse


Data Selection Data Preprocessing
Fill missing values Remove inconsistency

Data Transformation & Integration Data Loading


Data in warehouse is stored in form of fact tables and dimension tables.

Sales Data Warehouse Model


Fact Table

City Mumbai Mumbai

Product Cheese Cheese Swiss Rolls

Month January January February

Units 3 4

Rupees 7.95 7.32

Wheat Bread January

Pune
Pune Mumbai

Wheat Bread January

3
4 16

7.95
7.32 42.40

Sales Data Warehouse Model


Product Dimension Tables

Prod_ID 589 590 288

Product_Name Wheat Bread White Bread Coconut Cookies

Product_Category_ID 1 1 2

Product_Category_Id Product_Category 1 2 Bread Cookies

Online Analysis Processing(OLAP)


It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
Product Data Warehouse

Time

OLAP Cube
City All Mumbai Mumbai Mumbai Mumbai Mumbai Product All All White Bread Time All All All Units 113 64 38 13 3 3 Dollars 251.26 146.07 98.49 32.24 7.44 7.44

Wheat Bread All Wheat Bread Qtr1 Wheat Bread March

OLAP Operations
Drill Down Product Category e.g Electrical Appliance

Sub Category e.g Kitchen


Product e.g Toaster

Time

OLAP Operations
Drill Up Product Category e.g Electrical Appliance

Sub Category e.g Kitchen


Product e.g Toaster

Time

OLAP Operations
Slice and Dice Product Product=Toaster

Time

Time

OLAP Operations
Pivot Product Product

Time

Region

OLAP Server
An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure. OLAP server available are
MOLAP server ROLAP server HOLAP server

Presentation
Product

Reporting Tool

Report Time

Need for Data Warehousing


Industry has huge amount of operational data Knowledge worker wants to turn this data into useful information. This information is used by them to support strategic decision making. It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions.

Need for Data Warehousing (contd..)


From business perspective -it is latest marketing weapon -helps to keep customers by learning more about their needs . -valuable tool in todays competitive fast evolving world.

Data Warehousing Tools


Data Warehouse SQL Server 2000 DTS Oracle 8i Warehouse Builder OLAP tools SQL Server Analysis Services Oracle Express Server Reporting tools MS Excel Pivot Chart VB Applications

References
Building Data Warehouse by Inmon Data Mining:Concepts and Techniques by Han,Kamber. www.dwinfocenter.org www.datawarehousingonline.com www.billinmon.com

Thank You

You might also like