You are on page 1of 20

Chapter 1

Overview
Outline
• Brief description of data mining
• Data warehousing, data mining and database
technology
• Online Transaction processing and Online
Analytical Processing
Database
• Collection of records.
• Eg: data collected, maintained and used in airline
reservation.
• Personal address book in word document.
• Database is a model of structure of reality.
• Database supports queries and updates modeling
processes of reality. ie )The use of database reflects
the processes of reality.
• For eg: A bank database.(Customer transaction either
credit or debit are updated in the database).
Data Warehouse
• Refers to a database that is maintained separately
from an operational database.
• That is a dedicated database system and mainly
used for decision making.
• It covers much longer time horizon than
transactional system.
• Collects data from multiple databases that have
been processed uniformly (clean data).
• Eg: university warehouse(contains student database,
staff database)
Data Mining
• Data mining refers to extracting or mining
knowledge from large amounts of data (data
warehouse).
• Mining of gold from rocks or sand
• Eg: From University warehouse we can extract
information about staff salary over the
particular period of time and also we can
predict how much will be for next year.
Difference between operational DB and
Datawarehouses
Operational DB Datawarehouse
The major task is to perform online The major task is to perform data analysis
transaction and query processing. These and decision making. These systems are
systems are called Online Transaction known as Online analytical
Processing. Processing(OLAP)
Customer (user)oriented and is used for Market (system) oriented and is used for
query processing and transaction by data analysis by knowledge workers
clients and IT Professionals. including managers, executives and
analysts.
OLTP manages current data It manages large amount of historical data
An OLTP system adopts an ER model and An OLAP adopts either a star or snow
application oriented DB design. flake model and subject oriented
database design.
Architecture: Typical Data Mining System

Graphical User Interface

Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data Warehouse
Server

data cleaning, integration, and selection

Data World-Wide Other Info


Database Repositories
Warehouse Web
Data Mining: A KDD Process

Pattern Evaluation
– Data mining: the core of
knowledge discovery process.
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
Steps of a KDD Process
1. Data cleaning(to remove noise –fill the missing data)
2. Data integration(combine multiple data sources).
3. Data selection(where the relevant data to the analysis task are
retrieved from the database).
4. Data transformation(transformed to appropriate form for mining)
5. Data mining(is the process where intelligent techniques are
applied in order to extract interesting pattern).
6. Pattern evaluation(to identify ,it is truly interesting pattern with
the help of measures.)
7. Knowledge presentation(Visualization and knowledge
representation that are useful to present mining knowledge to
the user).
Data Mining Taxonomy
DATAMINING

Predictive Modeling Techniques Descriptive Modeling Techniques

Classification Regression Clustering Association


Predictive Modeling Techniques
• Predictive Modeling: predicts the value of a
particular attribute.
• Classification: is the model predicts the classes
contain only few values.
• Eg: A long distance customers likelihood of switching
to a competitor. ie)loyal Vs disloyal.
• Regression: is the model predicts a number from
wide range of possible values.
• Eg: The revenue of the customer will generate
during the next year.
Descriptive Modeling Techniques
• Clustering(Segmentation):lumps together similar
things into groups called clusters.
• helps to reduce the data complexity.
• Eg: to design a different marketing plan for each of
six targeted customer clusters.
• Association Model: involve determinations of
affinity-how frequently two or more things occur
together.
• Eg: most frequently used in retail, where it is called
Market Basket Analysis.
Data Mining: Confluence of Multiple Disciplines

Database
Technology Statistics

Machine Visualization
Data Mining
Learning

Pattern
Recognition Other
Algorithm Disciplines
Steps in Data Mining Process
• problem definition
• data collection and enhancement
• modeling strategies
• training, validation, and testing of models
• analyzing results
• modeling iterations
• implementing results.
Data Collection and Enhancement
• Define Data Sources
• Join and Denormalize Data
• Enrich data(add some data)
• Transform data(some aggreagtion etc).
Modeling strategies
• Data mining strategies fall into two broad
categories: supervised learning and
unsupervised learning.
Training, Validation, and Testing of Models

• Model development begins by partitioning


data sets into one set of data used to train a
model, another data set used to validate the
model, and a third used to test the trained and
validated model
Analyzing Results
• Model evaluation vary in supervised and
unsupervised learning.
• For classification problems, analysts typically
review gain, lift and profit charts, threshold charts,
confusion matrices, and statistics of fit for the
training and validation sets, or for the test set.
• Clustering models can be evaluated for overall
model performance or for the quality of certain
groupings of data.
Case Study: Public Sector Health Care
Industry
• Problem Definition:
Aim:-to analyze instances of fraud in the public
sector health care industry.
Objective:-The objective of the health care case
is to determine, through predictive modeling,
what attributes depict fraudulent claims.
Quiz
1) Discuss the difference betweenn operational
DB and Data warehouses Systems.

You might also like