Professional Documents
Culture Documents
Chapter 1 DM
Chapter 1 DM
Overview
Outline
• Brief description of data mining
• Data warehousing, data mining and database
technology
• Online Transaction processing and Online
Analytical Processing
Database
• Collection of records.
• Eg: data collected, maintained and used in airline
reservation.
• Personal address book in word document.
• Database is a model of structure of reality.
• Database supports queries and updates modeling
processes of reality. ie )The use of database reflects
the processes of reality.
• For eg: A bank database.(Customer transaction either
credit or debit are updated in the database).
Data Warehouse
• Refers to a database that is maintained separately
from an operational database.
• That is a dedicated database system and mainly
used for decision making.
• It covers much longer time horizon than
transactional system.
• Collects data from multiple databases that have
been processed uniformly (clean data).
• Eg: university warehouse(contains student database,
staff database)
Data Mining
• Data mining refers to extracting or mining
knowledge from large amounts of data (data
warehouse).
• Mining of gold from rocks or sand
• Eg: From University warehouse we can extract
information about staff salary over the
particular period of time and also we can
predict how much will be for next year.
Difference between operational DB and
Datawarehouses
Operational DB Datawarehouse
The major task is to perform online The major task is to perform data analysis
transaction and query processing. These and decision making. These systems are
systems are called Online Transaction known as Online analytical
Processing. Processing(OLAP)
Customer (user)oriented and is used for Market (system) oriented and is used for
query processing and transaction by data analysis by knowledge workers
clients and IT Professionals. including managers, executives and
analysts.
OLTP manages current data It manages large amount of historical data
An OLTP system adopts an ER model and An OLAP adopts either a star or snow
application oriented DB design. flake model and subject oriented
database design.
Architecture: Typical Data Mining System
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data Warehouse
Server
Pattern Evaluation
– Data mining: the core of
knowledge discovery process.
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
Steps of a KDD Process
1. Data cleaning(to remove noise –fill the missing data)
2. Data integration(combine multiple data sources).
3. Data selection(where the relevant data to the analysis task are
retrieved from the database).
4. Data transformation(transformed to appropriate form for mining)
5. Data mining(is the process where intelligent techniques are
applied in order to extract interesting pattern).
6. Pattern evaluation(to identify ,it is truly interesting pattern with
the help of measures.)
7. Knowledge presentation(Visualization and knowledge
representation that are useful to present mining knowledge to
the user).
Data Mining Taxonomy
DATAMINING
Database
Technology Statistics
Machine Visualization
Data Mining
Learning
Pattern
Recognition Other
Algorithm Disciplines
Steps in Data Mining Process
• problem definition
• data collection and enhancement
• modeling strategies
• training, validation, and testing of models
• analyzing results
• modeling iterations
• implementing results.
Data Collection and Enhancement
• Define Data Sources
• Join and Denormalize Data
• Enrich data(add some data)
• Transform data(some aggreagtion etc).
Modeling strategies
• Data mining strategies fall into two broad
categories: supervised learning and
unsupervised learning.
Training, Validation, and Testing of Models