Professional Documents
Culture Documents
yet ...
We can’t find the data we need
data is scattered over the network
Branch_key
Branch Location_key Location
branch_key location_key
Unit_sold street
branch_name
branch_type Euros_sold city
province_or_street
Avg_sales country
Measures
Example of Snowflake Schema
Supplier
Time
supplier_key
time_key
Item supplier_type
day Sales Fact Table
day_of_the_week item_key
month item_name
quarter Time_key brand
year type
Item_key
supplier_key
Branch_key City
Branch Location_key city_key
branch_key city
Unit_sold province_or_street
branch_name Location country
branch_type Euros_sold
location_key
Avg_sales street
city_key
Measures
Example of Fact Constellation
Shipping Fact Table
Time Time_key
time_key Item Item_key
day Sales Fact Table item_key
shipper_key
day_of_the_week item_name
month Time_key from_location
brand
quarter Item_key type
year to_location
supplier_key
Branch_key Euros_sold
Branch Location_key
unit_shipped
branch_key Location
branch_name Unit_sold
location_key
branch_type Euros_sold street
shipper
Avg_sales city
Measures Province/street shipper_key
country shipper_name
location_key
shipper_type
A Sample Data Cube
Total annual sales
Date of TV in Ireland
1Qtr 2Qtr 3Qtr 4Qtr sum
t
uc
TV
od
PC Ireland
Pr
VCR
Country
sum
France
Germany
sum
Typical OLAP Operations
Data Warehouse
Engine Analyze
Purchased Query
Data
Legacy
Data Metadata Repository
16
Data Warehouse Architecture
Data Extraction - Data Extraction involves gathering the data from multiple
heterogeneous sources.
Data Cleaning - Data Cleaning involves finding and correcting the errors in
data.
Enterprise warehouse
collects all of the information about subjects spanning the entire
organization
Data Mart
a subset of corporate-wide data that is of value to a specific groups
of users. Its scope is confined to specific, selected groups, such as
marketing data mart
Introduction to Data
Mining
What Motivated Data Mining?
21
Why Data Mining?—Potential
Applications
Data analysis and decision support
Market analysis and management
Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
Risk analysis and management
Forecasting, customer retention, quality control, competitive
analysis
Fraud detection and detection of unusual patterns (outliers)
22
Integration of Multiple
Technologies
Machine Artificial
Learning Intelligence
Database
Management Statistics
Algorithms Visualization
Data
Mining
23
What Can Data Mining Do?
Cluster
Classify
Categorical, Regression
Summarize
Summary statistics, Summary rules
Link Analysis / Model Dependencies
Association rules
Detect Deviations
24
Clustering
Find groups of “Group people with
similar data similar travel profiles”
items George, Patricia
Jeff, Evelyn, Chris
Rob
Clusters
25
Classification
Find ways to separate
data items into pre-
defined groups
A bank loan officer wants
to analyse the data in
order to know which
customer (loan applicant)
are risky or which are Training Data
safe. tool produces
Groups
classifier
26
Association Rules
Identify dependencies in the “Find groups of items
data: commonly purchased
X makes Y likely together”
Indicate significance of each People who purchase X
dependency are likely to purchase Y
27
Deviation Detection
Find unexpected “Find unusual
values, occurrences in stock
Uses: prices”
Failure analysis
Anomaly discovery for
analysis
28
Knowledge Discovery (KDD) Process
Proces
Pattern Evaluation
Data mining—core of
knowledge discovery
process Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
Knowledge Process
1. Data cleaning – to remove noise and inconsistent
data
2. Data integration – to combine multiple source
3. Data selection – to retrieve relevant data for analysis
4. Data transformation – to transform data into
appropriate form for data mining
5. Data mining
6. Evaluation
7. Knowledge presentation
Knowledge Process