Professional Documents
Culture Documents
MINING (SE-409)
Lecture-1
Introduction and Background
Huma Ayub
Software Engineering department
INTELLIGENCE
KNOWLEDGE
INFORMATION
DATA
Historical overview
1960
Master Files & Reports
1965
Lots of Master files!
1970
Direct Access Memory & DBMS
1975
Online high performance transaction processing
Historical overview
1980
PCs and 4GL Technology (MIS/DSS)
1985 & 1990
Extract programs, extract processing,
The legacy system’s web
Historical overview: Crisis of Credibility
What is the financial health of our company?
??
-10%
+10%
Why a Data Warehouse (DWH)?
• Data recording and storage is growing.
– A Few Examples
• WalMart: 24 TB
• France Telecom: ~ 100 TB
• CERN: Up to 20 PB by 2006
• Stanford Linear Accelerator Center (SLAC):
500TB
Caution!
A Warehouse of Data
is NOT a
Data Warehouse
Caution!
Size
is NOT
Everything
Reason-2: Why a Data Warehouse?
– What happened?
– Why it happened? Stages of
– What will happen? Data
Warehouse
– What is happening?
– What do you want to happen?
What is a Data Warehouse?
Ad-Hoc access
– Dose not have a certain access pattern.
– Queries not known in advance.
– Difficult to write SQL in advance.
Knowledge workers
– Typically NOT IT literate (Executives, Analysts, Managers).
– NOT clerical workers.
– Decision makers.
Another View of a DWH
Subject
Oriented
Integrated
Time
Variant
Non
Volatile
What is a Data Warehouse ?
Answers result
User requests
in more questions
IT people
?
Business user
may get answers
IT people do
system analysis
and design
IT people
send reports to IT people
business user create reports
How is it Different?
• Different patterns of hardware utilization
100%
0%
Operational DWH
Customer retention/holding.
How much history?
• Depends on:
– Industry.
– Cost of storing historical data.
– Economic value of historical data.
How much history?
• Industries and history
– Telecomm calls are much much more as compared to bank
transactions- 18 months.
Data Warehouse a
complete repository of data?
How is it Different?
• Usually (but not always) periodic or batch
updates rather than real-time.