Professional Documents
Culture Documents
What Is A Data Warehouse?: Data Warehouse Architecture From Data Warehousing To Data Mining
What Is A Data Warehouse?: Data Warehouse Architecture From Data Warehousing To Data Mining
A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of
managements decision-making process.W. H. Inmon
Data warehousing:
Data WarehouseSubject-Oriented
Data WarehouseIntegrated
Data WarehouseNonvolatile
OLAP
users
clerk, IT professional
knowledge worker
function
decision support
DB design
application-oriented
subject-oriented
data
current, up-to-date
detailed, flat relational
isolated
repetitive
historical,
summarized, multidimensional
integrated, consolidated
ad-hoc
lots of scans
unit of work
read/write
index/hash on prim. key
short, simple transaction
# records accessed
tens
millions
#users
thousands
hundreds
DB size
100MB-GB
100GB-TB
metric
transaction throughput
usage
access
complex query
Note: There are more and more systems which perform OLAP
analysis directly on relational databases
10
11
Top-down view
Techniques
12
Choose the dimensions that will apply to each fact table record
Choose the measure that will populate each fact table record
13
Other
sources
Operational
DBs
Metadata
Extract
Transform
Load
Refresh
Monitor
&
Integrator
Data
Warehouse
OLAP Server
Serve
Analysis
Query
Reports
Data mining
Data Marts
Data Sources
September 14, 2015
Data Storage
OLAP Engine Front-End Tools
Data Mining: Concepts and
Techniques
14
Enterprise warehouse
collects all of the information about subjects spanning
the entire organization
Data Mart
a subset of corporate-wide data that is of value to a
specific groups of users. Its scope is confined to
specific, selected groups, such as marketing data mart
Virtual warehouse
A set of views over operational databases
Only some of the possible summary views may be
materialized
15
Data Warehouse
Development: A
Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Data
Mart
Enterprise
Data
Warehouse
Data
Mart
Model refinement
Model refinement
Define a high-level
corporate data model
Data Mining: Concepts and
September 14, 2015
Techniques
16
Data extraction
get data from multiple, heterogeneous, and external
sources
Data cleaning
detect errors in the data and rectify them when
possible
Data transformation
convert data from legacy or host format to warehouse
format
Load
sort, summarize, consolidate, compute views, check
integrity, and build indicies and partitions
Refresh
propagate the updates from the data sources to the
warehouse
17
Metadata Repository
Operational meta-data
Business data
18
Greater scalability
19
20
Information processing
Analytical processing
Data mining
21
22
Mining result
Layer4
User Interface
OLAM
Engine
OLAP
Engine
Layer3
OLAP/OLAM
MDDB
Filtering&Integration
Database API
MDDB
Meta
Data
Filtering
Layer1
Databases
September 14, 2015
Data cleaning
Data
Warehouse
Data Data
integration
Mining: Concepts and
Techniques
Data
Repository
23
Summary
24
25
References (I)
J. Han. Towards on-line analytical mining in large databases. ACM SIGMOD Record,
27:97-107, 1998.
Techniques
26
References (II)
R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to
Dimensional Modeling. 2ed. John Wiley, 2002
27