Professional Documents
Culture Documents
Data Warehousing and OLAP Technology: A. Bellaachia
Data Warehousing and OLAP Technology: A. Bellaachia
Objectives ............................................................................... 3
What is Data Warehouse?....................................................... 4
2.1. Definitions ...................................................................... 4
2.2. Data WarehouseSubject-Oriented............................... 5
2.3. Data WarehouseIntegrated.......................................... 5
2.4. Data WarehouseTime Variant .................................... 6
2.5. Data WarehouseNon-Volatile..................................... 6
2.6. Data Warehouse vs. Heterogeneous DBMS ................... 7
2.7. Data Warehouse vs. Operational DBMS ........................ 7
2.8. OLTP vs. OLAP ............................................................. 8
2.9. Why Separate Data Warehouse?..................................... 9
3. Multidimensional Data Model .............................................. 10
3.1. Definitions .................................................................... 10
4. Conceptual Modeling of Data Warehousing......................... 12
4.1. Star Schema .................................................................. 13
4.2. Snowflake Schema........................................................ 14
4.3. Fact Constellation ......................................................... 15
5. A Data Mining Query Language: DMQL............................. 16
5.1. Definitions and syntax .................................................. 16
5.2. Defining a Star Schema in DMQL ............................... 17
5.3. Defining a Snowflake Schema in DMQL..................... 18
5.4. Defining a Fact Constellation in DMQL ...................... 19
5.5. Measures: Three Categories.......................................... 21
5.6. How to compute data cube measures? .......................... 22
6. A Concept Hierarchy ............................................................ 24
7. OLAP Operations in a Multidimensional Data..................... 26
8. OLAP Operations ................................................................. 29
9. Starnet Query Model for Multidimensional Databases ........ 33
10.
Data warehouse architecture............................................. 34
10.1. DW Design Process ...................................................... 35
1.
2.
A. Bellaachia
Page: 1
A. Bellaachia
Page: 2
1. Objectives
What is a data warehouse?
Data warehouse design issues.
General architecture of a data warehouse
Introduction to Online Analytical Processing (OLAP)
technology.
Data warehousing and data mining relationship.
A. Bellaachia
Page: 3
A. Bellaachia
Page: 4
A. Bellaachia
Page: 5
A. Bellaachia
Page: 6
Page: 7
Users
Function
OLTP
Clerk, IT professional
Day to day operations
OLAP
Knowledge worker
Decision support
DB design
Application-oriented
Subject-oriented
Data
Current, up-to-date
Detailed, flat relational
Isolated
Repetitive
Read/write, Index/hash on
prim. Key
Short, simple transaction
Tens
Historical, Summarized,
multidimensional
Integrated, consolidated
Ad-hoc
Lots of scans
Thousands
100MB-GB
Hundreds
100GB-TB
Transaction throughput
Usage
Access
Unit of work
# records
accessed
#users
DB size
Metric
A. Bellaachia
Complex query
Millions
Page: 8
A. Bellaachia
Page: 9
Page: 10
item
all
0-D(apex) cuboid
item
time
location
supplier
1-D cuboids
time,item
time,location
item,location
location,supplier
2-D cuboids
time,supplier
item,supplier
time,location,supplie
time,item,location
3-D cuboids
time,item,supplie
item,location,supplier
4-D(base) cuboid
time
location
item
time
location
item
A. Bellaachia
Each supplier
Page: 11
A. Bellaachia
Page: 12
time
time_key
day
day_of_the_week
month
quarter
year
item
item_key
item_name
brand
type
supplier_type
branch_key
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
location
location_key
street
city
state_or_province
country
avg_sales
Measures
A. Bellaachia
Page: 13
item
time
time_key
day
day_of_the_week
month
quarter
year
item_key
item_name
brand
type
supplier_type
item_key
supplier
supplier_key
supplier_typ
branch_key
location_key
branch
branch_key
branch_name
branch_type
units_sold
A. Bellaachia
location_key
street
city_key
dollars_sold
avg_sales
Measures
location
city
city_key
city
state_or_province
country
Page: 14
time
time_key
day
day_of_the_week
month
quarter
year
item_key
brand
type
supplier type
branch
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
Measures
A. Bellaachia
item key
Shipper key
from location
to location
branch_key
location_key
time key
dollars cost
location
location_key
street
city
state_or_
province
country
units shipped
shipper
shipper_key
shipper_name
location_key
shipper_type
Page: 15
A. Bellaachia
Page: 16
A. Bellaachia
Page: 17
A. Bellaachia
Page: 18
A. Bellaachia
Page: 19
A. Bellaachia
Page: 20
Page: 21
A. Bellaachia
Page: 22
A. Bellaachia
Page: 23
6. A Concept Hierarchy
A concept hierarchy is an order relation between a set of
attributes of a concept or dimension.
It can be manually (users or experts) or automatically
generated (statistical analysis).
Multidimensional data is usually organized into dimension
and each dimension is further defined into a lower level of
abstractions defined by concept hierarchies.
Example: Dimension (location)
all
all
region
North America
Europe
...
country
city
office
A. Bellaachia
...
Germany
Frankfurt
Spain
...
L. Chan
Canada
Vancouver
...
...
...
Mexico
Toronto
M. Wind
Page: 24
country
state
year
quarter
week
month
city
street
day
Set-grouping hierarchy:
A. Bellaachia
Page: 25
Product
Month
A. Bellaachia
Page: 26
Product
TV
PC
VCR
1Qtr
Total annual
sales
Date
2Qtr
3Qtr
4Qtr
sum
U.S.A
Country
sum
Canada
Mexico
sum
product,date
date
country
product,country
1-D cuboids
date, country
2-D cuboids
3-D(base) cuboid
product, date, country
A. Bellaachia
Page: 27
A. Bellaachia
Page: 28
8. OLAP Operations
Objectives:
o OLAP is a powerful analysis tool:
Forecasting
Statistical computations,
aggregations,
etc.
Roll up (drill-up): summarize data
o It is performed by climbing up hierarchy of a
dimension or by dimension reduction (reduce
the cube by one or more dimensions).
o The roll up operation in the example is based
location (roll up on location) is equivalent to
grouping the data by country.
New Orleans
Virginia
c1
10
21
c2
12
c3
11
c4
12
11
15
CD
video
Camera
roll up
A. Bellaachia
Date of
sale
Video
Camera CD
NO
22
30
VA
23
18
22
Page: 29
A. Bellaachia
Page: 30
A. Bellaachia
Page: 31
A. Bellaachia
Page: 32
Customer
CONTRACTS
AIR-EXPRESS
ORDER
TRUCK
Product
PRODUCT LINE
Time
ANNUALY QTRL
DAIL
COUNTRY
PRODUCT ITEM
CITY
REGION
A. Bellaachia
SALES
DISTRICT
Location
Each circle is called a footprint
PRODUCT GROUP
DIVISION
Promotion
Organization
Page: 33
A. Bellaachia
Page: 34
A. Bellaachia
Page: 35
Multi-Tiered Architecture
Metadata
other
sources
Operational
DBs
Extract
Transform
Load
Refresh
Monitor
&
Integrator
Data
Warehouse
OLAP
Server
Serve
Analysis
Query
Reports
Data
Data Marts
Data Sources
A. Bellaachia
Data Storage
OLAP
Engine
Front-End Tools
Page: 36
A. Bellaachia
Page: 37
A Recommended Approach
Multi-Tier Data
Warehouse
Distributed Data
Marts
Data
Mart
Enterprise Data
Warehouse
Data
Mart
Model refinement
Model refinement
A. Bellaachia
Page: 38
Page: 39
2n
If no hierarchy
if
hierarchy
and
number of cuboids = n
( Li + 1) Li is number of levels
i =1
Page: 40
A. Bellaachia
Page: 41
()
(city)
(city, item)
(item)
(city, year)
(year)
(item, year)
A. Bellaachia
Page: 42
A. Bellaachia
Page: 43
Asia
1
0
1
0
0
Europe
0
1
0
0
1
America
0
0
0
1
0
Index on Type:
RecID
1
2
3
4
5
A. Bellaachia
Retail
1
0
0
1
0
Dealer
0
1
1
0
1
Page: 44
A. Bellaachia
Page: 45
A. Bellaachia
Page: 46
A. Bellaachia
Page: 47
Mining query
Layer3
OLAP/OLAM
OLAP
Engine
Layer2
MDDB
MDDB
Meta Data
Filtering&Integration
Database API
Data cleaning
Databases
A. Bellaachia
Data integration
Filtering
Data
Warehouse
Layer1
Data Repository
Page: 48