Professional Documents
Culture Documents
Data Warehousing & OLAP
Data Warehousing & OLAP
Large time horizon for trend analysis (current and past data)
Non-Volatile store
physically separate store from the operational environment
3
Bulk load/refresh
warehouse is offline
OLAP-server provides
multidimensional view
Multidimensional-olap
(Essbase, oracle express)
Relational-olap
Examples of OLAP
Comparisons (this period v.s. last period)
Show me the sales per region for this year and compare it to that of
the previous year to identify discrepancies
Multidimensional Modeling
Example: compute total sales volume per product and store
Total
Sales
Product
1
1 $454
Store
Store
$925
2 $468 $800
3 $296
$240
4 $652
$540 $745
800
Product
cit
y
Sales of DVDs in
NY in August
NY
DIMENSIONS
product
DVD
product
country
month
August
state
city
store
quarter
month
week
day
8
category
region
year
product
country
quarter
state
city
month
week
day
store
Pivoting
Pivoting: aggregate on selected dimensions
usually 2 dims (cross-tabulation)
Sales
Store
Product
1
ALL
454
925
1379
468
800
1268
296
240
536
652
540
745
1937
780
1670
5120
10
pr
od
u
ct
customers
store
customer = Smith
11
Roadmap
12
13
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
product_key
product_key
product_name
category
brand
color
supplier_name
location_key
LOCATION
units_sold
location_key
store
street_address
city
state
country
14
region
SALES
time_key
measures
amount
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
SALES
time_key
product_key
location_key
measures
Pcategory
IN
O
J
JOI
N
units_sold
amount
Sregion=Europe
product_key
product_name
category
brand
color
supplier_name
LOCATION
location_key
store
street_address
city
state
country
16
region
LOCATION
location_key
L1
L2
L3
L4
L5
Region
Asia
Europe
Asia
America
Europe
Index on Region
Asia Europe America
1
0
0
0
1
0
1
0
0
0
0
1
0
1
0
17
Join-Index
LOCATION
region = Africa
region = America
region = Asia
region = Europe
SALES
R102
R117
R118
R124
18
Problem Solved?
Find total sales per product-category in our stores in Europe
Join-index will prune of the data (uniform sales), but the
remaining is still large (several millions transactions)
Index is unclustered
LOCATON
region
country
state
Pre-computation is necessary
city
store
19
Store
Sales
Product
1
ALL
454
925
1379
468
800
1268
296
240
536
652
540
745
1937
780
1670
5120
4 Group-bys here:
(store,product)
(store)
(product)
()
Need to write 4 queries!!!
Sub-totals per store
Total sales
Sub-totals per product
20
21
Store
Sales
Product
1
ALL
454
925
1379
468
800
1268
296
240
536
652
540
745
1937
780
1670
5120
Store
1
1
2
2
3
3
4
4
4
1
1
1
1
ALL
ALL
ALL
ALL
ALL
Product_key
1
4
1
2
1
3
1
3
4
ALL
ALL
ALL
ALL
1
2
3
4
ALL
sum(amout)
454
925
468
800
296
240
625
240
745
1379
1268
536
1937
1870
800
780
1670
5120
22
DVD
PC
VCR
sum
1Qtr
2Qtr
Quarter
3Qtr
4Qtr
sum
Region
Pr
od
uc
t
sum
23
24
product,quarter
store,quarter
product, store
quarter
product
store
none
25
Computation Directives
Smallest-parent
Cache-results
Amortize-scans
Share-sorts
Share-partitions
product,store,quarter
product,quarter
store,quarter
product, store
quarter
product
store
none
26
27
2n views for n
dimensions (nohierarchies)
Storage/updatetime explosion
More precomputation
doesnt mean
better
performance!!!!
28
product,quarter
quarter
store,quarter
product
B ( v, S )
product, store
store
(C S (u ) Cv (u ))
u:u v ,C (u )C (u )
v
none
29
30
Other Issues
Fact+Dimension tables in the DW are views of tables stored in
the sources
Lots of view maintenance problems
correctly reflect asynchronous changes at the sources
making views self-maintainable