You are on page 1of 60

Advanced Database

Data Warehousing

88423007
88423010
88423025
88423031

Overview

12

Dimensional

Modeling

15
5


33

SQL Server

46

58

Database

Archive
Data martData warehouse

Multi-dimensional

MDM, Multi-dimensional Data Model

OLAP, On-Line Analytical Processing


Data mining
Executive InformationDecision Support
ERP, Enterprise Resources Planning

(OLAP)(EIS)

Overview

1970 IBM Mainframe Computers


1980 mini-computer
Mainframe Computers
DB2IMS
IT/IS
tape libraries

DSS EIS DSS


EIS

Gordon Moore

40MB 20GB

Server Windows NT UNIX

LinuxFreeBSD

Internet Intranet

SAPBaanPeopleSoft Oracle

OLTP

(OLTP)
OLTP

OLTP

OLTP

OLTP
OLTP
OLTP

(
)

Metadata(
) metadata
metadata
metadata

metadata OLTP

RDBMS

(data mapping)(data extraction)


(data transformation)(data maintenance)

(OLAP)

1996 IDC ()
220 2.3 90
40() 50
160 400


TeraBytes(TB)
PetaByte(1PB=1024TB)

OLTP

OLTP

OLTP
OLTP

1. Warehousing data outside the operational systems

Operational Systems
(1)

(2)

dynamic

(3)
inactive order

2. Logical transformation of operational data

data
warehouse model

10


(1)

(2)

(3)

(4) table
table table
key

join table

11

3. Physical transformation of operational data

(1) Operational terms transformed into uniform business


terms
cust
cust_id cust_no

(2) Single physical definition of an attribute

(3) Consistent use of entity attribute values


Male
FemaleMF

(4) Issues associated with default and missing values

<1>
<2><3>

4.

Business view summarization of data

query report
query
summary views query

12

13

Operational Data Base / External Data Base


Layer
Information Access Layer
Data Access Layer
Data Directory (Metadata) Layer
Process Management Layer
Application Messaging Layer
Data Warehouse Layer
Data Staging Layer

3-1
3.1) / Operational Data Base / External Data
Base Layer

14

(information superhighway)

3.2) Information Access Layer

ExcelLotus 1-2-3FocusAccessSAS

3.3) Data Access Layer

SQL
/

DBMSs

(universal data access)

2.5) Data Directory (Metadata) Layer

metadataMetadata

metadata

15

2.6) Process Management Layer

2.7) Application Messaging Layer

middleware

2.8) Data Warehouse (Physical) Layer


,
virtual view,

table client/server
main frames
2.9) Data Staging Layer
data staging layer
(copy management)(replication management)

16

Dimensional Modeling

Dimensional modeling
end-user

THE CASE FOR DIMENSIONAL MODELING

What is Entity-Relationship Modeling?

ERDEntity-Relationship Modeling
relation

instances

tables

1980 1980 Chris Date


table
tables
ERD
ERD ERD

table
ERD

4-1
ERD 22 entities
tables tables
ERD
entities ERP SAP entities
entities implement

17

ERD ERD ERD


GUI ERD

ERD ERD retrieval


retrieval
4-1

dimensional

natural
dimensional

18

What is Dimensional modeling?

Dimensional modeling
dimensional
relational model
dimensional model multipart key table
fact table tables dimension tables
dimension table fact table multipart
key 4-2
4-2

star joindimensional
modeling ERD 1960 General Mills
and Dartmouth University facts dimensions

19

1970 Nielsen Marketing Research


1970 1980
Nielsen 1984
Fact table
multipart fact tables
facts facts record keys
4-2 facts Dollars SoldUnits Sold
Dollars Cost fact table facts
retrieve
fact table
records Dimension tables
fact table dimension
SQL
answer set row headers source 4-2
Product table Flavor
Promotion table AdType 4-2
power dimension tables Dimension tables
entry points 4-2
end users
instances business

The Relationship between Dimensional Modeling and EntityRelationship Modeling

4-1 4-2
star join star join
dimensional modeling entity-relationship
entity-relationship diagram
multiple fact table diagrams
ERD ERD Sales CallsOrder EntryShipment
Invoices Customer Payments Product Returns
diagram ERD
diagram processes
ERD dimensional modeling diagrams
ERD model
ERD additive facts

20

fact tables tables


flat tables single-part keys fact tables
tables dimension tables dimension table
fact table dimension table schemas
dimensional models dimension
table conformed
dimensional model 10 25
star-join schemas star join 5
15 dimension tables dimension tables
fact table fact table drill down
applications star join dimension
attributes SQL drill across applications conformed
dimensions fact tables
dimensional model star-join schemas
fact table

The Strengths of Dimensional Modeling

ERD dimensional
model
dimensional model report
writers query tools interfaces dimensional
model
dimension table bit
vector indexes dimension attributesMetadata

optimizer dimension tables


fact table dimension table keys
constraints fact
table join fact table n-way joins DBA
n-way join
n-way joins dimensional model
presentation performance

21

star join schema


dimension dimensions
fact table entry points
query patterns SQL

dimensional model
tables
table SQL ALTER TABLE

applications
4-2 1 4

1. facts fact table

2. dimensions dimension
fact
3. dimensional attributes
4. dimension records certain point
granularity level
modeling
report writersquery tools
dimensions
constant dimension
dimensional modeling dimensions
dimensions
facts

facts Pay-in-advance

fact table factless

22


implementation

fact
dimension tables dimensional model
dimensional

PUTTING DIMENSIONAL MODELS TOGETHER THE DATA


WAREHOUSE BUS ARCHITECTURE

defend

Data Warehouse Bus Architecture


data mart
data mart
data mart

The Planning Crisis

table

data warehouse manager

data mart data mart

23

data mart data mart

data mart

data mart

Data Marts with a Bus Architecture

overall data architecture phase


data mart
data mart data
mart

implementation

legacy system

data mart implement

Conformed Dimensions and Standard Fact Definitions

data mart facts


dimensions Data Warehouse Bus
Architecture data mart dimensional
key star join
conformed dimension join dimension
fact table conformed dimension
data mart
conformed

24

dimensions conformed dimension


conformed customer dimension customer key
table
conformed dimension
conformed dimensions
conformed dimensions
conformed dimensions

dimension table fact


tables
dimension

Designing the Conformed Dimensions

conformed dimensions
conformed dimensions customer dimension
product dimension
time dimension conformed
dimensions legacy

Establishing the Conformed Fact Definitions

conformed dimensions data marts


80% 20%
conformed fact
conformed dimensions
data mart drill across
data mart single reports conformed fact
conformed facts
facts conformed fact
dimension context data mart

25

units
fact tables fact

dimensional drill-across
report
dimension table
overhead fact
fact manufacturers table manufacturers table view
conform fact

Billing Cycle Revenue

The Importance of Data Mart Granularity

Conformed dimensions
data mart base level
fact tables dimensions
Granular fact table data power Data mart
Gracefully extended
dimensional modeling approach facts
dimension attribute dimensions
queries applications fact table drop
reload key

Multiple-Source Data Marts

data mart
user data mart
data mart data mart
roll up
roll up
data mart

26


Activity-Based Profitability multiple source data mart
single-source data mart data
mart

When You Dont Need Conformed Dimensions

The Data Warehouse Bus

Conformed dimensions conformed facts busbus


bus
powerbus

section conformed dimensions conformed


facts bus data mart
bus power bus dimensions facts
data mart

BASIC DIMENSIONAL MODELING TECHNIQUES

dimensional modeling

Fact Tables and Dimension Tables

dimensional modeling cube


cube cell measured value cube
edge cube hypercube

27

dimensional model 4~15


20

Facts

Fact

Fact table

Fact

Fact

Dimension Foreign Key

Attributes

fact attributes

Dimensions

dimensions text-like attributes


dimensions
dimensions dimension 1000 product 100
store dimension product-store

28

product-store dimension 100000 dimensions


dimensions
product-store dimension 2000

count distinct operations

Dimensions table

Fact
.
Fact

Dimension Fact

Fact

Dimension hierarchies
Primary Key Dimension Key

Star
Dimension Fact Dimension

Inside dimension tables drilling up and down

drilling down
drilling down drilling
dimensional model dimension table attribute
attribute application
dimension table attribute
dimension table attribute

29

SQL select group by order by


dimension product dimension

attribute
Roll up row header
join query high level low level bringing back

browsing dimension table attribute


browser dimension table

dimension table browse request


browse

Permissible Snowflaking

browse
DB
bitmap indexes

The Importance of a High-Quality Verbose Attribute

dimension attribute
dimension table dimension

attribute browse

dimension table attributes

30

verbose ()
descriptive ()
complete ()
quality assured ()
indexed ( B-tree bitmap )
documented ( attribute )

fact table
fact table 4-3

4-3

31


1.organization name
2.post name
3.post name abbreviation
4.role
5.street name
6.street type
... ...
... ...
... ...

Degenerate Dimensions

fact table
line item fact
table dimension table

dimension
order line
item table dimension order
line-item fact table order number order
number attribute degenerate dimension

Junk Dimensions
(flag) text

attribute

flag text attribute fact table fact


table
flag attribute dimension

flag attribute flag and attribute

32


flag and attribute flag and attribute
junk dimensions

Foreign KeysPrimary Keys and Surrogate Keys

keys surrogate key


key
4-byte
integer surrogate key

Surrogate Sate Keys

SQL-based stamp join fact table and time dimension


table key
1. 4-byte SQL-based stamp 8-byte
2. join time table fact table date key

Avoid Smart Keys

smart keys
attribute dimension primary key join

fact table
dimensional schema
1.the
2.the
3.the
4.the

data mart
fact table grain fact table
dimensions
facts fact

Step1.

33

Step 2. fact table fact tables


fact table
what is a fact record, exactly ?"
fact table

fact record
fact record
ATM fact record

fact record
fact record
fact record

fact table grain

Step 3. dimensions
fact table grain
grain

Step 4. fact
fact fact table grainfact table grain
fact fact table fact

34

DSS
DSS

DSS

SQL
SQL

Star Schema

Star Schema
Star Schema table
join path
Schema

1. Star Schema
data modeling Star Schema

Star Schema
Star Schema

35

metadata
Star Schema

2. Star Schema
Star Schema table Fact tables Dimension
tables Fact tables major tables,
Dimension tables minor tables
SQL fact table dimension
table join paths
fact table
Dimension table
fact table
schema dimension table

Star Schema

1.Simple Star Schemas


table
Simple Star Schema fact table
table
table SQL

5-1 fact table dimension table


fact table dimension table fact table
KEY1KEY2KEY3
dimension table

36

5-1

5-2 Simple Star Schema fact


table Product_IdPeriod_IdMarket_Id
dimension table
5-2

fact table dimension table


Product Table

Sales Table

2.Multiple Fact Tables


fact table
fact table

37

fact
table fact table

5-3 fact table


5-3

Fact table

fact table
5-4 fact table

schema

38

5-4

5-5

14,000 120
208
fact table Sales dimension
tableProductPeriod Market 5-5 Sales fact table
ProductPeriod Market
dimension table Sales table

39

simple star schemaSales table

dimension table
dimension table

dimension table

SELECT market_desc FROM market


120 product period
table
WHERE
Kraft WHERE
product table Kraft
dimension table fact table SELECT DISTINCT
fact table

dimension table
product table 1400
Dimension table

brand 14000 brand


brand ClassFlavorSize
Manufacturer Product table

drilling down

40

rolling up drill down roll up

(perspective)

drill down
Drilling
down
star schema dimension table

5-6
drill-down roll-up
5-6

star schema

41

AGGREGATION

Aggregation

CPU cycle

5-7

5-7

42

5-8 fact
table 10,000,000
9,000,000
1,000,000
(sparse
aggregation)

5-8

GB
5-9

Star schema

star schema
star schema

43

star schema Star schema


star schema
dimension table
productperiod market sales fact table
5-9

dimension table star


schema
star schema

star join
fact table

star schema
5-10 multi-star schema
Frequent_StayersActuals Bookings table

44

5-10

1.

45

5-10 multi-star schema


fact table Booking tableActuals
table Promotion Schedule table

5-11

2.

46

5-11

schema aggregation

47

SQL Server

TOYOTA OLTP

Models
Model
Tercel
Premio
Corolla
Camry
Solemio
Zace
Avalon
Lexus

Price
450000.0000
600000.0000
700000.0000
1000000.0000
750000.0000
500000.0000
1500000.0000
2000000.0000

Import

Retailers
Retailer

Region

Taipei
Chungli
Toayuan
Tainan

North
North
North
South

48

Kaoshiung
Taidong

South
South

Sales
OrderNo

Retailer

Model

SalesDate

1
2
3
4
5
6
7
8
9
10
:
:
49871
49872
49873
49874
49875
49876
49877

Taipei
Kaoshiung
Tainan
Chungli
Taidong
Taoyuan
Tainan
Chungli
Taoyuan
Taidong
:
:
Tainan
Taipei
Taipei
Taoyuan
Taipei
Kaoshiung
Taipei

Tercel
Solemio
Lexus
Premio
Tercel
Zace
Solemio
Solemio
Avalon
Premio
:
:
Avalon
Solemio
Premio
Avalon
Avalon
Premio
Zace

1999-12-25
1998-11-28
1998-11-15
1998-05-19
1999-02-28
1998-04-19
1999-02-23
1999-01-17
1998-04-02
1999-05-15
:
:
1998-07-23
1999-12-08
1999-12-26
1999-08-02
1999-03-18
1998-07-27
1999-03-06

Qty
1
2
1
2
2
1
1
1
1
1
:
:
1
1
2
2
2
2
2

Decision Support

OLTP

SQL Server

49

Dimensional Model
Star SchemaFact tableDimension table

1 Dimensional Model
(1)

Model
Time

Retailer

(2)
/

Model

Region

Retailer

Year

Month

(3) Star Schema


Fact Table
Fact table Fact
sYear
sMonth
Retailer
Model
Quantity
TotalPrice
50

Fact Table
Dimension tables
Dimension ModelRetailerTime

Retail_Dimension

Ret
ai

Sales_Fact
sYear
sMonth

Time_Dimension

Retailer

Model_Dimension

Mo
d

Model
Star Schema

(1) Dimension Table Dimension Table 1

Retail_Dimension
CREATE TABLE Retail_Dimension
( Retailer varchar(12) PRIMARY KEY,
Region varchar(12)
)

51

sYear
sMonth

INSERT Retail_Dimension
SELECT Retailer, Region FROM Retailers

Model_Dimension
CREATE TABLE Model_Dimension
( Model varchar(12) PRIMARY KEY,
Import varchar(8)
)

INSERT Model_Dimension
SELECT Model, Import FROM Models

Time_Dimension 2
CREATE TABLE Time_Dimension
( sYear int,
sMonth int,
CONSTRAINT pk_Y_M PRIMARY KEY ( sYear, sMonth )
)

INSERT Time_Dimension
SELECT DISTINCT
YEAR( SalesDate ), MONTH( SalesDate )
FROM Sales
(2) Fact Table

Sales_Fact
CREATE TABLE Sales_Fact
( sYear int,

52

sMonth int,
Retailer varchar(12),
Model varchar(12),
Quantity int,
TotalPrice money,
CONSTRAINT pk_sY_sM_R_D
PRIMARY KEY ( sYear, sMonth, Retailer, Model )
)
INSERT Sales_Fact
SELECT Retailer, s.model, YEAR( SalesDate ),MONTH( SalesDate ),
SUM( Qty ), SUM( Qty * m.price ) FROM Sales s JOIN Models m
ON s.model = m.model
GROUP BY Retailer, s.model, YEAR(SalesDate), MONTH(SalesDate)
1 Microsoft Transact-SQL
2 Time_Dimension
Fact Table Time_Dimension

53

SELECT SUM(Quantity) AS ', 1999 , '


FROM Sales_Fact s JOIN Retail_Dimension r
ON s.Retailer = r.Retailer
JOIN Model_Dimension m
ON s.Model = m.Model
WHERE r.Region = 'North'
AND s.sMonth = 5 AND s.sYear = 1999
AND m.Import = ''

SELECT SUM(TotalPrice) AS '1998, , '


FROM Sales_Fact
WHERE sYear = 1998
AND Retailer = 'ChungLi'

SELECT sMonth AS '', Retailer AS '', Model AS '',


SUM(Quantity) AS ''
FROM Sales_Fact
WHERE sYear = '1999'
GROUP BY sMonth, Retailer, Model

54

TOYOTA

2000
2000

Sales_Fact

SQL

INSERT Sales_Fact
SELECT YEAR( SalesDate ), MONTH( SalesDate ),
Retailer, s.model, SUM( Qty ), SUM( Qty * m.price )
FROM Sales s JOIN Models m
ON s.model = m.model
WHERE
MONTH(SalesDate) = MONTH(DATEADD(mm,-1,GETDATE()))
AND YEAR(SalesDate) = YEAR(DATEADD(mm,-1,GETDATE()))
GROUP BY YEAR(SalesDate), MONTH(SalesDate), Retailer, s.model

SQL Server Data Transformation Services DTS


Packages

55

Connection

Connection Properties
(1) Connection Name(2) Data Source(3) Server(4) Database

56

Execute SQL Task

Execute SQL Properties


(1) Description(2) Connection(3) SQL statement

57

DTS Package

Schedule Package

58

Recurring Job Schedule

59

60

You might also like