Professional Documents
Culture Documents
Data warehouse
External sources
Extract
Transform
Load
Operational
source systems
Serve
Data marts
Productt
Time1
Value1
Value11
Product2
Time2
Value2
Value21
Product3
Time3
Value3
Value31
Product4
Time4
Value4
Value41
Query/Reporting
Data mining
Fal aldf
flad akld
fal alksdf
Data presentation
area (RK)
The data warehouse
Operational
source systems
Operational
source systems
Operational
source systems
Analysis/OLAP
Data warehouse
External sources
Operational
source systems
Extract
Transform
Load
Serve
Data marts
Productt
Time1
Value1
Value11
Product2
Time2
Value2
Value21
Product3
Time3
Value3
Value31
Product4
Time4
Value4
Value41
Query/Reporting
Data mining
Fal aldf
flad akld
fal alksdf
Operational
source systems
Extract
Transform
Load
Extraction (E)
Transformation (T)
Load (L)
indexing
Extraction
means reading and understanding the source data and
copying the data needed for the data warehouse into
staging area for further manipulation, i.e.
transformation
Extract
Transform
Load
data conversion/transformation
(specify transformation rules to convert to a common data format
and common terms/semantics)
data cleaning/cleansing
data scrubbing (use domain-specific knowledge (e.g postal
adresses) to check the data)
data auditing (discover suspicious pattern, discover violation of
stated rules)
combining data from multiple sources
assigning warehouse (surrogate) keys
data aggregation
A debate questions:
Extract
Transform
Load
Flat file
C
DB2Connect
DB2
table(s)
D
SQL, C++ ??
Some cleansing
and scrubbing
may be needed
here
DB2
Preliminary
target DW
E
+aggregation
(new program)
DB2
Final
target DW
E
Analysis/OLAP
Data warehouse
External sources
Operational
source systems
Extract
Transform
Load
Serve
Data marts
Productt
Time1
Value1
Value11
Product2
Time2
Value2
Value21
Product3
Time3
Value3
Value31
Product4
Time4
Value4
Value41
Query/Reporting
Data mining
Fal aldf
flad akld
fal alksdf
Operational
source systems
Data marts
What is OLAP?
Dimensional modelling vs. 3 NF modelling
Data Marts
ROLAP/MOLAP servers
What is OLAP?
Acronym for On-line analytical processing
A decision support system (DSS) that support ad-hoc querying, i.e.
enables managers and analysts to interactively manipulate data. The
idea is to allow the users to easy and quickly manipulate and visualise
the data through multidimensional views, i.e. different perspectives.
Service
quarter
e
fic
of
Quarter
Facts
Office
product
Kimball: Dimensional modelling
Dimensional modelling
Service Dimension
Service
Key Service
group
S1
Local call Group A
S2
Intern. call Group A
S3
SMS
Group B
S4
WAP
Group C
0..*
Time Dimension
Date/
Key
991011
991012
C210
C210
C212
C213
C214
S1
S3
S2
S1
S4
F11
F11
F13
F13
F13
991011
991011
991011
991011
991012
0..*
Office
Sundsvall
Sundsvall
Kista
Year
99
99
Number
of calls
3
1
1
1
1
0..*
Customer Dimension
Sales Dimension
Seller
Anders C
Lisa B
Janis B
Sum
25:00
05:00
89:00
12:00
08:00
Quarter
4 - 99
4 - 99
0..*
Key
F11
F12
F13
Month
9910
9910
Key
C210
C211
C212
C213
C214
Customer
Anna N
Lars S
Erik P
Danny B
sa S
Address
Stockholm
Malm
Rttvik
Stockholm
Stockholm
Region
Stockholm
Skne
Dalarna
Stockholm
Stockholm
Income
group
B
B
C
A
A
Dimensional modelling
Service Dimension
Service
Key Service
group
S1
Local call Group A
S2
Intern. call Group A
S3
SMS
Group B
S4
WAP
Group C
Time Dimension
Date/
Key
991011
991012
S1
S3
S2
S1
S4
F11
F11
F13
F13
F13
991011
991011
991011
991011
991012
Sum
25:00
05:00
89:00
12:00
08:00
Number
of calls
3
1
1
1
1
=37:00
Key
F11
F12
F13
Seller
Anders C
Lisa B
Janis B
Office
Sundsvall
Sundsvall
Kista
Quarter
4 - 99
4 - 99
Year
99
99
Sales Dimension
Month
9910
9910
Query:
For how much
did customers in Sthlm
use service Local call
in october 1999?
Customer Dimension
Key
C210
C211
C212
C213
C214
Customer
Anna N
Lars S
Erik P
Danny B
sa S
Address
Stockholm
Malm
Rttvik
Stockholm
Stockholm
Region
Stockholm
Skne
Dalarna
Stockholm
Stockholm
Income
group
B
B
C
A
A
3 NF modelling
Dimensional modelling
Data marts
Service
Quarter
Calls
Service
Quarter
Office
Subscription
orders
Office
Service
Quarter
Calls
Office
Subscription
orders
A data mart
A data mart
Orders
ction
Produ
Dimensions
Time
Sales Rep
Customer
Promotion
Product
Plant
Distr. Center
10
Data marts
11
12
Data warehouse
OLAP
servers
Data marts
13
OLAP
servers
Data warehouse
External sources
Operational
source systems
Extract
Transform
Load
Refresh
Serve
Analysis
Productt
Time1
Value1
Value11
Product2
Time2
Value2
Value21
Product3
Time3
Value3
Value31
Product4
Time4
Value4
Value41
Query/Reporting
Data mining
Data marts
Operational
source systems
Fal aldf
flad akld
fal alksdf
What is metadata?
Data about data/Information about data
14
Maintenance
establish processes to synchronise metadata with
the changing data structure
Deployment
provide metadata to users in the right form and
with the right tools
15
Business metadata
(business terms and definitions, ownership of data)
Operational metadata
(information collected during the operations of the DW, e. g.
usage statistics, error reports)
OLAP
servers
Data warehouse
External sources
Operational DBs
Extract
Transform
Load
Refresh
Serve
Analysis
Productt
Time1
Value1
Value11
Product2
Time2
Value2
Value21
Product3
Time3
Value3
Value31
Product4
Time4
Value4
Value41
Query/Reporting
Data mining
Data marts
Operational
source systems
Fal aldf
flad akld
fal alksdf
16
Time1
Value1
Value11
Product2
Time2
Value2
Value21
Product3
Time3
Value3
Value31
Product4
Time4
Value4
Value41
Query/Reporting
Data mining
Fal aldf
flad akld
fal alksdf
mounth
quarter
Column headers
(join constraints)
Product Group
Group A
Group A
Group B
Group B
office
region
Column header
(application constraint)
Region
ABC
XYZ
ABC
XYZ
Row headers
17
18
Operational
Analytical
19
OMGs standards
Meta Object Facility (MOF)
M3 layer
M2 layer
Meta
metamodel
Metamodel
M0 layer
Model
Instances
Helen
Nagy
Invoice
no 34
20
Analysis
Data Mart
Reporting
Data
Source
Operational
Data Store
ETL
Data
Warehouse
Data Mart
Visualization
Data Mart
Data Mining
Data
Source
21
Why
metamodelling?
Event
consists of
Meta
metamodel
level or
Reference
model
consists of
Precedes
Transformation
State
Succedes
Precedes/
Succedes
Precedes
Function
State
Activity
Event
Metamodel
level
Precedes
Succedes
Succedes
Order
recieved
Model
level
Capture
ordered items
Capture
ordered items
Ordered item
[captured]
Ordered item
captured
Check material
on stock
Check material
on stock
Material on stock
[checked]
Material is
not on stock
Material is
on stock
CWM packages
Management
Warehouse Process
Analysis
Transformation
Resource
Relational
Foundation
Object
Model
Business
Information
Core
Warehouse Operation
OLAP
Record
Data Types
Information
Visualization
Data Mining
Expressions
Behavioral
Business
Nomenclature
Multi-Dimensional
XML
Keys and
Indexes
Type Mapping
Relationships
Software
Deployment
Instance
Packages/Metamodels
22
Element
ModelElement
Namespace
re
atu
rFe
Feature
sifie
Expression
s
Cla
StructuralFeature
Classifier
ProcedureExpression
Class
Attribute
Relational package
Datatype package
ColumnSet
NamedColumnSet
Table
Column
QueryExpression
QueryColumnSet
View
23
Package
Classifier
(Klass)
Feature
(Attribut)
Relational
Schema
Table
Column
Record
Record
file
RecordDef
Field
Multi
Dimensional
Schema
Dimenson
Dimension
ed Objct
Element
Type
Attribute
XML
Schema
Common
Representation
Tool X
Metamodel
Tool Z
Metamodel
<<metamodels>>
CWM Packages
24
Technical
Technical
Architecture
Architecture
Design
Design
Product
Product
Selection
Selection &
&
Installation
Installation
Business
Business
Project
Project
Planning
Planning
Requirement
Requirement
Dimensional
Dimensional
Modeling
Modeling
Physical
Physical
Design
Design
Data
Data Staging
Staging
Design
Design &
&
Development
Development
Deployment
Deployment
Definition
Definition
End-User
End-User
Application
Application
Specification
Specification
Maintenance
Maintenance
and
and
Growth
Growth
End-User
End-User
Application
Application
Development
Development
Project
Project Management
Management
Data
ARCHITECTURE AREA
Back room
Front room
Infrastructure
Info needed
for better decisions
Enterprise models
How get,
transform,
make available
data
Major business
issues.
How measure
How analyse
HW/SW
capabilities
needed vs what
we have
Architecture
models and
documents
Focal events,
facts, dimensions
Dimensional
models
Capabilities
needed to get and
transform data
Major data stores
Users needs
Major classes of
analyses
Priorities
Where is data
coming from
Calc and storage
reqs
Detailed
models and
specs
Logical and
physical models
Domains,
derivation rules
Standards, prods
to provide
capabilities
How hook together
Report layouts,
derivation
For whom, when
Implementation
DB, indexes
backup ...
Write extracts,
loads
Automate process
Implement report
and analysis env
Build rpt
Train users
Business
reqs and
audit
25