Professional Documents
Culture Documents
Data Warehouse 567
Data Warehouse 567
Data Warehousing
1.Basic Concepts of data warehousing
2.Data warehouse architectures
3.Some characteristics of data warehouse data
4.The reconciled data layer
5.Data transformation
6.The derived data layer
7. The user interface
Motivation
Chapter 1
Definition
Data Warehouse:
Warehouse
(W.H. Immon)
Data Warehousing:
Warehousing
The process of constructing and using a data
warehouse
Chapter 1
Data WarehouseSubjectOriented
Organized
by integrating multiple,
heterogeneous data sources
relational databases, flat files, on-line transaction
records
Data
time element.
Chapter 1
Chapter 1
Chapter 1
Two-Level Architecture
2.Independent Data Mart
3.Dependent Data Mart and Operational
Data Store
4.Logical Data Mart and @ctive Warehouse
5.Three-Layer architecture
All involve some form of extraction, transformation and loading (ETL)
ETL
Chapter 1
10
L
T
One,
companywide
warehouse
E
Periodic extraction data is not completely current in warehouse
Chapter 1
11
Data marts:
Mini-warehouses, limited in scope
T
E
Separate ETL for each
independent data mart
Chapter 1
12
Chapter 1
13
Figure 11-4:
Dependent data mart with operational data store
T
E
Single ETL for
enterprise data warehouse
(EDW)
Chapter 1
14
15
Figure 11-5:
Logical data mart and @ctive data warehouse
T
E
Near real-time ETL for
@active Data Warehouse
Chapter 1
16
Chapter 1
17
Chapter 1
18
Chapter 1
19
Three-layer architecture
Reconciled and derived data
Reconciled
20
Data Characteristics
Status vs. Event Data
Figure 11-7:
Example of
DBMS log entry
Status
Status
Chapter 1
21
Data Characteristics
Transient vs. Periodic Data
Figure 11-8:
Transient operational data
Changes to existing records are
written over previous records, thus
destroying the previous data content
Chapter 1
22
Data Characteristics
Transient vs. Periodic Data
Figure 11-9:
Periodic warehouse data
Chapter 1
23
descriptive attributes
New business activity attributes
New classes of descriptive attributes
Descriptive attributes become more refined
Descriptive data are related to one another
New source of data
Chapter 1
24
Data Reconciliation
Typical
After
Chapter 1
25
or data cleansing
Transform
Load and Index
ETL = Extract, transform, and load
Chapter 1
26
Incremental extract =
capturing changes that have
occurred since the last static
extract
27
Chapter 1
28
Record-level:
Selection data partitioning
Joining data combining
Aggregation data summarization
Chapter 1
Field-level:
single-field from one field to one field
multi-field from many fields to one, or
one field to many
29
Chapter 1
30
Data Transformation
Data
31
functions
Selection: data partitioning
Joining: data combining
Normalization
Aggregation: data summarization
Field-level functions
Single-field transformation: from one field to
one field
Multi-field transformation: from many fields to
one, or one field to many
Chapter 1
32
Table lookup
another
approach
Chapter 1
33
Chapter 1
34
Derived Data
Objectives
Characteristics
35
Chapter 1
36
1:N relationship
between dimension
tables and fact tables
Chapter 1
37
Chapter 1
38
Chapter 1
39
Granularity
want?
Chapter 1
40
Duration
of the database
Size
Chapter 1
41
42
Chapter 1
43
in a store on a date.
Receipts - facts about the receipt of a product from a
vendor to a warehouse on a date.
Two separate product dimension tables have been
created.
One date dimension table is used.
Chapter 1
44
Chapter 1
45
The
two situations:
To track events
To inventory the set of possible occurrences (called
coverage)
Chapter 1
46
47
48
49
Chapter 1
Multivalued dimension
50
Snowflake schema
Snowflake
Disadvantages
Schema less intuitive
Ability to browse through the content difficult
Degraded query performance because of additional
joins.
Chapter 1
51
time
time_key
day
day_of_the_week
month
quarter
year
item
Sales Fact Table
time_key
item_key
branch_key
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
Chapter 1
Measures
item_key
item_name
brand
type
supplier_key
supplier
supplier_key
supplier_type
location
location_key
street
city_key
city
city_key
city
province_or_stre
country
52
53
54
Querying Tools
SQL is
55
Multidimensional
OLAP (MOLAP)
Chapter 1
56
Chapter 1
57
MOLAP Operations
Roll
Drill
and dice:
Chapter 1
58
Chapter 1
59
Figure 11-23:
Example of drill-down
Summary report
Drill-down with
color added
Chapter 1
60
Data Mining
Techniques
Case-based reasoning
Rule discovery
Signal processing
Neural nets
Fractals
Chapter 1
61
Data Visualization
Data
Chapter 1
62
63