You are on page 1of 25

Data Warehouse Design: Overview

Objectives

After completing this lesson, you should be able to:


• Explain the basic concepts of a data warehouse
• Explain the data warehouse logical and physical designs
• Identify the data warehousing objects
• List the available data warehousing schemas

2-2
Lesson Agenda

• Reviewing possible data warehouse architectures:


– Characteristics of a data warehouse
– Online transaction processing systems (OLTP) versus data
warehouses
– Data warehouse architectures
– Data warehouse logical and physical designs, objects, and
schemas
• Reviewing data warehouse logical and physical designs,
objects, and schemas

2-3
Characteristics of a Data Warehouse

• A data warehouse is designed for querying, reporting, and


analysis.
• A data warehouse contains historical data derived from
transaction data.
• Data warehouses separate analysis workload from
transaction workload.
• A data warehouse is primarily
an analytical tool.

2-4
OLTP Systems Versus Data Warehouses

Property OLTP Data Warehouse


Response time Subseconds to seconds Seconds to hours

Data organization Application Subject, time

Activities Processes Analysis

Joins Many Some

Nature of data Generally 30–60 days, Snapshots over time,


transactional derived data/aggregates

Size Small to large Large to very large

Data sources Operational, internal Operational, internal,


external

Duplicated data Normalized RDBMS Denormalized RDBMS

2-5
Data Warehouse Architectures:
Basic Data Warehouse

Operational
system Analysis
Metadata

Summaries Raw data


Operational
system Reporting

Flat files Data mining

2-6
Data Warehouse Architectures:
Basic Data Warehouse with Staging Area

Operational
system Analysis
Metadata

Summaries Raw data


Operational Staging
system area Reporting

Flat files Data mining

2-7
Data Warehouse Architectures:
Basic Data Warehouse with Staging Area and Data Marts

Operational
system Analysis
Metadata
Sales

Purchasing
Summaries Raw data
Operational Staging Inventory
system area Reporting

Flat files Data mining

2-8
Lesson Agenda

• Reviewing possible data warehouse architectures


• Reviewing data warehouse logical and physical designs,
objects, and schemas
– Data warehouse logical design
– Facts, dimensions, and hierarchies
– Star and snowflake schemas

2-9
Data Warehouse Design

Key data warehouse design considerations:


• Identify the specific data content.
• Recognize the critical relationships within and between
groups of data.
• Define the system environment
supporting your data warehouse.
• Identify data sources.
• Identify the required data
transformations.
• Calculate the frequency at which
the data must be refreshed.

2 - 10
Data Warehouse: Design Phases

1. Define the business model. 2. Define the logical model.

4. Define the physical model. 3. Define the dimensional model.

2 - 11
Data Warehouse Physical Design

Logical Physical model (Objects)

Entities

Relationships Tables Columns Integrity constraints


(PK, FK, NN)

Attributes

Unique
identifiers Indexes Materialized Dimensions
views

2 - 12
Data Warehouse Physical Structures

DW Physical Structure Description

Partitioned tables • Enable you to split large data volumes into


smaller, more manageable pieces

Views • Are customized presentations of the data


contained in one or more tables or views
• Do not require any space in the database

Materialized views (MVs) • Are query results that have been stored in
advance
• Are used transparently (similar to indexes)
and improve performance

Integrity constraints • Are used in data warehouses for query


rewrite

Dimensions • Are containers of logical relationships and


do not require any space in the database

2 - 13
Data Warehousing Objects

Fact tables: Large tables Dimension tables: Relationships:


that store business Lookup or reference Guarantee integrity
measurements tables that serve to of business
categorize measure data information

2 - 14
Characteristics of Fact Tables

• Contain numerical measures of the business


• Hold large volumes of data
• Grow quickly
• Can contain base, derived,
Sales (Fact Table)
and summarized data
• Are typically additive PROD_ID
CUST_ID
• Are joined to dimension TIME_ID
tables through foreign keys CHANNEL_ID
PROMO_ID
that reference primary QUANTITY_SOLD
keys in the dimension tables AMOUNT_SOLD
...

2 - 15
Dimensions and Hierarchies

• A dimension is a structure composed of


one or more hierarchies that categorize CUSTOMERS
dimension hierarchy
data.
• Dimensional attributes help describe REGION
the dimensional value.
• Dimension data is usually collected at
the lowest level of detail and aggregated SUBREGION
into higher level totals.
• Hierarchies:
COUNTRY_NAME
– Ordered levels that organize the data
– Enable you to aggregate and drill on
the data
CUSTOMERS
• A dimension can have one or more
supporting hierarchies.
2 - 16
Dimensions and Hierarchies

CUSTOMERS
PRODUCTS
#prod_id Unique identifier #cust_id
...
Fact table cust_last_name
SALES cust_city
cust_state_province
cust_id
Relationship prod_id
Hierarchy
...

TIMES CHANNELS
... PROMOTIONS ...
...
Dimension table Dimension table
Dimension table

2 - 17
Using Hierarchies for Drill
on Data and Aggregate Data
Market Hierarchy

Group

Region 1 Region 2

Country 1 Country 2 Country 3 Country 4

State 1 State 2 State 3 State 4 State 5 State 6

City 1 City 2

2 - 18
Data Warehousing Schemas

• Objects can be arranged in data warehousing schema


models in a variety of ways:
– Star schema
– Snowflake schema
– Third normal form (3NF) schema
– Hybrid schemas
• The source data model and user
requirements should steer the data
warehouse schema.
• Implementation of the logical model may require changes
to adapt it to your physical system.

2 - 19
Schema Characteristics

• Star schema:
– It is characterized by one or more large fact tables and a
number of much smaller dimension tables.
– Each dimension table is joined to the fact table by using a
primary key to foreign key join.
• Snowflake schema:
– Dimension data is grouped into multiple tables instead of one
large table.
– The number of dimension tables are increased, requiring
more foreign key joins.
• Third normal form (3NF) schema:
– It is a classical relational-database model that minimizes
data redundancy through normalization.

2 - 20
Star Schema Model: Central Fact Table
and Denormalized Dimension Tables
Product Dimension Table Store Dimension Table

Product_id Store_id
Product_disc District_id
... ...
Sales Fact Table

Product_id
Store_id
Item_id
Day_id
Sales_amount
Sales_units
...
Time Dimension Table
Item Dimension Table
Day_id
Month_id Item_id
Year_id Item_desc
... ...

2 - 21
Star Dimensional Schema: Advantages

• Supports multidimensional analysis


• Creates a design that improves performance
• Enables optimizers to yield better execution plans
• Parallels end-user perceptions
• Provides an extensible design
• Broadens the choices for data access tools

2 - 22
Snowflake Schema Model

Store Table
Product Table District Table
Store_id
Product_id District_id
Store_desc
Product_desc District_desc
District_id

Sales Fact Table

Item_id
Store_id
Product_id
Week_id
Sales_amount
Sales_units

Time Table Item Table Dept Table Mgr Table

Week_id Item_id Dept_id Dept_id


Period_id Item_desc Dept_desc Mgr_id
Year_id Dept_id Mgr_id Mgr_name

2 - 23
Snowflake Schema Model

• Can be used directly by some tools


• Is more flexible to change
• Provides for quick data loading
• Can become large and unmanageable
• Degrades query performance
• Has more complex metadata

Country State County City

2 - 24
Summary

In this lesson, you should have learned how to:


• Review the basic concepts of a data warehouse
• Review the data warehouse logical and physical designs
• Identify the data warehousing objects
• Review the available data warehousing schemas

2 - 25

You might also like