You are on page 1of 37

Microsoft Official Course

Module 3

Designing a Data Warehouse


Module Overview

Data Warehouse Design Overview


Designing Dimension Tables
Designing Fact Tables
• Designing a Data Warehouse Physical
Implementation
Lesson 1: Data Warehouse Design Overview

Data Warehouse Design in a BI Project


The Dimensional Model
The Data Warehouse Design Process
Dimensional Modeling
• Documenting Dimensional Models
Data Warehouse Design in a BI Project

Business Requirements

Technical
Data
Architecture Reporting and
Warehouse
and Analysis
and ETL
Infrastructure Design
Design
Design

Monitoring and Optimizing

Operations and Maintenance


The Dimensional Model

Dimension
Attributes

Dimension Dimension
Attributes Attributes

Fact
Measures

Star schema

Dimension
Attributes Dimension
Attributes
Snowflake schema Dimension
Attributes
The Data Warehouse Design Process

1. Determine analytical and reporting requirements


2. Identify the business processes that generate the
required data
3. Examine the source data for those business
processes
4. Conform dimensions across business processes
5. Prioritize processes and create a dimensional
model for each
6. Document and refine the models to determine the
database logical schema
7. Design the physical data structures for the
database
Dimensional Modeling
Conformed Dimensions

• Grain: 1 row per order item


• Dimensions: Time (order date and ship date), Product, Customer, Salesperson
• Facts: Item Quantity, Unit Cost, Total Cost, Unit Price, Sales Amount, Shipping Cost
Documenting Dimensional Models

Time
(Order Date Salesperson
and Ship Date)
Calendar Year Region
Month Country
Date Territory
Fiscal Year Manager
Fiscal Quarter
Sales Order Name
Month Name
Date Item Quantity
Unit Cost
Total Cost
Unit Price
Sales Amount Country
Category Shipping Cost State or Province
Subcategory City
Product Name Age
Color Marital Status
Size Gender

Product Customer
Lesson 2: Designing Dimension Tables

Considerations for Dimension Keys


Dimension Attributes and Hierarchies
Unknown and None
Designing Slowly Changing Dimensions
Time Dimension Tables
Self-Referencing Dimension Tables
• Junk Dimensions
Considerations for Dimension Keys

Surrogate Key Business (Alternate) Key


Dimension Attributes and Hierarchies

Hierarchy

Drill-through detail Slicer


Unknown and None

• Identify the semantic meaning of NULL


• Unknown or None?

• Do not assume NULL equality


• Use ISNULL( )

Dimension Table

Source
Designing Slowly Changing Dimensions

Type 1

Type 2

Type 3
Time Dimension Tables

• Surrogate key
• Granularity
• Range
• Attributes and hierarchies
• Multiple calendars
• Unknown values
Self-Referencing Dimension Tables

• Kim Abercrombie
• Kamil Amireh
• Jeff Hay
• Cesar Garcia
Junk Dimensions

• Combine low-cardinality attributes that don’t


belong in existing dimensions into a junk
dimensions
• Avoids creating many small dimension tables
Lesson 3: Designing Fact Tables

Fact Table Columns


Types of Measure
• Types of Fact Table
Fact Table Columns

• Dimension Keys

• Measures

• Degenerate Dimensions
Types of Measure

• Additive

• Semi-Additive

• Non-Additive
Types of Fact Table

• Transaction Fact Tables

• Periodic Snapshot Fact Tables

• Accumulating Snapshot Fact Tables


Lab A: Designing a Data Warehouse Logical Schema

Exercise 1: Identifying Business Processes and


Dimensions
• Exercise 2: Designing Dimension Models and Data
Warehouse Tables

Logon Information
Start 20467B-MIA-DC and 20467B-MIA-SQLBI, and then log onto
20467B-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.

Estimated Time:90 Minutes


Lab Scenario

You are designing a BI solution for Adventure


Works Cycles, and have conducted interviews to
gather information about current business
processes and identify analytical and reporting
requirements.
Now you must create dimensional models for the
business processes and design a data warehouse
database schema to support the requirements.
Lab Review

Use Excel to open Matrix.xlsx in the D:\Labfiles\Lab03A\


Solution folder and compare it to the matrix your group
created during the lab. What are the significant differences
between your solution and the suggested solution, and how
would you justify your choices in the lab?
Use Visio to open Initial Sun Diagram.vsdx in the D:\Labfiles\
Lab03A\Solution folder. How do the dimensional models in
this document compare to your solution?
• Use Visio to open DW Schema.vsdx in the D:\Labfiles\
Lab03A\Solution folder. How does the database schema
design in this document compare to your solution?
Lesson 4: Designing a Data Warehouse Physical
Implementation

Data Warehouses I/O Activity


Consideration for Database Files
Table Partitioning
Demonstration: Partitioning a Fact Table
Considerations for Indexes
Demonstration: Creating Indexes
Data Compression
Demonstration: Implementing Data Compression
• Using Views to Abstract Base Tables
Data Warehouses I/O Activity

Data Model Processing


• Mostly table/index
ETL Loads scans
• Bulk inserts
• Some lookups and
updates Report Processing
• Predictable queries
• Many rows with range-
based query filters Data Models

ETL Reports

• Large fact Self-Service BI


tables • Potentially
• Star joins to unpredictable
dimension queries
tables
User Queries
Consideration for Database Files

• Data files and filegroups


• Staging tables
• TempDB
• Transaction logs
• Backup files
Table Partitioning

Jan Feb
Pre-2010 2010 2011
2012 2012
Demonstration: Partitioning a Fact Table

In this demonstration, you will see how to:


• Create a partitioned table
• View partition metadata
• Split a partition
• Merge partitions
Considerations for Indexes

• Dimension table indexes


• Clustered index on surrogate key column
• Nonclustered index on business key and SCD columns
• Nonclustered indexes on frequently searched columns

• Fact table indexes


• Clustered index on most commonly searched date key
• Nonclustered indexes on other dimension keys
Or
• Columnstore index on all columns
Demonstration: Creating Indexes

In this demonstration, you will see how to:


• Create indexes on dimension tables
• View index usage and execution statistics
• Create indexes on a fact table
• Create a columnstore index
Data Compression

• Apply page compression on all dimension tables,


indexes, and fact table partitions
• If performance becomes CPU-bound, fall back to
row compression on the most queried partitions
Demonstration: Implementing Data Compression

In this demonstration, you will see how to:


• Create uncompressed tables and indexes
• Estimate compression savings
• Create compressed tables and indexes
• Compare query performance
Using Views to Abstract Base Tables

CREATE VIEW dw_views.SalesOrder


WITH SCHEMABINDING
AS
SELECT [OrderDateKey]
,[ProductKey]
,[ShipDateKey]
,[CustomerKey]
,[OrderNumber]
,[OrderQuantity]
,[UnitPrice]
,[SalesAmount]
FROM [dbo].[FactSalesOrder]
WITH (NOLOCK)
Lab B: Designing a Data Warehouse Physical
Implementation

Exercise 1: Designing File Storage


• Exercise 2: Designing Warehouse Data Structures

Logon Information
Start 20467B-MIA-DC and 20467B-MIA-SQLBI, and then log on to
20467B-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.

Estimated Time:90 Minutes


Lab Scenario

You have designed a logical schema for a data


warehouse, and now you must design the physical
implementation of the database.
You have been provided with a database server to
host the data warehouse and related files.
Lab Review

• After spending some time reviewing the solution, what are


the key aspects of the implementation that differ from
your design in the lab, and how else might you have
designed the solution?
Module Review and Takeaways

• When designing a data warehouse, is it better or


worse to have a strong background in
transactional database design?

You might also like