Building a

Data
Warehouse
using
SQL Server
2008
Presented by Wes Dumey
Orlando SQL Saturday
October 16, 2010
First Things First
• Networking is key at these events, please
take a minute and introduce yourself to
the person to the left and right of you
Let’s Talk Trash….
• We’ll discuss data warehousing with a
view of how a trash company like Waste
Management could build a data
warehouse
• All photographs and logos are property of
Waste Management, Inc.
Fun Facts about Trash
• Municipal solid waste (a.k.a. trash) is
generated at a rate of 250 million tons of
trash per year (in the USA)
• Each person produces an average of 4.5
lbs of trash per day
• The nationwide recycling rate in 2008 was
33.2%

*source www.epa.gov
About the Presenter
• Senior Consultant, Durable Impact
Consulting, a Florida-based data
warehouse consulting practice
• 10+ years experience developing business
intelligence solutions
• Personal Interests: Economics and
Aviation
Agenda
• Overview of Data Warehouse principles
• Data Modeling and Data Warehouse
Architecting exercises
• SSIS Example
• Question/Answer Session
Let’s Get Started
• Our client today is Waste Management,
Inc.
• Our project is to develop a business
intelligence solution covering residential
and commercial service routes

Problem Definition
• We need to solve the following business
problems:
– Business has no long term trend picture of
commissioned employee performance
– Business has no ability to verify whether sales
contracts are profitable
– Business would like to be able to conduct
elasticity modeling on pricing
Steps to Complete Project
• Determine metrics to be captured
• Analyze source systems
• Develop data model
• Architect ETL solution
• Design and develop reporting/analysis
solution
Project Overview
• Overview of a data warehouse:
– A centralized database system optimized for
analysis that contains information from one or
more source systems
– ETL (extract, transform, and load) jobs are
created to load the data warehouse
– A reporting package typically sits on top of
the data warehouse to provide end user
analysis
Data Modeling Primer
• A data model is a logical and physical
representation of the star (or snowflake)
schemas used for the relational model
• Three schematic table types:
– Dimension: descriptions and attributes
– Facts: measures and quantities
– Aggregates: pre-computed answers (rolled up
facts)
Exercise: Can you think of some dimensions, facts,
and aggregates used for this example?
Data Model
• Dimensions: Date, Customer, Employee, Route, Vehicle,
Rate
• Facts: Sales activity, haul activity
• Aggregates: Sales amount by employee, hauls by vehicle


How Facts and Dimensions are joined
• By use of a surrogate key (generally meaningless number)
– Each dimension has a surrogate key as the primary identifier
– Natural keys in the data are used to find the surrogate keys
which are then passed into the fact tables
– This design allows for high performance
• Aggregates are joined to facts through the common keys
Data Warehouse Dimensions
• EDW_DATE_DIM (date_key, date attibutes, …)
• EDW_CUSTOMER_DIM (customer_key,
customer name, customer address, …)
• EDW_EMPLOYEE_DIM (employee_key,
employee id, employee name, …)
• EDW_ROUTE_DIM (route_key, route id, route
name, city, state, region, …)
• EDW_VEHICLE_DIM (vehicle_key, vehicle id,
vehicle type, make, model, year, acquire date,
disposal date, …)
• EDW_RATE_DIM (rate_key, rate id, rate type,
begin date, end date, current ind, …) SCD
Facts and Aggregates
• EDW_SALES_ACT_FACT (account_key,
customer_key, employee_key, date_key,
sales_amount, ….)
• EDW_HAUL_ACT_FACT (account_key,
customer_key, date_key, vehicle_key, haul
volume, …)
• EDW_DAILY_SALES_AGG (account_key,
customer_key, employee_key, date_key,
sales_amount, …) key determinant here is granularity
ETL Solution
• ETL = Extract, transform, and load
• Typically performed using ETL tools such
as SQL Server 2008
• Designed to read data from the source
system and load it into the star schema
• Typically scheduled on a repeating basis
to keep data current
• Can be simple or very complex
Data Architecture Considerations
• To stage or not to stage (creating a staging
area, a temporary place for source data)
• Data volumes will depend on how we
build our jobs
• Designed for ease of support and
maintenance
Auditing
• Use batch audit tables to keep track of
what is running
• Track insert/update metrics
• Always know what is going on in your
warehouse (and maybe trash, too)
Reporting Solution
• Create reports using SQL Server Reporting
Services





• Introduction to SSIS

Question/Answer Session

Additional Resources
• Durable Impact white papers
www.durableimpact.com
• Microsoft blogs
• Some books of interest: