You are on page 1of 18

Module 1

Introduction to Data
Warehousing
Module Overview
Overview of Data Warehousing

Considerations for a Data Warehouse Solution


Lesson 1: Overview of Data Warehousing
The Business Problem

What Is a Data Warehouse?

Data Warehouse Architectures

Components of a Data Warehousing Solution

Data Warehousing Projects

Data Warehousing Project Roles

SQL Server As a Data Warehousing Platform


The Business Problem

Key business data is distributed across multiple systems

Finding the information required for business decision


making is time-consuming and error-prone
Fundamental business questions are hard to answer
What Is a Data Warehouse?

A centralized store of business data for reporting and analysis

Typically, a data warehouse:


Contains large volumes of historical data
Is optimized for querying data (as opposed to inserting or updating)
Is incrementally loaded with new business data at regular intervals
Provides the basis for enterprise business intelligence solutions
Data Warehouse Architectures

Centralized Data Warehouse

Hub and Spoke

Departmental Data Mart


Components of a Data Warehousing Solution

Reporting and Analysis

Data Cleansing
Data Sources

1011000110

ETL Staging Process ETL Load Process

Staging Database Data Warehouse

An ETL process extracts data from business


applications and other sources
Data is typically staged before being loaded into
the data warehouse
Data cleansing and deduplication ensures the
quality of data in the data warehouse
Master data management provides definitive data
Master Data Management for business entities
Data Warehousing Projects
1. Start by identifying the business questions that the data
warehousing solution must answer
2. Determine the data that is required to answer these
questions
3. Identify data sources for the required data
4. Assess the value of each question to key business
objectives versus the feasibility of answering it from the
available data

For large enterprise-level projects, an incremental


approach can be effective:
Break the project down into multiple sub-projects
Each sub-project deals with a particular subject area in the
data warehouse
Data Warehousing Project Roles

Project manager

Solution architect

Data modeler

Database administrator

Infrastructure specialist

ETL developer

Business users/analyst

Testers

Data stewards
SQL Server As a Data Warehousing Platform
Data Warehousing

Microsoft SQL Server Integration Services


Microsoft SQL Azure
and the Windows
Azure Marketplace

SQL Server Database Engine

1011000110

SQL Server Master Data SQL Server Data Quality


Services Services

SQL Server
Business Intelligence

Analysis Services

SQL Server
Reporting Services
Microsoft PowerPivot
Technologies

Microsoft Excel
Data Mining Add-In
PowerPivot Add-In
MDS Add-In

Power View Microsoft SharePoint


Server
Reports, KPIs, and Dashboards
Lesson 2: Considerations for a Data Warehouse Solution
Data Warehouse Database and Storage

Data Sources

Extract, Transform, and Load Processes

Data Quality and Master Data Management


Data Warehouse Database and Storage

Considerations for the data warehouse include:

Database schema Hardware


Logical: typically denormalized for Query processing and memory
optimal read performance Storage
Physical: often partitioned for Network
performance and management

High availability and disaster recovery Security


Hardware redundancy Server access
Backup strategy Data permissions
Data Sources

Data Source Connection Types


Credentials and Permissions
Data Formats
Data Acquisition Windows
Extract, Transform, and Load Processes

Staging:
What data must be staged?
Staging data format

Required transformations:
Transformations during extraction versus data flow
transformations

Incremental ETL:
Identifying data changes for extraction
Inserting or updating when loading
Data Quality and Master Data Management

Data quality:
Cleansing data:
Validating data values 1011000110

Ensuring data consistency


Identifying missing values
Deduplicating data

Master data management:


Ensuring consistent business entity definitions
across multiple systems
Applying business rules to ensure data validity
Lab Scenario

This lab is an overview lab, designed to show the data


warehousing solution that you will explore in greater depth in
the rest of this course
Adventure Works uses various software applications to manage
different aspects of the business, and each application has its
own data store. This distribution of data has made it difficult for
business users to answer key questions about the overall
performance of the business.
You must examine a data warehousing solution that extracts
data from multiple data sources within Adventure Works, and
loads it into a centralized data warehouse that business users
can query to perform analysis and create reports.
Lab 1: Exploring a Data Warehousing Solution
Exercise 1: Exploring Data Sources

Exercise 2: Exploring an ETL Process

Exercise 3: Exploring a Data Warehouse

Logon information
Virtual machine MIA-SQLBI
User name ADVENTUREWORKS\Student
Password Pa$$w0rd

Estimated time: 30 minutes


Module Review and Takeaways
Why might you consider including a staging area in your
ETL solution?
What options might you consider for performing data
transformations in an ETL solution?
Why would you assign the data steward role to a business
user rather than a database technology specialist?

You might also like