You are on page 1of 18

MODULE 1

INTRODUCTION TO DATA
WAREHOUSING
Module Overview
Overview of Data Warehousing
Considerations for a Data Warehouse Solution
Lesson 1: Overview of Data
Warehousing
The Business Problem
What Is a Data Warehouse?
Data Warehouse Architectures
Components of a Data Warehousing Solution
Data Warehousing Projects
Data Warehousing Project Roles
SQL Server As a Data Warehousing Platform
The Business Problem
?

Key business data is distributed across multiple systems


Finding the information required for business decision making is time-
consuming and error-prone
Fundamental business questions are hard to answer
What Is a Data Warehouse?

A centralized store of business data for reporting and analysis


Typically, a data warehouse:
◦ Contains large volumes of historical data
◦ Is optimized for querying data (as opposed to inserting or updating)
◦ Is incrementally loaded with new business data at regular intervals
◦ Provides the basis for enterprise business intelligence solutions
Data Warehouse Arquitectura

Centralized Data Warehouse

Hub and Spoke

Departmental Data Mart


Componentes de una
solucion DatawareHousing
Reporting and Analysis

Data Cleansing
Data Sources

1011000110

ETL Staging Process ETL Load Process

Staging Database Data Warehouse

• An ETL process extracts data from business applications and


other sources
• Data is typically staged before being loaded into the data
warehouse
• Data cleansing and deduplication ensures the quality of data in


the data warehouse
• Master data management provides definitive data for business
Master Data Management entities
Data Warehousing Projects
1. Identificar las preguntas de negocio que la soluciòn de
almacenamiento de datos debe responder
2. Determinar los datos que se require para responder a estas preguntas
3. Identificar las fuentes de datos para los datos requeridos
4. Evaluar el valor de cada pregunta a los objetivos clave del negocio
frente a la posibilidad de responder a ella a partir de los datos
disponibles
Data Warehousing Project
Roles
Gerente de Proyecto
Arquitecto de Soluciones
Modelador de Datos
Administrador de BD
Especialista en Infraestructura
Desarrollador ETL
Usuarios de Negocio y Analistas
Testers
SQL Server As a Data Warehousing
Platform
Data Warehousing

Microsoft SQL Server Integration Services


Microsoft SQL Azure
and the Windows
Azure Marketplace

SQL Server Database Engine

 1011000110

SQL Server Master Data SQL Server Data Quality


Services Services

SQL Server
Business Intelligence

Analysis Services

SQL Server
Reporting Services
Microsoft PowerPivot
Technologies

Microsoft Excel
• Data Mining Add-In
• PowerPivot Add-In
• MDS Add-In

Power View Microsoft SharePoint


Server
Reports, KPIs, and Dashboards
Warehouse Solution

Data Warehouse Database and Storage


Data Sources
Extract, Transform, and Load Processes
Data Quality and Master Data Management
Data Warehouse Database and Storage
Considerations for the data warehouse include:

Database schema Hardware


• Logical: typically denormalized for • Query processing and memory
optimal read performance • Storage
• Physical: often partitioned for • Network
performance and management

High availability and disaster recovery Security


• Hardware redundancy • Server access
• Backup strategy • Data permissions
Data Sources
◦ Data Source Connection Types
◦ Credentials and Permissions
◦ Data Formats
◦ Data Acquisition Windows
Extract, Transform, and Load
Processes
Staging:
◦ What data must be staged?
◦ Staging data format

Required transformations:
◦ Transformations during extraction versus data flow transformations

Incremental ETL:
◦ Identifying data changes for extraction
◦ Inserting or updating when loading
Data Quality and Master Data
Management
Data quality:
◦ Cleansing data:
◦ Validating data values
◦ Ensuring data consistency 1011000110
◦ Identifying missing values
◦ Deduplicating data

• Master data management:


 Ensuring consistent business entity definitions across
multiple systems 
 Applying business rules to ensure data validity
Lab Scenario
This lab is an overview lab, designed to show the data warehousing solution
that you will explore in greater depth in the rest of this course
Adventure Works uses various software applications to manage different
aspects of the business, and each application has its own data store. This
distribution of data has made it difficult for business users to answer key
questions about the overall performance of the business.
You must examine a data warehousing solution that extracts data from multiple
data sources within Adventure Works, and loads it into a centralized data
warehouse that business users can query to perform analysis and create
reports.
Lab 1: Exploring a Data
Warehousing Solution
Exercise 1: Exploring Data Sources
Exercise 2: Exploring an ETL Process
Exercise 3: Exploring a Data Warehouse

Logon information
Virtual machine MIA-SQLBI
User name ADVENTUREWORKS\Student
Password Pa$$w0rd

Estimated time: 30 minutes


Module Review and
Takeaways
Why might you consider including a staging area in your ETL solution?
What options might you consider for performing data transformations
in an ETL solution?
Why would you assign the data steward role to a business user rather
than a database technology specialist?

You might also like