You are on page 1of 36

Page 1 of 36

Module 1: Introduction to Data Warehousing

Contents:
Th
is d
oc
um
en
tb
lon e
a Modulegs overview
sa
No d_ to
un 20 Mu
au 09 ha
t ho @
Lesson 1: rOverview
ize live ofmData ma Warehousing
dc . co dS
op m a
ies ad
allo .
Lesson 2: Considerations we for a Data Warehouse Solution
d!

Lab: Exploring a Data Warehousing Solution

Module review and takeaways

Th
is d
oc
um
en
Module
N
sa overview
a
tb
elo
ng
s
ou d_ to
na Mu 20
ut h ha 09
@ or i mm
live ze
dc . co ad
m Sa
Data warehousing op is a solution
ies that
ad
.
organizations can use to centralize business data
allo
for reporting and analysis. we Implementing a data warehouse solution can provide a
d!
business or other organization with significant benefits, including:

• Comprehensive and accurate reporting of key business information.

• A Tcentralized
his source of business data for analysis and decision-making.
do
cu
me
tb n
• The foundation
s
elo for an enterprise business intelligence (BI) solution.
n
No aa gs
to d_
un Mu 20
au ha 09
@t ho mm
riz live
ed . co ad
co m Sa
pie ad
This module provides s a an introduction
llow
. to the key components of a data warehousing
ed
!
solution and the high level considerations you must take into account when you
embark on a data warehousing project.

about:blank 3/20/2019
Page 2 of 36

Objectives

After completing this module, you will be able to:

• Describe the key elements of a data warehousing solution.

• Describe
Th
is the key considerations for a data warehousing project.
do
cu
me
n tb
elo
sa ng
No ad st
un oM _2
00
Lesson 1: Overview of Data Warehousing
riz
ed
au
9@ t ho
live
uh
am
ma
. co dS
co m aa
pie d.
sa
llow
ed
Data warehousing is a well-established! technique for centralizing business data for
reporting and analysis. Although the specific details of individual solutions can vary,
there are some common elements in most data warehousing implementations.
Familiarity with these elements will help you to better plan and build an effective data
warehousing
Th solution.
is d
oc
um
en
tb
Lesson
No
saobjectives
ad
elo
ng
s
un to _2
au Mu 00
t ho 9 @ ha
After completing
r ize thiselesson,
liv mm you will be able to:
dc . co ad
op m Sa
ies ad
allo .
we
• Describe the businessdproblem ! addressed by data warehouses.

• Define a data warehouse.

• Describe the commonly-used data warehouse architectures.


Th
sd i
• Identify
oc the components of a data warehousing solution.
u me
n tb
on el
sa a high
• Describe gslevel approach to implementing a data warehousing project.
No ad t
un oM _2
au uh00
9@ t ho am
riz live ma
• Identify the eroles
dc involved
. c om din
Sa
a data warehousing project.
op ad
ies .
allo
we
• Describe the components d! and features of Microsoft® SQL Server® and other
Microsoft products that you can use in a data warehousing solution.

about:blank 3/20/2019
Page 3 of 36

The Business Problem

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
en
tb
Running a sbusiness
elo
n
effectively can present a significant challenge, particularly as it
No aa gs
d_ to
grows ornaisu affected
u 20
09 byMutrends
ha
in its target market or the global economy. To be
t ho @ mm
riz live
successful, a businessed
co
.co mustadbe able to adapt to changing conditions. That requires
m Sa
pie ad
s
individuals within the aorganization . to make good strategic and tactical decisions.
llow
ed
!
However, the following problems can often make effective business decision-making
difficult:

• Key business data is distributed across multiple systems. This makes it hard to
collate
Th
is all the information necessary for a particular business decision.
do
cu
me
n
• Finding thet binformation
elo required for business decision-making is time-consuming
sa ng
No ad st
and error-prone. _
09 The
u na 20 o M need to gather and reconcile data from multiple sources
ut h uh
or i @ am
live madecision-making processes that can be further
results in slow, z ed inefficient. co dS
co m aa
pie d.
undermined by inconsistencies s allo between duplicate, contradictory sources of the
we
d!
same information.

about:blank 3/20/2019
Page 4 of 36

Most business decisions require the answer to fundamental questions, such as


“How many customers do we have?” or “Which products do we sell most often?”
Although these questions may seem simple, the distribution of data throughout
multiple systems in a typical organization can make them difficult, or even
impossible, to answer.
Th
is d
oc
um
en
be t
By resolving
sa theselon problems, you can make effective decisions that will help the
g
No ad st
un 20 Mu _ o
business ato
uth be more
09
@
successful,
ha both at a strategic, executive level, and during day-
or i live mm
ed z . co ad
to-day operations.
co m Sa
pie ad
sa .
llow
ed
!
What Is a Data Warehouse?

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un Mu _2 o
au 00 provides
A data warehouse a solution to the problem of distributed data that prevents
th 9@ ha
or i live mm
ed . co z ad
effective businessco decision-making.
p m Sa There are many definitions for the term “data
ies ad
allo .
warehouse,” and disagreements
we over specific implementation details. It is generally
d!
agreed, however, that a data warehouse is a centralized store of business data that
can be used for reporting and analysis to inform key business decisions.

about:blank 3/20/2019
Page 5 of 36

Typically, a data warehouse:

• Contains a large volume of data that relates to historical business transactions.

• Is optimized for read operations that support querying the data. This is in contrast
to a typical Online Transaction Processing (OLTP) database that is designed to
Th
sd i
support
oc data insert, in addition to update and delete operations.
u me
n tb
el
sa withonnew
• Is Nloaded gs or updated data at regular intervals.
o ad t
un _ 20 oM
au 09 uh
t ho @ am
riz live ma
• Provides the ed basis.cfor
co om enterprise
dS BI applications.
pie aa
sa d.
llow
ed
!

Data Warehouse Architectures

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un oM _2
au uh 00
9@ t ho am
riz live ma
There are many ed ways.cthat
co om you d S can implement a data warehouse solution in an
pie aa
d.
organization. Some scommon allo approaches include:
we
d!

• Creating a single, central enterprise data warehouse for all business units.

about:blank 3/20/2019
Page 6 of 36

• Creating small, departmental data marts for individual business units.

• Creating a hub-and-spoke architecture that synchronizes a central enterprise data


warehouse with departmental data marts containing a subset of the data
warehouse data.

Th
is d
oc
um
en
The correct architecture
tb
e
for a given business might be one of these, or a combination
sa lon
No ad g
s tfrom all three approaches.
of various
un elements
_2
0
oM
au 09 uh
t ho @ am
riz live ma
ed . co dS
co m aa
pie
Components of a Data Warehousing Solution sa
llow
d.
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
nt e
A data warehousing
be
l
solution usually consists of the following elements:
sa on
No ad gs
un to _2
au Mu 00
9@
t ho ha
riz live mm
• Data Sources.ed Sources
. co of
ad business data for the data warehouse, often including
co m Sa
pie ad
OLTP applications databases
allo and. data exported from proprietary systems, such as
we
accounting applications. d!

about:blank 3/20/2019
Page 7 of 36

An Extract, Transform, and Load (ETL) process. A workflow for accessing data
in the data sources, modifying it to conform to the data model for the data
warehouse, and loading it into the data warehouse.

• Data Staging Areas. Intermediary locations where the data to be transferred to


the data warehouse is stored. It is prepared here for import and to synchronize
Th
is d
loading oc into the data warehouse.
um
en
tb
lon e
• Data
No Warehouse.
sa
ad gs A relational database designed to provide high-performance
un _2 to
au 09 0 M
uhbusiness data for reporting and analysis.
querying thoof historical
@
li am
riz ve ma
ed . co d
co m Sa
es d pi a
allo upon the. data within the data warehouse, data models will be
• Data Models. Based
we
! d
used by a business analyst to produce analytics and predictions for the business.

Many data warehousing solutions also include:

• Data Cleansing and Deduplication. A solution for resolving data quality issues
before
Th
is it is loaded into the data warehouse.
do
cu
me
n
t b Management (MDM). A solution that provides an authoritative data
• Master Data elo
sa ng
Nou d_ to a s
definition
na for20business
0 Mu entities used by multiple systems across the organization.
ut h 9@ ha
or i live mm
ze . co ad
dc m Sa
op ad
ies .
allo
we
Data Warehousing Projects d!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

about:blank 3/20/2019
Page 8 of 36

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

A data
Th warehousing project has much in common with any other IT implementation,
is do
c
so you canumapply
en most commonly-used methodologies, such as Agile, Waterfall, or
tb
elo
Microsoft
No
s
Solutions
a ad ng Framework (MSF). However, a data warehousing project often
st
un _2 oM
au 00 uh
requires a deeper
t ho
riz
9@understanding
live am of the key business objectives and metrics used to
ed ma
. co d
drive decision-making. co m Sa
pie ad
sa .
llow
ed
!
A high level approach to implementing a data warehousing project usually includes
the following steps:

1. Work with business stakeholders and information workers to determine the


Th
is d
questions
oc the data warehouse must answer. They might include:
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie
o What was s athe total sales d. revenue for each geographic territory in a given
llow
ed
month? !

o What are our most profitable products or services?

about:blank 3/20/2019
Page 9 of 36

o Are business costs growing or reducing over time?

o Which sales employees are meeting their sales targets?

2. Determine the data required to answer these questions. It is normal to think of


Th
is d
this ocdata in terms of “facts” and “dimensions.” Facts contain the numerical
um
e
nt required to aggregate data, so you can answer the business
measuresbe
l
sa on
No ad st g
questions
un _2 identified
0
o M in step 1. For example, to determine sales revenue, you
au 09 uh
or ith @mm a
might need livesales
ze the amount for each individual transaction. Dimensions
d .c ad
om
co Sa
represent thepiedifferent
sa
ad
aspects
. of the business by which you want to aggregate
llow
the measures. For eexample,
d! to determine sales revenue for each territory in a
given month, you might need a geographic dimension, so you can aggregate
sales by territory, and a time dimension where you can aggregate sales by
month. Fact and dimensional modeling is covered in more detail in Module 3:
TDesigning
his
and Implementing a Data Warehouse.
do
cu
me
3. Identify ntdata sources containing the data required to answer the business
be
sa lon
Nquestions. a d_ These gs
ou 20 to are commonly relational databases used by existing line-of-
na 0 Mu
ut h 9@ ha
business ori applications,
ze liv e
mm but they can also include:
ad
dc . co
op m Sa
ies ad
allo .
we
d!

o Flat files or XML documents that have been extracted from proprietary
systems.

Th
ois d Data in Microsoft SharePoint® lists.
oc
um
en
tb
o elo
sCommercially
a ng available data that has been purchased from a data supplier
No ad st
un _2 oM
au such00as 9@ the uMicrosoft
ha Windows Azure® Marketplace.
t ho mm
riz live
ed . co ad
co m Sa
pie ad
sa .
llow
ed
!
4. Determine the priority of each business question based on:

about:blank 3/20/2019
Page 10 of 36

o The importance of answering the question in relation to driving key


business objectives.

o The feasibility of answering the question from the available data.

Th
is d
A common oc approach to prioritizing the business questions you will address in the data
um
e
warehousing nsolution
tb
el is to work with key business stakeholders and plot each
sa on
No ad st g
question
un on a_quadrant-based
20 oM matrix like the one shown below. The position of the
au 09 uh
th mm @ a
questions inotheed
live helps
riz matrix
.c adyou to agree the scope and clarify the requirements of
co om Sa
pie ad
the data warehousings a project. .
llow
ed
!

Business High importance, low feasibility High importance, high feasibility


importance
of the Low importance, low feasibility Low importance, high feasibility

question
Feasibility of answering the question
Th
is d
o cu
me
nt
If a large numberbe of questions fall into the high importance, high feasibility category,
l
sa on
No gs
ad
you mayunawant 2to _
00 consider
to
Mu breaking down the project into subprojects. Each
ut h 9@ ha
or i live problemm
subproject tackles ze
dc
the . co
ma
dS
of implementing the data warehouse schema, ETL
op m a
ies ad
solution, and data quality allo procedures . for a specific area of the business, starting with
we
the highest priority questions. d! If you choose to adopt this incremental approach, take
care to create an overall design for dimension and fact tables in early iterations of the
solution, so that subsequent additions can reuse them.

Data
Th Warehousing Project Roles
is do
cu
me
n tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

about:blank 3/20/2019
Page 11 of 36

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

A data
Th warehousing project typically involves several roles, including:
is do
cu
me
tb n
elo
• A project s ng
No a admanager. st Coordinates project tasks and schedules, responsible for
un _2 oM
au 00 uh
ensuringthothat 9@ project
the am is completed on time and within budget.
riz live ma
ed . co dS
co m aa
pie d.
sa
• A solution architect. llow Has overall responsibility for the technical design of the data
ed
!
warehousing solution.

• A data modeler. Designs the data warehouse schema.

• A database administrator. Designs the physical architecture and configuration of


Th
the isdata
d
warehouse database. In addition, database administrators with
oc
me u
responsibility
nt for data sources used in the data warehousing solution must be
be
sa lo
involved
No ad thenproject
in gs
to to provide access to the data sources required by the ETL
un_ 20
Mu
ut h a ha09
process. or i @
live mm
ze . c ad
dc o m Sa
op ad
ies .
• An infrastructure specialist.
a llow Implements the server and network infrastructure for
ed
!
the data warehousing solution.

• An ETL developer. Builds the ETL workflow for the data warehousing solution.

about:blank 3/20/2019
Page 12 of 36

• Business users. Provide requirements and help to prioritize the questions that the
data warehousing solution will answer. Often, the team includes a business
analyst as a full-time member, helping to interpret the questions and ensuring that
the solution design meets the users’ needs. They will also be the end users of the
finished solution.
Th
is d
• Testers.oc Verify the business and operational functionality of the solution as it is
um
e
developed.nt bel
sa on
No ad gs
un _2 to
au 00 M
• Data stewards.9@ Forueach
m key subject area in the data warehousing solution.
t ho ha
li riz ve ma
om rules ed .c d
Determine data
co quality
pie
Sa and validate data before it enters the data warehouse.
ad
sa .
Data stewards are lsometimes
low
ed referred to as data governors.
!

In addition to ensuring the appropriate assignment of these roles, you should consider
the importance of executive-level sponsorship of the data warehousing project. It is
T
his
significantly
do more likely to succeed if a high-profile executive sponsor is seen to
cu
m
actively support
en
t b the creation of the data warehousing solution and its associated
elo
sa ng
costs.No un
ad
_2 oM
st
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m
SQL Server As a Data Warehousing Platform
pie
sa
llow
aa
d.
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

about:blank 3/20/2019
Page 13 of 36

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

SQL TServer
his includes components and features that you can use to implement various
do
um c
architecturaleelements
n of a data warehousing solution, including:
tb
elo
sa ng
No ad st
un _2 oM
auuh 00
• Database 9@
tho Engine.
r liv Theam database engine component includes many features that
m ize e. c
om d ad
co
are directly relevant Sa warehouse project, the main feature being a robust
to a data
pie ad
sa .
l
relational database.lowOther
ed features include:
!

o In-Memory OLTP and in-memory optimizations, which enable businesses to


have a high-performance, real-time transactional database, while also layering
column-based query processing on top to achieve up to 10 times the query
Thperformance of traditional row-based storage.
is d
oc
um
en
o Temporal t b Tables that provide businesses with time context to data being
elo
sa ng
No ad st
captured
u na
_ 20by maintaining
09
oM
uh
a full history of all data changes.
ut h @ am
or i live ma
ze . co
dc m dS
• SQL Server Integration o pie Services
aa
d. (SSIS). A component for building data
sa
llow
integration and transformation ed
!
solutions, commonly referred to as Extract,
Transform, and Load (ETL). Improvements in SQL Server include:

o The ability to connect to Hadoop (Big Data) and Teradata sources.

about:blank 3/20/2019
Page 14 of 36

o Incremental package deployments, which enable individual packages to be


upgraded without having to deploy the project as a whole.

• Data Quality Services (DQS). Consisting of Data Quality Server and Data Quality
Client, businesses can use DQS to build a knowledge base around their data. You
can
T then standardize, deduplicate, and enrich this data.
his
do
cu
en m
• Master Datat b Services (MDS). A component that gives businesses the tools to
e
s lon
No aa gs
discover
u and
d_ define
2 to nontransactional lists of data. The goal is to compile a
M
na 00 uh
ut h
9@ am
maintainable
o riz master
ed
live
. co
list mof
ad
business rules and data types.
co m Sa
pie ad
sa .
• SQL Server Analysis llow Services (SSAS). An online analytical
ed
data engine that
!
provides analytical data upon which business reports can be built (see the
following description of Reporting Services). SSAS can create two different kinds
of semantic models—Tabular (relational modeling) or Multidimensional (OLAP
cubes).
Th
is d
oc
• SQL Server Reporting Services (SSRS). The main goal of developing a data
um
en
be t
warehouse
sa isloto
ng provide answers to business critical questions. These answers
No ad st
un 20 _ o
Mu of reports. SSRS is a server-based reporting platform that
normally
au take
t
09the form
@ ha
ho liv mm
ri
includes a zcomprehensive
ed
co
e. c
om ad set of tools to create, manage, and deliver business
Sa
pie ad
sa .
reports. llow
ed
!

For more information about the other components as they relate more closely to BI,
you should attend the following courses—20766A: Developing SQL Data models and
10988A:
Th
is Managing SQL Business Intelligence Operations.
do
cu
me
n tb
elo
sa ng
No ad st
Check nYour
u
au
_ 20Knowledge
09
oM
uh
t ho @ am
riz live ma
ed . co dS
co m aa
pie
Sequencing Activity sa
llow
d.
ed
Put the following steps in a! data warehouse project in order by numbering each to
indicate the correct order.

about:blank 3/20/2019
Page 15 of 36

Identify the people or groups who


require access to the produced
data.

Identify where data can be


sourced.
T his
do
cu
me
n tb
elo
Work
N
with
sa business
a ng stakeholders
o d_ st
un 20 Mu o
to identify
au
t
key 09questions.
@ h
ho live am
riz ma
ed . co d
co m Sa
pie ad
sa .
Prioritize questionsllthat
ow need
ed
!
answers.

Identify the data that might be


able to answer the identified
questions.
Th
is do
cu
me
n tb
elo
sa ng
Check
No answer
ad
_ sShow
to solution Reset
un 20 Mu
au 09 ha
t ho @ mm
riz live
ed . co ad
co m Sa
pie
Lesson 2: Considerations for a Data Warehouse sa
llow
ad
.
ed
Solution !

Before starting a data warehousing project, you need to understand the


considerations that will help you create a data warehousing solution to address your
Th
sd i
businesses-specific
oc
u
needs and constraints.
me
n tb
elo
sa ng
No ad st
This lesson _
u na describes 20
09
o Msome of the
uh
key considerations for planning a data
ut h @ am
or i live ma
warehousing solution.z ed . co dS
co m aa
pie d.
sa
llow
ed
Lesson objectives !

After completing this lesson, you will be able to describe considerations for:

about:blank 3/20/2019
Page 16 of 36

• Designing a data warehouse database.

• Columnstore indexes.

• Data sources.

• Designing
T
an ETL process.
his
do
cu
m
• Implementing
en
t b data quality and master data management.
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m
Data Warehouse Database and Storage pie
sa
aa
d.
llow
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
me u
A data warehouse
nt
b
is a relational database that is optimized for reading data for
elo
ng s
aa reporting.
analysis
No and
u
d_ st
o When you are planning a data warehouse, you should take
na 20 Mu
u9@ ha 0
tho considerations
the following mm into account:
riz live
ed . co ad
co m Sa
pie ad
sa .
llow
Database Schema ed
!

about:blank 3/20/2019
Page 17 of 36

The logical schema of a data warehouse is typically designed to denormalize data


into a structure that minimizes the number of join operations required in the queries
used to retrieve and aggregate data. A common approach is to design a star schema
in which numerical measures are stored in fact tables. These have foreign keys to
multiple dimension tables containing the business entities by which the measures can
be aggregated.
Th
is d
oc
um
en
tb
elo
Before
N
designing
sa
a nyour
gs data warehouse, you must know:
ou d_ to
na 20 Mu
ut h 09 ha
or i @ mm
ze live
dc . co ad
• Which dimensions op your
m business
Sa
ad users employ when aggregating data.
ies .
allo
we
d!
• Which measures need to be analyzed and at what granularity.

• Which facts include those measures?

You must also carefully plan the keys that will be used to link facts to dimensions,
and
Th consider whether your data warehouse needs to support the use of
i sd
cu o
dimensions
me that change over time. For example, handling dimension records for
nt
elo b
customers
N
sa who
a ngchange their address.
s
ou d_ to
na 20 Mu
ut h 09ha
o @ mm
You must also l
consider the
ad physical implementation of the database, because this
r ize iv e
dc . co
op m Sa
ies ad
will affect the performance allo and . manageability of the data warehouse. It is common
we
to use table partitioning d! to distribute large fact data across multiple file groups,

each on a different physical disk. This can increase query performance and makes
it easy for you to implement a file group-based backup strategy that can help
reduce downtime in the event of a single disk failure. You should also consider the
appropriate
T indexing strategy for your data, and whether to use data compression
his
whendostoring
cu
m it.
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t 9@
Note:
h ori Designing
ze live ama data warehouse schema is covered in more
ma detail in
d . c om dS
Module c3: op Designing
ies
and
aa Implementing a Data Warehouse.
d.
allo
we
d!

Hardware

about:blank 3/20/2019
Page 18 of 36

The choice of hardware for your data warehouse solution can make a significant
difference to the performance, manageability, and cost of the data warehouse. The
hardware considerations include:

• Query processing requirements, including anticipated peak memory and CPU


utilization.
Th
is d
oc
• Storageumvolume
en and disk input/output requirements.
tb
elo
sa ng
No to ad s
• Network _2
au connectivityMu and bandwidth.
un 00
9 t ho @ ha
riz live mm
ed . co ad
co m Sa
• Component redundancy
pie
sa
for high
ad
.
availability.
llow
ed
!
• In-memory requirements.

You can build your own data warehouse solution by purchasing and assembling
individual components, using a pretested reference architecture, or purchasing a
hardware appliance that includes preconfigured components in a ready-to-use
Th
do is
package.
cu Factors that will influence your choice of hardware include:
me
n tb
elo
ng sa
• How
No much
ad budget
_ s t is available?
o
un Mu 20
au ha 09
@ t ho mm
live riz
• e
Existing enterprise
dc ad
.cagreements
om with hardware vendors.
op Sa
ies ad
allo .
we
• Time and expertise to dconstruct ! the solution.

• Hardware assembly and configuration expertise.

Th Note: Hardware for data warehousing solutions is discussed in more detail


is d
oin
cu Module 2: Planning Data Warehouse Infrastructure.
me
n tb
elo
sa ng
No ad st
un oM_2
au uh 00
t 9@
High Availability
h or i
ze liveandamDisaster
ma Recovery
dc . co d
op m Sa
ies ad
.
A data warehouse lcan a low very quickly become a business-critical part of your overall
ed
!
application infrastructure, so it is essential to consider how you will ensure its
availability. SQL Server includes support for several high-availability techniques,

about:blank 3/20/2019
Page 19 of 36

including database mirroring and server clustering. You must assess these
technologies and choose the best one for your individual solution, based on:

• Failover time requirements.

• Hardware requirements and cost.


Th
do is
• Configuration
cu
m
and management complexity.
en
tb
elo
In Nadditions a to angserver-level, high-availability solution, you must consider
ou a d st
_2 oM
na 00 u
ut h 9 the individual
ha
redundancy ori at @ live mm component level for network interfaces and storage
ze . co ad
dc m Sa
arrays. op
ies ad
allo .
we
d!
The most robust high-availability solution cannot protect your data warehouse from
every eventuality, so you must also plan a suitable disaster recovery solution that
includes a comprehensive backup strategy. This should take into account:

• The volume of data in the data warehouse.


Th
is d
oc
um
• The frequency
en
t
of changes to data in the data warehouse.
be
sa lon
No ad gs
un oM _2 t
• The effect
au of00the
9
backup
uh process on data warehouse performance.
t ho @ am
riz live ma
ed . co dS
co m aa
• The time to recover
pie
sa the database d. in the event of a failure.
llow
ed
!
Security

Your data warehouse contains a huge volume of data that is typically commercially
sensitive. You may want to provide access to some data by all users, but restrict
access to other data for a subset of users.
Th
is d
oc
um
Considerations
en
t
for securing your data warehouse include:
be
sa lon
No ad gs
u _2 to
• The authentication
na
ut h
00
9@
Mmechanisms
uh that you must support to provide access to the
or i l am
ze ive ma
data warehouse. dc . co
m dS
op aa
ies d.
allo
we
• The permissions that the d! various users who access the data warehouse will

require.

• The connections over which data is accessed.

about:blank 3/20/2019
Page 20 of 36

• The physical security of the database and backup media.

SQL Server includes a new option, Always Encrypted, which enables you to perform
database operations on encrypted data. The keys are stored on the client, not the
Th
server,is so
d the data owners (business users) no longer need to worry about data
oc
me u
managers (DBAs
nt
b
and developers) being able to view sensitive data.
elo
sa ng
No ad st
un _2 oM
au 00 uh
9@
Columnstore Indexes t ho
riz
ed
live
. co
am
ma
dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

T
his introduced columnstore indexes in SQL Server 2012. This new index type
Microsoft
do
cu
en m
references datat b in a column fashion and uses compression aggressively to reduce
elo
aa s n
gs and the I/O operations needed to respond to a query.
the disk
No space
u
d_ used
2 to
na Mu 00
ut h
9@ ha
or i
live mm
. ze
c ad
dc o m Sa
op
Microsoft has increased
ies the capabilities
ad
. of the database engine to incorporate the
allo
we
following new features ford!columnstore indexes:

about:blank 3/20/2019
Page 21 of 36

Clustered columnstore indexes with nonclustered row indexes (b-tree index) on


top facilitates high performance reporting to sit on top of OLTP tables.

• Primary and foreign key constraints are supported through nonclustered row
indexes.

• Now
Th supported on in-memory tables.
is do
cu
me
tb n
• Batch mode
s
eenabled
lo on more aggregate functions.
No aa ng
d_ st
un 20 oM
au 09 uh
t ho @ am
riz live ma
ed . co dS
co m aa
pie d.
For details of the differences
s allo between the SQL Server 2012 and the latest
we
d!
implementations of columnstore indexes, see the SQL Server Technical
Documentation:

Columnstore Indexes Versioned Feature Summary


Th
is d
cu o
http://aka.ms/t7wvb0
me
nt
be
sa lon
No gs ad
un to _2
a 00 Mu
Row Indexuthor Columnstore
or i
9@ ha
mm
Index?
ze liv e
dc . co a dS
op m aa
ies d.
allo
When you are seeking specific we data, row indexes generally work best. For example,
d!
an OLTP database will be updating, inserting, and deleting specific rows to perform
changes, such as updating a customer’s address or last name.

When you are scanning data in a table, columnstore indexes are often the better
T
his For example, in a data warehouse using aggregate functions to return the
choice. do
cu
m
en for a region.
sum of all sales
tb
elo
sa ng
No ad st
un oM _2
au uh 00
9@t ho
In SQL Server riz 2012 liveand aSQL mm Server 2014, database architects had to choose to use
ed . co ad
co m Sa
one type of clustered pie index.
sa With ad the latest build of SQL Server, you can have either
.
llow
type of clustered index, in ed addition to some supporting nonclustered indexes of the
!
other type.

about:blank 3/20/2019
Page 22 of 36

Data Sources

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
en
tb
You must identify
s
elo the sources that provide the data for your data warehouse, and
n
No aa gs
d_ to
considernathe
u
ut h
following
20
0 9@
Mfactors
uh when planning your solution:
or i l am
ze iv e. c m
dc om ad
op Sa
ies ad
Data Source Connection allo Types.
we
d!

Your data warehouse might require data from a variety of sources. For each source,
you must consider how the ETL process can connect and extract the data you want to
use. In many cases, your data sources will be relational databases for which you can
use an
Th OLEDB or ODBC provider. However, some data sources might use
is do
cu
proprietary mstorage
en that needs a bespoke provider, or for which no provider exists. In
tb
e
sa youlonmust
theseNcases, gs develop a custom provider or determine whether you can
ouad to
_2 M
n
au from09the data 0
uh source in a format that the ETL process can easily
export datat ho @
l am
riz ive ma
dc co dS e .
consume—such as
op XML,
i
m Excelaafiles, or comma-delimited text.
es d.
allo
we
d!
Credentials and Permissions

about:blank 3/20/2019
Page 23 of 36

Most data sources require secure access with user authentication and, potentially,
individual permissions on the data. You must work with the owners of the data
sources used in your data warehousing solution to establish:

• The credentials that your ETL process can use to access the data source.
Th
i
• Thesrequired
do
cu permissions to access the data used by the data warehouse.
me
n tb
lon e
Data
No Formats
sa
ad gs
un to _2
au Mu 00
9@ t ho ha
riz live store mm
A data source ed might . co addata in a different format to the desired output. Your
co m Sa
pie ad
solution must take s ainto account
llow
. issues arising from this, including:
ed
!
• Conversion of data from one data type to another—for example, extracting
numeric values from a text file.

• Truncation of data when copying to a destination with a limited data length.


Th
is
• Date dand
oc
u time formats used in data sources.
me
n tb
elo
ng sa
• Numeric
No aformats,
d_ s t scales, and precisions.
o
un Mu 20
au ha 09
@ t ho mm
live riz
• e
Support for Unicode
dc ad
.co characters.
op m Sa
ies ad
allo .
we
Data Acquisition Windows d!

Depending on the workload patterns of the business, there might be times when
individual data sources are not available or the usage level is such that the
additional overhead of a data extraction is undesirable. When planning a data
warehousing
Th
is solution, you must work with each data source owner to determine
do
um c
appropriate
en data acquisition windows based on:
t be
sa lon
No ad gs
_2 to
• The workload
u na
ut h
00 pattern
9@
Mu of the data source, and its resource utilization and
ha
capacity
or i live mm
levels. ze . c ad
dc o m Sa
op ad
ies .
allo
• The volume of data toedbe w
! extracted, and the time that process takes.

• The frequency with which you need to update the data warehouse with fresh data.

about:blank 3/20/2019
Page 24 of 36

• If applicable, the time zones in which business users are accessing the data.

• How the data is produced and whether it resides in a real-time or batch-based


system.

Th
Extract,
is d
oc Transform, and Load Processes
u me
n tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

A significant part in the creation of a data warehouse solution is the implementation of


one or more ETL processes. When you design an ETL process for a data
warehousing solution, you must consider the following factors:
Th
is d
oc
um
en
Staging tb
elo
sa ng
No ad st
un oM _2
au uh 00
9@ t ho am
In some solutions,
riz live can
you mtransfer data directly from data sources to the data
ed . co ad
co m Sa
p ad
warehouse withoutieany s a intermediary
llow
. staging. In many cases, however, you should
ed
consider staging data to: !

about:blank 3/20/2019
Page 25 of 36

Synchronize a data warehouse refresh that includes source data extracted during
multiple data acquisition windows.

• Perform data validation, cleansing, and deduplication operations on the data


before loading it into the data warehouse.

• Perform
Th
is transformations on the data that you cannot perform during the data
do
um c
extractionenor data flow processes.
tb
elo
sa ng
No to ad s
If youunrequire
a
_2 a staging
00 Mu area in your solution, you must decide on a format for the
ut h 9@ ha
live
riz Possible o ma m
staged data.ed .co formats
d include:
co m Sa
pie ad
sa .
llow
• A relational database.ed!

• Text or XML files.

• Raw files (binary files in a proprietary format of the ETL platform being used).

Th choice of format should be based on several factors including:


Your
is do
cu
me
n t
• The need be
s to access
lon and modify the staged data.
aa gs
No d_ to
un 20 Mu
au 09 h
• The timethotaken
riz
@to store
live am and read the staging data.
m ed . co ad
co m Sa
pie ad
sa .
Finally, if using a relational
llow database as the staging area, you must decide where
ed
!
this database will reside. Possible choices include:

• A dedicated staging server.

• A dedicated SQL Server instance on the data warehouse server.


Th
is d
oc
• A dedicated
um
e
staging database in the same instance of SQL Server as the data
nt
be
warehouse.
sa lon
gs
No ad
to
un _2
Mu
au 00
9@
t ho ha
• A collection
riz of staging
live m
tables
ma (perhaps in a dedicated schema) in the data
ed . co dS
co m a
pie ad
warehouse database. sa .
llow
ed
!
Factors you should consider when deciding the location of the staging database
include:

about:blank 3/20/2019
Page 26 of 36

• Server hardware requirements and cost.

• The time taken to transfer data across network connections.

• The use of Transact-SQL loading techniques that perform better when the staging
data and data warehouse are co-located on the same SQL Server instance.
Th
do is
• The server
cu
m
resource overheads associated with the staging and data warehouse
en
tb
load processes.
s
elo
No aa ng
d_ st
un oM 20
au uh09
Required
t ho
riTransformations
@
live am
ma
ze . co
dc m dS
op aa
ies d.
Most ETL processes allo require that the data being extracted from data sources is
we
d!
modified to match the data warehouse schema. When planning an ETL process
for a data warehousing solution, you must examine the source data and
destination schema, and identify what transformations are required. You must then
determine the optimal place in the ETL process to perform these transformations.
Choices
Th
is
for implementing data transformations include:
do
cu
me
nt
• During the bdata
elo extraction. For example, by concatenating two fields in a SQL
sa ng
N d_
o u data a s
to into a single field in the Transact-SQL query used to extract
Serverna 20source
0 Mu
ut h 9@ ha
mm
the data. orized live
. co ad
co m Sa
pie ad
sa .
llow
• In the data flow. For eexample, d! by using a Derived Column data transformation
task in a SQL Server Integration Services data flow.

• In the staging area. For example, by using a Transact-SQL query to apply default
values to null fields in a staging table.
Th
is d
Factors oc affecting the choice of data transformation technique include:
um
en
tb
lon e
• The sa
No performance
ad gs overhead of the transformation. Typically, it is best to use the
t
un _2 oM
au 0 uh 0
approachtho with9the
r
@ least
li am performance overhead. Set-based operations that are
ize ve ma
om d .c d
performed in Transact-SQL
co Saqueries usually function better than row-based
pie ad
sa .
llow
transformations applieded in a data flow.
!

• The level of support for querying and updating in the data source or staging
area. Cases where you extract data from a comma-delimited file and stage it in a

about:blank 3/20/2019
Page 27 of 36

raw file limit your options to performing row-by-row transformations in the data
flow.

• Dependencies on data required for the transformation. For example, you might
need to look up a value in one data source to obtain additional data from another.
In this case, you must perform the data transformation in a location where both
Th
datais dsources
oc are accessible.
um
en
tb
lon e
• The
No complexity
sa
ad gs of the logic involved in the transformation. In some cases, a
un _2 to
au 09 uh 0 M
transformation
t ho may
@
li
require
am multiple steps and branches depending on the
riz ve ma
o ed .c d
presence or value
co
pie ofmspecificaadata
S
d
fields. In these cases, it is often easier to apply
sa .
the transformation lby
low combining several steps in a data flow, than it is to create a
ed
!
Transact-SQL statement to perform the transformation.

Incremental ETL

After the initial load of the data warehouse, you will usually need to incrementally
T
his
add new
do or updated source data. When you plan your data warehousing solution,
cu
m
you must econsider
nt
be the following factors that relate to incremental ETL:
sa lon
No gsad
un to
_2
a 00 Mu
• How will uthyou identify
or i
9@ hnew
am or modified records in the data sources?
ze liv e ma
dc . co dS
op m aa
• Do you need to delete ie sa recordsd. in the data warehouse when corresponding records
llow
e
in the data sources ared! removed? If so, will you physically delete the records, or
simply mark them as inactive (often referred to as a logical delete)?

• How will you determine whether a record that is to be loaded into the data
warehouse should be new or an update to an existing one?
Th
is d
cu o
• Are theremerecords in the data warehouse for which you need to preserve historical
n tb
lon e
values
N by
sa creating
a gs a new version of the row instead of updating the existing one?
ou d_
to
na 20
Mu
ut h ha09
or i @ mm
ze live
dc . c o ad
op m Sa
ies ad
allo .
Note: Managing data w ed changes
in an incremental ETL process is discussed in
!
more detail in Module 9: Implementing an Incremental ETL Process.

about:blank 3/20/2019
Page 28 of 36

Data Quality and Master Data Management

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

Th
is d
oc
um
en
The value sof at bdata
elo
n
warehouse is largely determined by the quality of the data it
No aa
gs
d_ to
contains.
aWhen
un
ut h
20 planning
0 9@
Mu a data warehousing project, therefore, you should
ha
determine
or i liv mm
z
how you will ensure e data
e .co quality.
ad You should consider using a master data
dc m Sa
op ad
ies .
management solution. allo
we
d!

Data Quality

To validate and enforce the quality of data in your data warehouse, it is


recommended
Th
is that business users who have knowledge of each subject area
do
cu
addressed by
men the data warehouse become data stewards. A data steward’s
tb
lon e
responsibilities
N
sa
ad include:
gs
ou to _2
na Mu 00
ut h
9@ ha
or i live mm
ze . c ad
dc o
• Building and maintaining
op
ies
m a Sknowledge
aa
d.
base that identifies common data errors and
allo
corrections. we
d!

• Validating data against the knowledge base.

about:blank 3/20/2019
Page 29 of 36

• Ensuring that consistent values are used for data attributes where multiple forms
of the value might be considered valid. For example, ensuring that a Country field
always uses the value “United States” when referring to America, even though
“USA,” “The U.S.” and “America” are also valid values.

• Identifying and correcting missing data values.


Th
is d
oc
um
• Identifying
en and consolidating duplicate data entities, such as records for “Robert
tb
lon e
Smith”
No and
sa “Bob
ad gs Smith”, which both refer to the same physical customer.
un _2 to
au 00 Mu
t ho 9@ ha
riz live mm
ed . co ad
co m Sa
pie ad
sa .
You can use SQL Server llow Data Quality Services to provide a data quality solution that
ed
!
helps the data steward to perform these tasks.

Note: SQL Server Data Quality Services is discussed in more detail in Module
10: Enforcing Data Quality.
Th
is d
oc
um
en
tb
elo
Master s
No Data a adManagement
ng
st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
It is common for large c m
ies organizations to have multiple business applications, and in
op aa
d.
allo
many cases, these systems we
d! perform tasks that are related to the same business
entities. For example, an organization may have an e-commerce application enabling
customers to purchase products, and a separate inventory management system that
also stores product data. A record representing a particular product might exist in
both systems. In this scenario, it can be useful to implement a master data
T
his
management
do
c
system that provides an authoritative definition of each business entity
um
en
(in this example,
t b a particular product) that you can use across multiple applications to
e
sa lon
ad
No consistency.gs
ensure un _2
0
to
M
au uh 09
@ t ho am
live riz ma
ed . co dS
co m aa
pie
Master data management sa is especially
d. important in a data warehousing scenario
llow
because it ensures that data ed conforms to the agreed definition for the business
!

entities to be included in any analysis and reporting solutions that it must support.

about:blank 3/20/2019
Page 30 of 36

You can use SQL Server Master Data Services to implement a master data
management solution.

Note: SQL Server Master Data Services is discussed in more detail in Module
11: Using Master Data Services.
Th
is d
oc
um
en
tb
elo
ng sa
Check
No Your
u
ad Knowledge
_2 st
oM
na 00
uh
ut h
9@ am
or i
live ma
.ze
co
dc m dS
Categorize Activityo pie
sa
aa
d.
llow
Which of the following are ed optimal places
! to perform data transformations in an ETL
solution?

All the data transformations


should be completed by the
Th
derived reports.
is do
cu
me
n tb
elo
sa ng
Implement
N ou a d_ transformations
st
oM in a
na 2 00
u tho task, @ 9 u h
data flow riz within
live anam ETL
ma
ed . co dS
c
process, whenopcomplex m row-by-
aa
ies d.
allo
row processing is required. we
d!

There is no optimal place to


perform data transformations.

This
Perform
do simple transformations in
cu
m
the queryenthat
t b is used to extract
elo
the sa ng
No data.ad st
un _2
oM
au 00 uh
9@
t ho am
riz live ma
ed . co dS
co m aa
Perform complex pie transformations
sa d.
llow
in a staging database,ewhere d! it is
possible to combine data from
multiple sources.

about:blank 3/20/2019
Page 31 of 36

Optimal solution

Please drag items here

Suboptimal solution

Th
is d
oc Please drag items here
um
en
tb
elo
sa ng
No ad st
un _2 oM
Check au
answer 0 09 Show uh solution Reset
t ho @ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
Lab: Exploring a Data Warehousing Solution ed
!

Scenario

The labs in this course are based on a fictional company called Adventure Works
Th
Cyclesisthat
do manufactures and sells cycles and cycling accessories to customers all
c um
e
nt Adventure Works sells direct to customers through an e-commerce
over the world.be
sa lon
No adgs
websiteunand also _ 20 throughto
Mu an international network of resellers.
au 09 ha
t ho @ mm
riz live
ed . co ad
co m Sa
pie
Throughout this course, sa you willaddevelop
. a data warehousing solution for Adventure
llow
Works Cycles, including:eda! data warehouse; an ETL process to extract data from
source systems and populate the data warehouse; a data quality solution; and a
master data management solution.

The lab
Th for this module provides a high level overview of the solution that you will
is d
create inolater
cu
m labs. You can use this lab to become familiar with the various elements
en
be t
of the datasawarehousing
lon
g
solution that you will learn to build in later modules.
No ad st
un oM _2
au uh 00
9@ t ho am
riz live m
Objectives d cop om ad Sa
e . c
ies ad
allo .
we
After completing this lab, dyou ! will be able to:

• Describe the data sources in the Adventure Works data warehousing scenario.

about:blank 3/20/2019
Page 32 of 36

• Describe the ETL process in the Adventure Works data warehousing scenario.

• Describe the data warehouse in the Adventure Works data warehousing scenario.

Lab setup
Th
do is
Estimated cuTime: 30 minutes
m en
tb
elo
sa ng
No adst
u _2 oM
Virtual Machine:
n au
t ho
0 0920767C-MIA-SQL
uh
@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
User name: ADVENTUREWORKS\Student llow
ed
!

Password: Pa55w.rd

Exercise 1: Exploring Data Sources


Th
is d
oc
um
en
tb
Scenario elo
sa ng
No ad st
un _2 oM
au 00 u
9@ ha
Adventure tWorks
ho
riz uses
liv various
mm software applications to manage different aspects of
ed e. c ad
om
the business, co
and each
p ies has itsSaown
ad
.
data store. Specifically, these are:
allo
we
d!

• Internet sales processed through an e-commerce web application.

• Reseller sales processed by sales representatives, who use a specific application.


Sales employee details are stored in a separate human resources system.
Th
do is
• Reseller
cu payments are processed by an accounting application.
m en
tb
elo
gs sa n
• Products
No
u
aare
d_ managed
2 to in a product catalog and inventory system.
na Mu 00
9@ ut h ha
live or i mm
. c ze ad
dc o m Sa
op ad
ies .
allo
This distribution of data has w ed made it difficult for users to answer key questions about
!
the overall performance of the business.

about:blank 3/20/2019
Page 33 of 36

In this exercise, you will examine some of the Adventure Works data sources that will
be used in the data warehousing solution.

The main tasks for this exercise are as follows:

1. TPrepare
his the Lab Environment
do
cu
me
2. View thent bSolutionelo Architecture
sa ng
No ad s to
un _2 Mu
au 00
3. View the 9 ha
t h ori Internet
@
live Sales mm Data Source
ze . co ad
dc m Sa
op ad
ies .
4. View the Reseller a llowSales Data Source
ed
!
5. View the Products Data Source

6. View the Human Resources Data Source

7. TView
his
the Accounts Data Source
do
cu
me
8. View the nt Staging Database
be
sa lon
No ad gs
un _ 20 to
au 09 Mu
t ho @ ha
riz live mm
ed . co ad
co m Sa
pie ad
Detailed Steps ▼ s a .
llow
ed
!
Detailed Steps ▼

Detailed Steps ▼

T
his
Detailed
do Steps ▼
cu
me
n tb
elo
sa ng
Detailed
No Steps
ad
_ oM
▼ st
un 20
au 09 uh
t ho
@ am
riz live ma
ed . co dS
Detailed Steps ▼ co m aa
pie d.
sa
llow
ed
!
Detailed Steps ▼

Detailed Steps ▼

about:blank 3/20/2019
Page 34 of 36

Result: After this exercise, you should have viewed data in the InternetSales,
ResellerSales, Human Resources, and Products SQL Server databases; viewed
payments data in comma-delimited files; and viewed an empty staging database.

Exercise 2: Exploring an ETL Process


Th
is d
oc
um
en
tb
elo
sa ng
Scenario
No ad
oM _2 st
un 00
au 9@ uh
t ho am
r live ma the data sources in the Adventure Works data
Now that you areiz ed familiar .co with dS
co m aa
pie d.
warehousing solution,all you will examine
s the ETL process used to stage the data, and
ow
ed
then load it into the data warehouse. !

Adventure Works uses a solution based on SQL Server Integration Services to


perform this ETL process.
Th
is d
o
The mainctasks
um
e for this exercise are as follows:
nt
be
sa lon
No gs
ad
un to_2
a 00 Mu
1. Viewutthe
ho Solution
9 @ Architecture
ha
mm
riz live
ed . co ad
co m Sa
pie ad
2. Run the ETL Staging sa
llow Process .
ed
!
3. View the Staged Data

4. Run the ETL Data Warehouse Load Process

Th
is d
oc
me u
Detailed Steps
n ▼
tb
elo
sa ng
No ad st
un _2
oM
a 00 uh
Detailed Steps
u t ho ▼9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
Detailed Steps ▼ llow
ed
!

Detailed Steps ▼

about:blank 3/20/2019
Page 35 of 36

Result: After this exercise, you should have viewed and run the SQL Server
Integration Services packages that perform the ETL process for the Adventure
Works data warehousing solution.

Exercise 3: Exploring a Data Warehouse


Th
is d
oc
um
en
tb
Scenario elo
sa ng
No ad st
un oM _2
au 00
uh
t 9@
Now that youh ori have lexplored
ze ive
am the ETL process that is used to populate the Adventure
ma
dc . co dS
o m
Works data warehouse, pie
sa
you can aa explore the data warehouse to see how business
d.
llow
users can view key information. ed
!

The main tasks for this exercise are as follows:

1. View the Solution Architecture


Th
is d
oc
u
2. Querymethe
nt Data Warehouse
be
sa lon
No ad gs
un to_2
au Mu00
t ho 9 @ ha
riz live mm
ed . co ad
co m Sa
Detailed Steps ▼ ies p ad
.
allo
we
d!
Detailed Steps ▼

Result: After this exercise, you should have successfully retrieved business
information from the data warehouse.
Th
is d
oc
um
en
tb
e
Module
No
u
sa review
ad
_2
lon
gs
to
and takeaways
na Mu 00
ut h
9@ ha
or i
live mm
. ze
c ad
dc o m Sa
op ad
i
In this module, youehave sa learned .about the key elements in a data warehousing
llow
ed
solution. !

Review Question(s)

about:blank 3/20/2019
Page 36 of 36

Check Your Knowledge

Discovery
Why might you consider including a staging area in your ETL solution?

Show
T solution Reset
his
do
cu
me
n tb
elo
sa ng
No st ad
CheckunYour
a
_2 Knowledge
00
oM
u
ut h
9@ ha
or i
live mm
. ze
co ad
dc m Sa
op
Select the best answer
ies
allo
ad
.
we
Which project role would be d! best assigned to a business user?

Solution Architect

Data Modeler

Data
Th Steward
is d
oc
um
Tester en
tb
elo
sa ng
No ad st
un _2 oM
au uh solution 00
Check answer
t ho 9@ Show am Reset
l riz ive ma
ed . co d
co m Sa
pie ad
sa .
llow
ed
!

Th
is d
oc
um
en
tb
elo
sa ng
No ad st
un _2 oM
au 00 uh
t ho 9@ am
riz live ma
ed . co dS
co m aa
pie d.
sa
llow
ed
!

about:blank 3/20/2019

You might also like