Professional Documents
Culture Documents
Microsoft Official Course: Designing An ETL Solution
Microsoft Official Course: Designing An ETL Solution
Module 4
ETL Overview
Planning Data Extraction
Planning Data Transformation
• Planning Data Loads
Lesson 1: ETL Overview
ETL in a BI Project
Common ETL Data Flow Architectures
Documenting High-Level Data Flows
• Creating Source To Target Mappings
ETL in a BI Project
Business Requirements
Technical
Data
Architecture Reporting and
Warehouse
and Analysis
and ETL
Infrastructure Design
Design
Design
• Single-stage ETL
• Data is transferred directly from Source DW
source to data warehouse
• Transformations and validations
occur in-flight or on extraction
• Two-stage ETL
Source Staging DW
• Data is staged for a coordinated
load
• Transformations and validations
occur in-flight, or on staged data
Source Landing Zone
• Three-stage ETL
• Data is extracted quickly to a
landing zone, and then staged prior
to loading Staging DW
• Transformations and validation can
occur throughout the data flow
Documenting High-Level Data Flows
ProductDB
Audit Start
Filter on LastModified
Concatenate Size
Lookup Subcategory Lookup Category Handle NULLs*
(Size + ' ' + MeasureUnit)
• What data sources are there, and how will the ETL
solution connect to them?
• What data types and formats are used in each
source system?
• What data integrity and validation issues exist in
the source data?
Identifying New and Modified Rows
• On extraction
Source
• From source
• From landing zone
• From staging
Landing
• In data flow Zone
• In-place
• In landing zone
Data
• In staging Warehouse
Transact-SQL vs. Data Flow Transformations
Minimizing Logging
Loading Indexed Tables
Loading Partitioned Fact Tables
• Demonstration: Loading a Partitioned Fact Table
Minimizing Logging