Professional Documents
Culture Documents
ETL Overview: ETL/DW Refreshment Process Building Dimensions Building Fact Tables Extract Transformations/cleansing Load
ETL Overview: ETL/DW Refreshment Process Building Dimensions Building Fact Tables Extract Transformations/cleansing Load
MS Integration Services
Refreshment Workflow
Extract
Transform
Integration
phase
Load
Metadata
Data
sources
Presentation servers
- Extract
- Transform
- Load
Data Staging
Area
Query side
ETL side
Query
Services
-Warehouse Browsing
-Access and Security
Data marts with
-Query Management
aggregate-only data
- Standard Reporting
Conformed -Activity Monitor
Data
Warehouse dimensions
Bus
and facts
Reporting Tools
Desktop Data
Access Tools
No user queries
Sequential operations on large data volumes
Data mining
Service
Element
2)
3)
5)
6)
8)
9)
10)
Load of dimensions
Construction of dimensions
4)
Often too time consuming to initial load all data marts by failure
Backup/recovery facilities needed
Better to do this centrally in DSA than in all data marts
Building Dimensions
Plan
1)
Operational
system
Data
Non-cooperative sources
Incremental update
Cooperative sources
10
Computing Deltas
Types of extracts:
DB triggers is an example
Extract
11
Updated by DB trigger
Extract only where timestamp > time for last extract
+ Reduces extract time
- Cannot (alone) handle deletions
- Source system must be changed, operational overhead
12
Common Transformations
Messages
DB triggers
13
Unique
Spellings, codings,
Production keys, comments,
The same things is called the same and has the same key
(customers)
Timely
14
Complete
Consistent
Precise
Cleansing
Building keys
Data Quality
EBCDIC ASCII/UniCode
String manipulations
Date/time format conversions
Normalization/denormalization
15
16
Types Of Cleansing
Cleansing
Special-purpose cleansing
Domain-independent cleansing
Rule-based cleansing
17
The optimal?
Parallellization
18
Source-controlled improvements
Load
19
20
Load
ETL Tools
ETL tools from the big vendors
Aggregates
Can be built and loaded at the same time as the detail data
Load tuning
Data modeling
ETL code generation
Scheduling DW jobs
21
Issues
22
MS Integration Services
Pipes
Load frequency
23
24
Packages
Tools
25
Scripting Tasks
Sequence container
Workflow Tasks
26
Structure to packages
Services to tasks
Control flow
Tasks
Containers provide
Sources, Connections
Control flow
Tasks, Workflows
Transformations
Destinations
Arrows:
green (success)
red (failure)
28
Event Handlers
Executables (packages,
containers) can raise events
Event handlers manage the events
Similar to those in languages
JAVA, C#
Transformations
Update
Summarize
Cleanse
Merge
Distribute
Destinations
29
Transformations
A Simple IS Case
Row Transformations
Error messages?
Execute package
Other Transformations
SQL Server
File system
30
31
32
ETL Demo
33
Join raw sales table with other tables to get DW keys for each
sales record
Output of the query written into the fact table
Core:
35
Extensions:
34
36
Summary
General ETL issues
MS Integration Services
37