Professional Documents
Culture Documents
ETL Processing
1
ETL Processing
IT Users
Operational Data
Data Transformation
Replication
Business Users
2
Data Acquisition from OLTP Systems
•Why is it hard?
Multiple source systems technologies
Multiple sources for the same data element
Complexity of require transformations
Scarcity and cost of legacy cycles
Inconsistent data representations
Volume of legacy data
3
Data Acquisition from OLTP Systems
4
Data Acquisition from OLTP Systems
•Inconsistent data representations: same data, different
domain
Examples:
Date value representations
- 1996-02-14
-02/14/1996
-14-FEB-1996
-960214
-14485
5
Data Acquisition from OLTP Systems
6
Data Acquisition from OLTP Systems
7
Data Acquisition from OLTP Systems
8
Data Acquisition from OLTP Systems
9
Data Acquisition from OLTP Systems
10
Data Acquisition from OLTP Systems
11
Data Acquisition from OLTP Systems
12
Data Acquisition from OLTP Systems
13
A Word of Warning
14
ETL Processing
•It is important to look at the big picture.
Data acquisition time may include:
Extracts from source systems
Data movement
Transformations
Data loading
Index maintenance
Statistics collection
Summary data maintenance
Data mart construction
Backups
15
Loading Strategies
16
Loading Strategies
We must also worry about rolling off “old” data as its economic
value drops below the cost for storing and maintaining it.
************************
***
DIAGRAM to be inserted
************************
***
17
Loading Strategies
(Delayed Availability)
Real time Availability Minimal Load Time
18
Loading Strategies
• Should consider:
Data storage requirements
Impact on query workloads
Ratio of existing to new data
Insert versus update workloads
19
Loading Strategies
20
Loading Strategies
21
Full Refresh Strategy
22
Full Refresh Strategy
• Performance hints:
23
Full Refresh Strategy
24
Incremental Refresh Strategy
25
Incremental Refresh Strategy
26
Incremental Refresh Strategy
27
Incremental Refresh Strategy
• Both incremental load strategies describes preserve index
structures during the loading operation.
However, there is a cost to maintaining indexes
maintained during the load costs 2-3 times the resources
of the actual row insertion of data into a table.
Rule of thumb: Each secondary index maintained during
the load costs 2-3 times the resources of the actual row
insertion of data into a table.
Rule of thumb: Consider dropping and re-building index
structures if the number of rows being incrementally
loaded is more than 10% of the size of the target table.
Note: Drop and re-build of secondary indices may not be acceptable
due to availability requirements of DW.
28
Trickle Feed
• Acquire data on a continuous basis into RDBMS
using row level SQL insert and update operations
29
Trickle Feed
30
ELT versus ETL
31
ETL Processing
32
ETL Processing
************************
***
DIAGRAM to be inserted
************************
***
33
ETL Processing
Perform the transformations on the source system platform if
available resource exist and there is significant data
reduction that can be achieved during the transformations
34
ETL Processing
35
ETL Processing
• First, load ‘raw’ data into empty tables using RDBMS block
slamming utilities.
Next, use SQL to transform the ‘raw’ data into a form
appropriate to target table.
- ideally, the SQL is d\generated using a meta data driven tool
rather than hand coding.
Finally, use insert-select into the target table for
incremental loads or view switching if a full refresh
strategy is used.
36
ELT Processing
DW server is the transformation server for ELT processing
************************
***
DIAGRAM to be inserted
************************
***
37
ETL Processing
38
Bottom Line
39