You are on page 1of 14

ETL Process

Steps in Extraction
 Identify sources
 List all facts
 List all dimensions
 List all attributes for each dimension
 Any attribute that need to split, come up with rules on how

to split
 Any attribute that need to be combined, come up with

consolidation rule
 Determine default values
 If there are multiple source for a data item, decide the

preferred source
Extracting data
 Immediate data extraction techniques
 Deferred data extraction techniques
Immediate data extraction techniques

 Capture through transaction logs


 Capture through database triggers
 Capture through source application
Deferred data extraction techniques

 Capture based on date and time


 Capture by comparing files
Data Transformation
Important step: Data Quality

 Tasks performed are:


 Format revision
 Decoding of field
 Splitting of field
 Merger of field
 Character set conversion
 Conversion of units
 Summarization
 De-Duplication
 Key restructuring
 Address missing values
Data Loading
 Three phrases:
 Initial Load
 Incremental Load
 Full Refresh
Steps in data loading
 Drop any indexes formed earlier
 Load dimension tables
 Load fact tables
 Create indexes
Techniques / Mode of Data Loading
 Load
 Append
 Destructive Merge
 Constructive Merge
Data Quality
 To ensure data and system quality:
 Testing
 Monitoring
Types of Testing
 Unit Testing
 Integration Testing
 System Testing
 Acceptance Testing
 Performance Testing
 Regression Testing
Modes of Testing
 Black box testing
 White box testing
Data Warehouse Monitoring
 Data warehouse need to be monitored during the use to check
its health and growth.
 The following statistics are checked:
 Tables accessed
 How many times accessed
 Number of users
 Time taken
Methods of Data Monitoring
 Sampling method:
 Data warehouse is checked periodically in regular intervals

 Even driven method:


 Data warehouse is checked when an event occurs

You might also like