Professional Documents
Culture Documents
• It's tempting to think a creating a Data warehouse is simply extracting data from multiple
sources and loading into database of a Data warehouse. This is far from the truth and
requires a complex ETL process. The ETL process requires active inputs from various
stakeholders including developers, analysts, testers, top executives and is technically
challenging.
• In order to maintain its value as a tool for decision-makers, Data warehouse system needs
to change with business changes. ETL is a recurring activity (daily, weekly, monthly) of a
Data warehouse system and needs to be agile, automated, and well documented.
Why do you need ETL?
There are many reasons for adopting ETL in the organization:
• It helps companies to analyze their business data for taking critical business decisions.
• Transactional databases cannot answer complex business questions that can be answered by ETL.
• A Data Warehouse provides a common data repository
• ETL provides a method of moving the data from various sources into a data warehouse.
• As data sources change, the Data Warehouse will automatically update.
• Well-designed and documented ETL system is almost essential to the success of a Data Warehouse
project.
• Allow verification of data transformation, aggregation and calculations rules.
• ETL process allows sample data comparison between the source and the target system.
• ETL process can perform complex transformations and requires the extra area to store the data.
• ETL helps to Migrate data into a Data Warehouse. Convert to the various formats and types to
adhere to one consistent system.
• ETL is a predefined process for accessing and manipulating source data into the target database.
• ETL offers deep historical context for the business.
• It helps to improve productivity because it codifies and reuses without a need for technical skills.
ETL Process in Data Warehouses
• ETL is a 3-step process
Step 1) Extraction
• In this step, data is extracted from the source system into the staging area.
Transformations if any are done in staging area so that performance of source system
in not degraded. Also, if corrupted data is copied directly from the source into Data
warehouse database, rollback will be a challenge. Staging area gives an opportunity
to validate extracted data before it moves into the Data warehouse.
• Hence one needs a logical data map before data is extracted and loaded physically.
This data map describes the relationship between sources and target data.
Three Data Extraction methods
1. Full Extraction
2. Partial Extraction- without update notification.
3. Partial Extraction- with update notification
• Irrespective of the method used, extraction
should not affect performance and response time
of the source systems. These source systems are
live production databases. Any slow down or
locking could effect company's bottom line.
Some validations are done during Extraction
• In this step, you apply a set of functions on extracted data. Data that
does not require any transformation is called as direct move or pass
through data.