You are on page 1of 8

White Paper on ETL Testing

ETL Testing
WHITE PAPER

Name Account / Business Group


Author(s) Davinder Singh Data Centric Testing - TeS,
singh.davinder@wipro.com Hyderabad

Reviewed by Sunder Ranganathan Nochilur Data Centric Testing - TeS,


sunder.nochilur@wipro.com Bangalore

&

Nithyananda Nayak Data Centric Testing - TeS,


nithyananda.nayak@wipro.com Bangalore

Wipro Technologies Page 1 of 8


White Paper on ETL Testing

Abstract

ETL Testing involves validating the ETL processes based on the requirements and design. It is
about testing the data load from different/same data sources to the target as constituted in a
typical Data Warehousing model.

Intended Audience

This paper is intended for readers who have the desire to understand the general goals of testing
an ETL application and also for readers who have the desire and want to build their career in the
ETL testing area.

Wipro Technologies Page 2 of 8


White Paper on ETL Testing

1 INTRODUCTION .................................................................................................................................... 4
2 GOALS OF TESTING AN ETL APPLICATION ................................................................................ 5
2.1 Data Completeness ................................................................................................................. 5
2.2 Data Transformation .............................................................................................................. 5
2.3 Data Quality ........................................................................................................................... 6
2.4 Performance & Scalability ..................................................................................................... 6
2.5 Integration Testing ................................................................................................................. 6
2.6 User-Acceptance Testing ........................................................................................................ 7
2.7 Regression Testing .................................................................................................................. 7
3 OBJECTIVES OF AN ETL TESTER .............................................................................................. 8
4 RESPONSIBILITIES OF AN ETL TESTER .................................................................................. 8

Wipro Technologies Page 3 of 8


White Paper on ETL Testing

1 Introduction
ETL testing involves a stage by stage process of validating the ETL processes
developed in a typical Data Warehousing model. ETL testing is about how to test the
incoming transformed data from different sources when it gets loaded to the target.

ETL testing involves validating the data and the functionality of the application as per
the defined requirements. ETL testing is not just about testing and validating the
data but also about checking the integrity of data.

A typical ETL testing work model would involve testing the data against the
requirements, testing the data completeness, testing the data correctness and
testing the integrity of the transformed data when loaded in the target.

Wipro Technologies Page 4 of 8


White Paper on ETL Testing

2 Goals of testing an ETL Application


The general goals of testing a typical ETL application can be broken down as follows:

 Data Completeness
 Data Transformation
 Data Quality
 Performance & Scalability
 Integration Testing
 User-acceptance Testing
 Regression Testing

2.1 Data Completeness

Data completeness ensures that all the desired data is completely loaded into
the target. It is one of the most basic and foremost steps to be performed
when testing an ETL application. This would involve testing the data
completeness with respect to the no. of rows loaded, no. of columns loaded
for each record, the contents loaded into each field.

Some of the typical examples of checking Data completeness are as follows:

 Checking the record count between the source and the target.
 Checking the load of all the rows based on any constraints (if applied).
 Load of complete contents of each column and to validate that no truncation
has occurred.
 Validating the load of date columns is in appropriate format (as per the
requirements).
 Validating the history data is maintained or not wherever the requirements
demand for.

2.2 Data Transformation

Data transformation involves testing and validating the business rules have
been applied correctly or not. The typical process to validate and test the
transformation of data is to select some sample records from the source and
then verify/validate them against the target data keeping in mind the
business logic. This involves a complete manual testing.

Some of the typical examples of Data transformation are as follows:

 Validating the data types in the warehouse are as defined in the Design or the
Data Model.
 Validating the relationships (parent-child) in the data.
 Validating the integrity between the tables.
 Validating the columns are derived or not.

Wipro Technologies Page 5 of 8


White Paper on ETL Testing

2.3 Data Quality

Data Quality is mainly to check how the data rejection, correction,


modification, substitution is handled. There are various forms in which the
data quality can be checked and maintained.

Some of the typical examples are as follows:

 Checking for duplicate entries of records based on some key.


 Maintaining the history of records as per Type 2 Logic in Informatica.
 Maintaining the records with NULL values in a uniform manner across the
Facts and Dimensions as defined in the architecture.
 Maintaining the timestamp of last update date and time for an update that
occurred on a particular record.
 Maintaining the detailed rejected rows in a table.

2.4 Performance & Scalability

Performance is one necessary aspect that needs to be considered with respect


to a typical Data Warehousing application as the volume of data in a Data
Warehouse grows, ETL load times would keep increasing and then the
performance decreases. A good ETL architecture and design reduces the
affect on performance. The main objective of performance testing is to
identify the weaknesses in the ETL design. The following strategies identify
the performance issues:

 Loading the database with high volumes of data to ensure that this amount of
data is loaded by the ETL process with the defined window time.
 Comparing the ETL loading time to loads performed with smaller amount of
data to expect the scalability issues.
 Monitoring the timing of reject process and check how large volumes of
rejected data will be handled.
 Performing join queries to validate query performance on large database
volumes.

2.5 Integration Testing

A standard system testing would involve testing within the ETL application
only. The boundary points for system testing are the input and output of the
ETL code being tested. Integration testing is that, shows how the application
functions into the entire system flow along with all the other upstream and
downstream applications. While building the integration test scenarios we
need to consider how the overall process can be broken and also focus on the
points/gaps between the different applications rather than one application.
Also, a need for checking how process failures at each step would be handled
and how data would be deleted or recovered if required.

Wipro Technologies Page 6 of 8


White Paper on ETL Testing

The major defects found during the integration testing are with the data. It is
always important and good to perform the integration testing with the
production data (depending on the security and privacy concerns that would
require certain data to be masked before it can be used in the test
environment) or production like data. It is always a best practice to hold
discussions with the design team of all the applications/modules while
designing the test scenarios to ensure nothing goes wrong in the production
and also to discuss on what may go wrong in the production. This would help
in identifying the gaps. Integration testing should be a collective effort and
not just the testing team’s responsibility.

2.6 User-Acceptance Testing

Since the main aim of building a Data Warehouse application is to make the
data available to the business users it is a best practice and approach to
involve the business users in the testing for a successful implementation of
the application. The user-acceptance testing typically is to test the data in the
Data Warehouse application and not how the ETL application would function.

Some typical scenarios that may arise can be categorized as follows:

 Users usually find issues when they see the real data which sometimes lead
to design changes.
 It is always important that the users sign off and understand how the data is
made available.
 Also it is found that users are likely to post queries about how the data is
populated and would urge to understand the details on how ETL works.
 Users may also require the data loaded during UAT and negotiate how often
the data will be refreshed.

2.7 Regression Testing

Regression testing is reassuring and checking the existing functionality of the


application. It is to check if there has been any disturbance to the existing
functionality with the introduction of new code. While designing the regression
test scenarios one needs to keep in mind that these cases would be executed
multiple times as new releases are created due to defect fixes, enhancements
or upstream system changes. Developing an automation approach to perform
the regression testing is considered as a much faster, smooth and best
approach.

In regression testing the test cases must be designed and prioritized by risk
in order to determine which needs to be re-run for every new release. A
simple approach to retest the basic functionality is to store data sets and
results from successful runs of the code and compare the new test results
with previous runs. While performing a regression testing is much faster to do
a compare results rather than running the complete data validation again.

Wipro Technologies Page 7 of 8


White Paper on ETL Testing

3 Objectives of an ETL Tester


The general objectives of an ETL Tester can be classified as follows:

 Does the mapping adhere to the development standards and naming


conventions?
 Does the mapping perform as per the technical design defined?
 Does the mapping work correctly in relation to other processes?

4 Responsibilities of an ETL Tester


The general responsibilities of an ETL Tester can be classifies as follows:

 Test Planning as per the Requirements and Technical Design.


 Preparation of Test Strategy.
 Test Case/Scenario preparation.
 Test Data Management.
 Test Execution.
 Recording the Test Results.
 Defect Maintenance.
 Performing Test Audit and preparing Test Audit Report.

Wipro Technologies Page 8 of 8

You might also like