Professional Documents
Culture Documents
Data Warehouse Testing
Data Warehouse Testing
Warehouse Staging Area, from the DW Staging Area into the Data Warehouse and finally from the Data Warehouse into a set of conformed Data Marts that are accessible by decision makers or the downstream applications. The scheduling of ETL jobs is critical. The ETL process could potentially run several times a day or weekly, monthly, quarterly, and annual production schedules as well.
DW Staging Area
The Data Warehouse Staging Area holds temporary data i.e. copied from source systems. Due to varying business cycles, data processing cycles, hardware and network resource limitations
and geographical factors, it is not feasible to extract all the data from all Operational databases at exactly the same time.
In short, all required data must be available before data can be integrated into the Data
Warehouse
Fact tables
Central table Stores the measures of the business Mostly raw numeric items Large number of rows (millions to a billion) Accessed via dimensions [All the data in the fact is related to the data in the dimension] Points to the key value at the lowest level of each dimension table
Usually designed to contain low level or atomic (indivisible) data. Contains limited history that is captured "real time" or "near real time.
Why?
There is an exponentially increasing cost associated with finding software defects later in the development lifecycle. In data warehousing, this is compounded because of the additional business costs of using incorrect data to make critical business decisions.
Data Completeness
Expected data is there in the data warehouse
All Records
Record count Sum of Numeric fields between source and target Minus queries PK values between source and target Right records from right source [Filter condition] Record count in target tables, pre and post ETL job execution
All Fields
Table structure between source and target objects Table structure with requirements and design matrix
Data Transformation
Data transformation is correct or not
Transformations based on business rules
if X ->Y Transformations at ETL Job level and DB level Primary key and Constraints between source and target DB defaults and Job defaults [for NULL and NOT NULL values] Transformations w.r.t. requirements and design matrix ETL generated fields such as surrogate keys Referential integrity with associated target tables One time historical transforms. E.g.: Inserting records in master tables Re run ETL jobs [Complete and Abort] Transformations on cloud. Different data load formats like append, delete and insert, SCD types. Stare and Compare Manual check for sanity test
Data Quality
Handling the incorrect data in the source.
Reject the record completely
Based on business rules Based on DB or job validations Based on filter conditions Substitute default values for partially rejected records Default values at job level and at DB level Based on validations with Master table Based on Data mapping Exception reports Exception report for all the stages of ETL job [Extract, Transform and Load] Records in report with the count mentioned in ETL jobs log file Records in report with the count mentioned in Control total tables Email notifications Corrected data Rejected data being corrected in source for subsequent ETLs Negative testing Manipulate data in source for all possible combinations of reject handling.
Regression testing
Ensure the existing functionality is intact
Existing data flows are modified for new functionality Enhancements and defect fixes Modifications of any upstream system.
Performance testing
ETLs execute in expected timeframes for any volume of data
Larger volume of data in source Schedulers response and flag updates Performance of reject handling Performance of extract, transform and load jobs separately Historic updates of production data
Security testing
Ensure the access to all objects and applications are as defined
Access to execute ETL jobs Access to databases, DB objects and files Access to exception reports Notification or information emails from the jobs Modifying privileges to the data and other configurations Access to all integrated applications
CONFIDENTIAL: For limited circulation only 2010 MindTree Limited
Thank you
Successful Customers
Our Mission