This action might not be possible to undo. Are you sure you want to continue?
By - SrilakshmiSudhaker
It is ensured by a strategy implemented in ETL process. it may be worth considering using a data mart. CRM. The data is validated. The data warehouse can be created or updated at any time. 2 . transformed and finally aggregated and it becomes ready to be loaded into the data warehouse.ETL Testing Data warehousing and its Concepts: What is Data warehouse? Data Warehouse is a central managed and integrated database containing data from the operational sources in an organization (such as SAP. where only a portion of detailed data is required. A data mart is generated from the data warehouse and contains data focused on a given subject and data that is frequently accessed or summarized. cleansed. Data warehouse database contains structured data for query analysis and can be accessed by users. Sometimes. nonvolatile and consistent data which can be analyzed in the time variant. with minimum disruption to operational systems. stable. A source for the data warehouse is a data extract from operational databases. Data warehouse is a dedicated database which contains detailed. ERP system). It may gather manual inputs from users determining criteria and parameters for grouping or classifying records.
ETL Testing Data warehouse Architecture: 3 .
ETL Testing Data warehouse Architecture (Contd): 4 .
There is a cost of delivering suboptimal information to the organization. Inconsistencies are identified and resolved prior to loading of data in the Data warehouse. order receipts. Data warehouses enhance the value of operational business applications. should have been developed in the operational systems and vice versa. notably customer relationship management (CRM) systems.g. transformed and loaded into the warehouse. there is an element of latency in data warehouse data. in retrospect. Duplicate. Data warehouses can get outdated relatively quickly. There is often a fine line between data warehouses and operational systems. general ledger charges. Data warehouses facilitate decision support system applications such as trend reports (e. and reports that show actual performance versus goals.ETL Testing Advantages of Data warehouse: Data warehouse provides a common data model for all data of interest regardless of the data's source. the information in the warehouse can be stored safely for extended periods of time. Over their life. the items with the most sales in a particular area within the last two years). expensive functionality may be developed. even if the source system data is purged over time. Information in the data warehouse is under the control of data warehouse users so that. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices. data warehouses can have high costs. functionality may be developed in the data warehouse that. exception reports. Because data must be extracted. Maintenance costs are high. etc. Because they are separate from operational systems. Disadvantages of Data Warehouse: Data warehouses are not the optimal environment for unstructured data. This greatly simplifies reporting and analysis. data warehouses provide retrieval of data without slowing down operational systems. Or.. 5 .
data integration. E . filtering.ETL Testing ETL Concept: ETL is the automated and auditable data acquisition process from source system that involves one or more sub processes of data extraction.Transforming the data – which may involve cleaning. data consolidation. T . data transformation.Extracting data from source operational or archive systems which are primary source of data for the data warehouse. L . data transportation.Loading the data into the data warehouse or any other database or application that houses the data. validating and applying business rules. 6 . data loading and data cleaning.
Extraction: The first part of an ETL process involves extracting the data from the source systems. An intrinsic part of the extraction involves the parsing of extracted data. Common data source formats are relational databases and flat files. the data may be rejected entirely or in part. Most data warehousing projects consolidate data from different source systems. Once data is secured. If not. Transformation: Transformation is the series of tasks that prepares the data for loading into the warehouse. 7 . but may include non-relational database structures such as Information Management System (IMS) or other data structures such as Virtual Storage Access Method (VSAM) or Indexed Sequential Access Method (ISAM). resulting in a check if the data meets an expected pattern or structure. you have worry about its format or structure. Extraction converts the data into a format for transformation processing.ETL Testing ETL Process: ETL Process involves the Extraction. Each separate system may also use a different data format. or even fetching from outside sources such as through web spidering or screen-scraping. Transformation and Loading Process.
This auditing process normally happens after the loading of data. Loading process decides the modality of how the data is added in the warehouse or simply rejected. IBM (Cognos) Javlin 8 .ETL Testing Because it will be not be in the format needed for the target. Some rules and functions need to be applied to transform the data One of the purposes of ETL is to consolidate the data in a central repository or to bring it at one logical or physical place. List of ETL tools: Below is the list of ETL Tools available in the market: List of ETL Tools Oracle Warehouse Builder (OWB) Data Integrator & Data Services IBM Information Server (Datastage) SAS Data Integration Studio PowerCenter Elixir Repertoire Data Migrator SQL Server Integration Services Talend Open Studio DataFlow Manager Data Integrator Open Text Integration Center Transformation Manager Data Manager/Decision Stream Clover ETL ETL Vendors Oracle SAP Business Objects IBM SAS Institute Informatica Elixir Information Builders Microsoft Talend Pitney Bowes Business Insight Pervasive Open Text ETL Solutions Ltd. data type. This will avoid overwriting existing data. Updating or deleting are executed at this step. might be different. ETL must support data integration for the data coming from multiple sources and data coming at different times. creating duplicate data or even worst simply unable to load the data in the target Loading: Loading process is critical to integration and consolidation. What happens to the existing data? Should the old data be deleted because of new information? Or should the data be archived? Should the data be treated as additional data to the existing one? So data to the data warehouse has to loaded with utmost care for which data auditing process can only establish the confidence level. Data cannot be used as it is. Data can be consolidated from similar systems. different subject areas. This has to be seamless operation. etc. Example the grain level. Methods like addition.
Integration testing.It ensures the solution fulfills the users’ current expectations and also anticipates their future expectations. Performance and scalability. This includes the validation of all the records. While the latter refers to where the user checks the data by comparing their MIS with the data that is displayed by the end-user tools.This is to ensure that the data loads and queries perform within expected time frames and the technical architecture is scalable. Data transformation .It promises that the ETL application correctly rejects. Data Quality . Data Transformation: Validating that the data is transformed correctly based on business rules.ETL Testing ETL4ALL DB2 Warehouse Edition Pentaho Data Integration Adeptia Integration Server IKAN IBM Pentaho Adeptia ETL Testing: Following are some common goals for testing an ETL application: Data completeness . Data Validation: Data completeness is one of the basic ways for data validation.This is meant for ensuring that all data is correctly transformed according to business rules and design specifications.To ensure that all expected data is loaded. This is needed to verify that all expected data loads into the data warehouse. corrects and reports invalid data. can be one of the most complex parts of testing an ETL application with significant transformation logic. substitutes default values. fields and ensures that the full contents of each field are loaded. Regression testing .It is to ensure that ETL process functions well with other upstream and downstream applications. Another way of testing is to pick up some sample records and compare them for validating data transformation manually. but this method 9 . Basically data warehouse testing is divided into two categories ‘Back-end testing’ and ‘Front-end testing’. User-acceptance testing . The former applies where the source systems data is compared to the end-result data in Loaded area which is the ETL testing.To keep the existing functionality intact each time a new release of code is completed.
An intelligently designed input dataset can bring out the flaws in the application more quickly. Aggregations take place in the target properly v. You may also use data generation tools or customized tools of your own to create test data. check for date fields with leap year dates iii. no data field is truncated while transforming iii. All source data that is expected to get loaded into target. The major challenge here is preparation of test data. This is a white-box testing to ensure the module or component is coded as per agreed upon design specifications. All fields are loaded with full contents− i. Regression testing. Data integrity constraints are properly taken care of System testing: Generally the QA team owns this responsibility. Surrogate keys have been generated properly iv. System testing. Unit testing: Traditionally this has been the task of the developer.ETL Testing requires manual testing steps and testers who have a good amount of experience and understand of the ETL logic. Integration testing and Acceptance testing. The developer should focus on the following: a) That all inbound and outbound directory structures are created properly with appropriate permissions and sufficient disk space. Wherever possible use production-like data. Data Warehouse Testing Life Cycle: Like any other piece of software a DW implementation undergoes the natural cycle of Unit testing. Here we test for the functionality of the application and mostly it is blackbox. Rejects have occurred where expected and log for rejects is created with sufficient details vi. Auditing is done properly c) That the data loaded into the target is complete: i. NULL values have been populated where expected v.g. All tables used during the ETL are present with necessary privileges.e. b) The ETL routines give expected results: i. Error recovery methods vii. Boundary conditions are satisfied− e. actually get loaded− compare counts between source and target and use data profiling tools ii. All transformation logics work as designed from source till target ii. No duplicates are loaded iv. We must test for all possible combinations of input and specifically check out the errors and 10 . For them the design document is the bible and the entire set of test cases is directly based upon it.
However. Also the load windows. They are the best judges to ensure that the application works as expected by them. Now the new results could be compared against the older ones to ensure proper functionality. Knowledge of the business process is an added advantage since we must be able to interpret the results functionally and not just code-wise. business users may not have proper ETL knowledge. Possibly it is the best example of an incremental design where requirements are enhanced and refined quite often based on business needs and feedbacks. Generation of error logs iv. Here we must consider the compatibility of the DW application with upstream and downstream flows. Notifications to IT and/or business are generated in proper format ii. Our test strategy should include testing for: i. a better strategy could be to preserve earlier test input data and result sets and running the same again. Data aggregations− match aggregated data against staging tables. The test team must have sufficient business knowledge to translate the results in terms of business. iii. An unbiased approach is required to ensure maximum efficiency. Data completeness− match source to target counts terms of business. v. However. refresh period for the DW and the views created should be signed off from users. Regression testing: A DW application is not a one-time solution. Sequence of jobs to be executed with job dependencies and scheduling ii. In such a situation it is very critical to test that the existing functionalities of a DW application are not messed up whenever an enhancement is made to it. Granularity of data is as per specifications. Hence. Also the load windows refresh period for the DW and the views created should be signed off from users. Generally this is done by running all functional tests for existing code whenever a new piece of code is introduced. Cleanup scripts for the environment including database This activity is a combined responsibility and participation of experts from all related applications is a must in order to avoid misinterpretation of results. 11 . Re-startability of jobs in case of failures iii. Integration testing: This is done to ensure that the application developed works from an end-to-end perspective. the development and test team should be ready to provide answers regarding ETL process that relate to data population. Acceptance testing: This is the most critical part because here the actual users validate your output datasets.ETL Testing exceptions. Error logs and audit tables are generated and populated properly. iv. The QA team must test for: i. We need to ensure for data integrity across the flow.
Another area of concern is test coverage. This phase should involve DBA team. And the task is even more difficult in the absence of any single end-to-end testing tool. refined and streamlined. We must ensure that the load window is met even under such volumes. So the strategies for testing should be methodically developed. A bug in a DW traced at a later stage results in unpredictable losses. we must test the system with huge volume of data. This is also true since the requirements of a DW are often dynamically changing. when it goes into production environment. it should not cause performance problems. a DW tester must go an extra mile to ensure near defect free solutions. Summary: Testing a DW application should be done with a sense of utmost responsibility. and ETL expert and others who can review and validate your code for optimization. 12 . Under such circumstances repeated discussions with development team and users is of utmost importance to the test team. Any DW application is designed to be scalable and robust. This has to be reviewed multiple times to ensure completeness of testing. Therefore. Here. Always remember.ETL Testing Performance testing: In addition to the above tests a DW must necessarily go through another phase called performance testing.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.