You are on page 1of 7

2011 Ninth International Conference on ICT and Knowledge Engineering

Towards a Data Warehouse Testing Framework
Neveen ElGamal
Information Systems Department Faculty of Computers and Information, Cairo University Cairo, Egypt n.elgamal@fci-cu.edu.eg

Ali El Bastawissy
Information Systems Department Faculty of Computers and Information, Cairo University Cairo, Egypt alibasta@fci-cu.edu.eg

Galal Galal-Edeen
Information Systems Department Faculty of Computers and Information, Cairo University Cairo, Egypt galal@acm.org

Abstract --- Data warehouse (DW) testing is a very critical stage in the DW development because decisions are made based on the information resulting from the DW. So, testing the quality of the resulting information will support the trustworthiness of the DW system. A number of approaches were made to describe how the testing process should take place in the DW environment. In this paper we will present briefly these testing approaches, and then a proposed matrix that structures the DW testing routines will be used to evaluate and compare these approaches. Afterwards an analysis of the comparison matrix will highlight the weakness points that exist in the available DW testing approaches. Finally, we will point out the requirements towards achieving a homogeneous DW testing framework. In the end, we will conclude our work. Keywords: Data Warehouse Testing, Data Warehouse Quality

the worthiness of an entity through a group of tests while Evaluation is the process of analyzing, reflecting upon, and summarizing assessment information and making judgments or decisions based upon the information gathered [2]. DW quality is different from the other terms as it refers to the combined outcome of the three processes. It is widely agreed upon that the DW is totally different from other systems such as Software or Transactional Systems. Consequently, the testing techniques used for these other systems are inadequate to be used in DW testing. Here are some of the differences: x DW always answers Ad-hoc queries, which makes it impossible to test prior to system delivery. On the other hand, all functions in the software engineering realm are predefined. DW testing is data centric, while software testing is code centric. DW always deals with huge data volumes. The testing process in other systems ends with the development life-cycle while in DWs it continues after the system delivery. “Software projects are self contained but a data warehouse project continues due to decision-making process requirement for ongoing changes” [3]. Most of the available testing scenarios are driven by some user inputs while in DW most of the tests are system-triggered scenarios. Volume of test-data in DW is considerably large compared to any other testing process. In other systems test cases can reach hundreds but the valid combinations of these test cases will never be unlimited. Unlike the DW, the test cases are unlimited due to the core objective of the DW that allows all possible views of data. [4]. DW testing consists of different types of tests depending on the time the test is taking place for example; Initial data load test is different from the incremental data load test.

I. INTRODUCTION During the development of DWs, a considerable amount of data is integrated, structured, cleansed, and grouped in a single framework that is the DW. A number of changes take place on the data which could lead to data manipulation and corruption. “Data warehouse projects fail for many reasons, all of which can be traced to a single cause: nonquality.” [1]. There should be a way of guaranteeing that the data in the sources is the same data that reached the DW, and the data quality is improved; not lost. In the data warehousing process, data passes through several stages each one causing different kind of changes to the data to finally reach the user in a form of a chart or a report. It is not the best approach to compare the DW system outputs and the data in the data sources to test if the DW system is working properly. This type of test is an informative test that will take place at a certain point in the testing process but the most important part in the testing process should take place during the DW development. Every stage and every component the data passes through should be tested to guaranty its accuracy and data quality preservation or even improvement. DW Assessment, Evaluation, Testing and Quality are most of the time used as synonyms which refer to how good the DW is. Linguistically Assessment and Evaluation are synonyms however in DW field, Assessment is the process of measuring 65

x x x

x x x

x

As shown in figure 1, DW system consists of a number of inter-related components:

978-1-4577-2162-5/11/$26.00 ©2011 IEEE

the author divided the data warehouse testing into: x Requirements Testing x Unit testing x Integration testing x Acceptance testing Figure 1. A different trend was taken by authors to present some automated tools for the DW testing process like [16. Decision Support tools. the authors presented a DW testing approach that they named a DW validation strategy. DW System Architecture Each component needs to be tested to verify its efficiency independently. 9-15]. the author introduced a DW testing and validation technique. Wipro Technologies company presents in this white paper their data warehouse testing strategy. The connections between the DW components are groups of transformations that take place on data. Data validation 4. Integration testing. They chose to customize the contents of these tests in order to be adequate to be used for DW testing. recovery. Some authors presented a DW testing methodology like [7. 17] and from a different 66 . Others. Section III will introduce the matrices that will be used later in section IV to compare and evaluate the existing DW testing approaches. In [7]. Acceptance testing.  Approach II: tends to follow the source through the Extraction Transformation Loading (ETL) process then into the target warehouse. Section V will analyze the comparison matrix to highlight the drawbacks and weaknesses that exist in the area of DW testing. 2. etc… The remainder of this paper will be organized as follows. we’re concerned with the types of tests that must be considered while testing the DW. outputs of Decision support and analysis tools) should be compared with the original data existing in the DSs. In addition.x x x x x Data Sources (DS) Operational Data Store (ODS)/ Data Staging Area (DSA) Data Warehouse (DW) Data Marts (DM) And. and Analysis Tools perspective. 1. Section VI will use the analysis in section V to state the needs for a DW testing framework. 1. User Interface (UI) Applications . reliability. to fill the gap of not finding a generic DW testing technique. II. Integration testing 2. section II will briefly survey the existing DW testing approaches. and performance testing. Reports.Ex. This attempt was proposed during a DW testing project. 10. System testing 3. Finally. 18]. robustness. Some were made by companies offering consultancy services for DW testing like [5-8]. They concentrated on validating the data that is loaded in the DW and checking its credibility. In [12]. These transformation processes should be tested as well to ensure data quality preservation. This took place via 2 main approaches:  Approach I: tends to follow the data from the source to the target warehouse. System and acceptance testing. In [11]. 4. In this paper we are only concerned with the attempts considering how to test the DW. had proposed one as a research attempt like [3. OLAP reports. The results of the DW system (ex: Charts. EXISTING DW TESTING APPROACHES A number of trials had been made to address the DW testing process. They have used the standard software testing process that includes: Unit testing. In other words. He had broken the testing and validation process into four well defined and high-level processes namely. The rest of this section will introduce these attempts in a chronological order and a comparison between them will be presented later in the following sections. from the operational point of view the DW system should be tested for performance. In [9]. Using these 2 approaches they have divided the process of testing into consecutive levels: ƒ Constraint testing ƒ Source to target Counts ƒ Source to target data validation ƒ Error processing ƒ Defect Tracking 3. they presented an abstract life cycle for testing the DW application. Finally we will conclude our work in section VII.

4. and Front-end. 9. In [13]. the author introduced an abstract DW testing methodology as follows:  Use of Traceability to enable full test coverage of Business Requirements  In depth review of Test Cases  Manipulation of Test Data to ensure full test coverage  Provision of appropriate tools to speed the process of Test Execution & Evaluation  Regression Testing He also stated that the DW Testing Types (routines) are: 1. The authors then highlighted how these activities split into smaller more distinctive activities to be performed during the DW testing process. User Acceptance Testing. ETL Procedures. ƒ Metadata and. Conceptual Schema. These tests are: ƒ ETL Testing ƒ Functional Testing ƒ Performance Testing ƒ Security Testing ƒ User Acceptance Testing ƒ End-to-end Testing published in [18-20] The testing activities can be split into four logical units regarding: ƒ Multidimensional database testing.What was unique for this approach is that it tested the data granularity on its lowest level and it also verified the user requirements with the resulting data. In [6]. the author presented DW testing types with respect to DW development stages and illustrated the DW testing focus points categorized into 2 main high-level aspects: ƒ Underlying Data: . he stated how these tests can be conducted using the Microsoft SQL Server tools. 2. Database. The DW testing goals are: ƒ Data completeness ƒ Data transformation ƒ Data quality ƒ Performance and scalability ƒ Integration testing ƒ User-acceptance testing ƒ Regression testing 7. To be able to test these components. Integration Testing. In [10]. 3. the author stated that during the process of building the DW with its ETL tools and applications. They have stated that the components that needs to be tested are. the authors suggested a proposal for basic DW testing activities (routines) as a final part of the DW testing methodology. Other parts of the methodology were 67 10. 8. In [3]. ƒ OLAP testing. 6. 5. Moreover. Operation readiness Testing 6. In [15]. These test types are: ƒ Functional test ƒ Usability test ƒ Performance test ƒ Stress test ƒ Recovery Test ƒ Security test ƒ Regression test A comprehensive explanation of how the DW components are being tested by the above testing routines is then explored showing what type of test(s) is suitable for which component as shown in table I. ƒ Data pump (ETL) testing. the author concentrated on testing the ETL Applications since most of the work is done through it. In [14]. System Testing. 5. Logical Schema. He has stated the testing goals that are required to be met after building the DW. six types of testing need to be conducted. Technical Shakedown Testing. TABLE I: DW COMPONENTS VS TESTING TYPES [3] The author did not give considerable attention to the DW testing methodology but concentrated on how these tests are conducted in the DW environment. Unit Testing. they have listed eight test types that best fit the characteristics of DW systems. the authors introduced data warehouse testing activities (routines) framed within a DW development methodology introduced in [21].

x WHERE: presents the component of the DW that this test targets. Performance and scalability 2. o DW to Data Marts: Presents the testing routines targeting the data marts and the transformations that take place on the data used by the data marts and the data marts themselves. WHAT: represents what these routines will test in the targeted DW component. Finally. The intersection of rows and columns indicates the coverage of the test routine in this approach where “——” represents full coverage and “—”represents partial coverage. the rows represent the ‘where’ dimension. o Data Staging Area to DW: Presents the testing routines targeting the loading process. PROPOSED DW TESTING MATRICES when any change takes place on the design of the system. Component orchestration testing (Integration Test) 3. WHEN will this test take place? o Before System Delivery: A one time test that takes place before the system is delivered to the user or Schema Data DSÆODS ODSÆDW DWÆDM DMÆUI Backend Frontend Operation IV. and the DW itself. o Schema: focuses on testing DW design issues. We were able to compare only 10 approaches. the columns represent the ‘what’ dimension. the ‘when’ dimension that indicates whether this test takes place before or after system delivery is represented by color highlighting the tests which take place after the system delivery. robustness. data selection. As shown in table II. the ‘what’ and ‘where’ dimensions classify the test routines on the rows. o After System Delivery: Redundant test that takes place several times during system operation. x x 68 . TABLE II: DW TESTING MATRICES DW testing process consists of a number of testing routines. o Data: concerned with all data related tests like data quality. In our study we focused on the approaches. while the tests that take place during the system development or when the system is subject to change are left without color highlighting. data presentation. regression. Regression Testing III. This divides the DW architecture as shown in figure 1 into the following layers: o Data Sources to Operational Data Store: Presents the testing routines targeting data sources. as not enough data was available for the rest of the approaches. Data Complying with the transformation logic in accordance with the business rules DW Components: 1. data transformation. The DW testing approaches are represented on the columns. etc… o Operational: tests the data warehousing as an integrated product to confirm its reliability. a comparison matrix is presented in table III showing the test routines that each approach covered. ‘where’ and ‘when’ testing categories will result in a 3 dimensional matrix. wrappers. and when these tests will take place. etc… and tests that are concerned with the process of putting the DW into operation. The ‘what’. These routines could be categorized by what. o Data Marts to User Interface: Presents the testing techniques targeting the transformation of data to the Interface applications and the interface applications themselves. showing what to test and how to test it and not the attempts presenting how to automate the testing process. extractors. transformations and data staging area itself. and later on the ‘when’ dimension shall be represented in color in the following section when this matrix is used to compare the existing DW testing approaches to show to what extent did the testing approaches cover the aspects of the DW testing process. APPROACHES COMPARISON AND EVALUATION After studying how each proposed DW testing approach addressed the DW testing and according to the DW testing matrices defined in the previous section. where. Data Coverage 2.ƒ 1.

TABLE III: DW APPROACHES COMPARISON 69 .

V. etc… Some of the contents of the testing routines will differ from one DW to another according to its DW type. This architecture makes the Data Marts layer acts as both the DW and the Data Mart interchangeably.As it is obvious in table III. there are some test routines that are not addressed by any approach like. Other approaches like [7. precision. Data Quality factors: as presented in [22] are Completeness. etc. none of the proposed approaches addressed the entire DW testing matrices. COMPARISON MATRIX ANALYSIS By studying carefully the existing DW testing approaches. Lots of integration takes place in the ODS which may lead to severe data loss or data corruption if the conceptual model where the data is loaded in happens to be incorrect.0. The approaches proposed in [14. None of the existing approaches covered all the tests needed to guarantee the efficiency of DW after delivery. consistency. These tests affect the quality. None of the existing approaches was targeted to the Unconventional DW types like Spatial DW.. This is simply because each approach addressed the DW testing process from its own point of view without leaning on any standard or general framework. Ignoring the additivity of measures along dimensions may cause the generation of misleading data so it is mandatory to guard the additivity of measures in the Data marts. 13] did not include the ODS layer. Precision. Other attempts used their own framework for the DW environment according to the case they are addressing for example. d. DM Schema Design: Data marts are miniature DWs that need to be designed and validated to ensure data quality preservation. granularity. Active DW. the ODS conceptual model tests.. completeness. incorrect data aggregation. Each of these quality factors has a great influence on the overall quality of the DW. Improper DM schema could lead to misleading the decision makers with incorrect data display. To fully cover the testing process of DW a framework needs to fill this gap. but real life is not always perfect. Currency. Some of the attempts considered only parts of the DW framework shown in figure 1. Some major components of the DW were not tested by any of the proposed approaches which is the DM Schema and the additivity of measures in the DMs. 9. 5. Defects of data quality will eventually lead to failure in providing accurate business information. ODS Conceptual Schema: The ODS has a conceptual model that carries data from a number of heterogeneous data sources. Facts are sometimes semi-additive which means that the fact defined in the DM is additive on some but not all the dimensions. DW testing approach that could be used in any project. 18. Some of the above test routines could be automated but none of the proposed approaches showed how these routines could be automated or have an automated assistance. Lack of proper DM design could lead to data loss. The existence of a generic. Data Quality factors like accuracy. which are different in the structure and implementation. [3] used a DW architecture that does not include either an ODS or DW Layers. DW2. Retention. efficiency and effectiveness of the DW severely. Continuity. The existing DW testing approaches missed testing some of the DW components. they presented the two approaches independently not showing how the testing routines can fit in a complete DW testing life cycle. inadequate dimension hierarchy. Precedence and Balancing. The data is loaded from the Data Sources to the Data Marts directly. it is evident that the DW environment lacks the following: 1. 2. we now state the requirements . continuity. well defined. A specialization of these test routines needs to be defined in order to make the DW testing approach applicable for all DW types. VI. b. From another perspective. and violating the additivity of facts with respect to dimensions. 19] were the only ones focusing on both the DW testing routines and the life cycle of the testing process. For example: Inventory is non additive on the time dimension but it is additive on the location and supplier dimensions. and it is almost prohibited to be nonadditive. c. Temporal DW. The life cycle was presented as follows: o Test Plan o Test Cases o Test Data o Termination Criteria o Test Result Nevertheless. Accuracy. Additivity Guards: Facts are always preferred to be fully additive. having an automated support for some of the test routines is a must to accelerate the testing process. REQUIREMENTS FOR A DW TESTING FRAMEWORK After pointing out the factors of weakness that exist in the DW testing environment. 4. Due to the huge amount of data in the DW 70 6. and the considerable number of tests that the DW passes through during development and after delivery. Duration. These tests are: a. 3.

W. R. CONCLUSION Some trials have been carried out address the DW testing. Moravþík. Moravþík. Inc. and F." in www." in Building a data warehouse with examples in SQL server: Apress. P. Data Warehouse Design: Modern Principles and Methodologies: McGraw Hill." in www. in addition to gaining the end user’s trust for the results he gets from the tested DW.umdnj." in www. "A Comprehensive Approach to Data Warehouse Testing. 2002. Verschelde. K. "Where are the Articles on Data Warehouse Testing and Validation Strategy?." in www. VII.edu.com. Is generic enough to be used in several DW testing projects. 7-11. pp. Bhat.wipro. Bateman. Tanuška. Rizzi. Having a generic DW Testing Framework that addresses all the aspects of the DW testing process will ensure the quality of the DW or even improve it. 2007." in ACM 12th international workshop on Data warehousing and OLAP (DOLAP '09) Hong Kong. P. Zeman. The DW testing environment requires a DW Testing Framework that: 1.nl.com. (Infokit-3) Part II: 3 meždunarodnaja nature-techniþeskaja konferencija. Golfarelli and S. Comprehensively define all test routines to minimize ambiguity. Executive-MiH. 2006. and J. "Data Warehouse Testing is Different. Test Cases c. Rainardi. "Data Warehouse Testing.Information-Management. J." in www. most of them were oriented to a specific problem and none of them was generic enough to be used in other data warehousing projects. VIII.com. "Automated ETL Testing in Data Warehouse Environment. Golfarelli and S." in www.inergy." 2010." in www." in www.. K. "CTG Data Warehouse Testing. L. It should also include suggestions for using existing automated test tools to minimize the amount of work done to get automated support in the DW testing process. Brahmkshatriya." in www.Stickminds." in Software Testing Analysis and Review (STAREAST). "How to Throughly Test a Data Warehouse. M. English." in 20th Central European conference on Information and Intelligent Systems." in 19th Central European Conference on Information and Intelligent Systems (CECIIS). CTG. Scanlan. P. Tanuška. "Testin your Data Warehouse. Inergy. Provide tests for all the DW components and transformations.which the DW testing environment needs. Evaluation. 2002. R. 2010. P. M. Arbuckle. 2009. Varaždin. C. Larson. [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] 71 . 2009.ctg. 6. REFERENCES [1] L. 2002." in www. "Strategies for Testing Data Warehouse Applications. Improving Data Warehouse and Business Information Quality (Methods for Reducing Costs and Increasing Profits). V.InfoSys. Miksa. Tanuška. Termination Criteria e. Stavropol. 2008.Information-Management. "Data Warehouse Testing. Test Results Supports testing unconventional DW types by providing a specialization of testing routines that is adequate for each type of DW. Gent. Schreiber. Florida. P. 3. "TDWI Data Cleansing: Delivering High-Quality Warehouse Data. 2007. Tanuška. Munshi.com. "SSN Solutions. Sharma.com. 2. Cooper and S. Presents the testing routines within a DW life cycle that includes the following: a. 2003. 2009. P. C. D. Orlando. Miksa. Test Plan b. "Testing a Data Warehouse Application. Testing and Grading. Russia. 4." in proizvodstvo obrazovanii. P. 2008. "The Realization of Data Warehouse Testing Scenario. SSNSolutions. 2007.Stickminds. "Assessment." in European conference on the use of Modern Information and Communication Technologies (ECUMICT). "Data Warehouse Testing . Theobald. 2007.ssnsol. O. Test Data d. Važan." in www. China. pp.com." The Data Warehouse Institute 2008. O. 1999. Croatia. A. [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] 5.com. Belgium. "The Proposal of Data Warehouse Testing Activities. P. "The Proposal of the Essential Strategies of Data Warehouse Testing. P. Rizzi. and F. 2008. Presents how the test routines can be automated or get automatic support if full automation is not applicable for this specific routine.com. S.Practical. Važan. New York: John Wiley and Sons. 2008. "Test Automation: In Data Warehouse Projects. 63-67. “It appears that all the experts want to tell us how to build these things (DWs) without ever addressing the issue of validating its accuracy once it is loaded” [9]. M. Kopþek. 2003. Mathen.com.automatedtestinginstitute. 2007. "The proposal of Data Warehouse Testing Scenario. and M.