An Analysis of Data Warehouse Research
Hua-Yang Lin National Central University Ping-Yu Hsu National Central University Yu-Min Su National Central University

The study of data warehouse is a relatively new research area therefore an analysis of the literature review could be useful to show what the academics are studying and what can be done for the further research. In this paper, a classification framework for data warehouse research based on business dimensional lifecycle is presented. There are 129 identified articles published on 31 journals that covered in SCI or SCI Expanded between 1995 and 2003. The articles are classified based on a framework that consists of six major categories: project management, data design, architecture, realization, deployment and maintenance, and others. The comprehensive list of reviewed articles is also presented. Key Word: Data Warehouse, Business Dimensional Lifecycle, Data Warehouse Articles, Literature Review

1. Introduction
With the implementation of business processes automation software in enterprises worldwide, data are piled up quickly worldwide. To organize the data and help companies make sensible business decision, data warehousing implementation is expected to grow quickly (Shin, 2002; Watson et al., 2002; Mukherjee & D’Souza, 2003). Data warehouse transforms data into information in subject-oriented, integrated, nonvolatile and time-variant manners across the organization (Inmon, 2002). With the huge potential of data warehouse application, a large number of publications on data warehousing research have appeared in the past ten years. However, to the best of our knowledge, no systematic review and classification of these literatures have been done. In this paper, one

The rest of the paper is organized as following: Section 2 gives an overview of research methodology and data collection processes. As a result. and others are used to classify research topics. deployment and maintenance. only the journals covered in SCI (Science Citation Index) or SCIE (Science Citation Index Expanded) produced by the Institute for Scientific Information (ISI). Ke et al. 1995). The collected paper set is further extended with data warehouse related references listed in the collected papers. IEEE Xplore electronic library and Elsevier ScienceDirect OnSite. 2. architecture. it helps to review the historical trend of published data warehouse articles and to explore potential research areas for future study. 409 articles spread over 49 journals are found... In Section 3 the classification framework is presented. Researchers can find the latest publications with relative comprehensive searching mechanism from these databases which have now become important resources for education and research (Macdonald & Dunkelberger. . because the articles in these journals have higher research quality (Kleijnen & Van Groenendaal. 1998). 2003).Electronic Commerce Studies 4 hundred and twenty-nine articles from thirty-one journals cited in SCI and SCIE are reviewed and classified based on Business Dimensional Life Cycle (Kimball et al. For practitioners. The analysis of reviewed articles on data warehousing based on year of publication. The reason is that academics and practitioners use journals to acquire and spread knowledge in general (Nord & Nord. or both are included. it helps companies to understand the potentialities and possible issues need to be considered in data warehouse implementation projects. For academics. Bar-Ilan et al. realization. ACM Digital Library. 2003). the literature search performed by the project starts from placing full text search with the words of “data warehouse” and “data warehousing” on three popular academic on-line databases. Many journals have published electronic versions with the pervasive of on-line electronic databases (Aiguo. Six major categories: project management.. Through the process. Research Methodology The papers reviewed in the research are all published on academic journals. data design. 2000. journals and research topics is provided in Section 4. 2002. This study provides as a beginning for understanding of data warehouse research for readers interested in this area. In this research. Conclusions are discussed in Section 5. namely.

The other 227 papers mentioned the search word in the articles but do not focus on such subjects. including a journal named Journal of Data Warehousing. which are listed in Table 1. The one hundred and twenty-nine papers are published in thirty-one journals. 17 of them are not covered in the SCI index. it has been neglected. Due to the scarcity of SSCI journal compared to SCI/SCIE journals in this study. 358 articles published in 31 journals are preserved in the set. only one journal named International Journal of Information Management is covered in SSCI (Social Science Citation Index). It has gained prominence and reputation for evaluating scientific research and scholarly impact of researchers (Jin & Wang. Table 1 The journal title related to DW research covered in SCI or SCIE Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Journal title ACM SIGMOD Record ACM Transactions on Database Systems ACM Transactions on Software Engineering and Methodology Automation in Construction Communications of the ACM Data & Knowledge Engineering Decision Support Systems European Journal of Operational Research IEEE Computer IEEE Software IEEE Transactions on Education IEEE Transactions on Engineering Management IEEE Transactions on Fuzzy Systems IEEE Transactions on Information Technology in Biomedicine IEEE Transactions on Knowledge and Data Engineering IEEE Transactions on Software Engineering IEEE Transactions on Systems. Katerattanakul et al. Man and Cybernetics (Part C) IEEE Transactions on Visualization and Computer Graphics Industrial Management & Data Systems Information & Management Information and Software Technology Information Processing Letters Information Sciences Information Systems Information Systems Management International Journal of Cooperative Information Systems International Journal of Medical Informatics Journal of Intelligent Information Systems Journal of Systems Architecture Journal of Systems and Software VLDB Journal SCI 9 9 9 SCIE 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 Period Quarterly Quarterly Quarterly Bimonthly Monthly Monthly Monthly Semimonthly Monthly Bimonthly Quarterly Quarterly Quarterly Quarterly Bimonthly Monthly Quarterly Quarterly Monthly Monthly Monthly Monthly Biweekly Bimonthly Quarterly Tri-annual Monthly Bimonthly Monthly Monthly Quarterly 9 9 9 9 9 9 9 9 9 9 9 9 .Electronic Commerce Studies 5 2000. In the remaining 48 journals. 2003).. which is specifically designated to data warehouse. 1999). After the filtering. Among the 49 journals. we identify only 129 of the papers are data warehouse technology related. These journals and their articles are deleted from the collected data set. After carefully reviewing the papers.

and maintaining data warehouse. . . business requirements definition. technical architecture. 1998). and project control.Introduction / Overview .Product Selection . It encompasses project planning.Electronic Commerce Studies 6 3.DW implementation .Query Processing .Applications (5) Deployment and Maintenance .Data Quality .Security .. Based on the lifecycle model.Project Planning . (4) realization.Business Requirement (2) Data Design (3) Architecture (1) Project Management Research Topics of Data Warehouse (4) Realization (6) Others .Data Staging . and (6) others. The comprehensive classification framework of data warehouse research including major categories and subcategories is shown in Figure 1. Classification Framework The data warehouse research classification framework is based on the business dimensional lifecycle (Kimball et al. business requirement taking.Physical Design . The business dimensional lifecycle describes the phases required to implement a data warehouse project for an organization. completing implementation and deployment.Project Control . data design.Dimensional Modeling . (5) deployment and maintenance.DW Software . (3) architecture.Novel Architecture .Integration Figure 1 Classification framework of research topics in data warehouse The research issues contained in each category are further described in following: (1) Project management This category includes project planning.Metadata . the data warehouse research framework is classified into six major categories: (1) project management (2) data design.

DW software. The data warehouse must provide a mechanism to help user access data. The details of physical . security of data warehouses. Metadata maintenance is an important issue. the design of meta data. 2002). etc. and focusing on resources and capable staff requirements. y Business requirement: It impacts virtually every phase of the data warehouse project lifecycle.Electronic Commerce Studies 7 y Project planning: The subject describes the definition and scope of data warehouse project including accessing organization’s readiness for the project. authorizations. The issues include data encryption. y Project control: The subject focuses on keeping the data warehouse project on track by status monitoring. building business justification such as combining investments and returns to calculate ROI. authentication. The factors needed to be considered including price. y Product selection: Given various design and architect on the market. y DW software: It includes commercial data warehouse systems for companies to implement and some developing systems for academic research. training. y Security of data warehouses: The security issues are important to the data warehouse since many important data are collected in the system. and maintenance services. and when to update data frequently. how to organize the data. It consists of five subcategories: novel architecture. (2) Data design The data design phase design multidimensional models to hold aggregated data for queries. y Metadata: Metadata is like an index for the warehouse contents that tracking of what data is where in the warehouse (Inmon. and guideline of product selection. Business requirements decide what data must be resided in data warehouse. (3) Architecture The category is dedicated to the architect design of data warehouses. etc. the subject discusses a formal procedure to decide which product fits company need better. because it has influence on the entire warehouse from initial model through data extraction and load processes to the exploration and access of users. scope managing and ongoing communication strategy. y Novel architecture: The subject includes the topics of advanced architecture design that different from traditional data warehouse architecture such as moving the data warehouse to the Internet architecture. (4) Realization The realization phase transforms the logical design of a data warehouse project into physical implementation.

(5) Deployment and maintenance Deployment is to deliver the data warehouse related technology. construction management. transformation and load (ETL) (Kimball & Ross. and sort operations for a large volume of data. It consists of three major steps: extraction. 1988). the user satisfaction is applicable to measure the success of data warehouse. transformation makes sure the data is in a consistent state. 1992. marketing and web data. a set of materialized views can be computed. a large variety of query processing techniques are used to increase the query performance. After successfully deploying a data warehouse. view maintenance and view synchronization. it loads quality data into the warehouse. group-by. data and application to end-users along with necessary education and support. To ensure quality data in the warehouse. The research topics of materialized views include view selection. query processing. Guimaraes & Gupta. the data gathering process and full lifecycle of data warehouse must be well designed. Extraction is the process of retrieving data from a variety of sources. User satisfaction is the most useful way to measure the success of information systems (DeLone & McLean. Data warehouse satisfaction criteria and performance metrics should be monitored for measuring the success of system. To support these kinds of queries. the attention should be focus on the ongoing support and education for operation of the warehouse and future growth. (6) Others This category contains articles that discuss others aspects of data warehouse research. data quality and applications. Loading data is the final step of the ETL process. validations and conversions of the source data. y Data quality: Since data quality will impact on the credibility of data warehouse. As data warehouse is a type of IS. The applications of a data warehousing are seen to have considerable potential for different usage in the future. y Query processing: Data warehouse typically involves the execution of complex queries with join. y Physical design: The phase converts logical data design into physical database. Three subcategories are related to these issues: DW . End user education must match the role the users play. The applications cover health care management. data staging. etc. This phase includes five subcategories: physical design of data. y Data staging: The process collects operational source data and integrates the data into data warehouse. To speed up system performance. 2002).Electronic Commerce Studies 8 implementation vary according to different applications and size of projects. By modifications. y Applications: Data warehouse can be applied to many areas and industries for better decision making.

there have more opportunities to integrate data warehouse systems with others systems or technologies such as AI (artificial intelligence).. strategy. Data Analysis After we collected all the identified articles. considered factors. KM (knowledge management) and data mining. 40 35 30 25 20 15 10 5 0 2 2 2 14 10 20 19 24 36 1995 1996 1997 1998 1999 Year 2000 2001 2002 2003 Figure 2 Articles classified according to year of publication . There are only limited publications before 1997. A total of 129 articles were classified. introduction/overview and integration. y DW implementation: This includes the methodology. etc. research topics and total number of articles in the major selected journal. y Introduction/overview: General introduction to data warehouse concepts and an overview of data warehouse related technologies. we found no data warehouse literature published earlier than 1995.Electronic Commerce Studies 9 implementation. These articles are classified by year of publication. 4. 2003).1 Articles classified according to year of publication The articles that classified by year of publication from 1995 to 2003 is shown in Figure 2. y Integration: As advancements are made in decision support technologies and computer based information systems. these articles were analyzed and categorized using the classification framework in Figure 1. but the number of journal articles has expanded substantially. critical implementation factors and organizational culture changing when implementing data warehouse. 4. Although data warehouse was appeared in the early of 1990s (Jarke et al. etc.

The Communication of the ACM and Data & Knowledge Engineering have the most articles related to data warehouse research. Information Systems Management. 61%) while only three articles discusses about “deployment & maintenance” of data warehouse.Electronic Commerce Studies 10 4. we do not find any article in . In this study.3 Articles classified according to research topics The articles that classified by research topics is shown in Figure 4. The Information Systems ranks second in the data warehouse articles. Information Systems. Most of the journals lie in fields of IS (Information Systems). more than half of the data warehouse articles are concentrated in 6 journals. Although there are 31 journals covered in this study. and IEEE Transactions on Knowledge and Data Engineering. but the IS journals is still the most popular selection. It is possible to publish data warehouse articles to a variety of journals. the most widespread and prestigious journals covered the study of data warehouse include the following: Communications of the ACM. The VLDB Journal has the third high amount of literatures. Although the project management category has three subcategories. From Figure 3. and there are 13 journals contain only one related article. Others 13 2 Information and Software Technology 2 2 ACM Transactions on Database Systems 3 5 Information & Management 5 5 IEEE Computer 6 7 8 9 International Journal of Cooperative Information Systems Information Systems Management 10 11 Information Systems 13 14 Communications of the ACM 14 0 2 4 6 8 10 12 14 16 Number of articles Figure 3 Articles classified according to journals 4. the current research area focuses on project planning only. VLDB Journal. The most plenty published research area is in the realization phase of data warehouse (78 articles.2 Articles classified according to journals The articles that classified by journals is shown in Figure 3. Data & Knowledge Engineering.

followed by 25% (19 articles) related to the applications of data warehouse. Table 3 lists the number of articles in realization topics. 37% of the articles (29 articles) are related to physical design. The number of architecture articles is shown in Table 2. there is no article on topics of security and product selection. There are 12 articles (52%) discussing about data warehouse implementation and 7 articles (30%) are general introduction or overview of data warehouse. Table 4 shows the number of articles not covered in previous topics.Electronic Commerce Studies 11 topics of project control and business requirement. Number of articles 78 (61%) 4 (3%) Project Management 13 (10%) 8 (6%) Architecture Realization Data Design 3 (2%) Deployment & Maintenance 23 (18%) Research topics Figure 4 Articles classified according to research topics Table 2 Number of architecture articles Architecture Novel architecture DW software Metadata Total Number of articles 4 (31%) 3 (23%) 6 (46%) 13 (100%) Table 3 Number of realization articles Realization Physical design Data staging Query processing Data quality Applications Total Number of articles 29 (37%) 14 (18%) 12 (15%) 4 (5%) 19 (25%) 78 (100%) Others . According to the classification scheme. 46% of the articles (6 articles) are related to metadata area while 31% of the articles (4 articles) talk about novel architecture.

090 005. 063. 119. 126. 073. 059. 128 007. 117 Integration 048. 021. 084. 061. 066. 043. 071. 074. 056. 002. 124 060. 065. 044. The articles identified in this research originate from SCI and . 053. 108 017. 015. 105 003. 040. 082. 010. 107 036. 092. 055. 054. 121. Conclusion The number of articles within the IS community on data warehouse research seems minority compared to the size of business it generated and been used. 072. 087. 058. 031. 079. 069. 123 Introduction/overview 024. 101. 018. 046. 047. 029. 035. 057. 039. 052. 086. 120. 088. 114 019. 067.Electronic Commerce Studies 12 Table 4 Number of general articles Others DW implementation Introduction/overview Integration Total Number of articles 12 (52%) 7 (30%) 4 (18%) 23 (100%) A summary of all reviewed articles that correspond to the classification framework is shown in table 5. 106 026. 045. 112. 081. 115. 027. 050. 113. 004. 103. This table can help anyone who desired to look for data warehouse articles in a specific field. 094. 110. 096. 129 001. 098. 064. 033. 023. 102. 012. Table 5 Summary of reviewed literature References Project management Project planning Data design Dimensional modeling Architecture Novel architecture DW software Metadata Realization Physical design 007. 089 Note: The list of papers is shown in appendix. 008. 109. 013. 076. 095. 5. 042. 091. 116. 020. 037. 062. 032. 099. 127. 125 028. 104. 080. 030. 016. 097. 118 009. 122. 038. 025. 077. 075. 051. 014. 022. 083. 011. 085. 068. 078. 041. 093. 111. 034. 070. 100. Data staging Query processing Data quality Applications Deployment & maintenance Others DW implementation 049.

No. 346-361. The Journal of Academic Librarianship. Data warehouse systems have gained popularity among the companies in a variety of industries. pp. Acquisition. 2003. The classification framework of research topics raised in this paper is intended for researchers and practitioners who are interested in data warehouse research. McLean. it is better to combine IS and domain knowledge of specific applications or industries. and E. L. there is a wide range of professional and scholarly communities pay attention to the applications of data warehouse. the number of publications on data warehouse will grow progressively in the next few years. and B. Nevertheless. The reason is that the organizations will take much more efforts and time to implement data warehouse during realization phase. 27. 2003. According to our classification scheme.H. Vol. Most of the journals belong to IS field. DeLone. . Wolman. 1999) and some IS conferences dedicated panels or mini-tracks to the subject such as ACM international workshop on Data warehousing and OLAP and Hawaii International Conference on System Science. 29. Information Systems Research. because many universities created research areas and courses in data warehouse systems (Pierce. The information contained in this research is intended only to provide a general summary. Bar-Ilan.C. In our opinion.Electronic Commerce Studies 13 SCIE journals published between 1995 and 2003. This study shows that data warehouse researchers mainly focus on issues related to the realization phase especially on physical design and applications topics while lesser emphasis on deployment and maintenance problems. This suggests that data warehouse related research could or should be interdisciplinary. 261-267. To proceed data warehouse research. Peritz and Y. This would encourage or motivate academics and practitioners in further exploration of the possible data warehouse research areas. there are some areas which discussed in this article have no related research. Library Collections. 6. Information Systems Success: the Quest for the Dependent Variable. & Technical Services. etc. pp. Vol. 3.. Vol. J. the greater weight of realization topics may be the result of the bias of the IS journals. Data warehousing offers many potential areas for research. A Survey on the Use of Electronic Databases and Electronic Journals Accessed Through the Web by the Academic Staff of Israeli Universities. This paper aims at setting up a business dimensional lifecycle based framework for classifying data warehouse research issues. Calis: Acquiring Electronic Resources. In addition to IS field.R. References Aiguo. W.

Nord. Chen. 1. Vol. Information & Management. E. 1-23. Pierce. 2000. Fundamentals of Data Warehouses 2nd edition. Information Processing and Management. Communications of AIS. 2002. Kimball. 7. Vol.D. Thornthwaite. M. and B. Han and S. 1988. New York. Mukherjee. Think Phased Implementation for Successful Data Warehousing.H. John Wiley & Sons. 82-90. Library & information Science Research. pp. 24. New York. Jin. Springer. 2. and R. No. . 111-114. No. 46. 1. Lenzerini and Y. 2003. pp. 581-592. Kwakkelaar and Y.H. B. and B. Vol. Vol.C. 325-332.Electronic Commerce Studies 14 No. 2004. Van Groenendaal. 551-570. 1999. 1998. and L. Ke. pp. 60-95. 2. 39. No. Tai and L. Vol. 265-291. R.. 301-307. Nord. pp. Developing and Delivering a Data Warehousing and Mining Course.M. P. J. The Data Warehouse Toolkit: the Complete Guide to Dimensional Modeling 2nd edition. pp. 16. 2002. Kleijnen. Communications of the ACM.. Measuring the Quality of Publications: New Methodology and Case Study. Vol. T. B. Ross and W. D'Souza. 2002. 2. Objective Quality Ranking of Computing Journals. and R. The Data Wwarehouse Lifecycle Toolkit. Vol. 1995. 10. Research Strategies. No. Jarke. New York. Macdonald. pp. 36. Inmon. 4. 1999.P. pp. Katerattanakul. Exploring Behavior of E-journal Users in Science and Technology: Transaction Log Analysis of Elsevier’s ScienceDirect OnSite in Taiwan”. Measuring Top Management Satisfaction with the MIS Department. Vol. Scientometrics. No. Vassiliou and P. No. and G. 2002. 2000. and M. 2003. Guimaraes. Information Systems Management. Hong. B. H. Wang. pp.17-24. Information & Management. Shin. Fill-text Database Dependency: an Emerging Rrend among Undergraduate Library Users?. 29-42. MIS Research: Journals Status and Analysis. 1992. 45. W. 1. R. Reeves and M. Vol. John Wiley & Sons. Ross.. Vassiliadis. Germany. pp. J. A Case of Data Warehousing Project Management. pp. D. 16. and Y. Vol. Building the Data Warehouse 3rd edition. Chinese Science Citation Database: its Construction and Application. John Wiley & Sons. OMEGA. 20. and W. Dunkelberger. No. and M. 29. Gupta. Kimball. and D.

Goodhue and Wixom.H. pp. 2002.L. 6. H.. B. 39. Vol. Information & Management.J. No. 491-502.Electronic Commerce Studies 15 Watson. . and D. The Benefits of Data Warehousing: Why some Organizations Realize Exceptional Payoffs.

