This action might not be possible to undo. Are you sure you want to continue?
12th August 2002 MBUS 626 INFORMATION SYSTEMS THEORY AND PRACTICE Dr. CHEN
Many corporations are experiencing significant business benefits of using data warehouse technology. Users report gains in market competitiveness through increased revenue and reduced costs through information management. Data warehousing is thus a major issue within most organizations, and thus the development of a data warehouse with a strong base is essential. This paper aims to present the important concepts of Data Warehousing such as Data Warehousing tools and the benefits of Data Warehousing, that a manager must understand in order to execute a successful Data Warehousing project in his/her company.
Keywords: Data Warehouse Technology, Market Competitiveness,
Data Warehousing tools, benefits of Data Warehousing.
It has become a strategic imperative for corporations to know more about its customers and prospects than ever before. Corporations are competing in a world that is moving faster, and in more directions, than at any time in history. The need for information is growing at an increasing rate thus, the more we know, the more we need to know. Where once it was the job of the information technologists to study customer data; nowadays even the president of the company may need to sieve through the databases of the corporation to retrieve clues for better marketplace performance. (Rolleigh and Thomas,
The importance of data warehousing in the commercial segment arises from the need for enterprises to gather all of their information into a single place for in-depth analysis, and the desire to decouple such analysis from online transaction processing systems. (Widom,
But this isn't an easy endeavor. Before people can get ready access to data, and be able to make meaningful analysis of it, a lot of behind-the-scenes groundwork must be done. It is insane to just go out and have your information technology organization buy a data warehouse. If any corporation does take such an action it will undoubtedly join the ranks of many big-name corporations that have made humongous investments that have failed to provide any return on investment (ROI). If a company does indeed want to succeed with data warehousing, it has to build crossorganizational consensus and support for a way of business that is empowered by real customer data. And then the warehouse has to be tailored for the specific requirements of the company, this means a lot of careful, tedious and time-consuming steps are required.
(Rolleigh and Thomas, 2002)
Let's look at what goes into creating a rich data warehouse, and what we need to know about it. This paper will first introduce the concept of data warehouse in a simple 3
straightforward manner followed by the major components of a data warehouse and the various structures of a data warehouse. The paper will then follow on by presenting the data warehousing methodologies. And finally it will discuss the advantages and disadvantages of data warehousing ending with the conclusion.
WHAT IS A DATA WAREHOUSE
Data warehousing is a concept. It is a set of hardware and software components that can be used to better analyze the massive amounts of data that companies are accumulating to make better business decisions. Bill Inmon widely considered as the 'father' of data warehousing describes it as: "A subject-oriented, integrated, time-variant, non-volatile collection of data in support of management's decision -making process" This implies that within a warehouse, the data will be organized by entity (such as customer or product) rather than application (sales or purchase), that both current and historical, time stamped data will be present and that once stored in the warehouse it cannot be changed (Figure 1). (Davis et al, 1999)
However, a simple definition of data warehouse, as Ralph Kimball puts it, is “(a) data warehouse is a copy of transaction data specifically structured for querying and analysis.”
But this definition does not encompass the entirety of data warehousing. Sometimes nontransaction data are stored in a data warehouse - though probably 95-99% of the data usually are transaction data. Additionally, "querying and reporting" rather than "query and analysis" is key when talking about the functionality of data warehousing because the main output from data warehouse systems are either tabular listings (queries) with minimal formatting or highly formatted "formal" reports. Queries and reports generated from data stored in a data warehouse may or may not be used for analysis. (Greenfield) Nevertheless, data warehousing doesn't just make data available. Data warehousing is the process of making your operational data available to your business managers and decision support applications. Proper warehousing focuses on efficient information access. Of course, this efficiency doesn't happen magically. Corporations must first identify what it is that they require from the data and the decision support applications, and then they must evaluate the current operational data to determine how to transform that data into what adds value to the output provided by the corporations. The tools that you choose for your warehousing solution will take data from your operational systems (extract it), convert your operational data into business information using your defined business rules (transform it), and create a data warehouse (load it).
Finally, we can explain data warehouse as it being analogous to a physical warehouse. Operational systems create data "parts" that are loaded into the warehouse. Some of those parts are summarized into information "components" and stored in the warehouse. Data
Warehouse users make requests and are delivered information "products" that are created from the components and parts stored in the warehouse. A Data Warehouse is typically a blending of technologies, including relational and multidimensional databases, client/server architecture, extraction/transformation
programs, graphical user interfaces, and more. Data warehousing is one of the hottest industry trends - for good reason. A well-defined and properly implemented data warehouse can be a valuable competitive tool. (Perkins)
THE COMPONENTS OF A DATA WAREHOUSE
The following describes the components of a data warehouse (Figure. 2)
Figure 2 Summarized Data :- There are two kinds of summarized data, lightly summarized data and highly summarized data. Lightly summarized data are the hallmark of a Data Warehouse. All departments in a corporation do not have the same information requirements, so effective Data Warehouse design provides for customized, lightly summarized data for every department. Highly summarized data are primarily for the executives. Highly summarized data can come from either the lightly summarized data 6
used by enterprise elements or from current detail. If executives require more detailed information they have the capability of accessing increasing levels of detail through a "drill down" process. Current Detail :- The heart of a Data Warehouse is its current detail, where the bulk of data resides. Current detail comes directly from operational systems and may be stored as raw data or as aggregations of raw data. Every data entity in current detail is a snapshot, at a moment in time, representing the instance when the data are accurate. Current detail is typically two to five years old. Current detail refreshment occurs as frequently as necessary to support enterprise requirements. System of Record :- A system of record is the source of the data that feed the data warehouse. Data in a data warehouse differ from operational systems data in that they can only be read, not modified. Thus, it is necessary that a data warehouse be populated with the highest quality data available, i.e., data that are most timely, complete, accurate, and have the best structural conformance to the data warehouse. Integration and Transformation Programs :- Even the highest quality operational data cannot usually be copied, as is, into a data warehouse. As operational data items pass from their systems of record to a data warehouse, integration and transformation programs convert them from application-specific data into enterprise data. These integration and transformation programs perform functions such as: Reformatting, recalculating, or modifying key structures; Adding time elements; Identifying default values; Supplying logic to choose between multiple data sources; Summarizing, tallying, and merging data from multiple sources. When either operational or Data Warehouse
environments change, integration and transformation programs are modified to reflect that change. Archives :- Data Warehouse archives contain old data (normally over two years old) of significant, continuing interest and value to the enterprise. There is usually a massive amount of data stored in the Data Warehouse archives, with a low incidence of access. Archive data are most often used for forecasting and trend analysis. Archives include not only old data (in raw or summarized form); they also include the metadata that describes the old data's characteristics. Metadata :- One of the most important parts of a Data Warehouse is its metadata - or data about data. Also called Data Warehouse architecture, metadata is integral to all levels of the Data Warehouse, but exists and functions in a different dimension from other warehouse data. Metadata that is used by Data Warehouse developers to manage and control Data Warehouse creation and maintenance resides outside the Data Warehouse. (Perkins) Along with the various components of a data warehouse there are various structures of a data warehouse too.
STRUCTURES OF A DATA WAREHOUSE
There are various structures of a data warehouse that a corporation can adopt based on its needs. The physical data warehouse, the logical data warehouse and the data mart.
STRUCTURES OF A DATA WAREHOUSE
Physical Data Warehouse :- is a physical database in which all the data for the data warehouse are stored, along with metadata and processing logic for scrubbing, organizing, packaging and processing the detail data. Logical Data Warehouse :- like physical data warehouse also contains metadata, including enterprise rules and processing logic for scrubbing, organizing, packaging and processing the data, but does not contain actual data. Instead, it contains the information necessary to access the data wherever they reside. This structure is effective only when there is a single source for the data and they are known to be accurate and timely. Data Mart :- is a subset of an enterprise-wide data warehouse, which typically supports an enterprise element (department, region, function, etc.). As part of an iterative data warehouse development process, an enterprise builds a series of physical (or logical) data marts over time and links them via an enterprise-wide logical data warehouse or feeds them from a single physical warehouse. (Perkins)
DATA WAREHOUSING METOHODS
Several warehousing methodologies are used throughout the warehousing community. All of these fall into one of two categories: the big bang approach or the iterative approach.
Big Bang Approach
A big bang methodology tries to solve all known problems by creating a huge data warehouse before you release it for evaluation and testing. Many people believe that this process is necessary to deliver on your objectives. Based on your objectives, amount of data to be incorporated, and your intimate knowledge of your business and data, you may be able to accomplish your warehousing project with a big bang methodology. But, there are some considerations to take heed of: To create a data warehouse, the corporation must plan its warehouse, evaluate and install the necessary software and hardware, collect business requirements, and become familiar with its corporate data. While these tasks are taking place:
The business goals of the corporation can change due to changes in the market or technology.
Management supporters can lose interest in this project if you don't keep them involved and show rapid results.
The corporate data could change (they may start collecting Web log data) New releases of the firm’s chosen software may become available (warehousing is still an evolving market and even the best tools continue to improve and change).
The items listed above are just a few of the business and technical changes that could impact your plans. If you are not plugged into the proper channels, any one of these changes could cause your project to fail because you cannot quickly respond the necessary changes.
With an iterative methodology, the corporation breaks its warehousing project into small, manageable chunks, referred to as projects. In the iterative approach, the same planning tasks are performed that are required in the big bang approach, but evaluation of all of your deliverables up front is not required. The corporation must design its overall architecture, but when entering the planning phase, it needs to concentrate only on its first project or iteration. After each project, review of the architecture, its development process, and the corporation’s business requirements is done. The value of smaller projects within the larger warehousing process is:
It shows a faster return on the company investment because it delivers one solution quickly. This keeps the management supporters involved and interested in the project.
It can adjust to changes in the business requirements faster because the team is small. Manageable projects that have short delivery schedules.
Early involvement by the corporation’s user community provides real-situation testing, which provides it with user needs and defect reports. Additionally, users are provided with better feedback when they can see the system than when they have to envision it from a slide presentation. This feedback can improve the
Although the business may ask for everything to be delivered by the warehouse at once, taking a "big bang" approach may not be a prudent step. Instead, breaking the project into parts or releases, and establishing a clear set of objectives and definition of success for each release of the project will be appropriate. Each release will need to go through the complete system development lifecycle of requirements, design, build, test, implement, and support. By delivering functionality and business value with each short release, a corporation can integrate its data, create the proper roles, train its users, gain insight and assimilate lessons learned. At the same time, by using an iterative approach a corporation can adjust its data warehouse's content to add new sources, which were not previously considered, as it gains insight into its business and analytics. Perhaps most important, an iterative approach enables the project team to demonstrate a few quick wins in terms of business value delivered. Due to its iterative nature, a data warehouse is a journey not a destination. There is always more data which can be integrated. (Karakizis) As it is exhibited in Figure 4, the iteration methodology thus starts with the Initial Organization phase, identifying the corporation’s readiness for undertaking a data warehousing project. Followed by the Analysis phase; evaluation of the feasibility of the data warehouse, gathering business requirements, and getting agreement on the goals and purpose of the warehouse. Third is the Design phase; analyzing and designing the warehouse system architecture. Followed by the Development phase; creation of the actual data warehouse structure and population of the data warehouse. Fifth is the Testing phase; data cleansing and parameter clarification (may send back to the design phase for another iteration). Followed by, the Implementation phase; the rolling out the production 12
environment and providing user training. And finally the Maintenance phase; monitoring, updating and cleansing data. (Wierschem et al)
Design Development Testing
Figure 4 At this point, provisions for an easy path for user feedback should be established and review of the five steps must be done and necessary adjustments should be made. After the review of the process, the next iteration or project should be started from the assessment phase. The assessment and requirements phases should require less process time after successive iterations.
GETTING THE RIGHT TOOLS TO BUILD A DATA WAREHOUSE
As in any endeavor selecting the correct tools is paramount for success. As the old Chinese adage says,” To accomplish a goal, make sure the proper tools are selected."
Given the complexity of the data warehousing system and the cross-departmental implications of the project, it is easy to see why the proper selection of tools and personnel is very important. This section of the paper will present information on such selections. There are two steps that top management is concerned with when building a data warehouse. One step is choosing a vendor. However, in doing so there are certain basic but nonetheless critical issues that have to be evaluated.
Support :- What type of support is offered? It is industry standard for vendors to charge an annual support fee that is 15-20% of the software product license. But the question is, will any software issues be handled promptly by the vendor or not?
Professional Services :- This encompasses consulting and education. What type of consulting proposal does the vendor give? Is the personnel requirements and consulting rates reasonable? It might be wise to speak with members of the consulting team before signing on the dotted line. On the education front, what type of training is available? And how much is the consulting team willing to do knowledge transfer? Does the consulting team purposely hold off information so that either 1) you will need to send more people to vendor's education classes, or 2) you will need to hire additional consulting to make any changes to the system.
Stability :- More than anything else, this is probably the most important measure. It may even be more important than the current functionalities that the tool itself provides, for the simple reason that questions whether the vendor is going to be around for a while or not, or will it be able to make enhancements to its tool?
The other step is selecting the right team to build the data warehouse. In this there are two possibilities; one is to use external consultants and the other, to hire permanent employees. The pros of hiring external consultants are: 1. They are usually more experienced in data warehousing implementations. The fact of the matter is, even today, people with extensive data warehousing backgrounds are difficult to find. With that, when there is a need to ramp up a team quickly, the easiest route to go is to hire external consultants. The pros of hiring permanent employees are: 1. They are less expensive. With hourly rates for experienced data warehousing professionals running from $100/hr and up, and even more for Big-5 or vendor consultants, hiring permanent employees is a much more economical option. 2. They are less likely to leave. With consultants, whether they are on contract, via a Big-5 firm, or one of the tool vendor firms, they are likely to leave at a moment's notice. This makes knowledge transfer very important. Of course, the flip side is that these consultants are much easier to get rid of, too. However, management must understand that there are various entities that play important roles in a data warehouse project. They are; 1. Project Manager: This person will oversee the progress and be responsible for the success of the data warehousing project.
2. DBA: This role is responsible to keep the database running smoothly. Additional tasks for this role may be to plan and execute a backup/recovery plan, as well as performance tuning. 3. Technical Architect: This role is responsible for developing and implementing the overall technical architecture of the data warehouse, from the backend hardware/software to the client desktop configurations. 4. ETL Developer: This role is responsible for planning, developing, and deploying the extraction, transformation, and loading routine for the data warehouse. 5. Front End Developer: This person is responsible for developing the front-end, whether it be client-server or over the web. 6. OLAP (On-Line Analytical Processing) Developer: OLAP is the foundation for a range of essential business applications, including sales and marketing analysis, planning, budgeting, statutory consolidation, profitability analysis, balanced scorecard, performance measurement and data warehouse reporting. (Pendse and
The role of the OLAP Developer is thus very crucial. He/She is
responsible for the development of OLAP cubes. 7. Trainer: A significant role is the trainer. After the data warehouse is implemented, a person on the data warehouse team needs to work with the end users to get them familiar with how the front end is set up so that the end users can get the most benefit out of the data warehouse system. 8. Data Modeler: This role is responsible for taking the data structure that exists in the enterprise and model it into a schema that is suitable for OLAP analysis.
(Chuo-Han Lee, 2002)
ADVANTAGES and DISADVANTAGES OF DATA WAREHOUSING
Data warehousing has been increasingly popular in many organizations around the world. It is not with blind belief that corporations are investing millions of dollars in data warehousing projects. A successful data warehouse project will provide numerous lasting benefits to a company. Data warehousing makes retrieving information so easy that when a user query is submitted to the warehouse, the needed information is already there, with inconsistencies and differences already resolved. This makes it much easier and more efficient to run queries over data that originally came from different sources. (WHIPS) Along with its numerous ease of use benefits data warehousing provides other qualitative advantages too. It enables improved knowledge of relationships among products and services and their performances, ability to make quick and proper analysis that pave the way for better decision making can be gained from a successful data warehousing project and thus, give the company a strong competitive advantage over the competition. (Smith) The benefits of the development of a data warehouse would include: • More accurate predictions of customer demand based on the use of trends analysis. • The response improvement in direct marketing campaigns through the use of household demographics and current customer analysis. • The improvement in vendor relations and price reductions by targeting selected vendors with increased level of purchasing over the enterprise. • The significant savings from improved data quality across the enterprise. (Smith)
The ability to run complex queries easily and efficiently since query execution does not involve data translation and communication with remote sources.
Convenience for end users since they can use a single data model and query language.
Simplicity of the system design. For example, there is no need to perform query optimization over heterogeneous sources, a very difficult problem faced by other approaches.
Information at the warehouse is under the control of the warehouse users; thus it can be stored safely and reliably for as long as necessary. (WHIPS)
From the management’s point of view the benefits and rewards are abounding for a company that builds and maintains a data warehouse correctly. The corporation will make dramatic cost savings and its revenues will soar. Furthermore, there will be increase in analysis of marketing databases to cross-sell products, less storage on the mainframe and the ability to identify and keep the most profitable customers while getting a better picture of who they are, and it's easy to see why data warehousing is spreading faster. For example, the telecom industry uses data warehouses to target customers who may want certain phone services rather than doing "blanket" phone and mail campaigns and aggravating customers with unsolicited calls during dinner. Some of the soft benefits of data warehousing come in the technology's effect on users. When built and used correctly, a warehouse changes users' jobs, granting them faster access to more accurate data and allowing them to give better customer service. A company must not forget, however, that the goal for any data warehousing project is to lower operating costs and generate revenue—this is an investment, after all, and quantifiable ROI should be expected over time. (Wailgum, 2001) 18
Even though the benefits of data warehousing by far outweigh the disadvantages there are certain disadvantages of data warehousing that companies must pay heed to. The most important disadvantages are: • Expensive initial data warehouse set up. However, after the system is in place the cost should be low and cover only the maintenance and future modifications of the system. Also, there is high cost in getting data translated and copied to existing databases in time for being useful for the end user. • A data warehouse takes time to build and time should be given to the project and the difficulties in getting a data warehouse up and running and developed should not be underestimated. • The re-education of the programmers often proves to be a disadvantage as change is often resisted until familiarity is gained with the new approach. (OCS
Cost and time is also borne to develop the required new skill-set for warehouse developers and end users.
A data warehouse is complex to develop it cannot just be bought as an off-theshelf product and is designed specific for an organization needs. Choice of hardware, software and structure requires careful consideration and how they will progressively work together in the future.
The data warehouse will require management, although in comparison to the management of the current environment it will mean that overall less time is actually required in the Data Warehousing approach.
Data held in one place highlights data integrity problems and vulnerability from the public domain thus advanced security to prevent unwanted users, including 19
competitors from accessing the data base will be of critical importance for the company. (Davis et al, 1999)
Thus, we see that data warehousing does not have to be an enigma to the managers of companies. Even though it is an exhausting task a successful data warehousing project is crucial for the companies to run a successful business. Management must believe and understand that Data warehousing is of strategic importance to a company. Therefore, even if management has to invest a massive amount of capital to build a data warehouse it must do in hindsight of the myriad benefits that will crop up. Any company that doesn’t see the importance and benefits of data warehousing and is blinded by the cost and daunted by the size of the task will feel the devastating impact of the competing business that have undertaken successful data.
REFERENCES Louis Rolleigh and Joe Thomas, Data Integration: The Warehouse Foundation, White Papers, copyright 2002 Acxiom Corporation. Available at: http://www.acxiom.com/displayMain/0,1494,USA~en~383~197~0~0,00.html Jennifer Widom (1995), Research Problems in Data Warehousing, Proceedings of the 4th Int'l Conference on Information and Knowledge Management (CIKM), Nov. 1995 Available at: http://www-db.stanford.edu/pub/papers/warehouse-research.ps M. Davis, Z. Galzie, M. Silcox (1999), Data Warehousing Available at: http://www.student.city.ac.uk/~dz542/cw.html Ralph Kimball (1996), The Data Warehouse Tool Kit, Wiley. Larry Greenfield, A Definition of Data Warehousing, copyright 1995-2002 Available at: http://www.dwinfocenter.org/defined.html The Data Warehousing Concept, Available at: http://www.sas.com/rnd/warehousing/concept.html Alan Perkins, A Strategic Approach to Data Warehouse Engineering White Papers, copyright 1997-1998 Visible Systems Corporation. Available at: http://www.dmreview.com/portal_ros.cfm? NavID=92&WhitePaperID=108&PortalID=12 Spyro D. Karakizis, How to Win With Your Data Warehouse: Advice from A Data Warehouse Expert, ©1996-2002 Accenture. Available at: http://www.accenture.com David Wierschem, Ph.D. Jeremy McMillen, M.S. Randy McBroom, Ph.D., Methodology for Developing an Academic Data Warehouse, Office of Institutional Resource and Planning Texas A&M University-Commerce. Available at: http://www.panda.auburn.edu/sair/3-3000-PP.pdf Nigel Pendse and Richard Creeth, The OLAP Report Available at: http://www.olapreport.com/ Chuo-Han Lee (2002), Data Warehouse and Data Warehousing Copyright © 2001, 2002 Available at: http://www.1keydata.com/datawarehousing/datawarehouse.html WHIPS (WareHouse Information Prototype at Stanford), Data Warehousing At Stanford, Available at: http://www-db.stanford.edu/warehousing/warehouse.html Anna Marie Smith, Data Warehousing: A Short Overview, Available at: http://datawarehouse.ittoolbox.com/browse.asp? c=DWPeerPublishing&r=%2Fpub%2FAS120301a%2Epdf
Tom Wailgum (2001), What is a Data Warehouse? Available at: http://www.darwinmag.com/learn/curve/column.html? ArticleID=50 OCS Consulting, Data Warehousing Presentation To VIEWS Pharmaceutical SIG Available at: http://www.ocs-consulting.com/businessintelligence_papers_cdw.asp
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.