This action might not be possible to undo. Are you sure you want to continue?
A data warehouse is a copy of transaction data specifically structured for querying and reporting. Data warehousing is not necessarily for the needs of "decision makers" or used in the process of decision making. Abbreviated DW, a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. A data warehouse is the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. A data warehouse might be used to find the day of the week on which a company sold the most books in May 1992, or how employee sick leave the week before the winter break differed between California and New York from 2001–2005.
The data warehouse is optimized for reporting and analysis (online analytical processing, or OLAP). Frequently data in data warehouses are heavily demoralized, summarized or stored in a dimension-based model.
The technical explanation of data warehouse is that it is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.
Recently, a set of significant new concepts and tools have evolved into a new technology that makes it possible to attack the problem of providing all the key people in the enterprise with access to whatever level of information needed for the enterprise to survive and prosper in an increasingly competitive world. The term that has come to characterize this new technology is "data warehousing." Data Warehousing has grown out of the repeated attempts on the part of various researchers and organizations to provide their organizations flexible, effective and efficient means of getting at the sets of data that have come to represent one of the organization's most critical and valuable assets. Data Warehousing is a field that has grown out of the integration of a number of different technologies and experiences over the last two decades. These experiences have allowed the IT industry to identify the key problems that have to be solved.
Data Warehouses became a distinct type of computer database during the late 1980s and early 1990s. They were developed to meet a growing demand for management information and analysis that could not be met by operational systems. Operational systems were unable to meet this need for a range of reasons:
The processing load of reporting reduced the response time of the operational systems, The database designs of operational systems were not optimized for information analysis and reporting, Most organizations had more than one operational system, so company-wide reporting could not be supported from a single system, and Development of reports in operational systems often required writing specific computer programs which was slow and expensive.
As a result, separate computer databases began to be built that were specifically designed to support management information and analysis purposes. These data warehouses were able to bring in data from a range of different data sources, such as mainframe computers, minicomputers, as well as personal computers and office automation software such as spreadsheet, and integrate this information in a single place. This capability, coupled with userfriendly reporting tools and freedom from operational impacts, has led to a growth of this type of computer system. As technology improved (lower cost for more performance) and user requirements increased (faster data load cycle times and more features), data warehouses have evolved through several fundamental stages:
Off line Operational Databases Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance. Offline Data Warehouse Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure. Real Time Data Warehouse Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.) Integrated Data Warehouse Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.
CHART SHOWING THE DEVELPOPMENTS MADE IN THE DATABASE AND STORAGE TECHNOLOGY
Database technology has evolved from primitive file processing to the development of database management systems with query and transaction processing. Further progress has led to the increasing demand for efficient and effective data analysis and data understanding tools. This need is a result of the explosive growth in data collected from applications including business and
management, government administration, science and engineering, and environmental control.
CHARACTERISTICS OF A DATA WAREHOUSE
A common way of introducing data warehousing is to refer to the characteristics of a data warehouse
• • • •
Subject Oriented Integrated Nonvolatile Time Variant
Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Integrated Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Time Variant In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant.
STORAGE OF DATA IN DATAWARE HOUSE
The goal of a data warehouse is to support decision making with data. Data mining can be used in conjunction with a data warehouse to help with certain types of decisions. Data mining can be applied to operational databases with individual transactions. To make data mining more efficient, the data warehouse should have an aggregated or summarized collection of data. Data mining helps in extracting meaningful new patterns that cannot be found necessarily by merely querying or processing data or metadata in the data warehouse. Data mining applications should therefore be strongly considered early, during the design of a data warehouse. Data mining tools should be designed to facilitate their use in conjunction with data warehouses. In fact, for very large databases
running into terabytes of data, successful use of database mining applications will depend first on the construction of a data warehouse.
The result of mining may be to discover: Association rules—e.g., whenever a customer buys video equipment, he or she also buys another electronic gadget. Sequential patterns—e.g., suppose a customer buys a camera, and within three months he or she buys photographic supplies, and within six months an accessory item. A customer who buys more than twice in the lean periods may be likely to buy at least once during Christmas period. Classification trees—e.g., customers may be classified by frequency of visits, by types of financing used, by amount of
purchase, or by likeness for types of items, and some revealing statistics may be generated for such classes.
Database technology has evolved from primitive file processing to the development of database management systems with query and transaction processing. Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses, or other information repositories. It is a young interdisciplinary field, drawing from areas such as database systems, data warehousing, statistics, machine learning, data visualization, information retrieval, and high-performance computing. Other contributing areas include neural networks, pattern recognition, spatial data analysis, image databases, signal processing, and many application fields, such as business, economics, and bioinformatics. A knowledge discovery process includes data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation. Data patterns can be mined from many different kinds of databases, such as relational databases, data warehouses, and transactional, object-relational,
and object-oriented databases. Interesting data patterns can also be extracted from other kinds of information repositories, including spatial, time-related, text) multimedia, legacy databases and the World Wide Web. A data warehouse is a repository for long-term storage of data from multiple sources, organized so as to facilitate management decision making. The data are stored under a unified schema and are typically summarized. Data warehouse systems provide some data analysis capabilities, collectively referred to as OLAP (On-Line Analytical Processing). Data mining functionalities include the discovery of concept/class descriptions, association, classification, prediction, clustering, trend analysis, deviation analysis, and similarity analysis. Characterization and discrimination are forms of data summarization. A pattern represents knowledge if it is easily understood by humans; valid on test data with some degree of certainty; and potentially useful, novel, or validates a hunch about which the user was curious. Measures of pattern interestingness, either objective or subjective, can be used to guide the discovery process. Data mining systems can be classified according to the kinds of databases mined, the kinds of knowledge mined, the techniques used, or the applications adapted. Efficient and effective data mining in large databases poses numerous requirements and great challenges to researchers and developers. The issues involved include data mining methodology, user interaction, performance and scalability, and the processing of a large variety of data types. Other issues include the exploration of data mining applications and their social impacts.
In OLTP — online transaction processing systems relational database design use the discipline of data modeling and generally follow rules of data normalization in order to ensure absolute data integrity. Less complex information is broken down into its most simple structures (a table) where all of the individual atomic level
elements relate to each other and satisfy the normalization rules. Fully normalized OLTP database designs often result in having information from a business transaction stored in dozens to hundreds of tables. Relational database managers are efficient at managing the relationships between tables and result in very fast insert/update performance because only a little bit of data is affected in each relational transaction. OLTP databases are efficient because they are typically only dealing with the information around a single transaction. In reporting and analysis, thousands to billions of transactions may need to be reassembled imposing a huge workload on the relational database. Given enough time the software can usually return the requested results, but because of the negative performance impact on the machine and all of its hosted applications, data warehousing professionals recommend that reporting databases be physically separated from the OLTP database. In addition, data warehousing suggests that data be restructured and reformatted to facilitate query and analysis by novice users. OLTP databases are designed to provide good performance by rigidly defined applications built by programmers fluent in the constraints and conventions of the technology. Add in frequent enhancements, and too many a database is just a collection of cryptic names, seemingly unrelated and obscure structures that store data using incomprehensible coding schemes. All factors that while improving performance complicate use by untrained people. Lastly, the data warehouse needs to support high volumes of data gathered over extended periods of time and are subject to complex queries and need to accommodate formats and definitions inherited from independently designed package and legacy systems. Designing the data warehouse data Architecture synergy is the realm of Data Warehouse Architects. The goal of a data warehouse is to bring data together from a variety of existing databases to support management and reporting needs. The generally accepted principle is that data should be stored at its most elemental level because this provides for the most useful and flexible basis for use in reporting and information analysis. However,
because of different focus on specific requirements, there can be alternative methods for design and implementing data warehouses. There are two leading approaches to organizing the data in a data warehouse: the dimensional approach advocated by Ralph Kimball and the normalized approach advocated by Bill Inmon. Whilst the dimension approach is very useful in data mart design, it can result in a rats nest of long term data integration and abstraction complications when used in a data warehouse. In the "dimensional" approach, transaction data is partitioned into either a measured "facts" which are generally numeric data that captures specific values or "dimensions" which contain the reference information that gives each transaction its context. As an example, a sales transaction would be broken up into facts such as the number of products ordered, and the price paid, and dimensions such as date, customer, product, geographical location and salesperson. The main advantages of a dimensional approach are that the data warehouse is easy for business staff with limited information technology experience to understand and use. Also, because the data is pre-joined into the dimensional form, the data warehouse tends to operate very quickly. The main disadvantage of the dimensional approach is that it is quite difficult to add or change later if the company changes the way in which it does business. The "normalized" approach uses database normalization. In this method, the data in the data warehouse is stored in third normal form. Tables are then grouped together by subject areas that reflect the general definition of the data (customer, product, finance, etc.) The main advantage of this approach is that it is quite straightforward to add new information into the database — the primary disadvantage of this approach is that because of the number of tables involved, it can be rather slow to produce information and reports. Furthermore, since the segregation of facts and dimensions is not explicit in this type of data model, it is difficult for users to join the required data elements into meaningful information without a precise understanding of the data structure. Subject areas are just a method of organizing information and can be defined along any lines. The traditional approach has subjects defined as the subjects or nouns within a problem space.
For example, in a financial services business, you might have customers, products and contracts. An alternative approach is to organize around the business transactions, such as customer enrollment, sales and trades.
DATAWAREHOUSE AND DATAMART
A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated. This makes it much easier and more efficient to run queries over data that originally came from different sources. Making better business decisions quickly is the key to succeeding in today's competitive marketplace. Understandably, organizations seeking to improve their decision-making can be overwhelmed by the sheer volume and complexity of data available from their varied operational and production systems. Making this data available to a wide audience of business users is one of the most significant challenges for today's businesses. In response, Persys, Inc. has chosen Microsoft SQL Server Data Warehousing Framework to build data warehouses and data marts.
Enterprise data warehouse Data mart The enterprise data warehouse contains corporate-wide information integrated from multiple operational data sources for consolidated data analysis. Typically it is composed of several subject areas such as customers, products, and sales and is used for both tactical and strategic decision making. An enterprise warehouse contains both detailed point-intime data and summarized information and can range from 50 gigabytes to more than one terabyte in total data size. Enterprise data warehouses can be very expensive and time consuming to build and manage. They are usually created by centralized IS organizations from the top down. Data marts contain a subset of corporate-wide data that is built for use by an individual department or division of an organization. Unlike the enterprise warehouse, data marts are often built from the bottom up by departmental resources for a specific decision support application or group of users. Data marts contain summarized and often detailed data about the subject area. This information in a data mart can be a subset of an enterprise warehouse (dependent data mart) or more likely come directly from the operational data sources (independent data mart).
As an example, a data warehouse may pull information from Human Resources, Project and Procurement, Program Management and other sources of data and present that data in a usable format such as reports to company's staff such as executives, managers and field staff or make that information available to public as a sales and marketing tool.
ARCHITECTURE OF DATAWARE HOUSE
The concept of "data warehousing" dates back at least to the mid-1980s, and possibly earlier. In essence, it was intended to provide an architectural model for the flow of data from operational systems to decision support environments. It attempted to address the various problems associated with this flow, and the high costs associated with it. In the absence of such architecture, there usually existed an enormous amount of redundancy in the delivery of management information. In larger corporations it was typical for multiple decision support projects to operate independently, each serving different users but often requiring much of the same data. The process of gathering, cleaning and integrating data from various sources, often legacy systems, was typically replicated for each project. Moreover, legacy systems were frequently being revisited as new requirements emerged, each requiring a subtly different view of the legacy data. Based on analogies with real-life warehouses, data warehouses were intended as large-scale collection/storage/staging areas for corporate data. From here data could be distributed to "retail stores" or "data marts" which were tailored for access by decision support users (or "consumers"). While the data warehouse was designed to manage the bulk supply of data from its suppliers (e.g. operational systems), and to handle the organization and storage of this data, the "retail stores" or "data marts" could be focused on packaging and presenting selections of the data to endusers, to meet specific management information needs.
Somewhere along the way this analogy and architectural vision was lost, as some vendors and industry speakers redefined the data warehouse as simply a management reporting database. This is a subtle but important deviation from the original vision of the data warehouse as the hub of a management information architecture, where the decision support databases were actually the data marts or "retail stores".
Figure - Data Warehouse Architecture
A Data Warehouse Architecture overall structure of data, presentation that exists for enterprise. The architecture interconnected parts:
(DWA) is a way of representing the communication, processing and end-user computing within the is made up of a number of
Operational Database / External Database Layer Information Access Layer Data Access Layer Data Directory (Metadata) Layer Process Management Layer Application Messaging Layer Data Warehouse Layer Data Staging Layer
Data Warehouse Options There are perhaps as many ways to develop data warehouses as there are organizations. Moreover, there are a number of different dimensions that need to be considered:
Scope of the data warehouse
Data redundancy Type of end-user
Figure below shows a two-dimensional grid for analyzing the basic options, with the horizontal dimension indicating the scope of the warehouse, and the vertical dimension showing the amount of redundant data that must be stored and maintained.
Figure - Data Warehouse Options
BASIC DATA WAREHOUSE ARCHITECTURE
Figure below shows a simple architecture for a data warehouse. End users directly access data derived from several source systems through the data warehouse.
Architecture of a Data Warehouse
This illustrates three things:
• • •
Data Sources (operational systems and flat files) Warehouse (metadata, summary data, and raw data) Users (analysis, reporting, and mining)
In the figure above, the metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data. Summaries are very valuable in data warehouses because they pre-compute long operations in advance. For example, a typical data warehouse query is to retrieve something like August sales. A summary in Oracle is called a materialized view. DATA WAREHOUSE ARCHITECTURE (WITH A STAGING AREA) In the figure above, you need to clean and process your operational data before putting it into the warehouse. You can do this programmatically, although most data warehouses use a staging area instead. A staging area simplifies building summaries and general warehouse management. Figure below illustrates this typical architecture.
Architecture of a Data Warehouse with a Staging Area
This illustrates four things:
• • • •
Data Sources (operational systems and flat files) Staging Area (where data sources go before the warehouse) Warehouse (metadata, summary data, and raw data) Users (analysis, reporting, and mining)
DATA WAREHOUSE ARCHITECTURE (WITH A STAGING AREA AND DATA MARTS) Although the architecture in figure above is quite common, you may want to customize your warehouse's architecture for different groups within your organization. You can do this by adding data marts, which are systems designed for a particular line of business. Figure below illustrates an example where purchasing, sales, and inventories are separated. In this example, a financial analyst might want to analyze historical data for purchases and sales.
Architecture of a Data Warehouse with a Staging Area and Data Marts
This illustrates five things:
• • • • •
Data Sources (operational systems and flat files) Staging Area (where data sources go before the warehouse) Warehouse (metadata, summary data, and raw data) Data Marts (purchasing, sales, and inventory) Users (analysis, reporting, and mining)
DATA WAREHOUSING SCHEMAS
A schema is a collection of database objects, including tables, views, indexes, and synonyms. You can arrange schema objects in the schema models designed for data warehousing in a variety of ways. Most data warehouses use a dimensional model. The model of your source data and the requirements of your users help you design the data warehouse schema. You can sometimes get the source model from your company's enterprise data model and reverse-engineer the logical data model for the data warehouse from this. The physical implementation of the logical data warehouse model may require some changes to adapt it to your system parameters--size of machine, number of users, storage capacity, type of network, and software.
STAR SCHEMAS The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables, as shown in figure below. Star Schema
The most natural way to model a data warehouse is as a star schema, only one join establishes the relationship between the fact table and any one of the dimension tables. A star schema optimizes performance by keeping queries simple and providing fast response time. All the information about each level is stored in one row. OTHER SCHEMAS Some schemas in data warehousing environments use third normal form rather than star schemas. Another schema that is sometimes useful is the snowflake schema, which is a star schema with normalized dimensions in a tree structure.
DATA WAREHOUSING OBJECTS
Fact tables and dimension tables are the two types of objects commonly used in dimensional data warehouse schemas.Fact tables are the large tables in your warehouse schema that store business measurements. Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables represent data, usually numeric
and additive, that can be analyzed and examined. Examples include sales, cost, and profit. Dimension tables, also known as lookup or reference tables, contain the relatively static data in the warehouse. Dimension tables store the information you normally use to contain queries. Dimension tables are usually textual and descriptive and you can use them as the row headers of the result set. Examples are customers or products. Example of Data Warehousing Objects and Their Relationships Figure below illustrates a common example of a sales fact table and dimension tables customers, products, promotions, times, and channels. Typical Data Warehousing Objects
This illustrates a typical star schema with some columns and relationships detailed. In it, the dimension tables are:
• • • •
times channels products, which contains prod_id customers, which contains cust_id, cust_last_name, cust_city, and cust_state_province
The fact table is sales, which contains cust_id and prod_id.
DIFFERENT ASPECTS OF DATA WAREHOUSE ARCHITECTURE
This is a list of aspects of architecture that the data warehouse decision maker will have to deal with themselves. There are many other architecture issues that affect the data warehouse, e.g., network topology, but these have to be made with all of an organization's systems in mind (and with people other than the data warehouse team being the main decision makers.) This list will not attempt to provide detailed explanations of the different types of architecture. Rather, I am presenting this list because the data warehousing literature usually muddles the subject of architecture by lumping different types of decisions together or by forgetting certain types of decisions. Also, the literature makes these decisions seem much more black and white than they are. For example, in the area of what I call reporting and staging data store architecture, much of the literature discusses only the "enterprise" data warehouse, the dependent data mart, and the independent data mart options. In reality, there are many more variations being used that cannot easily be given a snappy label.
1. Data consistency architecture The choice of what data sources, dimensions, business rules, semantics, and metrics an organization chooses to put into common usage. This is by far the hardest aspect of architecture to implement and maintain because it involves organizational politics. However, determining this architecture has more to do with determining the place of the data warehouse in your business than any other architectural decision. In my opinion, the decisions involved in determining this architecture should drive all other architectural decisions. Unfortunately, this determination of this architecture seems to often be backed into than consciously made. Reporting data store and staging data store architecture The main reasons we store data in a data warehousing systems are so they can be: 1) reported against, 2) cleaned up, and (sometimes) 3) transported to another data store where they can be reported against and/or cleaned up. Determining where we hold data to report against is what I call the reporting data store architecture. All other decisions are what I call staging data store architecture. As mentioned before, there are infinite variations of this architecture. Many writings on this aspect or architecture take
on a religious overtone. That its, rather than discussing what will make most sense for the organization implementing the data warehouse, the discussion is often one of architectural purity and beauty or of the writer's conception of rightness and wrongness. Data modeling architecture This is the choice of whether you wish to use denormalized, normalized, object-oriented, proprietary multidimensional, etc. data models. As you may guess, it makes perfect sense for an organization to use a variety of models. Tool architecture This is your choice of the tools you are going to use for reporting and for what I call infrastructure. Processing tiers architecture This is your choice of what physical platforms will do what pieces of the concurrent processing that takes place when using a data warehouse. This can range from an architecture as simple as host-based reporting to one as complicated as the "The Data Webhouse Toolkit". Security architecture If you need to restrict access down to the row or field level, you will probably have to use some other means to accomplish this other than the usual security mechanisms at your organization. Note that while security may not be technically difficult to implement, it can cause political consternation. The decisions on data consistency architecture will probably have much more influence on the return of investment in the data warehouse than any other architectural decisions. To get the most return from a data warehouse (or any other system), business practices have to change in conjunction with or as a result of the system implementation. Conscious determination of data consistency architecture is almost always a prerequisite to using a data warehouse to effect business practice change.
DATA WAREHOUSE SCOPE
The scope of a data warehouse may be as broad as all the informational data for the entire enterprise from the beginning of time, or it may be as narrow as a personal data warehouse for a single manager for a single year. There is nothing that makes one of these more of a data warehouse than another. In practice, the broader the scope, the more value the data warehouse is to the enterprise and the more expensive and time consuming it is to create and maintain. As a consequence, most organizations seem to start out with functional, departmental or divisional data warehouses and then expand them as users provide feedback I. Central Data Warehouses
Central Data Warehouses are what most people think of when they first are introduced to the concept of data warehouse. The central data warehouse is a single physical database that contains all of the data for a specific functional area, department, division, or enterprise. Central Data Warehouses are often selected where there is a common need for informational data and there are large numbers of end-users already connected to a central computer or network. A Central Data Warehouse may contain data for any specific period of time. Usually, Central Data Warehouses contain data from multiple operational systems. Central Data Warehouses are real. The data stored in the data warehouse is accessible from one place and must be loaded and maintained on a regular basis. Normally, data warehouses are built around advanced RDBMs or some form of multi-dimensional informational database server. II. Distributed Data Warehouses
Distributed Data Warehouses are just what their name implies. They are data warehouses in which the certain components of the data warehouse are distributed across a number of different physical databases. Increasingly, large organizations are pushing decision-making down to lower and lower levels of the organization and in turn pushing the data needed for decision making down (or out) to the LAN or local computer serving the local decision-maker. Distributed Data Warehouses usually involve the most redundant data and, as a consequence, most complex loading and updating processes.
ADVANTAGES OF DATA WAREHOUSING
There are many advantages to using a data warehouse, some of them are:
Enhances end-user access to a wide variety of data. Decision support system users can obtain specified trend reports, e.g. the item with the most sales in a particular area/country within the last two years. Standard enterprise view of information is shared across the entire organization. Strategic repository of information for supporting future business requirements and projects. Streamlined methodology and processes for data management. Centralized information exchange between current and future systems. Elimination of data redundancies and consolidation of data processing. Extensible solution adaptable to changes in corporate technology direction. Straightforward processing and storage of information. Highly flexible presentation of dissimilar information. Supports information sharing and collaboration requirements. Simple introduction of new data sources. Limited impact from small changes in business requirements. Reduced overhead of data storage support. Provides a repository of transaction processing system data that contains data from a longer span of time than can efficiently be held in a transaction processing system and/or to be able to generate reports "as was" as of a previous point in time.
DISADVANTAGES OF DATA WAREHOUSING
Data warehousing systems, for the most part, store historical data that have been generated in internal transaction processing systems. This is a small part of the universe of data available to manage a business. Sometimes this part has limited value. Data warehousing systems can complicate business processes significantly. If most of the business needs are to report on data in one transaction processing system and/or all the historical data you need are in that system and/or the data in the system are clean and/or your hardware can support reporting against the live system data and/or the structure of the system data is relatively simple and/or your firm does not have much interest in end user ad hoc query/report tools, data warehousing may not be for your business. Data warehousing can have a learning curve that may be too long for impatient firms. Data warehousing can become an exercise in data for the sake of the data. In certain organizations ad hoc end user query/reporting tools do not "take". Many "strategic applications" of data warehousing have a short life span and require the developers to put together a technically inelegant system quickly. Some developers are reluctant to work this way. There is a limited number of people available who have worked with the full data warehousing system project "life cycle". Data warehousing systems can require a great deal of "maintenance" which many organizations cannot or will not support.
Sometimes the cost to capture data, clean it up, and deliver it in a format and time frame that is useful for the end users is too much of a cost to bear.
REAL WORLD EXAMPLE: A DATA WAREHOUSE FOR LEVIS STRAUSS
In 1998, ArsDigita Corporation built a Web service as a front end to an experimental custom clothing factory operated by Levi Strauss. Users would visit the site to choose a style of khaki pants, enter their waist, inseam, height, weight, and shoe size, and finally check out with their credit card. The server would attempt to authorize a charge on the credit card through Cyber Cash. The factory IT system would poll server's Oracle database periodically so that it could start cutting pants within 10 minutes of a successfully authorized order. The whole purpose of the factory and Web service was to test and analyze consumer reaction to this method of buying clothing. Therefore, a data warehouse was built into the project almost from the start. The public Web site was supported by a mid-range HewlettPackard Unix server that had ample leftover capacity to run the data warehouse.
Data Warehousing is not a new phenomenon. All large organizations already have data warehouses, but they are just not managing them. Over the next few years, the growth of data warehousing is going to be enormous with new products and technologies coming out frequently. In order to get the most out of this period, it is going to be important that data warehouse planners and developers have a clear idea of what they are looking for and then choose strategies and methods that will provide them with performance today and flexibility for tomorrow.