This action might not be possible to undo. Are you sure you want to continue?
6/17/11 7:06 AM
THE DATA ADMINISTRATION NEWSLETTER – TDAN.com ROBERT S. SEINER – PUBLISHER
Home Current Issue Archive Special Features Featured Columns Perspectives Contribute Subscribe Newsletter TDAN.com Picks Events Resources About Contact Search TDAN.com
> home > newsletter > article
Four Ways to Build a Data Warehouse
by Wayne Eckerson Published: May 29, 2007 It has been said there are as many ways to build data warehouses as there are companies to build them. It has been said there are as many ways to build data warehouses as there are companies to build them. Each data warehouse is unique because it must adapt to the needs of business users in different functional areas, whose companies face different business conditions and competitive pressures. Nonetheless, four major approaches to building a data warehousing environment exist. These architectures are generally referred to as 1) top-down 2) bottom-up 3) hybrid, and 4) federated. Most organizations—wittingly or not—follow one or another of these approaches as a blueprint for development. Although we have been building data warehouses since the early 1990s, there is still a great deal of confusion about the similarities and differences among these architectures. This is especially true of the “top-down” and “bottom-up” approaches, which have existed the longest and occupy the polar ends of the development spectrum. As a result, some companies fail to adopt a clear vision for the way the data warehousing environment can and should evolve. Others, paralyzed by confusion or fear of deviating from prescribed tenets for success, cling too rigidly to one approach or another, undermining their ability to respond flexibly to new or unexpected situations. Ideally, organizations need to borrow concepts and tactics from each approach to create environments that uniquely meets their needs. Semantic and Substantive Differences The two most influential approaches are championed by industry heavyweights Bill Inmon and Ralph Kimball, both prolific authors and consultants in the data warehousing field. Inmon, who is credited with coining the term “data warehousing” in the early 1990s, advocates a top-down approach, in which companies first build a data warehouse followed by data marts. Kimball’s approach, on the other hand, is often called bottom-up because it starts and ends with data marts, negating the need for a physical data warehouse altogether. On the surface, there is considerable friction between top-down and bottom-up approaches. But in reality, the differences are not as stark as they may appear. Both approaches advocate building a robust enterprise architecture that adapts easily to changing business needs and delivers a single version of the truth. In some cases, the differences are more semantic than substantive in nature. For example, both approaches collect data from source systems into a single data store, from which data marts are populated. But while “top-down” subscribers call this a data warehouse, “bottom-up” adherents often call this a “staging area.” Nonetheless, significant differences exist between the two approaches (see chart.) Data warehousing professionals need to understand the substantial, subtle, and semantic differences among the approaches and which industry “gurus” or consultants advocate each approach. This will provide a clearer understanding of the different routes to achieve data warehousing success and how to translate between the advice and rhetoric of the different approaches. Top-Down Approach The top-down approach views the data warehouse as the linchpin of the entire analytic environment. The data warehouse holds atomic or transaction data that is extracted from one or more source systems and integrated within a normalized, enterprise data model. From there, the data is summarized, dimensionalized, and distributed to one or more “dependent” data marts. These data marts are “dependent” because they derive all their data from a centralized data warehouse. Sometimes, organizations supplement the data warehouse with a staging area to collect and store source system data before it can be moved and integrated within the data warehouse. A separate staging area is particularly useful if there are numerous source systems, large volumes of data, or small batch windows with which to extract data from source systems. The major benefit of a “top-down” approach is that it provides an integrated, flexible architecture to support downstream analytic data structures. First, this means the data warehouse provides a departure point for all data marts, enforcing consistency and standardization so that organizations can achieve a single version of the truth. Second, the atomic data in the warehouse lets organizations re-purpose that data in any number of ways to meet new and unexpected business Printer-friendly E-mail to friend
Page 1 of 4
decentralized organization. and may simply stream flat files from source systems to data marts using the file transfer protocol.) This initial delay may cause some groups with their own IT budgets to build their own analytic applications. few query tools can dynamically and intelligently query atomic data in one database (i. Doug Hackney—is not a methodology or http://www. Without an integration infrastructure. Each data mart builds on the next. (Of course. it may not be intuitive or seamless for end users to drill through from a data mart to a data warehouse to find the details behind the summary data in their reports. In most cases. organizations that use a bottom-up approach need to create additional data structures outside of the bottom-up architecture to accommodate data mining. Thus. reusing dimensions and facts so users can query across data marts. Bottom-Up Approach In a bottom-up approach. star schema models.com/view-articles/4770 Page 2 of 4 . and the marts before deploying their applications or reports. users can query the data warehouse if they need cross-functional or enterprise views of the data. an independent consultant who teaches at TDWI conferences. transformation. One problem with a bottom-up approach is that it requires organizations to enforce the use of standard dimensions and facts to ensure integration and deliver a single version of the truth. ODSs. a top-down approach may take longer and cost more to deploy than other approaches. This approach minimizes data redundancy and makes it easier to extend existing dimensional models to accommodate new subject areas. is currently the most vocal proponent of this approach. the bottom-up approach relies on a “dimensional bus” to ensure that data marts are logically integrated and stovepipe applications are avoided. this integration is easily done. organizations can deploy all three “tiers” within a single database. This dual modeling approach fleshes out the enterprise model without sacrificing the usability and query performance of a star schema. Another advantage of the bottom-up approach is that since the data marts contain both summary and atomic data. not support batch or transaction processing. money. The hybrid approach relies on an extraction. the goal is to deliver business value by deploying dimensional data marts as quickly as possible. This is because organizations must create a reasonably detailed enterprise data model as well as the physical infrastructure to house the staging area. backfilling a data warehouse can be a highly disruptive process that delivers no ostensible value and therefore may never be funded. instantiating the “fleshed out” version of the enterprise data model.e. After deploying the first few “dependent” data marts. However. dimensional data marts are logically stored within a single database. deliver operational reports. if desired. In addition. The hybrid approach may make it too easy for local groups to stray irrevocably from the enterprise data model. It also delivers value rapidly because it doesn’t lay down a heavy infrastructure up front. The use of a staging area also eliminates redundant extracts and overhead required to move source data into the dimensional data marts.) Users may be confused when to query which database. when executives start asking for reports that cross data mart boundaries. The first several data marts are also designed in third normal form but deployed using star schema physical models. In addition. preferring to focus an organization’s effort on developing dimensional designs that meet end-user requirements. The “bottom-up” staging area is non-persistent. these data marts contain all the data—both atomic and summary—that users may want or need. the data warehouse) and summary data in another database (i. each new data mart is integrated with others within a logical enterprise dimensional model. and operational reporting requirements. develop aggregates. the federated approach—as defined by its most vocal proponent. Although ETL tools have matured considerably. It develops an enterprise data model iteratively and only develops a heavyweight infrastructure once it’s really needed (e. it may be too much to ask departments and business units to adhere and reuse references and rules for calculating facts. The major benefit of a hybrid approach is that it combines rapid development techniques within an enterprise architecture framework. However. To integrate data marts logically. Hybrid Approach The hybrid approach tries to blend the best of both “top-down” and “bottom-up” approaches. they can never enforce adherence to architecture. This lets local groups. or support operational data stores (ODS) and analytic applications. a data warehouse can be used to create rich data sets for statisticians. now or in the future. flexible data structures using dimensional. and processing resources. On the downside. organizations use “conformed” dimensions and facts when building new data marts. depending on the size of an implementation. The “bottom-up” approach consciously tries to minimize back-office operations. and load (ETL) tool to store and manage the enterprise and local models in the data marts as well as synchronize the differences between them. this may be achieved simply by pulling a subset of data from a data mart at night when users are not active on the system. develop their own definitions or rules for data elements that are derived from the enterprise model without sacrificing long-term integration. Data is modeled in a star schema design to optimize usability and query performance. and orchestrate the transition to a data warehousing infrastructure. to obtain a single version of the truth as well as both summary and atomic data. The major benefit of a bottom-up approach is that it focuses on creating user-friendly. Pieter Mimno. the data marts. Unlike the top-down approach. But in a distributed.g. especially in the initial increments. load detail data. users do not have to “drill through” from a data mart to another structure to obtain detailed or transaction data. Organizations typically backfill a data warehouse once business users request views of atomic data across multiple data marts. Federated Approach The federated approach is sometimes confused with the hybrid approach above or “hub-and-spoke” data warehousing architectures that are a reflection of a top-down approach. This approach also relies heavily on an ETL tool to synchronize meta data between enterprise and local versions. saving the organization time. When data marts are logically arrayed within a single physical database.tdan.) However. Also.Four Ways to Build a Data Warehouse 6/17/11 7:06 AM needs. Pros/Cons. dimensional marts are designed to optimize queries. data warehouse.e. There can be a tendency for organizations to create “independent” or non-integrated data marts. Organizations also use the ETL tool to extract and load data from source systems into the dimensional data marts at both the atomic and summary levels. an organization then backfills a data warehouse behind the data marts. Thus. Moreover. The organization then transfers atomic data from the data marts to the data warehouse and consolidates redundant data feeds. It attempts to capitalize on the speed and user-orientation of the “bottom-up” approach without sacrificing the integration enforced by a data warehouse in a “top down” approach. Most ETL tools today can create summary tables on the fly. for example. The hybrid approach recommends spending about two weeks developing an enterprise model in third normal form before developing the first data mart. For example.
integrating meta data is a pernicious problem in a heterogeneous. and comprise the building blocks for methodologies developed by practicing consultants. Cons · Upfront modeling and platform deployment mean the first increments take longer to deploy and cost more. minimizes the possibility of renegade “independent” data marts. Top-Down Bottom-Up Hybrid Federated Major Characteristics · Emphasizes the DW. · Data marts contain both atomic and summary data. · Avoids creation of renegade “independent” data marts. however possible. · Starts by designing a dimensional model for a data mart. This may mean. and operational reports. Also. · Data marts can provide both enterprise and function. or analytic applications. rules. · Instantiates enterprise model and architecture only when needed and once data marts deliver real value. Hackney says the federated approach is “an architecture of architectures. Summary The four approaches described here represent the dominant strains of data warehousing methodologies. · Starts by designing an enterprise model for a DW. · Supports other analytical structures in an architected environment. · Once built. The major problem with the federated approach is that it is not well documented.state or architecture in mind. Another potential problem is that without a specific architecture in mind.up” methods. it may http://www.specific dimensional model. making it harder to deliver an enterprise view in the end.” He says it provides the “maximum amount of architecture possible in a given political and implementation reality. · Data warehouse eliminates redundant extracts. logically or physically deployed.level. and packaged applications that companies have already deployed and will continue to implement in spite of the IT group’s best effort to enforce standards and adhere to a specific architecture. · Focuses on creating userfriendly. facts. it’s a salve for the soul of the stressed out data warehousing project manager who must sacrifice architectural purity to meet the immediate (and ever-changing) needs of his business users. · Populates marts with atomic and summary data via a non. · The DW is enterprise. · Creates new views by extending existing stars or building new ones within the same logical model. · Staging area eliminates redundant extracts. · Emphasizes DW and data marts. a federated approach can perpetuate the continued decentralization and fragmentation of analytical resources. · Uses a “flat” architecture consisting of a staging area and data marts. Pros · Enforces a flexible. data marts. ODSs. · Provides pragmatic way to share data and resources. · Emphasizes the need to integrate new and existing heterogeneous BI environments.” It recommends how to integrate a multiplicity of heterogeneous data warehouses.persistent staging area. · Deploys multi. · Provides a rationale for “band aid” approaches that solve real business problems. In short. flexible data structures. fleshes out model with initial marts. dimensions. · Few query tools can easily join data across multiple. and moves atomic data to the DW. · The DW has atomic. · The approach is not fully articulated. for example. data marts use a subject. · No drill. · Models marts as one or more star schemas. blends “top. Since each organization must respond to unique needs and business conditions.Four Ways to Build a Data Warehouse 6/17/11 7:06 AM architecture per se. however possible.specific views. · Keeps detailed data in normalized form so it can be flexibly re.” The approach merely encourages organizations to share the “highest value” metrics. ever-changing environment.tdan. · Spends 2–3 weeks creating a high. · Data marts are deployed incrementally and “integrated” using conformed dimensions. · Backfilling a DW is disruptive. and “dependent” data marts. and data wherever possible. data marts are function. · An architecture of architectures. Hackney concedes that a federated architecture will never win awards for elegance or be drawn up on clean white boards as an “optimal solution. · Emphasizes data marts. · The DW uses an enterprisebased normalized model. · Users can query the data warehouse and data marts. · Encourages organizations to share dimensions. having a foundation of best practice models to start with augurs a successful outcome.specific. · The staging area is largely non.through required since atomic data is always stored in the data marts. · The staging area is persistent. · Rationalizes the use of whatever means possible to implement or integrate analytical resources to meet changing needs or business conditions. a DW.oriented. A federated approach rationalizes the use of whatever means possible to integrate analytical resources to meet changing needs or business conditions. instantiates the “fleshed out” enterprise model. enterprise architecture. including data mining sets. · Provides rapid development within an enterprise architecture framework. These methodologies have shaped the debate about data warehousing best practices. but a concession to the natural forces that undermine the best laid plans for deploying a perfect system.persistent. definitions. creating a common staging area to eliminate redundant data feeds or building a data warehouse that sources data from multiple data marts. There are only a few columns written on the subject.level data. · Requires groups throughout an · Requires organizations to enforce standard use of entities and rules. But perhaps this is enough. and measures wherever possible. · A data mart consists of a single star schema. data marts have summary data. normalized. data warehouses. · With no predefined end. · Backfills a DW behind the marts when users want views at atomic level across marts. enterprise model. physically distinct marts.com/view-articles/4770 Page 3 of 4 . · Backfilled DW eliminates redundant extracts. Data warehousing managers need to be aware of these methodologies but not wedded to them. · Acknowledges the reality of change in organizations and systems that make it difficult to implement a formalized architecture.tier architecture comprised of a staging area. · Alleviates the guilt and stress data warehousing managers might experience by not adhering to formalized architectures. organizations need to understand the strengths and limitations of each methodology and then pursue their own way through the data warehousing thicket. · Uses ETL tool to populate data marts and exchange meta data between ETL tool and data marts.down” and “bottom. Ultimately. · Synchronizes meta data and database models between enterprise and local definitions. · Starts by designing enterprise and local models synchronously. as it doesn’t prescribe a specific end-state or approach. · Minimizes “back office” operations and redundant data structures to accelerate deployment and reduce cost.purposed to meet new and unexpected needs.
He has conducted numerous in-depth research studies and is a noted speaker and blogger.tdan. · It might encourage rather than reign in independent development and perpetuate the disintegration of standards and controls. He is the author of the best-selling book Performance Dashboards: Measuring. and Managing Your Business . Wayne is currently director of research at TechTarget and president of BI Leader Consulting. he served as director of education and research at The Data Warehousing Institute (TDWI) where he chaired its BI Executive Summit and created a popular BI Maturity Model and Assessment.com Contact Publisher | Comments and Contributions Welcome | Advertising | Disclaimer TDAN.www. which provides advisory services to user and vendor organizations. · Few query tools can dynamically query atomic and summary data in different databases.authors Ralph Kimball and co. and application rewrites. · Difficult to drill through from summary data in marts to detail data in DW.com. give way to unfettered chaos. funding. Major Proponents Bill Inmon and co. requiring corporate commitment. organization to consistently use dimensions and facts to ensure a consolidated view.authors Many practitioners Doug Hackney Go to Current Issue | Go to Issue Archive Recent articles by Wayne Eckerson Are You Stuck In BI Adolescence? The Business Intelligence Evangelist Wayne Eckerson .com is an affiliate of the BeyeNETWORK™ http://www. Monitoring. The Data Administration Newsletter. · Might need to store detail data in data marts anyway. Quality Content for Data Management Professionals Since 1997 © Copyright 1997-2011.TDAN. · Not designed to support operational data stores or operational reporting data structures or processes. He can be reached at weckerson@techtarget. LLC -. For many years.Four Ways to Build a Data Warehouse 6/17/11 7:06 AM · Requires building and managing multiple data stores and platforms.com/view-articles/4770 Page 4 of 4 .Wayne Eckerson has been a thought leader and consultant in the business intelligence (BI) field since 1995.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.