You are on page 1of 36

Data Warehousing & ERP - A Combination of Forces

by Anne Marie Smith, Ph.D. Published: April 1, 2002. Published in April 2002

Since the introduction of the term “data warehousing” in 1990, companies have explored the ways they can capture, store and manipulate data for analysis and decision support. At the same time, many companies have been instituting enterprise resource planning (ERP) software to coordinate the common functions of an enterprise. ERP software usually has a central database as its hub, allowing applications to share and reuse data more efficiently than previously permitted by separate applications. The use of ERP has led to an explosion in source data capture, and the existence of a central ERP database has created the opportunity to develop enterprise data warehouses for manipulating that data for analysis. This paper will provide an overview of the issues and challenges that the intersection of these two IS concepts are creating. Data warehouses are one of the foundations of the decision support systems of many IS operations. They serve as the storage facility of millions of transactions, formatted to allow analysis and comparison. As defined by the “father of data warehouse”, William H. Inmon, a data warehouse is “a collection of integrated, subject-oriented databases where each unit of data is specific to some period of time. Data Warehouses can contain detailed data, lightly summarized data and highly summarized data, all formatted for analysis and decision support” (“Building a Data Warehouse”, Inmon, W. H.; Wiley, 1996). In the “Data Warehouse Toolkit”, Ralph Kimball gives a more succinct definition: “a copy of transaction data specifically structured for query and analysis” (“The Data Warehouse Toolkit”, Kimball, R.; Wiley, 2000). Both definitions stress the data warehouse’s analysis focus, and highlight the historical nature of the data found in a data warehouse. Enterprise Resource Planning software is a recent addition to the manufacturing and information systems that have been designed to organize the flow of data from process start to finish. This flow of information has existed since the first manufacturers traded with the first merchants, but until the advent of ERP software and the processes that accompany it, this information was largely ignored and not captured. ERP software attempts to link all internal company processes into a common set of applications that share a common database. It is the common database that allows an ERP system to serve as a source for a robust data warehouse that can support sophisticated decision support and analysis.

ERP software is divided into functional areas of operation; each functional area consists of a variety of business processes. The main, common functional areas of operation in most companies would include: Marketing and Sales; Production and Operations (Materials Management, Inventory, etc.); Accounting and Finance; Human Resources. Historically, businesses have had clear divisions among each of these areas, and IS development was also clearly delineated so that systems did not share data or processes and crossfunctional analysis of information was not possible. Since all functional areas ARE interdependent, this separation was not a valid representation of a business’ activities and the divisions among the many information systems created artificial barriers that needed to be overcome. ERP software was designed to eliminate the barriers to sharing data and processes that occur when companies design and implement information systems for a single function or activity. ERP software coordinates the entire business process, and stores all the captured data in a common database, accessible to all the integrated applications of the ERP suite. As explained in “Concepts in Enterprise Resource Planning” (Brady, Monk, Wagner; Course Technology, 2001) companies can achieve many cost savings and related benefits from the use of ERP for transaction processing and management reporting through the use of the ERP’s common database and integrated management reporting tools. However, much of the work performed by managers and knowledge workers in the 21st century is not transaction or management reporting-based. The main activity of knowledge and management staff is analysis, and this analysis is supported by the development and use of decision support systems. The most common application of DSS in companies today is the data warehouse. With the use of the ERP’s common database and the implementation of DSS/DW user support products companies can design a decision support/data warehouse database that allows cross-functional area analysis and comparisons for better decision-making. Since companies usually implement an ERP in addition to their current applications, the problem of data integration from the various sources of data to the data warehouse becomes an issue. Actually, the existence of multiple data sources is a problem with ERP implementation regardless of whether a company plans to develop a data warehouse; this issue must be addressed and resolved at the ERP project initiation phase to avoid serious complications from multiple sources of data for analysis and transaction activity. In data warehouse development, data is usually targeted from the most reliable or stable source system and moved into the data warehouse database as needed. Identification of the correct source system is essential for any data warehouse development, and is even more critical in a data warehouse that includes an ERP along with more traditional transaction systems. Integration of data from multiple sources (ERP database and others) requires rigorous attention to the

metadata and business logic that populated the source data elements, so that the “right” source is chosen. Another troubling issue with ERP data is the need for historical data within the enterprise’s data warehouse. Traditionally, the enterprise data warehouse needs historical data (see Inmon’s definition). And traditionally ERP technology does not store historical data, at least not to the extent that is needed in the enterprise data warehouse. When a large amount of historical data starts to stack up in the ERP environment, the ERP environment is usually purged, or the data is archived to a remote storage facility. For example, suppose an enterprise data warehouse needs to be loaded with five years of historical data while the ERP holds at the most, six months worth of detail data. As long as the corporation is satisfied with collecting a historical set of data as time passes, then there is no problem with ERP as a source for data warehouse data. But when the enterprise data warehouse needs to go back in time and bring in historical data that has not been previously collected and saved by the ERP, then using the ERP environment as a primary source for the data warehouse is not a viable option. Metadata in the ERP is another consideration when building a data warehouse is in the ERP environment. As the metadata passes from the ERP to the data warehouse environment, the metadata must be moved and transformed into the format and structure required by the data warehouse infrastructure. There is a significant difference between operational metadata and DSS/DW metadata. Operational metadata is primarily for the developer and programmer. DSS metadata is primarily for the end user. The metadata that exists in the ERP application’s database must be converted, and such a conversion is not always easy or uncomplicated, and requires experienced data administrators and users to collaborate in the effort. Mr. Inmon suggests some guidelines for using the ERP database as a source for a data warehouse. They would include the existence of a solid interface that pulls data from the ERP environment to the data warehouse environment. The ERP to enterprise data warehouse interface needs to:
         

be easy to use enable the access of ERP data capture the meaning of the data that is being transported into the data warehouse be aware of restrictions within the ERP that might exist when it comes to the accessing of ERP data be aware of referential integrity be aware of hierarchical relationship be aware of logically defined - implicit - relationships be aware of application conventions be aware of any structures of data supported by the ERP be efficient in accessing ERP data, supporting -

In contrast to an OLTP system in which the purpose is to capture high rates of data changes and additions.  direct data movement change data capture be supportive of timely access of ERP data understand the format of data  o o (taken from “Data Warehousing and ERP”. there are many considerations in data warehouse design that differ from OLTP database design. . the purpose of a data warehouse is to organize large amounts of stable data for ease of analysis and retrieval. This is because the data warehouse data has been cleansed and verified before it is posted to the data warehouse database. Integrating ERP data into a data warehouse can lead to a superior source of data for analysis and decisionmaking if the data is formatted for query and reporting. and if the ERP environment is coordinated with the decision support needs of the organization. The data warehouse schema is almost always very different and much simpler than the schema of an OLTP system designed using entity-relation modeling. LLC. a white paper by Wm. 1999) In summary. which is rapid access to information for analysis and reporting. To ignore the wealth of data and information that is available from an ERP is to ignore a valuable corporate resource. Designing a Data Warehouse SQL Server 2000 Designing a data warehouse is very different from designing an online transaction processing (OLTP) system. Data warehouse data must be organized to meet the purpose of the data warehouse. Verification tables used in OLTP systems to validate data entry transactions are not necessary in the data warehouse database. Kiva Productions. Because of these differing purposes. the development of data warehouses and the emergence of ERP as factors in the information systems explosion must be addressed and resolved by experienced information systems professionals with a clear understanding of the challenges each environment poses. and historical data is not expected to change frequently once it is in the data warehouse. one that can serve as a foundation for a superior data warehouse. Inmon. H. Dimensional modeling is used in the design of data warehouse databases to organize the data for efficiency of queries that are intended to analyze and summarize large volumes of data.

Using Dimensional Modeling SQL Server 2000 Entity-relation modeling is often used to create a single complex model of all of the organization's processes. and transactions themselves. There are some considerations to take into account when designing the data warehouse if you are planning to use Microsoft® SQL Server™ 2000 Analysis Services for OLAP and data mining. For more information. Much of the data in a data warehouse is unchanging history and does not need repetitive backup. Data marts that adhere to central design specifications produce reports that are consistent even though the data resides in different places. data warehouses specialize in rapid retrieval of information from stable data. In contrast. and in some situations it is feasible to do these backups from the data preparation database to minimize performance impact on the data warehouse database. The . see Analysis Services Overview and OLAP and Data Warehouses. Backup of new data can be accomplished at the time of update. A central data warehouse can be developed and implemented first with data marts created later. design must be centralized so that all of the organization's data warehouse information is consistent and usable. Backup and restore strategies also differ in a data warehouse from those necessary for an OLTP system. sales information may go to one model. For example. Data Mart Design There are two approaches to creating a data warehouse system for an organization. This approach has proven effective in creating efficient online transaction processing (OLTP) systems. Restore policies for a data warehouse might also differ from those for an OLTP. and customer accounts to yet another. or data marts can be implemented such that they make up the data warehouse when their information is joined. a sales data mart must use the same product table arranged in the same way as the inventory data mart or summary information will be inconsistent between the two. and data updates consist primarily of periodic additions of new data. In either approach. OLTP systems specialize in large volumes of data update transactions. dimensional modeling creates individual models to address discrete business processes. Each model captures facts in a fact table and attributes of those facts in dimension tables linked to the fact table.Transaction locking considerations. depending on how critical it is for an organization to have uninterrupted access to data warehouse data. In contrast. For example. play very small roles in data warehouse databases. inventory to another.

each dimension table has a single-part primary key that links to one part of the multipart primary key in the fact table. Central to a star or snowflake schema. Varieties of information about a sales record might include the customer. sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization. Each fact table also includes a multipart index that contains . time and date information in a Time dimension table. A Data Warehouse Model A simple dimensional model of sales information might include a fact table named Sales_Fact that contains one record for each line item of each sale. the time and date of the sale.schemas produced by these arrangements are called star or snowflake schemas. store information in a Store dimension table. capturing the quantity sold. a fact table captures the data that measures the organization's business operations. the store where the sale occurred. Fact tables usually contain large numbers of rows. For example. the question. time) to specify the information to be summarized. Each of these categories of information is organized into its own dimension table. and product information in a Product dimension table. A key characteristic of a fact table is that it contains numerical data (facts) that can be summarized to provide information about the history of the operation of the organization. In a star schema. and the sale value. and the product sold. geography. and have been proven effective in data warehouse design. Dimensional modeling organizes information into structures that often correspond to the way analysts want to query data warehouse data. "What were the sales of food items in the northwest region in the third quarter of 1999?" represents the use of three dimensions (product. A fact table might contain business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. In most designs. Customer information is placed in a Customer dimension table. star schemas are preferable to snowflake schemas because they involve fewer joins for information retrieval and are easier to manage. In a snowflake schema. one or more dimension tables are decomposed into multiple tables with the subordinate dimension tables joined to a primary dimension table instead of to the fact table. Fact Tables SQL Server 2000 Each data warehouse or data mart includes one or more fact tables. the unit cost.

each entry represents the sale of a specific product on a specific day to a specific customer in accordance with a specific promotion at a specific store. Additive measures allow summary information to be obtained by adding various quantities of the measure. sales_fact_1998. one fact table. contains the following columns. and the quantity sold. which contain the attributes of the fact records. Currency column containing the cost to the store of the sale. Fact tables should not contain descriptive information or any data other than the numerical measurement fields and the index fields that relate the facts to corresponding entries in the dimension tables. but different summarization techniques must then be used. such as the sales of a specific item at a group of stores for a particular time period. Foreign key for dimension table store. Foreign key for dimension table time_by_day. In the FoodMart 2000 sample database provided with Microsoft® SQL Server™ 2000 Analysis Services. Nonadditive measures such as inventory quantity-on-hand values can also be used in fact tables. The most useful measures to include in a fact table are numbers that are additive. The business measurements captured are the value of the sale. Aggregation in Fact Tables . Numeric column containing the quantity sold. Currency column containing the value of the sale. the cost to the store. Foreign key for dimension table foreign keys the primary keys of related dimension tables. In this fact table. Foreign key for dimension table promotion. Column product_id time_id customer_id promotion_id store_id store_sales store_cost unit_sales Description Foreign key for dimension table product.

Another approach is to create two fact tables. detailed information is no longer directly available to the analyst. eliminates the need for such tables. Analysis Services creates OLAP cubes that contain preaggregated summaries so that queries can be answered quickly. quantity. The order identification key should be carried in the detail fact table so the two tables can be related. the other containing the order level information. One approach is to allocate the order level values to line items based on value. . Aggregating data in the fact table should only be done after considering the consequences. with the order-level values considered as attributes of the order level in the dimension hierarchy. OLAP technology. possibly in the source system that provided the data. Analysis Services creates aggregations as necessary and stores them in tables in the data warehouse database or in internal multidimensional structures. For example. If detailed information is needed. Fact table data should be maintained at the finest granularity possible. It is not necessary to create aggregation tables in the data warehouse when Analysis Services is used to provide presentation services. such as that provided by Microsoft® SQL Server™ 2000 Analysis Services. a sales order often contains several line items and may contain a discount. yet the quantities and item identification are recorded at the line item level.Aggregation is the process of calculating summary data from detail records. Aggregation Tables SQL Server 2000 Aggregation tables are tables that contain summaries of fact table information. For more information. or shipping cost that is applied to the order total instead of individual line items. or shipping weight. regardless of the level of summarization required to answer the query. see Analysis Services Overview. when data is summarized in the fact table. Summarization queries become more complex in this situation. one containing data at the line item level. Mixing aggregated and detailed data in the fact table can cause issues and complications when using the data warehouse. the detail rows that were summarized will have to be identified and located. However. It is often tempting to reduce the size of fact tables by aggregating data into summary records when the fact table is created. tax. There are two approaches that can be used in this situation. and tools such as Analysis Services often require the creation of special filters to deal with the mixture of granularity. These tables are used to improve query performance when SQL is used as the query mechanism. The order table can then be used as a dimension table to the detail table.

For example. a dimension containing product information would often contain a hierarchy that separates products into categories such as food. For example. store. such as "What was the cost of kitchen products in New York City in the third quarter of 1999?" In these examples. This is the state level of the hierarchy. a product dimension table contains information about products. Specifies the city or province in which the store is located. others are used to specify how fact table data should be summarized to provide useful information to the analyst. This is the city level of the hierarchy. Dimensional modeling produces dimension tables in which each table contains fact attributes that are independent of those in other dimensions. and time dimensions to ask the question "What was the cost of nonconsumable goods sold in the northeast region in 1999?" Subsequent queries might drill down along one or more dimensions to examine more detailed data. drink. and nonconsumable items. This is the country level of the hierarchy. Queries use attributes in dimensions to specify a view into the fact information. For example. Dimension tables contain hierarchies of attributes that aid in summarization. Specifies the state in which the store is located. a query might use the product. This is the lowest level of the hierarchy. Columns in a dimension table can be used to categorize the information into hierarchical levels. Column store_country Description Specifies the country or region in which the store is located.Dimension Tables SQL Server 2000 Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information. a dimension table for stores in the FoodMart 2000 sample database includes the following columns that specify the hierarchy levels. the dimension tables are used to specify how a measure (cost) in the fact table is to be summarized. a customer dimension table contains data about customers. with each of these categories further subdivided a number of times until the individual product SKU is reached at the lowest level. Specifies the individual store. For example. and a store dimension table contains information about stores. store_state store_city store_id .

 Keys may change or be reused in the source data systems.This field contains the primary key of the store dimension table and is used to join the dimension table to the fact table. There are several reasons for the use of surrogate keys:  Data tables in various source systems may use different keys for the same entity. Other columns not shown provide additional attribute information. but it cannot be permitted in the data warehouse where data is consolidated. Legacy systems that provide historical data might have used a different numbering system than a current online transaction processing system. It is strongly recommended that surrogate keys be created and used for primary keys for all dimension tables. Surrogate keys are keys that are maintained within the data warehouse instead of keys taken from source data systems. or they may use keys that conflict with data in the systems of other divisions. store_name Specifies the name of the store. Surrogate Keys It is important that primary keys of dimension tables remain stable. A surrogate key uniquely identifies each entity in the dimension table regardless of its source key. Varieties of Dimension Tables The preceding example illustrates a dimension table that contains a balanced hierarchy that is separated into regular levels. . Other types of dimension tables contain less balanced information. This situation may not cause problems when each division independently reports summary data. For information about how dimension tables are used in OLAP cubes built using Microsoft® SQL Server™ 2000 Analysis Services. Systems developed independently in company divisions may not use the same keys. such as part-breakdown structures or organization charts in which the hierarchy is represented by parent-child relationships instead of an array of levels. see Dimensions. A separate field can be used to contain the key used in the source system. The values in this column are used to identify the store to users in a readable form.

and the purpose of these keys is to accurately track history in the data warehouse. Surrogate keys are maintained in the data preparation area during the data transformation process. the key may still be in use in historical data in the data warehouse. To represent this organization of data. the salesperson will be represented twice in the dimension table with two different surrogate keys. the salesperson's record must exist in two places in the sales force dimension table. such as a reduced-fat version of a food item. if a salesperson is transferred from one region to another. For example. The implementation and management of surrogate keys is the responsibility of the data warehouse. This can be a common situation. and sales data for the salesperson in the person's new region after the transfer date. but some systems have been known to reuse keys belonging to obsolete data. Another example of a situation that causes this type of change is the creation of a new version of a product. A surrogate key allows the same salesperson to participate in different locations in the dimension hierarchy. These surrogate keys are used to join the salesperson's records to the sets of facts appropriate to the various locations in the hierarchy occupied by the salesperson.  Changes in organizational structures may move keys in the hierarchy. Dimensions that exhibit this type of change are called slowly changing dimensions. but may retain most of the same attributes of the original item. the company may prefer to track two things: sales data for the salesperson with the person's original region for data prior to the transfer date.This situation is usually less likely than others. The appropriate use of surrogate keys can allow the two versions of the item to be summarized together or separately. The item will receive a new SKU or Uniform Product Code (UPC). In this case. and the same key cannot be used to identify different entities. which is still manufactured and sold. However. The employee's identification number should be carried in a separate column in the table so information about the employee can be reviewed or summarized regardless of the number of times the employee's record appears in the dimension table. which is not possible if the salesperson's company employee identification number is used as the primary key for the dimension table. OLTP systems are rarely affected by these situations. .

Indexes on other columns such as those that identify levels in the hierarchical structure can also be useful in the performance of some specialized queries. Each fact record contains foreign keys that relate to primary keys in the dimension tables. as they do in any relational database. products. Every fact record must have a related record in every dimension table used with that fact table.Referential Integrity Referential integrity must be maintained between all dimension tables and the fact table. The fact table must be indexed on the composite primary key made up of the foreign keys of the dimension tables. These are the primary indexes needed for most data warehouse applications because of the simplicity of star and snowflake schemas. and geographical dimensions such as the store dimension in the example earlier in this topic. Special query and reporting requirements may indicate the need for additional indexes. One method to maintain consistency is to create dimension tables that are shared and used by all components and data marts in the data warehouse. Candidates for shared dimensions include customers. Queries can return inconsistent results if records are missing in one or more dimension tables. time. Queries that join a defective dimension table to the fact table will exclude facts whereas queries that do not join the defective dimension table will include those facts. Every dimension table must be indexed on its primary key. Indexes SQL Server 2000 Indexes play an important role in data warehouse performance. Shared Dimensions A data warehouse must provide consistent information for similar queries. Creating and Overview SQL Server 2000 Using Data Warehouses . For example. requiring that all OLAP cubes and data marts use the same shared time dimension enforces consistency of results summarized by time. Missing records in a dimension table can cause facts to be ignored when the dimension table is joined to the fact table to respond to queries or for the population of OLAP cubes.

and the tools that organize and present the data to client applications. Describes the tools and methods used to prepare data for presentation and to provide client access to the information. the database that holds the data warehouse data. In many cases a data warehouse contains the living history of the organization. using. the cleansing process. but many excellent books and training courses are available to enhance your understanding. text files. A data warehouse combines this data. other definitions consider them to be separate entities. The topics in this section discuss the elements and processes of data warehousing and identify the many powerful tools provided by Microsoft® SQL Server™ 2000 to help you in the task of creating. called data marts. Some definitions of a data warehouse include several elements such as a data preparation area.Organizations collect data in the normal course of business operations. data is often segmented into specialized components. and maintaining a data warehouse. often collected from a variety of disparate sources such as online transaction processing (OLTP) systems. that address individual components of the organization. In large data warehousing applications. and organizes it for ease and efficiency of querying. or spreadsheets. Other definitions restrict the data warehouse to the database that contains the data warehouse data. legacy systems. A complete treatment of data warehousing is not possible in this document. The purpose of a data warehouse is to consolidate and organize this data so it can be analyzed and used to support business decisions. cleanses it for accuracy and consistency. Some definitions consider data marts to be part of the data warehouse. . The topics in this section generally use the broadest definition and address individual elements as components within the context of the data warehouse. Data warehousing is an advanced and complex technology. Describes the steps in creating a data warehouse. The intended meaning of the term data warehouse is usually clear from the context in which it is used. Topic Description SQL Server 2000 Tools for Data Describes tools provided by SQL Server 2000 that are Warehouses commonly used in data warehouse applications. Data warehouses usually contain historical data. Parts of a Data Warehouse Creating a Data Warehouse Using a Data Warehouse Describes the elements that make up a data warehouse.

General descriptions of the tools commonly used in data warehouse applications are provided here with links to more detailed information about the tools themselves. construction. regardless of the applications for which the databases are used. Data Transformation Services (DTS). Data Transformation Services Data warehouse applications require the transformation of data from many sources into a cohesive. The core component of SQL Server 2000 is a powerful and full-featured relational database engine. see SQL Server 2000 Features. SQL Server Warehouses SQL Server 2000 2000 Tools for Data Microsoft® SQL Server™ 2000 provides many tools that support database applications.Maintaining a Data Warehouse Describes the processes involved in updating data in the data warehouse and modifying the presentation of information to users. see DTS Overview. DTS can access data from a wide variety of sources and transform it using built-in and custom transformation specifications. and some are specifically designed to address special needs of data warehouses. Information about the many relational database management tools is provided throughout the SQL Server 2000 documentation. and maintenance. consistent set of data configured appropriately for use in data warehouse operations. Replication . The tools listed here are commonly used in data warehouse applications. SQL Server 2000 provides a powerful tool for such tasks. Relational Database Data warehouses use relational database technology as the foundation for their design. Some of these tools are used more often than others in data warehouse applications. The uses of these tools in data warehouse applications are specifically discussed in other topics in this section. For more information. Many other tools not mentioned here can often be used to solve specific problems in data warehouse applications. SQL Server 2000 provides many tools for design and manipulation of relational databases. For more information. although most are also applicable to other database applications.

see Analysis Services Overview. Meta Data Services Many of the various tools in SQL Server 2000 store meta data in a centralized repository in the msdb system database. SQL Server 2000 Meta Data Services provides a browser for viewing this meta data and application interfaces for use in developing custom meta data applications. and the updating of data warehouse data from the data preparation area. For more information. Analysis Services Data warehouses collect and organize enterprise data to support organizational decision-making through analysis. replication can also be used in data warehouses. For more information. Parts of a Data Warehouse SQL Server 2000 There are several physical and functional elements that make up a data warehouse. For more information. see Meta Data Services Overview. and sophisticated data mining technology to analyze and discover information within the data warehouse data. English Query English Query provides access to data warehouse data using English language queries such as "Show me the sales for stores in California for 1996 through 1998. see English Query Overview. Some potential data warehouse applications of replication are the distribution of data from a central data warehouse to data marts. You can develop English Query models specific to your data warehouse to reduce sophisticated and complex SQL or MDX queries to simple English questions. For more information. SQL Server 2000 Analysis Services provides online analytical processing (OLAP) technology to organize massive amounts of data warehouse data for rapid analysis by client tools. seeReplication Overview.Database replication is a powerful tool with many uses." English Query is a development tool for creating client applications that transform English language into the syntax of SQL to query relational databases or the syntax of Multidimensional Expressions (MDX) to query OLAP cubes. Often used to distribute data and coordinate updates of distributed data in online transaction processing systems (OLTP). The topics in this section discuss these elements. Topic Description .

they must be designed as components of the master data warehouse so that data organization. Typical examples are data marts for the sales department. Data marts are sometimes designed as complete individual data warehouses and contribute to the overall organization as a member of a distributed data warehouse. each with its own data mart that contributes to the master data warehouse. which contain portions of data warehouse data for specialized purposes. Inconsistent table designs. and schemas are consistent throughout the data warehouse. a large service organization may treat regional operating centers as individual business units. For example. a data mart is a miniature data warehouse. Relational Databases Data Sources Data Preparation Area Presentation Services End-User Analysis Data Marts SQL Server 2000 In some data warehouse implementations. upper level management. in others. the inventory and shipping department. Describes the use of client applications to access and analyze information in a data warehouse. the finance department. Describes the services that organize and analyze data warehouse information and make it available to client applications. Describes the roles and uses of relational databases in data warehouses. Regardless of the functionality provided by data marts. Data marts are often used to provide information to functional segments of the organization. Data marts can also be used to segment data warehouse data to reflect a geographically compartmentalized business in which each region is relatively autonomous.Data Marts Describes data marts. and they can result in . In other designs. update mechanisms. and so on. format. data marts receive data from a master data warehouse through periodic updates. Describes the area where data extracted from data sources is prepared for use in a data warehouse. Describes various sources of organizational data typically used in data warehouses. or dimension hierarchies can prevent data from being reused throughout the data warehouse. it is just one segment of the data warehouse. in which case the data mart functionality is often limited to presentation services for clients.

For example. only a subset of the tools may be involved. Data Sources SQL Server 2000 . including filtering data appropriate to the data mart and updating the appropriate tables in the data mart. If the data mart is a local access point for data distributed from a central data warehouse. This provides consistency and usability of information throughout the organization. Distributing Data Warehouse Data to Data Marts If data warehouse data is maintained in a central data warehouse.inconsistent reports from the same data. If the data mart is created and maintained locally and participates in the organization's data warehouse as an independent contributor. or financial analysis. it is unlikely that summary reports produced from a finance department data mart that organizes the sales force by management reporting structure will agree with summary reports produced from a sales department data mart that organizes the same sales force by geographical region. Some data warehouse distribution scenarios may also use replication to coordinate and maintain data mart data. SQL Server Agent and Data Transformation Services (DTS) can be used to schedule and perform data transfers. Data marts should be designed from the perspective that they are components of the data warehouse regardless of their individual functionality or construction. It is not necessary to impose one view of data on all data marts to achieve consistency. For example. and product data does not preclude data marts from presenting information in the diverse perspectives of inventory. the data is prepared and loaded into the data warehouse at the central site and then distributed to local data marts. the use of a standard format and organization for time. its creation and maintenance will involve all the operations of a data warehouse. sales. it is usually possible to design consistent schemas and data formats that permit rich varieties of data views without sacrificing interoperability. customer. Microsoft® SQL Server™ 2000 tools used for a data mart may include any of the tools used for data warehouses depending on how the data mart is designed. DTS packages can also be created and scheduled to update OLAP cubes in the data mart after new data is received from the central data warehouse.

the customer table in the OLTP of an acquired company may contain many of the same customers and products as the acquiring company but use a different identification system. sometimes called the data staging area. Data Preparation Area SQL Server 2000 Data to be used in the data warehouse must be extracted from the data sources.Data warehouses are intended to provide information to decision makers. They may even be designed by different organizations. Data extracted from these OLTP systems must be transformed into a common representation. and made ready for loading into the data warehouse database. Data critical for business analysis may even reside on individual desktop computers in personal databases and spreadsheets. Legacy systems that have been in use for many years often contain denormalized data as well as unusual data identification designs and limited query flexibility. For example. Such data must also be captured into the data warehouse. especially in organizations that developed and grew without a central information technology group. transformed into common formats. To do so. The data preparation area. see DTS Overview and DTS Basics. is a relational database into which data is extracted from the data sources. which is often the case when organizations grow through acquisitions and mergers. Sources of data to be used in the data warehouse must be identified and techniques developed for extracting the data from them. Data Transformation Services (DTS) provides powerful tools for extracting and transforming data from diverse data sources. cleansed and formatted for consistency. These OLTP systems are seldom designed at the same time as data warehouses. The data preparation area and the data warehouse database can be combined in some data warehouse implementations as long as the cleansing and transformation operations do not interfere with the performance or operation of serving the end users of the data warehouse data. Database schemas and data element identification keys often vary from database to database. and transformed into the data warehouse schema. data warehouses must gather and consolidate data from many sources in the organization into a consistent set of data that accurately reflects the organization's business operation and history. For more information. Performing the . Organizations often have multiple online transaction processing (OLTP) systems to capture daily business operations. checked for consistency and referential integrity.

data cleansing can interfere with these operations. It will contain tables that relate source data keys to surrogate keys used in the data warehouse. Many data warehouse database operations require sophisticated queries and the processing of large amounts of data. often scheduled to minimize performance impact on the operational data source systems. Reconciliation of inconsistencies in data extracted from various sources can rarely be accomplished until the data is collected in a common database. these ongoing operations are performed on a periodic basis. Attempting to transform data in the data source systems can interfere with online transaction processing (OLTP) performance.preparation operations in source databases is rarely an option because of the diversity of data sources and the processing load that data preparation can impose on online transaction processing systems. OLAP Terminology . In most data warehouse systems. The use of a data preparation area that is separated from the data sources and the data warehouse promotes effective data warehouse management. such as Data Transformation Services (DTS) packages. The relational database used for data preparation. care should be taken to avoid introducing errors into the data warehouse data and to minimize the effect of data preparation processing on the performance of the data warehouse. If the data warehouse database is used for data preparation. The data preparation area is a relational database that serves as a general work area for the data preparation operations. It will also contain the processes and procedures. at which time data integrity errors can more easily be identified and rectified. that extract data from source data systems. regardless of where it is performed. must have powerful data manipulation and transformation capabilities such as those provided by Microsoft® SQL Server™ 2000. and many legacy systems do not have effective or easily implemented transformation capabilities. the data preparation area is used in an ongoing basis to prepare new data for updating the data warehouse. After the initial load of a data warehouse. tables of transformation data. The data preparation area should isolate raw data from the data warehouse data to preserve the integrity of the data warehouse and permit it to perform its primary function of preparing information for presentation and supporting access by clients. and many temporary tables.

and a customer dimension). The original simple sales database has a date entry for each product sold. We can choose to organize the dates into a standard calendar or a fiscal calendar. we see Quarter 3's direct descendants in the hierarchy (i. When we convert this information into a multidimensional database. the product sold. Quarter 2. and the customer that bought it." page 22. August. and customer) into dimensions (e. For example.g. The terms roll up and drill down describes actions we can perform on members of a dimension. the only measure is sales amount. split pea soup is a member in the product dimension. The individual items within a dimension are members. . July.) The members in the dimensions are stored in hierarchies.e. the salesperson records the date.e. A cube is similar to a relational table but can have more than two dimensions.. or measures.Let's take a simple sales database and convert it to a multidimensional database. and we can group Quarter 2 with the other three quarters to form a member called Year 1999. For example. Quarter 3) and the siblings of August's parent (i.. a time dimension. and September). the numeric values become cells. We can group May with April and June to form Quarter 2. dimensions and their members in a cube. A standard calendar combines all the dates in May into a time member called May. July. we see August's parent in the hierarchy (i.g. we organize all the individual dates into a hierarchy. In this example... and Quarter 4). Every time company XYZ makes a sale. (For an illustration of cube. A multidimensional database stores measures. see Bob Pfeiff's "OLAP: Resistance Is Futile. If we roll up August. the sales amount. product. When we add that date information to a cube. date.. a product dimension. For example.e. if we drill down on Quarter 3. A multidimensional database organizes the attributes of a sale (e. Quarter 1. and September are rolled up to Quarter 3. August. We can use rolled up as a synonym for aggregated to.

) and measures (sales value. and value (home furnishings. Multidimensional analysis tools organize the data in two primary ways: in multiple dimensions and in hierarchies. A cube doesn't store the original transactions.). For example -Revenue for different products within a given state or revenue for different states for a given . The cube aggregates the sales values to determine the values at higher levels in the hierarchy. in order to see what values are contained in the middle layer. For eg.. Slicing and dicing a cube allows an end-user to do the same thing with multiple dimensions. rest remains the same. When you create a cube. PIVOTING. any measure is available for any combination of members. it stores only the aggregations.. The ability to move between different combinations of dimensions when viewing data with an OLAP browser/Report Browser. given certain set of select dimension (product). some other may want that same data but on city level for a particular country. Dicing means viewing the slices from different angles. sales units. Even sales on May 23 is an aggregate because multiple transactions might have occurred on that day.In a cube. This availability means that we can ask for the sales for product A in Quarter 3 or the sales for product A on May 23. Picture slicing a three-dimensional cube of information. So in this case slice n dice provides the option of just changing the dimensions from "Country" to "City". The number of aggregations affects your MDX queries' performance. Slicing and dicing refers to the ability to combine and re-combine the dimensions to see different slices of the information. AND SLICING & DICING ANALYSIS Slicing means taking out the slice of a cube. If one User at Executive level is looking sales on Country level. you choose how many aggregations the cube will have.

Step 3: Adding depth within a single dimension: You can also add another dimension like months under quarters. Here is an example of how you can slice and dice through pivot: Step1: Starting layout. you can have the sub-totals for locations. given certain set of select dimension (customer segment). . Slicing means taking out the slice of a cube. the quarters (say four quarters) on the X-axis. month and quarters. Now you will have 30 X 12 (3 months for each quarter). The x and y axis are the dimensions and the intersection cells for any two dimension values contain the value of the measures. Dicing means viewing the slices from different angles. productions.) or KPIs (Sales Productivity).) and measures (sales revenue.. Step 2: Adding depth Cross-Dimensionally-Taking a step further.You can have product list on y axis (say 10 products). Pivot is the standard and basic look and feel of the views you create on the OLAP cubes. For example. For example -Revenue for different products within a given state OR revenue for different states for a given product. Slicing and Dicing leads to what you can call Pivot.. One form of Slicing and Dicing is called pivoting. You will not have a 30 (3 locations for each of the 10 products) X 4 (quarters) matrix. You can also specify. You will have 10 X 4 matrix. A pivot creates an ability for you to create the width and depth in your view of the data. if you want to have sub-totals for every dimension. and value (home furnishings.product. Therefore now you can have different locations (say 3 locations) for each row of product. sales units. you can add a dimension of locations under the product to give it more depth. Pivot is known in Excel context. A pivot is a two dimensional lay-out of the summary data. You can have sales value as the measure shown in the table against intersection of a given product and a quarter.

The goal of this first session is to describe the structure of OLAP data. which on Y axis will have 10 rows (for 10 products) and 3 rows (for 3 locations). with a 13X4 matrix. You need to be familiar with the look and feel of OLAP data. Session #1 – The Structure of OLAP Data From a Two Part Seminar on Getting Started With OLAP The goal of this seminar is to help you get started with OLAP. But before we can do that we have to describe the OLAP structures themselves – what are we trying to create in an OLAP database? This seminar was originally based on Analysis Services in SQL Server 2000. The main content . you can add location dimension adjacent to the product dimension. you will have a matrix. If you haven’t ever browsed an OLAP cube. In the second session we will be describing how to find the OLAP structures in a relational database. you need to experience what it’s like. For example. so you learn how to bring your data from a relational database into cubes. Therefore. Step 5: Adding Width: Referring to starting layout-You can also add dimensions in 'width' instead of 'depth'. You also need to be familiar with some OLAP concepts. It has been updated to include information from Analysis Services in SQL Server 2005. Try the cubes on this dashboard.instead of having location dimension under the product. OLAP presents data in a different way.Step 4: Pivoting on an axis: You can also pivot your view and transpose the product+ location combination on X axis and quarter + month combination on Y axis.

Many of these concepts are needed by everybody who uses OLAP. etc. Others are only important for people who are developing an OLAP system. You need to see how dimensions. Here is my Top 30 List of OLAP Concepts. try browsing a cube as you’re reading these definitions. levels.of this session is a definition/description of 30 OLAP terms. OLAP Browsing Data Mining Cube Dimension Level Drilling Up Drilling Down Drill-Through Hierarchy Member Set Member Property (Attribute) Child Members Slicing Dicing Tuple Measure (Fact) Calculated Measures . actually look to an OLAP end-user. measures. If you’re just starting out with OLAP. For each one I give a short-hand definition (which is highlighted) and a lengthier description.

OLAP is basically a spreadsheet tool – pretty powerful and flexible – but its basic purpose is to show spreadsheets. OLAP (Another Popular Definition) . OLAP is one of the worst acronyms I have ever seen – when you tell someone what the letters stand for. you still don’t have the slightest idea what the software actually looks like.A standard paper-based executive report shows two dimensions . an account displayed on each row. but nobody really uses it like that. The key for OLAP is the ability to navigate to different views of the data. An OLAP .Reporting With Something Extra . Do you see some information that interests you? With OLAP you can look at that data from a more detailed (or a more general) perspective. OLAP (Popular Definition) .a cost center displayed on each column.A Million Spreadsheets in A Box – This is what you can tell people about OLAP who have never seen it before. You don’t have to ask your technical people every time you want to see your data in a new way. with dollar figures in the middle.Cell Current Member Dimension Table Fact Table Star Schema Snowflaking MDX Unbalanced Hierarchy Ragged Hierarchy Actions Local Cube OLAP (The Acronym) – OnLine Analytical Processing – The logical meaning of OLAP would seem to include any computer application that is used to analyze data. Your OLAP tool allows you to move quickly and easily from one perspective to another.

If you have a data analysis application that doesn’t return results of new queries (almost) immediately. The value of Analysis Services is that it makes this fast query response possible.trends over time.cube adds extra dimensions . Both are methods of analyzing data.) OLAP has to be fast. we would throw the term OLAP away and replace it with something like Interactive Spreadsheeting. you don’t have OLAP. Multi-Level Data – OLAP always involves multiple dimensions. OLAP (Technical Definition) . your OLAP browsing doesn’t have much power. Data mining is a process where the computer analyzes data and then reports the significant results back to the analyst.Fast. How fast is fast enough? Less than 5 seconds is OK. with some of the summary (aggregated) views in the data calculated ahead of time. If I had my way. You can easily move among different perspectives and between a more detailed or a more general perspective. Interactive Browsing of Multidimensional. OLAP browsing is something done by a human analyst. The data is organized in multidimensional structures. As the analysts view the data. Data Mining – Computer Analysis of Data that is Not Interactive – You can use Analysis Services for both OLAP and data mining. A person browses OLAP data to find the significant information. accounts displayed for different geographical areas or for different products. but less than 1 second is a lot better. (If you don’t have levels. OLAP is interactive. The big difference is that OLAP is used interactively. which should have multiple levels. . they can ask new questions of the data and receive immediate answers. Or maybe we should replace the term “OLAP Browsing” with “Data Browsing”.

The user can start out with data mining and then analyze the significant findings with OLAP.(NOTE: There is at least one Analysis Services client tool that allows a user to switch back and forth between OLAP and data mining. The term cube implies three dimensions.) Cube – A Multidimensional Way to Look at Data – The cube is the primary OLAP structure used to view data. OLAP browsing would be kind of dull. The value of OLAP comes . Dimensions Have Levels Which Are Used for Drilling Down and Drilling Up – If you just had dimensions. An Analysis Services cube can have 128 dimensions (though it’s probably not a good idea to have that many).Do you want to see how different products have done in different time periods? Put the different products on the columns and the different time periods on the rows (or the other way around). It is analogous to a table in the relational database world. How about looking at the age of customers who are shopping at different stores? Use a different row for each age group and a separate column for each store. Insights gained from OLAP browsing can then be used for additional data mining.The Perspectives Used for Looking at the Data – Dimensions are the answer to the question “How do you want to see your data?” Here are some examples of dimensions – Product Time Store Customer Age Customer Income Employee Dimensions (Definition #2) .The Categories You Use for the Columns and Rows in the Spreadsheet . but OLAP folk have stretched the concept a little bit. Dimensions (Definition #1) .

Levels Have Members – the Labels for the Columns and Rows in the Spreadsheet – If you have a dimension called Time. February. one of the Levels might be called Month. ancestors. Note: Some OLAP tools don’t handle hierarchies very well. Lumber. you will maybe want to drill down to see if they were higher in a particular part of the month. etc. for the most part. etc. A whole variety of family terms are used to describe the relationship between members within a dimension – parent. For example. that hierarchy can be ignored. cousin. and the members of the level Month would be the members January. and when they do. to see if the data patterns are valid on a wider scale. If you have a dimension called Product. But many dimensions have only one hierarchy. you could have a level called Department. you will see the child members January. When you drill down to see the details at the next lower level. Plumbing. and the members of the level Department could be Hardware. If you notice that sales are higher in a particular month. Levels are Organized Into Hierarchies – We should really say that dimensions have hierarchies and it’s the hierarchies that have levels. or they might just appear as if they were separate dimensions. March. You also might want to drill up to a higher level. etc. But there are times when a single dimension has more than one hierarchy – such as when the levels of a Time dimension are organized by Calendar Year and by Fiscal having good levels for your dimensions. and March. February. if you drill down on Quarter 1. the members you see are called child members. Levels let you see the general view of things and the detailed view of things. . You might not see them in the one you’re using. descendants. sibling. Drilling Down to Child Members – Drilling down is what you do when you’re looking at data at a particular level and you want to see more detailed data.

All the members of a level – Time.Drill-through . A Collection of Members is Called a Set . A City level could have a member property Population. and Hire Date. An Employee level could have member properties such as Phone Number. which could be used to calculate per capita sales. If you put the children of Quarter 1 from the . Some OLAP systems also allow you to jump back to the source data. Sets can be defined in OLAP in a large number of different ways.In OLAP you can drill up and drill down to view different levels in the data. You can slice on any member from any level of the dimension. The group of members that is shown on the columns and the group of members that is shown on the rows are called sets. By a family relationship – the children of Quarter 1 – [Quarter 1]. If you slice on the member January in the Month level of the Time dimension.Jumping from OLAP Back to the Source Data . You can select data you find interesting in the cube and drill through to the source data to view extra detail. You will see one row for every combination of the members from each of the dimensions. February. you will only see data from January.Month. Dimensions Can Be Combined For Dicing – You can put more than one dimension on the rows or on the columns.Children 3. OLAP Filtering Is Called Slicing – You can use a dimension for slicing when it isn’t being displayed on the columns or rows of the spreadsheet. March} 2.Members 4. Here are a few examples 1. All the members of a dimension – Time. By listing each member – {January. Birth Date.Members Members Have Extra Information Called Member Properties (or Attributes) – You can display extra information about members of any level by creating member properties.You nearly always want to see more than one column and more than one row in a spreadsheet.

That’s true. When you cut your data with sets of members from two or more dimensions. Food). In the simplest case. you would have nine rows. The Members Defining a Row (or a Column) Are Called a Tuple – In geometry. like you’re cutting up vegetables from every direction. y). those tuples only have a single member. a point is defined by its x-y coordinates (x. To summarize members. it’s called dicing. and sets: The rows and the columns of an OLAP spreadsheet are always defined by a set of .Time dimension and the members of the Product Family level of the Product dimension on the rows. You can think of x and y as being members of the x-dimension and the y-dimension. Before I said that a group of members make a set. The first row in the previous example is defined by the tuple (January. as follows: January Food January Drink January Non-Consumable February Food February Drink February Non-Consumable March Food March Drink March Non-Consumable When you are cutting the data with one member it’s called slicing. In OLAP. sort of. each row or column is defined by the members that are used for that row. A Group of Tuples Are Called a Set – The 9 tuples in the example are also called a set. tuples. but it’s really a group of tuples that make a set.

(NOTE: You don’t really have to understand the concepts of tuples and sets when you’re first starting with OLAP. Each tuple is defined by a single member from one or more dimensions. Calculated Measures . but they’re very important when creating calculations. In a simple case. Each row or column is defined by a single tuple. you’ll have an easier time when creating calculations a little later.) The Numbers Are Called Measures (or Facts) – The numbers in the OLAP spreadsheet are called measures.Applying Formulas to Multidimensional Data . so you might as well start thinking about them.tuples. . don’t worry. the tuple has only a single member. Typical measures or facts would be: Sales Dollars Sales Count Profit Hours of Work The Numbers are Displayed in the Cells of a Cube – The spaces in the OLAP spreadsheet are called cells (like the cells in an Excel spreadsheet). When setting up OLAP cubes.Some measures are more interesting when you combine them with other measures or analyze them from the perspective of a particular dimension. But if you start hearing about them now. these values are also often called facts. If you don’t see the point of these concepts at first. With calculated measures you can view data across different time periods or calculate the percentage one particular product contributes to total sales. when only a single dimension is used for a row or column.

When the All level member is used. March. used to store the member names for the levels. Dimension Information Is Stored In Dimension Tables – The information used for OLAP dimensions is contained in relational database tables which are often called dimension tables. 3 for March. the current member is the member used in the slice. etc. which could have a value of 1 for January. the practical effect is to ignore that particular dimension in the display of the data. used to store the member property information. Dimension tables have the following types of fields: 1. For example. Level Order Key fields. the current member is the member for that dimension used in the tuple defining the column or row. February. Member Property fields. 3. used to store integer values used to order the members of the levels (if necessary). The default member is often the All level member. used to join the dimension tables to the fact table (and to other dimension tables when snowflaking). The current member for each dimension is determined as follows: 1. 2. rows. the Time dimension table could have a field called Month. If the dimension is used for slicing. That one member is called the current member for that dimension. Key fields. the Time dimension table could have a field called Month Order Key. etc. For example. If the dimension is not used for columns. which would have values such as January. 2. the current member is the default member for the dimension. 3. A Time . If the dimension is used on the columns or on the rows. 2 for February. 4.Each Cell Has One (And Only One!) Current Member From Every Dimension – Every cell in a cube is defined by one member from every dimension in the cube. or slicing. Level name fields.

but I think my preference for this terminology is a minority opinion. 2. Information for the Measures Is Stored in the Fact Table – The information used for the measures is contained in a relational database table which is called the fact table. Snowflaking the Star Schema (The Snowflake Schema) – The basic star schema has a simple structure where all the information for each dimension is contained in a single table. Every fact must be connected to one particular member from the lowest level of every one of the dimensions in the cube. the first question you’re asked is to identify the fact table. The fact table is at the center of the star. Key fields connecting the fact table to a dimension table for each of the dimensions.dimension could have a field called Day Count. Fact tables have the following types of fields: 1. Snowflaking is the process of dividing information for one dimension among two or more tables. at C:\Program Files\Microsoft Analysis Services\Samples\foodmart 2000. Measure fields. Take a look at this database to see dimension tables and fact tables. by default. The Star Schema – A Multidimensional Data Structure in a Relational Database – The combination of a fact table with its dimension tables is called a star schema. Whenever you make a new cube in the Analysis Manager.) Analysis Services comes with a sample database called FoodMart 2000. . containing the numeric values used for the measures. which is located. which would store the number of days for each month. The resulting data structure is often called a snowflake schema. This OLAP database is based on a Microsoft Access database.mdb. (I actually prefer to call this structure a star schema that has some snowflaking. while each dimension table represents a point of the star.

A Dimension Where Some Parents Have Missing Children (But Do Have Grandchildren) . To create calculated sets.Most dimensions in Analysis Services have a fixed number of levels. This is useful for hierarchical structures that are constantly changing. MDX is the Querying Language for OLAP – A special language has been developed for OLAP. To create local cube files. To create calculated members – members which are not in the data. but Analysis Services actions allow you to . 5. But what happens to countries that don't have states? A ragged hierarchy allows you to skip levels where there aren't any real values. But a parent-child dimension can display levels that have an indefinite depth.OLAP Browsing has often been a fairly passive activity. and the City level. the State level.Right click on the Sales cube from the FoodMart 2000 database in the Analysis Manager and select Edit. 3. but it has many unique features. To create actions Unbalanced Hierarchy (Parent-Child Dimension) . To retrieve data from a cube. This language looks similar to SQL.Moving Beyond OLAP Browsing To Doing Something .It's fine in the United States to show a hierarchy that has the Country level. You will see a star schema that has snowflaking in the Product dimension. 6. 4.A Dimension With Levels That Can Appear Or Disappear . MDX is used for several purposes: 1. Switch to the Schema tab. Ragged Hierarchy . To define security rules for accessing data. such as who's supervising who in a company. Actions . 2. but can be derived from the existing members and measures.

when everyone is not yet hooked up to an Analysis Server. Deciding on OLAP structures by looking at the spreadsheets that are currently being used. The OLAP 30 Those are my top 30 OLAP concepts. You can do it all with actions inside your cubes. as is the case with the FoodMart 2000 database. You could find a product that needed ordering and order it on the spot. you’re a long way toward understanding how OLAP data is structured and used. . and levels in a relational database. Local Cubes Are Files That Can Be Used For Disconnected Access to Cube Data – OLAP cubes are stored in an Analysis Server. You could find a salesperson with a high level of sales and send a congratulatory e-mail. Finding the Measures. An action could allow you to open a web page for a customer. which can be used when users do not have a connection to the Analysis Server. dimensions. Browsing a local cube file appears to the users to be the same thing as browsing a cube stored in an Analysis Server. Levels. Applications can browse these cubes by connecting to the Analysis Server. dimensions.accomplish things right from inside your cubes. There are two ways of deciding on the structures that you are going to include in your cubes: 1. It’s easy to build cubes if you already have your data organized into a star schema. If you understand those terms. If you have your data in a normal relational database. levels. and Member Properties The next step is finding measures. Dimensions. it’s not so obvious what should be used for measures. and member properties. But cube data can also be put into local cube files. I think local cubes are particularly useful in a pilot project.

Deciding on OLAP structures by looking at the elements that are available in the source data. If the current spreadsheets are being used. the information in them should probably be included in the OLAP cubes.2. Both strategies are important. But the source data may have additional information which has been ignored in existing reports. you have the opportunity to give your users more of the data than they’ve seen before. . When you’re building an OLAP system.