2

: Data Warehouse: The Building Blocks

2011-03-31

Data Warehousing

1

Chapter Objectives
‡ Review formal definition of a data warehouse ‡ Discuss the defining features ‡ Distinguish between data warehouses and data marts ‡ Study each component or building block that makes up a data warehouse ‡ Introduce metadata and highlight its significance

2011-03-31

Data Warehousing

2

Data Warehouse
‡ Information delivery system ‡ Integrate and transform enterprise data into information
± suitable for strategic decision making

‡ Take all the historic data from the various operational systems
± Combine this internal data with any relevant data from outside sources ± Pull them together
2011-03-31 Data Warehousing 3

Set up information delivery system
‡ Need different components or building blocks
± Arranged together in the most optimal way ± Arranged in a suitable architecture

2011-03-31

Data Warehousing

4

Bill Inmon¶s Definition of DW
‡ The father of Data Warehouse ‡ ³A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management¶s decisions.´

2011-03-31

Data Warehousing

5

Sean Kelly ‡ Another leading data warehouse practitioner ‡ The data in the data warehouse is: ± ± ± ± ± ± ± 2011-03-31 Separate Available Integrated Time stamped Subject oriented Nonvolatile Accessible Data Warehousing 6 .

DEFINING FEATURES ‡ What about the nature of the data in the data warehouse? ‡ How is this data different from the data in any operational system? ‡ Why does it have to be different? ‡ How is the data content in the data warehouse used? 2011-03-31 Data Warehousing 7 .

DEFINING FEATURES ‡ Some of Key Defining Features of the Data Warehouse ± Subject-Oriented ± Integrated Data ± Time-Variant Data ± Nonvolatile Data ± Data Granularity 2011-03-31 Data Warehousing 8 .

Subject-Oriented Data ‡ Data is stored by subjects. not by applications ‡ The subjects are critical for the enterprise ± Sales. shipments and inventory for a manufacturing company ± Figure 2-1 ‡ There is no application flavor ‡ The data in a data warehouse cut across applications 2011-03-31 Data Warehousing 9 .

2011-03-31 Data Warehousing 10 .

consolidation.Integrated Data ‡ Need to pull together all the relevant data from the various systems ± Data from internal operational systems ± Data from outside sources ‡ Before the data can be stored in a DW. and integration of the source data 2011-03-31 Data Warehousing 11 . ± Remove the inconsistencies ± Standardize the various data elements ± Go through a process of transformation.

2011-03-31 Data Warehousing 12 .

Standardization ‡ Some of the items that would need standardization: ± Naming Conventions ± Codes ± Data attributes ± Measurements 2011-03-31 Data Warehousing 13 .

2011-03-31 Data Warehousing 14 . not just current values. ± the stored data contains the current values ‡ The data in the data warehouse is meant for analysis and decision making. but on the past purchases. ‡ A data warehouse has to contain historical data. ± The use needs data not only about the current purchase.Time-Variant Data ‡ For an operational system.

Time-variant nature ‡ The time-variant nature of the data ± Allows for analysis of the past ± Relates information to the present ± Enables forecasts for the future 2011-03-31 Data Warehousing 15 .

not update 2011-03-31 Data Warehousing 16 .Nonvolatile Date ‡ The data in the data warehouse is not intended to run the day-to-day business. ‡ Figure 2-3. ± You do not update the data warehouse every time you process a single order. ‡ Data from the operational systems are moved into the data warehouse at specific intervals.

Data Granularity ‡ The analysis begins at a high level and moves down to lower levels of detail ± Start by looking at summary data ± Look at the breakdown ‡ Data granularity in a data warehouse refers to the level of detail ± The lower the level of detail. the finer the data granularity ± The lowest level of detail p a lot of data in the data warehouse 2011-03-31 Data Warehousing 17 .

2011-03-31 Data Warehousing 18 .

´ 2011-03-31 Data Warehousing 19 .DATA WAREHOUSES AND DATA MARTS ‡ In 1998. Bill Inmon stated. ³The single most important issue facing the IT manager this year is whether to build the data warehouse first or the data mart first.

you need to ask: ± Top-down or bottom-up approach? ± Enterprise-wide or department? ± Which first ± data warehouse or data mart? ± Build pilot or go with a full-fledged implementation? ± Dependent or independent data marts? 2011-03-31 Data Warehousing 20 .DATA WAREHOUSES AND DATA MARTS ‡ Before deciding to build a data warehouse.

2011-03-31 Data Warehousing 21 .

How are They Different? ‡ ‡ Figure 2-5 Two different basic approaches 1) Overall data warehouse feeding dependent data marts 2) Several departmental or local data marts combining into a data warehouse 2011-03-31 Data Warehousing 22 .

2011-03-31 Data Warehousing 23 .

central storage of data about the content ‡ Centralized rules and control ‡ May see quick results if implemented with iterations 2011-03-31 Data Warehousing 24 . an enterprise view of data ‡ Inherently architected ± not a union of disparate data marts ‡ Single.Top-Down Approach: Advantages ‡ A truly corporate effort.

Top-Down Approach: Disadvantages ‡ Takes longer to build even with an iterative method ‡ High exposure/risk to failure ‡ Needs high level of cross-functional skills ‡ High outlay without proof of concept 2011-03-31 Data Warehousing 25 .

Bottom-Up Approach: Advantages ‡ Faster and easier implementation of manageable pieces ‡ Favorable return on investment and proof of concept ‡ Less risk of failure ‡ Inherently incremental. can schedule important data marts first ‡ Allows project team to learn and grow 2011-03-31 Data Warehousing 26 .

Bottom-Up Approach: Disadvantages ‡ Each data mart has its own narrow view of data ‡ Permeates redundant data in every data mart ‡ Perpetuates inconsistent and irreconcilable data ‡ Proliferates unmanageable interfaces 2011-03-31 Data Warehousing 27 .

one at a time ‡ 2011-03-31 Supermarts are carefully architected data marts Data Warehousing 28 . Conform and standardize the data content 4. Create a surrounding architecture for a complete warehouse 3. Plan and define requirements at the overall corporate level 2.A Practical Approach by Ralph Kimball 1. Implement the data warehouse as a series of supermarts.

An Enterprise Data Warehouse ‡ A data mart is a logical subset of the complete data warehouse ‡ A data warehouse is a conformed union of all data marts ‡ Individual data marts are targeted to particular business groups ‡ The collection of all the data marts form an integrated whole. called the enterprise data warehouse 2011-03-31 Data Warehousing 29 .

OVERVIEW OF THE COMPONENTS ‡ Architecture is the proper arrangement of the components ‡ Build a data warehouse with software and hardware components ‡ Arrange the building blocks for maximum benefit ‡ May lay special emphasis on one component 2011-03-31 Data Warehousing 30 .

keep track of the data by means of the metadata repository ± Information Delivery Component ± Metadata Component ± Management and Control Component 2011-03-31 Data Warehousing 31 .Basic Components of a typical warehouse ‡ Figure 2-6: building blocks or components ± Source Data Component ± Data Staging Component ± Data Storage Component ‡ Store and manage the data.

2011-03-31 Data Warehousing 32 .

Source Data Component ‡ ‡ ‡ ‡ Production Data Internal Data Archived Data External Data 2011-03-31 Data Warehousing 33 .

transform. convert.Production Data ‡ Data from the various operational systems ± on different hardware platforms ± by different database systems and operating systems ± from many vertical applications ‡ No conformance of data among the various operational systems ‡ The significant and disturbing characteristic of production data is disparity ± Standardize. and integrate the disparate data 2011-03-31 Data Warehousing 34 .

documents. and sometimes even departmental database ‡ Add additional complexity to the process of transforming and integrating the data ± Determine strategies for collecting data from spreadsheets ± Find ways of taking data from textual documents ± Tie into departmental databases to gather pertinent data from these sources 2011-03-31 Data Warehousing 35 . customer profiles.Internal Data ‡ Data from users¶ ³private´ spreadsheets.

Archived Data ‡ Periodically take the old data and store it in archived files in an operational system ‡ Many different methods of archiving ± A separate archival database ± Flat files on disk storage ± Tape cartridges or microfilm and even off-site ‡ A data warehouse keeps historical snapshots of data ± Look into your archived data sets ± Useful for discerning patterns and analyzing trends 2011-03-31 Data Warehousing 36 .

data from outside sources do not conform to your formats 2011-03-31 Data Warehousing 37 .External Data ‡ Data from external sources for information that most executives use ± Statistics relating to their industry produced by external agencies ± Market share data of competitors ± Standard values of financial indicators for their business ‡ To spot industry trends and compare performance against other organizations ‡ Usually.

Data Staging Component ‡ Three major functions need to be performed for getting the data ready ± extract the data ± transform the data ± and then load the data into the data warehouse storage ‡ ETT: ± ± ± 2011-03-31 (Extraction) (Transformation) (Transportation) Data Warehousing 38 .

deduplicate. change. and prepare source data for storage and use in the data warehouse 2011-03-31 Data Warehousing 39 .Data Staging ‡ Provide a place and an area with a set of functions to clean. combine. convert.

± or a combination of both 2011-03-31 Data Warehousing 40 . ± or a data-staging relational database.Data Extraction ‡ Deal with numerous data sources ‡ Tools for data extraction ± Purchasing outside tools ± Developing in-house programs ‡ Extract the source data into ± a group of flat files.

and summarized 2011-03-31 Data Warehousing 41 .Data Transformation ‡ Perform a number of individual tasks ± ± ± ± ± ± Clean Standardization Combine Purging and separating out Sorting and merging Assignment of surrogate keys ‡ Results: a collection of integrated data that is cleaned. standardized.

Data Loading ‡ Two distinct groups of tasks ± The initial loading of the data into the data warehouse ± Refresh cycles ‡ Extract the changes to the source data ‡ Transform the data revisions ‡ And feed the incremental data revisions on an ongoing basis ‡ Figure 2-7 2011-03-31 Data Warehousing 42 .

2011-03-31 Data Warehousing 43 .

Data Storage Component ‡ A separate repository ± To keep large volume of historical data for analysis ± To keep the data in structures suitable for analysis ‡ The data warehouses are ³read-only´ data repositories ± The data is stable and it represents snapshots at specified periods ‡ The database in a data warehouse must be open ± Must be open to different tools ± RDBMSs or MDDBs 2011-03-31 Data Warehousing 44 .

the business analysts. statistical analysis. e-mail 2011-03-31 Data Warehousing 45 . internet. EIS feed. the casual users. data-mining applications ‡ Information delivery mechanism ± Online. multidimensional analysis. intranet. complex queries. and the power users ‡ Different methods of information delivery ± Ad hoc reports.Information Delivery Component ‡ Who are the users? ± The novices.

2011-03-31 Data Warehousing 46 .

Metadata Component ‡ The data about the data in the data warehouse ‡ Similar to a data dictionary. but much more than a data dictionary ‡ (Later. in a separate section) 2011-03-31 Data Warehousing 47 .

Management and Control Component ‡ Sit on top of all the other components ± Coordinate the services and activities ± Control the data transformation and the data transfer into the data warehouse storage ± Moderate the information delivery to the users ± Monitor the movements of data into the staging area and from there into the data warehouse storage ‡ The metadata is the source of information for the management module 2011-03-31 Data Warehousing 48 .

METADATA IN THE DATA WAREHOUSE ‡ The Yellow Pages ± A directory with data about the institutions ‡ Types of Metadata ± Operational Metadata ± Extraction and Transformation Metadata ± End-user Metadata 2011-03-31 Data Warehousing 49 .

Operational Metadata ‡ Contain all of next information about the operational data sources ± Data for the data warehouse comes from several operational systems ± The data elements have various field lengths and data types ± You split records. combine parts of records from different source files. and deal with multiple coding schemes and field lengths 2011-03-31 Data Warehousing 50 .

Extraction and Transformation Metadata ‡ Contain data about the extraction of data from the source systems ± the extraction frequencies ± extraction methods. ± and business rules for the data extractions 2011-03-31 Data Warehousing 51 .

End-User Metadata ‡ The navigational map ± Enable the end-users to find information ± Allow the end-users to use their own business terminology 2011-03-31 Data Warehousing 52 .

Special Significance of Metadat ‡ Act as the glue that connects all parts of the data warehouse ‡ Provide information about the contents and structures to the developers ‡ Open the door to the end-users and make the contents recognizable in their own terms 2011-03-31 Data Warehousing 53 .

Sign up to vote on this title
UsefulNot useful