This action might not be possible to undo. Are you sure you want to continue?
: Data Warehouse: The Building Blocks
Review formal definition of a data warehouse Discuss the defining features Distinguish between data warehouses and data marts Study each component or building block that makes up a data warehouse Introduce metadata and highlight its significance
Information delivery system Integrate and transform enterprise data into information
± suitable for strategic decision making
Take all the historic data from the various operational systems
± Combine this internal data with any relevant data from outside sources ± Pull them together
2011-03-31 Data Warehousing 3
Set up information delivery system
Need different components or building blocks
± Arranged together in the most optimal way ± Arranged in a suitable architecture
Bill Inmon¶s Definition of DW
The father of Data Warehouse ³A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management¶s decisions.´
Sean Kelly Another leading data warehouse practitioner The data in the data warehouse is: ± ± ± ± ± ± ± 2011-03-31 Separate Available Integrated Time stamped Subject oriented Nonvolatile Accessible Data Warehousing 6 .
DEFINING FEATURES What about the nature of the data in the data warehouse? How is this data different from the data in any operational system? Why does it have to be different? How is the data content in the data warehouse used? 2011-03-31 Data Warehousing 7 .
DEFINING FEATURES Some of Key Defining Features of the Data Warehouse ± Subject-Oriented ± Integrated Data ± Time-Variant Data ± Nonvolatile Data ± Data Granularity 2011-03-31 Data Warehousing 8 .
shipments and inventory for a manufacturing company ± Figure 2-1 There is no application flavor The data in a data warehouse cut across applications 2011-03-31 Data Warehousing 9 .Subject-Oriented Data Data is stored by subjects. not by applications The subjects are critical for the enterprise ± Sales.
2011-03-31 Data Warehousing 10 .
and integration of the source data 2011-03-31 Data Warehousing 11 . ± Remove the inconsistencies ± Standardize the various data elements ± Go through a process of transformation. consolidation.Integrated Data Need to pull together all the relevant data from the various systems ± Data from internal operational systems ± Data from outside sources Before the data can be stored in a DW.
2011-03-31 Data Warehousing 12 .
Standardization Some of the items that would need standardization: ± Naming Conventions ± Codes ± Data attributes ± Measurements 2011-03-31 Data Warehousing 13 .
not just current values. A data warehouse has to contain historical data. ± The use needs data not only about the current purchase. 2011-03-31 Data Warehousing 14 .Time-Variant Data For an operational system. ± the stored data contains the current values The data in the data warehouse is meant for analysis and decision making. but on the past purchases.
Time-variant nature The time-variant nature of the data ± Allows for analysis of the past ± Relates information to the present ± Enables forecasts for the future 2011-03-31 Data Warehousing 15 .
± You do not update the data warehouse every time you process a single order. not update 2011-03-31 Data Warehousing 16 . Data from the operational systems are moved into the data warehouse at specific intervals. Figure 2-3.Nonvolatile Date The data in the data warehouse is not intended to run the day-to-day business.
Data Granularity The analysis begins at a high level and moves down to lower levels of detail ± Start by looking at summary data ± Look at the breakdown Data granularity in a data warehouse refers to the level of detail ± The lower the level of detail. the finer the data granularity ± The lowest level of detail p a lot of data in the data warehouse 2011-03-31 Data Warehousing 17 .
2011-03-31 Data Warehousing 18 .
³The single most important issue facing the IT manager this year is whether to build the data warehouse first or the data mart first. Bill Inmon stated.DATA WAREHOUSES AND DATA MARTS In 1998.´ 2011-03-31 Data Warehousing 19 .
you need to ask: ± Top-down or bottom-up approach? ± Enterprise-wide or department? ± Which first ± data warehouse or data mart? ± Build pilot or go with a full-fledged implementation? ± Dependent or independent data marts? 2011-03-31 Data Warehousing 20 .DATA WAREHOUSES AND DATA MARTS Before deciding to build a data warehouse.
2011-03-31 Data Warehousing 21 .
How are They Different? Figure 2-5 Two different basic approaches 1) Overall data warehouse feeding dependent data marts 2) Several departmental or local data marts combining into a data warehouse 2011-03-31 Data Warehousing 22 .
2011-03-31 Data Warehousing 23 .
Top-Down Approach: Advantages A truly corporate effort. central storage of data about the content Centralized rules and control May see quick results if implemented with iterations 2011-03-31 Data Warehousing 24 . an enterprise view of data Inherently architected ± not a union of disparate data marts Single.
Top-Down Approach: Disadvantages Takes longer to build even with an iterative method High exposure/risk to failure Needs high level of cross-functional skills High outlay without proof of concept 2011-03-31 Data Warehousing 25 .
Bottom-Up Approach: Advantages Faster and easier implementation of manageable pieces Favorable return on investment and proof of concept Less risk of failure Inherently incremental. can schedule important data marts first Allows project team to learn and grow 2011-03-31 Data Warehousing 26 .
Bottom-Up Approach: Disadvantages Each data mart has its own narrow view of data Permeates redundant data in every data mart Perpetuates inconsistent and irreconcilable data Proliferates unmanageable interfaces 2011-03-31 Data Warehousing 27 .
A Practical Approach by Ralph Kimball 1. Implement the data warehouse as a series of supermarts. Create a surrounding architecture for a complete warehouse 3. Plan and define requirements at the overall corporate level 2. Conform and standardize the data content 4. one at a time 2011-03-31 Supermarts are carefully architected data marts Data Warehousing 28 .
called the enterprise data warehouse 2011-03-31 Data Warehousing 29 .An Enterprise Data Warehouse A data mart is a logical subset of the complete data warehouse A data warehouse is a conformed union of all data marts Individual data marts are targeted to particular business groups The collection of all the data marts form an integrated whole.
OVERVIEW OF THE COMPONENTS Architecture is the proper arrangement of the components Build a data warehouse with software and hardware components Arrange the building blocks for maximum benefit May lay special emphasis on one component 2011-03-31 Data Warehousing 30 .
Basic Components of a typical warehouse Figure 2-6: building blocks or components ± Source Data Component ± Data Staging Component ± Data Storage Component Store and manage the data. keep track of the data by means of the metadata repository ± Information Delivery Component ± Metadata Component ± Management and Control Component 2011-03-31 Data Warehousing 31 .
2011-03-31 Data Warehousing 32 .
Source Data Component Production Data Internal Data Archived Data External Data 2011-03-31 Data Warehousing 33 .
convert. transform.Production Data Data from the various operational systems ± on different hardware platforms ± by different database systems and operating systems ± from many vertical applications No conformance of data among the various operational systems The significant and disturbing characteristic of production data is disparity ± Standardize. and integrate the disparate data 2011-03-31 Data Warehousing 34 .
customer profiles. and sometimes even departmental database Add additional complexity to the process of transforming and integrating the data ± Determine strategies for collecting data from spreadsheets ± Find ways of taking data from textual documents ± Tie into departmental databases to gather pertinent data from these sources 2011-03-31 Data Warehousing 35 . documents.Internal Data Data from users¶ ³private´ spreadsheets.
Archived Data Periodically take the old data and store it in archived files in an operational system Many different methods of archiving ± A separate archival database ± Flat files on disk storage ± Tape cartridges or microfilm and even off-site A data warehouse keeps historical snapshots of data ± Look into your archived data sets ± Useful for discerning patterns and analyzing trends 2011-03-31 Data Warehousing 36 .
data from outside sources do not conform to your formats 2011-03-31 Data Warehousing 37 .External Data Data from external sources for information that most executives use ± Statistics relating to their industry produced by external agencies ± Market share data of competitors ± Standard values of financial indicators for their business To spot industry trends and compare performance against other organizations Usually.
Data Staging Component Three major functions need to be performed for getting the data ready ± extract the data ± transform the data ± and then load the data into the data warehouse storage ETT: ± ± ± 2011-03-31 (Extraction) (Transformation) (Transportation) Data Warehousing 38 .
combine. deduplicate. convert. change.Data Staging Provide a place and an area with a set of functions to clean. and prepare source data for storage and use in the data warehouse 2011-03-31 Data Warehousing 39 .
± or a data-staging relational database. ± or a combination of both 2011-03-31 Data Warehousing 40 .Data Extraction Deal with numerous data sources Tools for data extraction ± Purchasing outside tools ± Developing in-house programs Extract the source data into ± a group of flat files.
Data Transformation Perform a number of individual tasks ± ± ± ± ± ± Clean Standardization Combine Purging and separating out Sorting and merging Assignment of surrogate keys Results: a collection of integrated data that is cleaned. standardized. and summarized 2011-03-31 Data Warehousing 41 .
Data Loading Two distinct groups of tasks ± The initial loading of the data into the data warehouse ± Refresh cycles Extract the changes to the source data Transform the data revisions And feed the incremental data revisions on an ongoing basis Figure 2-7 2011-03-31 Data Warehousing 42 .
2011-03-31 Data Warehousing 43 .
Data Storage Component A separate repository ± To keep large volume of historical data for analysis ± To keep the data in structures suitable for analysis The data warehouses are ³read-only´ data repositories ± The data is stable and it represents snapshots at specified periods The database in a data warehouse must be open ± Must be open to different tools ± RDBMSs or MDDBs 2011-03-31 Data Warehousing 44 .
internet. intranet. complex queries. statistical analysis. the business analysts. and the power users Different methods of information delivery ± Ad hoc reports.Information Delivery Component Who are the users? ± The novices. data-mining applications Information delivery mechanism ± Online. EIS feed. the casual users. e-mail 2011-03-31 Data Warehousing 45 . multidimensional analysis.
2011-03-31 Data Warehousing 46 .
but much more than a data dictionary (Later. in a separate section) 2011-03-31 Data Warehousing 47 .Metadata Component The data about the data in the data warehouse Similar to a data dictionary.
Management and Control Component Sit on top of all the other components ± Coordinate the services and activities ± Control the data transformation and the data transfer into the data warehouse storage ± Moderate the information delivery to the users ± Monitor the movements of data into the staging area and from there into the data warehouse storage The metadata is the source of information for the management module 2011-03-31 Data Warehousing 48 .
METADATA IN THE DATA WAREHOUSE The Yellow Pages ± A directory with data about the institutions Types of Metadata ± Operational Metadata ± Extraction and Transformation Metadata ± End-user Metadata 2011-03-31 Data Warehousing 49 .
Operational Metadata Contain all of next information about the operational data sources ± Data for the data warehouse comes from several operational systems ± The data elements have various field lengths and data types ± You split records. and deal with multiple coding schemes and field lengths 2011-03-31 Data Warehousing 50 . combine parts of records from different source files.
Extraction and Transformation Metadata Contain data about the extraction of data from the source systems ± the extraction frequencies ± extraction methods. ± and business rules for the data extractions 2011-03-31 Data Warehousing 51 .
End-User Metadata The navigational map ± Enable the end-users to find information ± Allow the end-users to use their own business terminology 2011-03-31 Data Warehousing 52 .
Special Significance of Metadat Act as the glue that connects all parts of the data warehouse Provide information about the contents and structures to the developers Open the door to the end-users and make the contents recognizable in their own terms 2011-03-31 Data Warehousing 53 .