Definition of DWH Bill Inmon,- father of Data Warehousing “A Data Warehouse is a subject oriented, integrated, nonvolatile, and time

variant collection of data in support of management’s decisions.” Sean Kelly, The data in the data warehouse is: Separate Available Integrated Time stamped Subject oriented Nonvolatile Accessible Key defining features of the data warehouse Subject-Oriented Data – data is stored by subjects, not by applications. data is stored by Business subjects , which differ from enterprise to enterprise Example: For a manufacturing company, sales, shipments, and inventory are critical business subjects. For a retail store, sales at the check-out counter is a critical subject. Distinguishes between how data is stored in operational systems and in the data warehouse Figure 2-1 distinguishes between how data is stored in operational systems and in the data warehouse. In the operational systems shown, data for each application is organized separately by application: order processing, consumer loans, customer billing, accounts receivable, claims processing, and savings accounts. For example, Claims is a critical business subject for an insurance company. Claims under automobile insurance policies are processed in the Auto Insurance application. Claims data for automobile insurance is organized in that application. Similarly, claims data for workers’ compensation insurance is organized in the Workers’ Comp Insurance application. But in the data warehouse for an insurance company, claims data are organized around the subject of claims and not by individual applications of Auto Insurance and Workers’ Comp. In a data warehouse, there is no application flavor. The data in a data warehouse cut across applications. Integrated Data data comes from several operational systems.

you keep the extracted stock status data as snapshots over time You do not update the data warehouse every time you process a single . You will find histor-ical snapshots of the operational data in the data warehouse. or quarter. These are disparate applications. month. not just current values.Source data are in different databases. Data is stored as snapshots over past and current periods. character code representations. the sales quantity in a record may relate to a specific date. consolidation. files. and field naming conventions all can be different. For example. Here are some of the items that would need standardization: _ Naming conventions _ Codes _ Data attributes _ Measurements Time-Variant Data the data in the dwh is meant for analysis and decision making. Before moving the data into the data warehouse. This is quite significant for both the design and the implementation phases. In the data warehouse. standardize the data elements and make sure of the meanings of data names in each source application. The file layouts. The time-variant nature of the data in a data warehouse _ Allows for analysis of the past _ Relates information to the present _ Enables forecasts for the future Nonvolatile Data The data in the data warehouse is not intended to run the day-to-day business. you do not look into the data warehouse to find the current stock status. you have to go through a process of transformation. outside sources . The operational order entry application is meant for that purpose. Data also comes from internal operational systems. A data warehouse. Depending on the level of the details in the data warehouse.. When you want to process the next order received from a customer. and integration of the source data. Every data structure in the data warehouse contains the time element. data from various disparate sources is stored in a data warehouse. has to contain historical data. and data segments. the quantity stored in each file record or table row relates to a specific time element. week. in a data warehouse containing units of sale. Example : Figure 2-2 illustrates a simple process of data integration for a banking institution. we have to remove the inconsistencies.

the finer the data granularity. therefore. these data movements take place twice a day. Difference between Data warehouse and Data mart. In a data warehouse. once a day. Depending on the query. you have to store a lot of data in the data warehouse. the analysis begins at a high level and moves down to lower levels of detail. once a week. Data granularity in a data warehouse refers to the level of detail. or once in two weeks Data Granularity When a user queries the data warehouse for analysis. you find it efficient to keep data summarized at different levels. The lower the level of detail. Figure 2-4 shows examples of data granularity in a typical data warehouse. Depending on the requirements of the business. Frequently. Then the user may want to look at the breakdown by states in the region. Data from the operational systems are moved into the data warehouse at specific intervals. if you want to keep data in the lowest level of detail.order. Data ware house _ Corporate/Enterprise-wide _ Union of all data marts _ Data received from staging area _ Queries on presentation resource _ Structure for corporate view of data _ Organized on E-R model Data mart _ Departmental _ A single business process _ Star-join (facts & dimensions) _ Technology optimal for data access and analysis _ Structure to suit the departmental view of data Two design methodologies for building a Data warehouse Top Down Approach Advantages enterprise view of data . The next step may be the examination of sale units by the next level of individual stores. he or she usually starts by looking at summary data. You will have to decide on the granularity levels based on the data types and the expected system performance for queries. The user may start with total sale units of a product in an entire region. you can then go to the particular level of detail and satisfy the query.

this approach could be dangerous difficult to sell this approach to senior management and sponsors.narrow view of data Permeates redundant data in every data mart Perpetuates inconsistent and irreconcilable data Proliferates unmanageable interfaces Data Fragmentation you build your departmental data marts one by one You would set a priority scheme to determine which data marts you must build first. central storage of data Centralized rules and control quick results . enterprise-wide data warehouse do not have a collection of fragmented information. .. Bottom Up Approach Advantages Faster and easier implementation of pieces return on investment Less risk of failure Inherently incremental Allows project team to learn and grow Disadvantages data mart . implemented with iterations Disadvantages Takes longer to build risk to failure Needs high level of cross-functional skills High outlay big-picture approach build big.Inherently architected Single. If you do not have experienced professionals on your team. Each independent data mart will be blind to the overall requirements of the entire organization.

Sign up to vote on this title
UsefulNot useful