You are on page 1of 6

SS ZG515: Data Warehousing

DATA WAREHOUSE COMPONENTS & ARCHITECTURE Lecture Note # 02 The data in a data warehouse comes from operational systems of the organization as well as from other external sources. These are collectively referred to as source systems. The data extracted from source systems is stored in a area called data staging area, where the data is cleaned, transformed, combined, deduplicated to prepare the data for us in the data warehouse. The data staging area is generally a collection of machines where simple activities like sorting and sequential processing takes place. The data staging area does not provide any query or presentation services. As soon as a system provides query or presentation services, it is categorized as a presentation server. A presentation server is the target machine on which the data is loaded from the data staging area organized and stored for direct querying by end users, report writers and other applications. The three different kinds of systems that are required for a data warehouse are: . !ource !ystems ". #ata !taging Area $. %resentation servers The data travels from source systems to presentation servers via the data staging area. The entire process is popularly known as &T' (extract, transform, and load) or &TT (extract, transform, and transfer). *racle+s &T' tool is called *racle ,arehouse -uilder (*,-) and .! !/' !erver+s &T' tool is called #ata Transformation !ervices (#T!). A typical architecture of a data warehouse is shown below:

LOAD MANAGER

QUERY MANAGER

META DATA

OPERATIONAL SOURCE

HIGHLY SUMMERIZED DATA

LIGHTLY SUMMERIZED DATA

END USER ACCESS TOOLS

DETAILED DATA

WAREHOUSE MANAGER

ARCHIVE / BACK UP

Dr. Navneet Goyal, !"S, #ilani

#age 1 o$ %

SS ZG515: Data Warehousing

Each component and the tasks performed by them are explained below: 1. OPERATIONAL DATA The sources of data for the data warehouse is supplied from: (i) The data from the mainframe systems in the traditional network and hierarchical format. (ii) Data can also come from the relational DB ! like "racle# $nformix. (iii) $n addition to these internal data# operational data also includes external data obtained from commercial databases and databases associated with supplier and customers. 2. LOAD MANAGER The load mana%er performs all the operations associated with extraction and loadin% data into the data warehouse. These operations include simple transformations of the data to prepare the data for entry into the warehouse. The si&e and complexity of this component will 'ary between data warehouses and may be constructed usin% a combination of 'endor data loadin% tools and custom built pro%rams. 3. WAREHOUSE MANAGER

Dr. Navneet Goyal, !"S, #ilani

#age 2 o$ %

SS ZG515: Data Warehousing

The warehouse mana%er performs all the operations associated with the mana%ement of data in the warehouse. This component is built usin% 'endor data mana%ement tools and custom built pro%rams. The operations performed by warehouse mana%er include: (i) Analysis of data to ensure consistency (ii) Transformation and mer%in% the source data from temporary stora%e into data warehouse tables (iii) (reate indexes and 'iews on the base table. (i') Denormali&ation (') )eneration of a%%re%ation ('i) Backin% up and archi'in% of data $n certain situations# the warehouse mana%er also %enerates *uery profiles to determine which indexes ands a%%re%ations are appropriate. 4. QUERY MANAGER The *uery mana%er performs all operations associated with mana%ement of user *ueries. This component is usually constructed usin% 'endor end+user access tools# data warehousin% monitorin% tools# database facilities and custom built pro%rams. The complexity of a *uery mana%er is determined by facilities pro'ided by the end+user access tools and database. . DETAILED DATA This area of the warehouse stores all the detailed data in the database schema. $n most cases detailed data is not stored online but a%%re%ated to the next le'el of details. ,owe'er the detailed data is added re%ularly to the warehouse to supplement the a%%re%ated data. !. LIGHTLY AND HIGHLY SUMMERIZED DATA The area of the data warehouse stores all the predefined li%htly and hi%hly summari&ed (a%%re%ated) data %enerated by the warehouse mana%er. This area of the warehouse is transient as it will be sub-ect to chan%e on an on%oin% basis in order to respond to the chan%in% *uery profiles. The purpose of the summari&ed information is to speed up the *uery performance. The summari&ed data is updated continuously as new data is loaded into the warehouse. ". ARCHIVE AND BACK UP DATA This area of the warehouse stores detailed and summari&ed data for the purpose of archi'in% and back up. The data is transferred to stora%e archi'es such as ma%netic tapes or optical disks. #. META DATA

Dr. Navneet Goyal, !"S, #ilani

#age & o$ %

SS ZG515: Data Warehousing

The data warehouse also stores all the eta data (data about data) definitions used by all processes in the warehouse. $t is used for 'ariety of purposed includin%: (i) The extraction and loadin% process . eta data is used to map data sources to a common 'iew of information within the warehouse. (ii) The warehouse mana%ement process . eta data is used to automate the production of summary tables. (iii) As part of /uery ana%ement process eta data is used to direct a *uery to the most appropriate data source. The structure of eta data will differ in each process# because the purpose is different. ore about eta data will be discussed in the later 0ecture 1otes. $. END%USER ACCESS TOOLS The principal purpose of data warehouse is to pro'ide information to the business mana%ers for strate%ic decision+makin%. These users interact with the warehouse usin% end user access tools. The examples of some of the end user access tools can be: (i) 2eportin% and /uery Tools (ii) Application De'elopment Tools (iii) Executi'e $nformation !ystems Tools (i') "nline Analytical 3rocessin% Tools (') Data inin% Tools THE E T L &E'TRACT TRANS(ORMATION LOAD) PROCESS $n this section we will discussed about the 4 ma-or process of the data warehouse. They are *+,-./, (data from the operational systems and brin% it to the data warehouse)# ,-.0123-4 (the data into internal format and structure of the data warehouse)# /5*.01* (to make sure it is of sufficient *uality to be used for decision makin%) and 53.6 (cleanse data is put into the data warehouse). The four processes from extraction throu%h loadin% often referred collecti'ely as D.,. S,.7807. E'TRACT !ome of the data elements in the operational database can be reasonably be expected to be useful in the decision makin%# but others are of less 'alue for that purpose. 5or this reason# it is necessary to extract the rele'ant data from the operational database before brin%in% into the data warehouse. any commercial tools are a'ailable to help with the extraction process. D.,. 9:0/,830 is one of the commercial products. The user of one of these tools typically has an easy+to+use windowed interface by which to specify the followin%:

Dr. Navneet Goyal, !"S, #ilani

#age ' o$ %

SS ZG515: Data Warehousing

(i) (ii) (iii) (i') (')

6hich files and tables are to be accessed in the source database7 6hich fields are to be extracted from them7 This is often done internally by !/0 !elect statement. 6hat are those to be called in the resultin% database7 6hat is the tar%et machine and database format of the output7 "n what schedule should the extraction process be repeated7

TRANS(ORM The operational databases de'eloped can be based on any set of priorities# which keeps chan%in% with the re*uirements. Therefore those who de'elop data warehouse based on these databases are typically faced with inconsistency amon% their data sources. Transformation process deals with rectifyin% any inconsistency (if any). "ne of the most common transformation issues is 8Attribute 1amin% $nconsistency9. $t is common for the %i'en data element to be referred to by different data names in different databases. Employee 1ame may be E 3:1A E in one database# E1A E in the other. Thus one set of Data 1ames are picked and used consistently in the data warehouse. "nce all the data elements ha'e ri%ht names# they must be con'erted to common formats. The con'ersion may encompass the followin%: (i) (haracters must be con'erted A!($$ to EB(D$( or 'ise 'ersa. (ii) ixed Text may be con'erted to all uppercase for consistency. (iii) 1umerical data must be con'erted in to a common format. (i') Data 5ormat has to be standardi&ed. (') easurement may ha'e to con'ert. (2s; <) ('i) (oded data ( ale; 5emale# ;5) must be con'erted into a common format. All these transformation acti'ities are automated and many commercial products are a'ailable to perform the tasks. D.,.MAPPER from Applied Database Technolo%ies is one such comprehensi'e tool. CLEANSING $nformation *uality is the key consideration in determinin% the 'alue of the information. The de'eloper of the data warehouse is not usually in a position to chan%e the *uality of its underlyin% historic data# thou%h a data warehousin% pro-ect can put spotli%ht on the data *uality issues and lead to impro'ements for the future. $t is# therefore# usually necessary to %o throu%h the data entered into the data warehouse and make it as error free as possible. This process is known as D.,. C5*.01807.

Dr. Navneet Goyal, !"S, #ilani

#age 5 o$ %

SS ZG515: Data Warehousing

Data (leansin% must deal with many types of possible errors. These include missin% data and incorrect data at one source= inconsistent data and conflictin% data when two or more source are in'ol'ed. There are se'eral al%orithms followed to clean the data# which will be discussed in the comin% lecture notes. LOADING 0oadin% often implies physical mo'ement of the data from the computer(s) storin% the source database(s) to that which will store the data warehouse database# assumin% it is different. This takes place immediately after the extraction phase. The most common channel for data mo'ement is a hi%h+speed communication link. Ex: "racle 6arehouse Builder is the A3$ from "racle# which pro'ides the features to perform the ET0 task on "racle Data 6arehouse.

Dr. Navneet Goyal, !"S, #ilani

#age % o$ %

You might also like