You are on page 1of 16

Good

Evening to All
III & II
IT
Characteristics of Data-warehousing
Goals Of Data-Warehousing
Architecture Of Data-Warehousing
The Phases Involved
 Accessibility Getting required information
when ever needed
 Timeliness Time taken to submit the report
 Formats Formats like spreadsheets,
graphs, maps etc.,
 Integrity Accuracy and Reliability of data
Data Warehouse is a database of data
gathered from many systems and intended to
support management reporting and
decision making.
This process of gathering data is called
Data Warehousing
 Subject oriented: Data Warehouse deals with all the subjects of
corporate data.Eg: sales, finance, customers etc

 Integrated: Integrates data from different Database systems
(Heterogeneous data) to single homogeneous
data.
 Non-volatile: The Data Warehouse is a read only database. It
cannot be overwritten or deleted. So, it’s
Non-volatile.
 Time variant: Historical data with chronological importance,
i.e. Historical data is maintained and analysed
for future analysis.
• To provide a reliable, single, integrated source of
information

• To give end users access to their data without a reliance on
reports produced by Information System (IS)
department.

• Allows to analyze corporate data, predictive models and
improve Business Intelligence.
• Four Data structures for the storage of data are:
1. DATA STORE 1, called , called Online Transaction Processing
(OLTP).
2. DATA STORE 2, called Integration Layer or Data Warehouse
3. DATA STORE 3, called Data Mart or High Processing Query
System (HPQS)
4. DATA STORE 4, called Online Analytical Processing (OLTP)

• Three Data flow paths between the four data structures are:
1. FLOW1, from DATA STORE1 to DATA STORE 2
2. FLOW2, from DATA STORE2 to DATA STORE 3
3. FLOW3, from DATA STORE3 to DATA STORE 4
The architecture is divided into three
phases :
1.Extract Phase
2.Transform Phase
3.Loading Phase

Transfer data
Data Store 1---------------Data Store 2
There are different mechanisms for extracting that
data out of its sources. This is called Data
The art of determining what records to extract from the
source system is frequently called Change data capture .
Some general techniques used to recognize changes to
source database tables. They are:
 Timestamps: The lucky among us extract data from
systems the timestamp records whenever
they are inserted or deleted.
 Triggers: Every time a record is inserted into,
updated in or deleted from a source table,
these triggers write a corresponding
message in a log file.
FileCompares: Identify changes in your data is to
compare the file as it appears today to a
copy of how it appeared when you last
loaded the warehouse.
Transform phase is where this data is Transformed into the required form in the
DATA STORE 2 . Some of the fundamental steps in the Transformation phase are:
1. Converting heterogeneous data to homogeneous data:---
The data in the DATA STORE 2 is from the different source
systems of DATA- STORE 1. So, the data is heterogeneous.
DATA STORE 2 is called Integration Layer or Warehouse.
2. Adding Surrogate keys:---
For example, rather than using the customer number as
the key on the CUSTOMER table, you might use a
surrogate key that is simply a sequential number generated
by your warehouse load programs.
3. Removing dirty data:----
a. Ignoring them.
b. Rejecting bad records, but saving them in a separate file
for manual review.
c. Loading as much of the bad record as possible and pointing
out the errors for later.
4. Normalization:---
A normalized database is like a flat file that is broken up into
smaller files or tables in order to store the data more
Transformed data is sent to DATA STORE3, which is called DATA
MART.
DEFINITION OF DATA MART:
Data Marts are databases that share many of the
features of data warehouses but are smaller in scope.

LOADING phase constitutes several schemas. Two of them are:

Star Schema: Maintenance of data will be in one fact
table and multiple dimension tables.
Snow Flake Schema: Maintenance of data will be in the form of
normalized dimension tables.
This DATA STORE 3 is also called High Performance Query
Structures [HPQS].

 DATA FLOW 3 is the transfer of data from the High Performance
Query Structures to the End User Reporting applications,
DATA STORE4.

 DATA STORE 4 is the data in the end user’s hands. This report in
users’ hands is the end of the information utility. It is, also, the
A Centralized Data Warehouse Server is maintained at a
particular place. The transactions of all the Government
Departments are transferred to the Centralized Server,
Data Warehouse Server.The topology of the Network is
equated to the Architecture of the Data Warehouse as
shown in the fig
DWHS-Data Ware Housing Server
OLPS-OnLine Analytical Processing
System.

 In the above example, Data from three departments are
extracted and transformed to Centralized Server [DWHS].
 Data Marts can answer most complex Queries and
Report generation will be immediate………

This Data can be checked further for any corrections
if any Incorrect data is found in the Data Ware House can
be informed to the government.

§.Thus, Data Warehousing can take both
private and public sectors to a top level.