You are on page 1of 16

DATA WAREHOUSING

Definition
 “A data warehouse is a collection of computerized
data that is organized to most optimally support
reporting and analysis activity”.
 A data warehouse is a repository of an
organization's data, where the informational
assets of the organization are stored and managed,
to support various activities such as reporting,
analysis, decision-making, as well as other
activities such as support for optimization of
organizational operational processes.

Evolution Stages Of Data Warehouse
 Offline Operational Databases - Data
warehouses in this initial stage are developed
by simply copying the database of an
operational system to an off-line server where
the processing load of reporting does not
impact on the operational system's
performance.
 Offline Data Warehouse - Data warehouses in
this stage of evolution are updated on a regular
time cycle (usually daily, weekly or monthly)
from the operational systems and the data is
stored in an integrated reporting-oriented data
structure

Evolution Stages Of Data Warehouse
Cont.
 Real Time Data Warehouse - Data
warehouses at this stage are updated on a
transaction or event basis, every time an
operational system performs a transaction
(e.g. an order or a delivery or a booking
etc.)
 Integrated Data Warehouse - Data
warehouses at this stage are used to
generate activity or transactions that are
passed back into the operational systems
for use in the daily activity of the
organization.

Components of a data warehouse
 Data Sources: Refers to any electronic repository of
information that contains data of interest for
management use or analytics.
 Data Transformation: This layer receives data from
the data sources, cleans and standardizes it, and loads
it into the data repository. This is often called
"staging" data as data often passes through a
temporary database whilst it is being transformed.
This activity of transforming data can be performed
either by manually created code or a specific type of
software could be used called an ETL (Extract,
Transform & Load ) tool.

Data Transformation
 Activities occurring during data transformation

 Comparing data from different systems to improve data


quality (e.g. Date of birth for a customer may be blank
in one system but contain valid data in a second
system. In this instance, the data warehouse would
retain the date of birth field from the second system).

 Standardizing data and codes (e.g. If one system refers


to "Male" and "Female", but a second refers to only
"M" and "F", these codes sets would need to be
standardized) .

 Integrating data from different systems (e.g. if one
system keeps orders and another stores customers,
these data elements need to be linked).


Components of a data warehouse
 Data Warehouse: The data warehouse need not to
be a relational database, as it must be organized
to hold information in a structure that best
supports not only query and reporting, but also
advanced analysis techniques, like data mining.
Most data warehouses hold information for at
least 1 year and sometimes can reach half
century, depending on the business/operations
data retention requirement. As a result these
databases can become very large.

Components of a data warehouse

 Reporting : The data in the data warehouse


must be available to the organization's
staff if the data warehouse is to be useful.
There are a very large number of software
applications that perform this function, or
reporting can be custom-developed.
Reporting
 Types of reporting tools include:

 Business Intelligence tools: These are software applications


that simplify the process of development and production of
business reports based on data warehouse data.
 Executive Information System (known more widely as
dashboard (business) : These are software applications that
are used to display complex business metrics and
information in a graphical way to allow rapid understanding.
 OLAP Tools: OLAP ( On Line Analytical Processing) tools
form data into logical multi-dimensional structures and
allow users to select which dimensions to view data by.
 Data Mining: Data mining tools are software that allow users
to perform detailed mathematical and statistical calculations
on detailed data warehouse data to detect trends, identify
patterns and analyze data.

Components of a data warehouse
 Metadataor "data about data", is used to inform
operators and users of the data warehouse about
its status and the information held within the
data warehouse.
Examples of data warehouse metadata include table

and column names,


 their detailed descriptions,
 their connection to business meaningful
names,
 the most recent data load date,
 the business meaning of a data item
 the number of users that are logged in
currently
Components of a data warehouse

 Operations: Data warehouse operations comprises


of the processes of loading, manipulating and
extracting data from the data warehouse.
Operations also cover user management,
security, capacity management and related
functions

Organizing Data in a Data Warehouse
 The dimensional approach
 transaction data is partitioned into either a measured "facts"
which are generally numeric data that captures specific
values or "dimensions" which contain the reference
information that gives each transaction its context.
 a sales transaction would be broken up into facts such as

the number of products ordered, and the price paid, and


dimensions such as date, customer, product,
geographical location and salesperson.
 The normalized approach
 data in the data warehouse is stored in third normal form.
 Tables are then grouped together by subject areas that reflect
the general definition of the data .
 (customer, product, finance, etc.).
The dimensional approach

Units Revenue Cost


Jan. 280 1560 1200
Feb. 200 1260 980
March 350 2490 2200
April 600 4560 3980
May 550 4320 3750
The normalized approach

Product DescriptionCost
Code
P001 Pants 800
P002 Shirts 600 Product Quantity
Code
P003 T-Shirts 550 P001 200
P001 100
P003 120
Advantages of using data warehouse

 Enhances end-user access to a wide variety of data.


 Business decision makers can get the trend reports e.g.
The Item mostly sailed in a particular area/ country for
last 2 year. This may be helpful in future investments in
a particular item.
 Increases data consistency.
 Increases productivity and decreases computing costs.
 Is able to combine data from different sources, in one
place.
 It provides an infrastructure that could support changes to
data and replication of the changed data back into the
operational systems.

Concerns in using data warehouse
 Extracting, cleaning and loading data could be
time consuming.
 Data warehousing project scope might increase.
 Problems with compatibility with systems already
in place e.g. transaction processing system.
 Providing training to end-users, who end up not
using the data warehouse.
 Security could develop into a serious issue,
especially if the data warehouse is web
accessible