You are on page 1of 25

DATA

WAREHOUSE

Sara Qaid Al-Gwill Silwan Ahmed Al-Eriani Noha Ali Al-Gharsi


Content
Definition of Data warehouse
Why data warehouse
Data warehouse Architectures
Goals of data warehouse
Benefits of data warehouse
Disadvantages of data warehouse
definition
 A data warehouse: is a  repository of an
organization's electronically stored data. Data
warehouses are designed to facilitate reporting and
analysis .
 This definition of the data warehouse focuses on data
storage. However, the means to retrieve and analyze data,
to extract, transform and load data, and to manage
the data dictionary are also considered essential
components of a data warehousing system.

Why warehouse?
 For analysis and decision support, end users require
access to data captured and stored in an
organization’s operational or production systems

 This data is stored in multiple formats, on multiple


platforms, in multiple data structures, with multiple
names, and probably using different business rules
The Need for Data Warehousing
Who
Whoare
aremy
my
customers
customers
and
andwhat
whatproducts
products
are
arethey
theybuying? What
buying? Whatisisthe
the
most
most
effective
effective
distribution
distribution
What
Whatproduct
product channel?
channel?
promotions
promotions
have
havethe
thebiggest
biggest
impact on
impact on Which
Which
revenue?
revenue? customers
customers
are
aremost
mostlikely
likely
What to
togo
go
Whatimpact
impact to
will
will tothe
the
new competition
competition??   
new
products/servic
products/servic
es
es
have
haveonon
revenue
revenue
and
andmargins?
margins?
Why do we want a central data
store
Life LTD

Voluntary
Non -Medical
Benfits

Individual
Financial
Disability

Account
Management
Underwriting

Customer Service
Sales /Marketing
Financial Analysis

A Good Reason for a Central Data Store


Data Warehouse Goals
 Speed up reporting
 Reduce reporting load on transactional systems

 Make institutional data more user-friendly and

accessible
 Integrate data from different source systems

 Enable ‘point-in-time’ analysis and trending over

time
 To help identify and resolve data


integrity issues, either in the

warehouse itself or in the source

systems that collect the data
Types of Databases
(classified by use)
 Transactional (DBMS):
collection of operations that performs a
single logical function in a database
application
 Create data and reading or updating of database records.
 Supports a company’s day-to-day operations
 Data warehouse:
 Stores data used to generate

information required to make

strategic decisions
 Often used to store historical data
Transaction System vs . Data Warehouse
♦Transaction System ♦Data Warehouse
oSupports day-to-day operational oSupports management analysis and
processes decision-making processes
oContains raw, detailed data that oContains summarized, refined, and
has not been refined or cleansed cleansed information
oVolatile -- data changes from day-
oNon-volatile -- provides a data
to-day, with frequent updates
“snapshot”; adjustments are not
oTechnical issues drive the data
permitted, or are limited
structure and system design
oBusiness analysis requirements drive
oDisparate data structures,
the data structure and system
physical locations, query types,
design
etc.
oIntegrated, consistent information on
oUsers rely on technical analysts
a single technology platform
for reporting needs
Data Warehouse Characteristics

§ Key Characteristics of a Data


Warehouse

§ Subject-oriented
§ Integrated
§ Time-variant
§ Non-volatile
•Example for an insurance company :
Applications Area Data Warehouse
Auto
Auto and
and Fire
Fire
Policy
Policy
Commercial
Commercial Processing
and Processing Policy
and Life
Life Systems
Systems
Customer
Customer Policy
Insurance
Insurance
Systems
Systems

Data
Data

Claims
Claims Losses Premium
Accounting Processing Losses Premium
Accounting Processing
System
System Billing System
System
Billing
System
System
DATA MODELING FOR DW

►Dimensional Modeling
§

A data warehouse is based on a multidimensional

data model which views data in the form of a data

cube
A specific discipline for modeling data that is an

alternative to entity-relationship (E/R) modeling;

usually employed in data warehouses and OLAP


Data warehouse and data base

Region
Reg1 Reg2 Reg3 Reg4
all
62 20 14 20 8

P12
54 2 7 10 35
P13
Product
P14 48 5 28 8 10
164 27 49 35 53
all
DATA WAREHOUSE ARCHITECTURE

oData Warehouse Architecture (DWA)


§is a way of representing the overall

structure of data, communication,

processing and presentation that exists

for end-user computing within the

enterprise.
DATA WAREHOUSE ARCHITECTURE
§ General Architecture for Data
Warehousing
 Source systems
 Extraction, (Clean), Transformation, & Load (ETL)
 Metadata repository
 Data marts
 Operational feedback (OLAP)
 Data Mining tools
 End users (business)
Data warehouse process

Data goes through a series of


steps as
it is moved to the warehouse:

Extract programs

Write Natural programs to extract data
from the mainframe data base
Data warehouse process

Verify data

Verify accuracy and consistency of data --
ensure “data legibility”

Create tables

Create “normalized” tables on the warehouse --
eliminate data redundancy (i.e. address appears in
one place only)


Data warehouse process

Load tables

Load warehouse tables with extracted data

Refresh data

Establish a schedule to refresh the data.
Frequency depends on volatility of the
data. Some refreshed weekly, some once
per semester

Data Warehouse Metadata
 The metadata repository is a key data warehouse
component. The metadata repository includes both
technical and business metadata.
 technical metadata:
 covers details of acquisition processing, storage structures,
data descriptions, warehouse operations and maintenance,
and access support functionality.
 business metadata:
 includes the relevant business rules and organizational details
supporting the warehouse.
Costs of Data Warehousing

 1. Time spent in careful analysis of measurable


needs
 2. Design and implementation effort
 3. Hardware costs
 4. Software costs
 5. On-going support and maintenance
 6. Resulting re-engineering effort
Benefits

 Has a subject area orientation


 Integrates data from multiple, diverse sources
 Allows for analysis of data over time
 Adds ad hoc reporting and enquiry
 Provides analysis capabilities to decision makers
 Relieves the development burden on IT
 Provides improved performance for complex analytical
queries
 Relieves processing burden on transaction oriented
databases
 Allows for a continuous planning process

Disadvantages  

 There are also disadvantages to using a data warehouse.


Some of them are:
 Data warehouses are not the optimal environment for 
unstructured data.
 Because data must be extracted, transformed and loaded
into the warehouse, there is an element of  latency  in
data warehouse data.
 Over their life, data warehouses can have high costs.
Maintenance costs are high.
 Data warehouses can get outdated relatively quickly. There
is a cost of delivering suboptimal information to the
organization.
 There is often a fine line between data warehouses and
operational systems. Duplicate, expensive functionality
may be developed. Or, functionality may be developed in
the data warehouse that, in retrospect, should have been
developed in the operational systems and vice versa.
Example: health care
Business Issues:


How many hospitals to open

Where to open hospitals

Are services consistent among population

Thank You
For
Attention