You are on page 1of 65

An Introduction to Data

Warehousing

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
In the Beginning, life was simple…

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
But…

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Our information needs…

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Kept growing. (The Spider web)

SOURCE: William H.
Inmon
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Purpose

To explore and discuss the


purpose and principles of
data warehousing.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
A producer wants to know….

Which are our


lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?

What product prom-


-otions have the biggest Which customers
are most likely to go
impact on revenue? to the competition ?

What impact will


new products/services
have on revenue
and margins?
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data, Data everywhere
yet ...
• I can’t find the data I need
o data is scattered over the network
o many versions, subtle differences

• I can’t get the data I need


o need an expert to get the data

• I can’t understand the data I found


o available data poorly documented

• I can’t use the data I found


o results are unexpected
o data needs to be transformed from
one form to other
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
What is a Data Warehouse?

A single, complete and


consistent store of data
obtained from a variety of
different sources made
available to end users in
a what they can
understand and use in a
business context.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
What are the users saying...

• Data should be
integrated across the
enterprise
• Summary data has a
real value to the
organization
• Historical data holds the
key to understanding
data over time
• What-if capabilities are
required

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
What is Data Warehousing?

A process of transforming
Information data into information and
making it available to users in
a timely enough manner to
make a difference

Data

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Warehousing --
It is a process

• Technique for assembling


and managing data from
various sources for the
purpose of answering
business questions. Thus
making decisions that were
not previous possible
• A decision support database
maintained separately from
the organization’s
operational database

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Warehouse

• A data warehouse is a
o subject-oriented
o integrated
o time-varying
o non-volatile
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Warehouse?

• Definition: A data warehouse is the data


repository of an enterprise. It is generally
used for research and decision support.
• By comparison: an OLTP (on-line
transaction processor) or operational
system is used to deal with the everyday
running of one aspect of an enterprise.
• OLTP systems are usually designed
independently of each other and it is
difficult for them to share information.
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Why Data Warehouse?

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Scenario 1

ABC Pvt Ltd is a company with branches at


Mumbai, Delhi, Chennai and Banglore. The
Sales Manager wants quarterly sales
report. Each branch has a separate
operational system.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Scenario 1 : ABC Pvt Ltd.

Mumbai

Delhi
Sales per item type per Sales
branch Manager
for first quarter.
Chenna
i

Banglor
e

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Solution 1:ABC Pvt Ltd.

• Extract sales information from each


database.
• Store the information in a common
repository at a single site.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Solution 1:ABC Pvt Ltd.

Mumbai

Rep
ort
Delhi
Query & Sales
Data Analysis tools Manager
Warehouse

Chennai

Banglor
e

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Scenario 2

One Stop Shopping Super Market has huge


operational database.Whenever Executives
wants
some report the OLTP system becomes
slow and data entry operators have to wait
for
some time.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Scenario 2 : One Stop Shopping

Data Entry
Operator
Repor
t

Wait Operational Management


Database

Data Entry
Operator

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Solution 2

• Extract data needed for analysis from


operational database.
• Store it in warehouse.
• Refresh warehouse at regular interval so that it
contains up to date information for analysis.
• Warehouse will contain data with historical
perspective.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Solution 2

Data Entry
Operator

Repor
t

Transaction Extract Data


Operational Manager
data Warehouse
database

Data Entry
Operator

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Scenario 3

Cakes & Cookies is a small,new


company.President of the company wants his
company should grow.He needs information so
that he can make correct decisions.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Solution 3

• Improve the quality of data before


loading it into the warehouse.
• Perform data cleaning and
transformation before loading the
data.
• Use query analysis tools to support
adhoc queries.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Solution 3

Expansion

sales

Data Query and


Analysis President
Wareho
use tool

time

Improvement

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Why Do We Need Data Warehouses?

• Consolidation of information resources


• Improved query performance
• Separate research and decision support
functions from the operational systems
• Foundation for data mining, data
visualization, advanced reporting and
OLAP tools

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Need for Data Warehousing

• Industry has huge amount of operational


data
• Knowledge worker wants to turn this data
into useful information.
• This information is used by them to
support strategic decision making .

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Need for Data Warehousing (contd..)

• It is a platform for consolidated historical


data for analysis.
• It stores data of good quality so that
knowledge worker can make correct
decisions.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Need for Data Warehousing (contd..)

• From business perspective


-it is latest marketing weapon
-helps to keep customers by learning more
about their needs .
-valuable tool in today’s competitive fast
evolving world.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
What Is a Data Warehouse Used for?

• Knowledge discovery
o Making consolidated reports
o Finding relationships and correlations
o Data mining
o Examples
 Banks identifying credit risks
 Insurance companies searching for fraud
 Medical research

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
How Do Data Warehouses Differ From
Operational Systems?

• Goals
• Structure
• Size
• Performance optimization
• Technologies used

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Comparison Chart of Database Types

Data warehouse Operational system


Subject oriented Transaction oriented

Large (hundreds of GB up to Small (MB up to several GB)


several TB)
Historic data Current data

De-normalized table structure (few Normalized table structure (many


tables, many columns per table) tables, few columns per table)
Batch updates Continuous updates

Usually very complex queries Simple to complex queries

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Design Differences

Operational Data
System Warehouse

ER Diagram Star Schema

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Supporting a Complete Solution

Operational
System-
Data Entry

Data
Warehouse-
Data Retrieval

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Warehouses, Data Marts, and
Operational Data Stores
• Data Warehouse – The queryable source
of data in the enterprise. It is comprised of
the union of all of its constituent data
marts.
• Data Mart – A logical subset of the
complete data warehouse. Often viewed
as a restriction of the data warehouse to a
single business process or to a group of
related business processes targeted
toward a particular business group.
• Operational Data Store (ODS) – A point
of integration for operational systems that
developed independent of each other.
Since an ODS supports day to day SOURCE: Ralph
Kimball
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Mining works with Warehouse Data

• Data Warehousing provides the


Enterprise with a memory

• Data Mining provides the


Enterprise with intelligence

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
We want to know ...

• Given a database of 100,000 names, which persons


are the least likely to default on their credit cards?
• Which types of transactions are likely to be
fraudulent given the demographics and transactional
history of a particular customer?
• If I raise the price of my product by Rs. 2, what is the
effect on my ROI?
• If I offer only 2,500 airline miles as an incentive to
purchase rather than 5,000, how many lost responses
will result?
• If I emphasize ease-of-use of the product as opposed
to its technical capabilities, what will be the net effect
on my revenues?
• Which of my customers are likely to be the most
Data Mining helps extract such information
loyal?
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Application Areas

Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud
Telecommunication Analysis
Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Mining in Use

• The US Government uses Data Mining to


track fraud
• A Supermarket becomes an information
broker
• Basketball teams use it to track game
strategy
• Cross Selling
• Warranty Claims Routing
• Holding on to Good Customers
• Weeding out Bad Customers

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Warehousing Tools

• Data Warehouse
o SQL Server 2000 DTS
o Oracle 8i Warehouse Builder
• OLAP tools
o SQL Server Analysis Services
o Oracle Express Server
• Reporting tools
o MS Excel Pivot Chart
o VB Applications

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
RDBMS used for OLTP

• Database Systems have been


used traditionally for OLTP
o clerical data processing tasks
o detailed, up to date data
o structured repetitive tasks
o read/update a few records
o isolation, recovery and integrity are
critical

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
OLTP vs Data Warehouse

• OLTP • Warehouse (DSS)


o Application o Subject Oriented
Oriented o Used to analyze
o Used to run business
business o Summarized and
o Detailed data refined
o Current up to date o Snapshot data
o Isolated Data o Integrated Data
o Repetitive access o Ad-hoc access
o Clerical User o Knowledge User
(Manager)

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
OLTP vs Data Warehouse

• OLTP • Data Warehouse


o Transaction o Query throughput is
throughput is the the performance
performance metric metric
o Thousands of users o Hundreds of users
o Managed in entirety o Managed by
subsets

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
To summarize ...

• OLTP Systems are

used to “run” a
business

• The Data
Warehouse helps
to “optimize” the
business

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Building a Data Warehouse

Data Warehouse
Lifecycle
• Analysis
• Design
• Import data
• Install front-end tools
• Test and deploy

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Stage 1: Analysis

Analysis
• Design
• Identify: • Import data
• Install front-end tools
o Target Questions • Test and deploy
o Data needs
o Timeliness of data
o Granularity
• Create an enterprise-level data
dictionary
• Dimensional analysis
o Identify facts and dimensions

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Stage 2: Design

• Analysis
Design
• Star schema • Import data
• Install front-end tools
• Data Transformation • Test and deploy

• Aggregates
• Pre-calculated Values Dimensional
• HW/SW Architecture Modeling

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Dimensional Modeling

• Fact Table – The primary table in a


dimensional model that is meant to
contain measurements of the business.
• Dimension Table – One of a set of
companion tables to a fact table. Most
dimension tables contain many textual
attributes that are the basis for
constraining and grouping within data
warehouse queries.
SOURCE: Ralph
Kimball
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Stage 3: Import Data

• Analysis
• Identify data sources • Design
Import data
• Extract the needed data • Install front-end tools
from existing systems to a • Test and deploy
data staging area
• Transform and Clean the
data
o Resolve data type conflicts
o Resolve naming and key
conflicts
o Remove, correct, or flag bad
data
o Conform Dimensions
• Load the data into the
warehouse
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Importing Data Into the Warehouse

Operational
Systems
(source systems)

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Stage 4: Install Front-end Tools

• Analysis
• Design
• Reporting tools • Import data
Install front-end tools
• Data mining tools • Test and deploy

• GIS
• Etc.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Stage 5: Test and Deploy

• Analysis
• Design
• Usability tests • Import data
• Install front-end tools
• Software installation Test and deploy

• User training
• Performance tweaking based on usage

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Special Concerns

• Time and expense


• Managing the complexity
• Update procedures and maintenance
• Changes to source systems over time
• Changes to data needs over time

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Goals of the STORET Central Warehouse

• Improved performance and faster data


retrieval
• Ability to produce larger reports
• Ability to provide more data query
options
• Streamlined application navigation

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Old Web Application
Flow

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Central Warehouse Application Flow

Search Criteria
Selection

Report Size
Feedback/
Report
Customization

Report Generation

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Web Application Demo

STORET Central
Warehouse:
http://epa.gov/storet/dw_hom
e.html

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
STORET Central Warehouse – Potential Future
Enhancements

• More query functionality


• Additional report types
• Web Services
• Additional source systems?

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Warehouse Components

SOURCE: Ralph
Kimball
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Data Warehouse Components – Detailed

SOURCE: Ralph
Kimball
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this
Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this

You might also like