An Introduction to Data Warehousing

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

In the Beginning, life was simple«

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

But«

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Our information needs«

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Kept growing. (The Spider web)

SOURCE: William H. Inmon
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Purpose

To explore and discuss the purpose and principles of data warehousing.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

A producer wants to know«.

Which are our lowest/highest margin customers ? What is the most effective distribution channel? Who are my customers and what products are they buying?

What product prom-otions have the biggest impact on revenue? What impact will new products/services have on revenue and margins?

Which customers are most likely to go to the competition ?

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data, Data everywhere yet ...
‡ I can¶t find the data I need
o o

data is scattered over the network many versions, subtle differences

‡ I can¶t get the data I need
o

need an expert to get the data

‡ I can¶t understand the data I found
o

available data poorly documented

‡ I can¶t use the data I found
o o

results are unexpected data needs to be transformed from one form to other

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

What is a Data Warehouse?

A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

What are the users saying...

‡ Data should be integrated across the enterprise ‡ Summary data has a real value to the organization ‡ Historical data holds the key to understanding data over time ‡ What-if capabilities are required
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

What is Data Warehousing?

Information

A process of transforming data into information and making it available to users in a timely enough manner to make a difference

Data
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Warehousing -It is a process

‡ Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible ‡ A decision support database maintained separately from the organization¶s operational database
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Warehouse

‡ A data warehouse is a
o o o o

subject-oriented integrated time-varying non-volatile

collection of data that is used primarily in organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Warehouse?

‡ Definition: A data warehouse is the data repository of an enterprise. It is generally used for research and decision support. ‡ By comparison: an OLTP (on-line transaction processor) or operational system is used to deal with the everyday running of one aspect of an enterprise. ‡ OLTP systems are usually designed independently of each other and it is difficult for them to share information.
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Why Data Warehouse?

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Scenario 1

ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Scenario 1 : ABC Pvt Ltd.

Mumbai

Delhi Sales per item type per branch for first quarter. Chenna i Sales Manager

Banglor e

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Solution 1:ABC Pvt Ltd.

‡ Extract sales information from each database. ‡ Store the information in a common repository at a single site.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Solution 1:ABC Pvt Ltd.

Mumbai Rep ort Data Warehouse Chennai Query & Analysis tools Sales Manager

Delhi

Banglor e

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Scenario 2

One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Scenario 2 : One Stop Shopping

Data Entry Operator Repor t Wait Operational Database Management

Data Entry Operator

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Solution 2

‡ Extract data needed for analysis from operational database. ‡ Store it in warehouse. ‡ Refresh warehouse at regular interval so that it contains up to date information for analysis. ‡ Warehouse will contain data with historical perspective.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Solution 2

Data Entry Operator Repor t Transaction Operational database Extract data Data Warehouse Manager

Data Entry Operator

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Scenario 3

Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Solution 3

‡ Improve the quality of data before loading it into the warehouse. ‡ Perform data cleaning and transformation before loading the data. ‡ Use query analysis tools to support adhoc queries.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Solution 3
Expansion

sales Data Wareho use Query and Analysis tool time Improvement

President

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Why Do We Need Data Warehouses?

‡ Consolidation of information resources ‡ Improved query performance ‡ Separate research and decision support functions from the operational systems ‡ Foundation for data mining, data visualization, advanced reporting and OLAP tools

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Need for Data Warehousing

‡ Industry has huge amount of operational data ‡ Knowledge worker wants to turn this data into useful information. ‡ This information is used by them to support strategic decision making .

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Need for Data Warehousing (contd..)

‡ It is a platform for consolidated historical data for analysis. ‡ It stores data of good quality so that knowledge worker can make correct decisions.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Need for Data Warehousing (contd..)

‡ From business perspective -it is latest marketing weapon -helps to keep customers by learning more about their needs . -valuable tool in today¶s competitive fast evolving world.

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

What Is a Data Warehouse Used for?

‡ Knowledge discovery
o Making consolidated reports o Finding relationships and correlations o Data mining o Examples 

Banks identifying credit risks  Insurance companies searching for fraud  Medical research

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

How Do Data Warehouses Differ From Operational Systems?

‡ ‡ ‡ ‡ ‡

Goals Structure Size Performance optimization Technologies used

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Comparison Chart of Database Types

Data warehouse
Subject oriented Large (hundreds of GB up to several TB) Historic data

Operational system
Transaction oriented Small (MB up to several GB) Current data

De-normalized table structure (few Normalized table structure (many tables, many columns per table) tables, few columns per table) Batch updates Continuous updates Usually very complex queries Simple to complex queries

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Design Differences

Operational System

Data Warehouse

ER Diagram

Star Schema

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Supporting a Complete Solution

Operational SystemData Entry

Data WarehouseData Retrieval

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Warehouses, Data Marts, and Operational Data Stores

‡ Data Warehouse ± The queryable source of data in the enterprise. It is comprised of the union of all of its constituent data marts. ‡ Data Mart ± A logical subset of the complete data warehouse. Often viewed as a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group. ‡ Operational Data Store (ODS) ± A point of integration for operational systems that developed independent of each other. Since an ODS supports day to day operations, it needs to be continually
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

SOURCE: Ralph Kimball

Data Mining works with Warehouse Data

‡ Data Warehousing provides the Enterprise with a memory

‡ Data Mining provides the Enterprise with intelligence

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

We want to know ...
‡ Given a database of 100,000 names, which persons are the least likely to default on their credit cards? ‡ Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer? ‡ If I raise the price of my product by Rs. 2, what is the effect on my ROI? ‡ If I offer only 2,500 airline miles as an incentive to purchase rather than 5,000, how many lost responses will result? ‡ If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues? ‡ Which of my customers are likely to be the most Data loyal? Mining helps extract such information
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Application Areas

Industry Finance Insurance Telecommunication Transport Consumer goods Data Service providers Utilities

Application Credit Card Analysis Claims, Fraud Analysis Call record analysis Logistics management promotion analysis Value added data Power usage analysis

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Mining in Use

‡ The US Government uses Data Mining to track fraud ‡ A Supermarket becomes an information broker ‡ Basketball teams use it to track game strategy ‡ Cross Selling ‡ Warranty Claims Routing ‡ Holding on to Good Customers ‡ Weeding out Bad Customers

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Warehousing Tools

‡ Data Warehouse o SQL Server 2000 DTS o Oracle 8i Warehouse Builder ‡ OLAP tools o SQL Server Analysis Services o Oracle Express Server ‡ Reporting tools o MS Excel Pivot Chart o VB Applications

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

RDBMS used for OLTP

‡ Database Systems have been used traditionally for OLTP
o clerical data processing tasks o detailed, up to date data o structured repetitive tasks o read/update a few records o isolation, recovery and integrity

are

critical

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

OLTP vs Data Warehouse

‡ OLTP
o o o o o o o

‡ Warehouse (DSS)
o o o o o o o

Application Oriented Used to run business Detailed data Current up to date Isolated Data Repetitive access Clerical User

Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data Ad-hoc access Knowledge User (Manager)

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

OLTP vs Data Warehouse

‡ OLTP
Transaction throughput is the performance metric o Thousands of users o Managed in entirety
o

‡ Data Warehouse
Query throughput is the performance metric o Hundreds of users o Managed by subsets
o

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

To summarize ...

‡ OLTP Systems are used to ³run´ a business

‡ The Data Warehouse helps to ³optimize´ the business
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Building a Data Warehouse

Data Warehouse Lifecycle

‡ ‡ ‡ ‡ ‡

Analysis Design Import data Install front-end tools Test and deploy

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Stage 1: Analysis

‡ Identify:
o Target Questions o Data needs o Timeliness of data o Granularity

‡ ‡ ‡ ‡

Analysis Design Import data Install front-end tools Test and deploy

‡ Create an enterprise-level data dictionary ‡ Dimensional analysis
o Identify

facts and dimensions
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Stage 2: Design

‡ ‡ ‡ ‡ ‡

Star schema Data Transformation Aggregates Pre-calculated Values HW/SW Architecture

‡ Analysis Design ‡ Import data ‡ Install front-end tools ‡ Test and deploy

Dimensional Modeling

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Dimensional Modeling

‡ Fact Table ± The primary table in a dimensional model that is meant to contain measurements of the business. ‡ Dimension Table ± One of a set of companion tables to a fact table. Most dimension tables contain many textual attributes that are the basis for constraining and grouping within data warehouse queries.
SOURCE: Ralph Kimball
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Stage 3: Import Data

‡ Identify data sources ‡ Extract the needed data from existing systems to a data staging area ‡ Transform and Clean the data
o o o o

‡ Analysis ‡ Design Import data ‡ Install front-end tools ‡ Test and deploy

Resolve data type conflicts Resolve naming and key conflicts Remove, correct, or flag bad data Conform Dimensions

‡ Load the data into the warehouse
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Importing Data Into the Warehouse

Operational Systems (source systems)

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Stage 4: Install Front-end Tools

‡ ‡ ‡ ‡

Reporting tools Data mining tools GIS Etc.

‡ Analysis ‡ Design ‡ Import data Install front-end tools ‡ Test and deploy

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Stage 5: Test and Deploy

‡ ‡ ‡ ‡

Usability tests Software installation User training Performance tweaking based on usage

‡ ‡ ‡ ‡

Analysis Design Import data Install front-end tools Test and deploy

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Special Concerns

‡ ‡ ‡ ‡ ‡

Time and expense Managing the complexity Update procedures and maintenance Changes to source systems over time Changes to data needs over time

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Goals of the STORET Central Warehouse

‡ Improved performance and faster data retrieval ‡ Ability to produce larger reports ‡ Ability to provide more data query options ‡ Streamlined application navigation

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Old Web Application Flow

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Central Warehouse Application Flow

Search Criteria Selection

Report Size Feedback/ Report Customization

Report Generation

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Web Application Demo

STORET Central Warehouse:

http://epa.gov/storet/dw_hom e.html

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

STORET Central Warehouse ± Potential Future Enhancements

‡ ‡ ‡ ‡

More query functionality Additional report types Web Services Additional source systems?

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Warehouse Components

SOURCE: Ralph
Kimball
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Data Warehouse Components ± Detailed

SOURCE: Ralph
Kimball
Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Briefing Contents

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Sign up to vote on this title
UsefulNot useful