You are on page 1of 48


| | 


OWhy Warehouse?
OWhat is a Warehouse?
OData Warehouse
OIntroduction to Data
OIntroducing Data
Warehousing and Mining
in your organization
Pee |e elees ...

phe two most important

people in the 21st
Century will be the CFO
(managing the Cash
Flow) and the CIO
(managing the
Information Flow)
| | eeee
e ...

OI canƞt find the data I need

Odata is scattered over the network
Omany versions, subtle differences OI
canƞt get the data I need
Oneed an expert to get the data OI
canƞt understand the data I
Oavailable data poorly documented
OI canƞt use the data I found
Oresults are unexpected
Odata needs to be transformed from
one form to other
 | es 
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?

What product prom- Which customers

-otions have the biggest are most likely to go
impact on revenue? to the competition ?
What impact will
new products/services
have on revenue and


Outting Information technology to help

the knowledge worker make faster and
better decisions
OWhich of my customers are most likely to go
to the competition?
OWhat product promotions have the biggest
impact on revenue?
OHow did the share price of software
companies correlate with profits over last 10

Oàsed to manage and control business

OData is historical or point-in-time
OOptimized for inquiry rather than update
Oàse of the system is loosely defined and
can be ad-hoc
Oàsed by managers and end-users to
understand the business and make

l  |es

OÎ0ƞs: Batch reports

Ohard to find and analyze information
Oinflexible and expensive, reprogram every request
O70ƞs: perminal based DSS and EIS O80ƞs:
Desktop data access and analysis tools
Oquery tools, spreadsheets, GàIs
Oeasy to use, but access only operational db O90ƞs:
Data warehousing with integrated OLA
engines and tools
 e e ses
s ...

OData should be integrated

across the enterprise
OSummary data had a real
value to the organization
OHistorical data held the key to
understanding data over time
OWhat-if capabilities are

 s | es 

A process of transforming
data into information and
making it available to
users in a timely enough
manner to make a

[Forrester Research, April 199Î


 s  | ese

A single, complete and

consistent store of data
obtained from a variety of
different sources made
available to end users in a
what they can understand
and use in a business

[Barry Devlin

| es  e

OHardware -- servers, storage, clients

OWarehouse -- DBMS
OSystems Integration and Consulting
OMarket growing from
O$2B in 1995 to $8B in 1998 [Meta Group
O$1.5B in 1995 to $Î.9B in 1999 [Gartner
e | ese


    ! "# Meta Group

½   ! e 1000 C" es
e e  eses

  '( )

  * # +

, # + , ,-

 ./0 1 # , # , ,-


&$  # , # , , -

) #

, # + ,  ,-

 # + , #+ , ,-

  3 20 #  , # +  , ,-

& 2    $  ! ",,     '  '% $,
04 '  '%+  0,

eses e Ve
#e |ses




- 560 7

- 04 .& '%+  0,

'  '  '  '
' '  '  ' 
Ve #e | $ses

Operabytes -- 10^12 bytes: Walmart -- 24 perabytes

Oetabytes -- 10^15 bytes: Geographic Information

OExabytes -- 10^18 bytes: National Medical Records

OZettabytes -- 10^21 bytes: Weather images

OZottabytes -- 10^24 bytes: Intelligence Agency


| es  %%
 s  ess

Opechnique for assembling and

managing data from various
sources for the purpose of
answering business questions.
phus making decisions that
were not previous possible OA
decision support database
maintained separately from the
organizationƞs operational
| ese

OA data warehouse is a
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 199Î

 l |$  se

ODatabase Systems have been used

traditionally for OLp
Oclerical data processing tasks
Odetailed, up to date data
Ostructured repetitive tasks
Oread/update a few records
Oisolation, recovery and integrity are critical
OWill call these operational systems

e l se"s

ORun the business in real time

OBased on up-to-the-second data
OOptimized to handle large numbers
of simple read/write transactions
OOptimized for fast response to
predefined transactions
Oàsed by people who deal with
customers, products -- clerks,
salespeople etc.
Ophey are increasingly used by

&"les  e l
Data Industry àsage pechnology Volumes
Customer All prack Legacy application, flat Small-medium
File Customer files, main frames
Account Finance Control Legacy applications, Large
Balance account hierarchical databases,
activities mainframe
oint-of- Retail Generate Client/Server, Very Large
Sale data bills, manage relational databases
Call pelecomm- Billing Legacy application, Very Large
Record unications hierarchical database,
roduction Manufact- Control New applications, Medium
Record uring roduction relational databases,
l %e  s.
Application-Orientation Subject-Orientation

8 9(

Loans Customer

Savings Activity
#P s. | ese

OOLp systems are tuned for known transactions

and workloads while workload is not known a
priori in a data warehouse
OSpecial data organization, access methods and
implementation methods are needed to support
data warehouse queries (typically
multidimensional queries)
Oe.g., average amount spent on phone calls
between 9AM-5M in California during the
month of December
#P s. | ese

OComplex Data Warehouse queries would

degrade performance of operational DBMS
OData Warehouse requires historical data; not
typically maintained by operational databases
ODecision support requires consolidation
(aggregation, summarization) of data from
heterogeneous sources: operational DBMS,
external sources, legacy systems ODifferent
sources typically use different representations,
code and format which have to be reconciled

#P s | ese

OOLp OWarehouse (DSS)

OApplication Oriented OSubject Oriented
Oàsed to run business Oàsed to analyze business
ODetailed data OSummarized and refined
OCurrent up to date OSnapshot data
OIsolated Data OIntegrated Data
ORepetitive access OAd-hoc access
OClerical àser OKnowledge àser

#P s | ese

OOLp OData Warehouse

Oerformance Sensitive Oerformance relaxed
OFew Records accessed OLarge volumes
at a time (tens) accessed at a
ORead/àpdate Access OMostly Read (Batch
ONo data redundancy ORedundancy present
ODatabase Size ODatabase Size
100MB -100 GB 100 GB - few terabytes

#P s | ese

OOLp OData Warehouse

Opransaction OQuery throughput is
throughput is the the performance
performance metric metric
Ophousands of users OHundreds of users
OManaged in entirety OManaged by subsets

Ce  ele  

Oload/index time
Oquery response time
Odatabase size
Oratio of raw data size to full
database size (including indices,
temp space, etc.)
Oparallel capabilities
Ocompany DBMS standardization

| ese

)  ›%$* :


:30; . 3  & ;*


| &  
Cle s 

OExtract data from existing operational and

legacy data
OSources of data for the warehouse
OData quality at the sources
OMerging different data sources
OData pransformation
OHow to propagate updates (on the sources) to the
Operabytes of data to be loaded

 s " ls

OCarleton Corporation -- assport

OEvolutionary pechnologies Inc. -- Extract
OInformatica -- OpenBridge
OInformation Builders Inc. -- EDA Copy
Olatinum pechnology -- InfoRefiner
Orism Solutions -- rism Warehouse

OSophisticated transformation
Oàsed for cleaning the quality
of data
OClean data is vital for the
success of the warehouse
OSeshadri, Sheshadri, Sesadri,
Seshadri S., Srinivasan
Seshadri, etc. are the same

OApertus -- Enterprise/Integrator
OVality -- IE
Oostal Soft

e | ese

OHeart of the data warehouse is the data

OSingle version of the truth
OCorporate memory
OData is organized in a way that represents
business -- subject orientation

ese Ps
OComputer Associates -- CA-Ingres
OHewlett-ackard -- Allbase/SQL
OInformix -- Informix, Informix XS
OMicrosoft -- SQL Server
OOracle -- Oracle7, Oracle arallel Server
ORed Brick -- Red Brick Warehouse
OSAS Institute -- SAS
OSoftware AG -- ADABAS
OSybase -- SQL Server, IQ, M
ses e  ee  es
pourists: Browse
information harvested
by farmers

Farmers: Harvest information

from known access paths

Explorers: Seek out the

Organizationally unknown and previously
structured unsuspected rewards hiding in
the detailed data

!" e | ese

 | s

Individually Less

Departmentally History
Structured Normalized

Organizationally More
Structured Data Warehouse

# P: 3 e |
 9( ›:&5 . 3  0 %%  

8 :; &%%0 :30 :; 5  :;

Store atomic Generate SQL Obtain multi-

data in industry execution plans in dimensional
standard Data the OLA engine to reports from the
Warehouse. obtain OLA DSS Client.
e s  # P

OIt is a powerful visualization

OIt provides fast, interactive
response times
OIt is good for analyzing time
OIt can be useful to find
some clusters and outliners
OMany vendors offer OLA
e  ls
OAndyne Computing -- GQL
OBrio -- BrioQuery
OBusiness Objects -- Business Objects
OCognos -- Impromptu
OInformation Builders Inc. -- Focus for Windows
OOracle -- Discoverer2000
Olatinum pechnology -- SQL*Assist, roReports
OowerSoft -- InfoMaker
OSAS Institute -- SAS/Assist
OSoftware AG -- Esperant
OSterling Software -- VISION:Data
# P   &ee
" se"s

OAndyne Computing -- OOracle -- Express

ablo Oilot -- LightShip
OArbor Software -- Essbase Olanning Sciences --
OCognos -- owerlay Gentium
OComshare -- Commander Olatinum pechnology --
OLA rodeaBeacon, Forest &
OHolistic Systems -- Holos prees
OInformation Advantage -- OSAS Institute -- SAS/EIS,
OInformix -- Metacube OSpeedware -- Media
OMicrostrategies --
D #(s  $les  
PC |ses

OInformation Builders -- Focus

OLotus -- Approach
OMicrosoft -- Access, Visual Basic
OMIpI -- SQR/Workbench
OowerSoft -- owerBuilder
OSAS Institute -- SAS/AF

|    s 
ese |

OData Warehousing provides

the Enterprise with a memory

OData Mining provides the

Enterprise with intelligence

e      ...

O Given a database of 100,000 names, which persons are the least

likely to default on their credit cards?
O Which types of transactions are likely to be fraudulent given the
demographics and transactional history of a particular customer? O If
I raise the price of my product by Rs. 2, what is the effect on my ROI?

O If I offer only 2,500 airline miles as an incentive to purchase rather

than 5,000, how many lost responses will result?
O If I emphasize ease-of-use of the product as opposed to its
technical capabilities, what will be the net effect on my revenues? O
Which of my customers are likely to be the most loyal? <

l es

 ; &%%0
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
pelecommunication Call record analysis
pransport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
àtilities ower usage analysis

|     se

Ophe àS Government uses Data Mining to track

OA Supermarket becomes an information broker
OBasketball teams use it to track game strategy
OCross Selling
OWarranty Claims Routing
OHolding on to Good Customers
OWeeding out Bad Customers


e e 

OMarketing efforts based

on the targeting most
likely customers
empowers companies to
achieve their goals with
remarkable precision and
substantially lower costs.

 "es  "  

OAdvances in the following areas are

making data mining deployable:
Odata warehousing
Obetter and more data (i.e., operational,
behavioral, and demographic)
Othe emergence of easily deployed data
mining tools and
Othe advent of new data mining techniques.
ƥ -- Gartner Group