You are on page 1of 169

DATA WAREHOUSING AND DATA MINING

S. Sudarshan Krithi Ramamritham IIT Bombay


sudarsha@cse.iitb.ernet.in krithi@cse.iitb.ernet.in

Course Overview
T The course: what and how T 0. Introduction T I. Data Warehousing T II. Decision Support and OLAP T III. Data Mining T IV. Looking Ahead T De os and La!s
2

0. Introduction
T Data Warehousing" OLAP and data ining: what and wh# $now%& T 'e(ation to OLTP T A case stud# T de os" (a!s

A producer w nts to !now".


Which are our lowest/highest margin customers ? What is the most effective distribution channel? Who are my customers and what products are they buying?

What product prom-otions have the biggest impact on revenue? What impact will new products/services have on revenue and margins?

Which customers are most likely to go to the competition ?

D t # D t ever$w%ere $et ... T I can)t *ind the data I need


S data is scattered o+er the network S an# +ersions" su!t(e di**erences

T I can)t get the data I need


S need an e,pert to get the data

T I can)t understand the data I *ound


S a+ai(a!(e data poor(# docu ented

T I can)t use the data I *ound


S resu(ts are une,pected S data needs to !e trans*or ed *ro one *or to other
5

W% t is

D t W re%ouse&

A sing(e" co p(ete and consistent store o* data o!tained *ro a +ariet# o* di**erent sources ade a+ai(a!(e to end users in a what the# can understand and use in a !usiness conte,t. -.arr# De+(in/
6

W% t re t%e users s $in'...


T Data shou(d !e integrated across the enterprise T Su ar# data has a rea( +a(ue to the organi0ation T 1istorica( data ho(ds the ke# to understanding data o+er ti e T What2i* capa!i(ities are re3uired
7

W% t is D t W re%ousin'&
Information A process o* trans*or ing data into in*or ation and aking it a+ai(a!(e to users in a ti e(# enough anner to ake a di**erence
-4orrester 'esearch" Apri( 5667/
8

Data

Evo(ution
T 70)s: .atch reports
S hard to *ind and ana(#0e in*or ation S in*(e,i!(e and e,pensi+e" reprogra e+er# new re3uest

T 80)s: Ter ina(2!ased DSS and 9IS $e,ecuti+e in*or ation s#ste s%
S sti(( in*(e,i!(e" not integrated with desktop too(s

T :0)s: Desktop data access and ana(#sis too(s


S 3uer# too(s" spreadsheets" ;<Is S easier to use" !ut on(# access operationa( data!ases

T 60)s: Data warehousing with integrated OLAP engines and too(s 9

W re%ouses re )er$ * r'e D t + ses


35% 30% Res,$ndents 25% 20% 15% 10% 5% 0% 5GB 10-19GB 5-9GB 50-99GB 250-499GB 500GB-1 B
10

!nitia" #r$%ected 2&9'


S$urce( )* + Gr$u,- !nc.

20-49GB

100-249GB

)er$ * r'e D t , ses


T Tera!#tes 22 50=5> !#tes:Wa( art 22 >A Tera!#tes T Peta!#tes 22 50=5? !#tes:;eographic In*or ation S#ste s T 9,a!#tes 22 50=5: !#tes: Bationa( Medica( 'ecords T @etta!#tes 22 50=>5 !#tes: T @otta!#tes 22 50=>A !#tes: Weather i ages Inte((igence Agenc# Videos
11

D t W re%ousin' -It is process


T Techni3ue *or asse !(ing and anaging data *ro +arious sources *or the purpose o* answering !usiness 3uestions. Thus aking decisions that were not pre+ious possi!(e T A decision support data!ase aintained separate(# *ro the organi0ation)s operationa( data!ase 12

D t W re%ouse
T A data warehouse is a
S su!Cect2oriented S integrated S ti e2+ar#ing S non2+o(ati(e

co((ection o* data that is used pri ari(# in organi0ationa( decision aking.


22 .i(( In on" .ui(ding the Data Warehouse 5667
13

E.p(orers# / r0ers nd Tourists


$ourists %rowse information harvested by farmers

"armers #arvest information from known access paths Explorers !eek out the unknown and previously unsuspected rewards hiding in the detailed data
14

D t W re%ouse Arc%itecture
Re"ati$na" .atabases 1,timi2ed 3$ader *R# S6stems

*4tracti$n 5"eansin0 .ata /areh$use *n0ine +na"62e &uer6

#urchased .ata

3e0ac6 .ata

)etadata Re,$sit$r6
15

D t W re%ouse 1or Decision Support 2 O*A3


T Putting In*or ation techno(og# to he(p the know(edge worker ake *aster and !etter decisions
S Which o* # custo ers are to the co petition& ost (ike(# to go

S What product pro otions ha+e the !iggest i pact on re+enue& S 1ow did the share price o* so*tware co panies corre(ate with pro*its o+er (ast 50 #ears&
16

Decision Support
T <sed to anage and contro( !usiness T Data is historica( or point2in2ti e T Opti i0ed *or in3uir# rather than update T <se o* the s#ste can !e ad2hoc is (oose(# de*ined and

T <sed !# anagers and end2users to understand the !usiness and ake Cudge ents
17

D t Minin' wor!s wit% W re%ouse D t


T Data Warehousing pro+ides the 9nterprise with a e or#

T Data Mining pro+ides the 9nterprise with inte((igence


18

We w nt to !now ...
T ;i+en a data!ase o* 500"000 na es" which persons are the (east (ike(# to de*au(t on their credit cards& T Which t#pes o* transactions are (ike(# to !e *raudu(ent gi+en the de ographics and transactiona( histor# o* a particu(ar custo er& T I* I raise the price o* e**ect on # 'OI& # product !# 's. >" what is the

T I* I o**er on(# >"?00 air(ine i(es as an incenti+e to purchase rather than ?"000" how an# (ost responses wi(( resu(t& T I* I e phasi0e ease2o*2use o* the product as opposed to its technica( capa!i(ities" what wi(( !e the net e**ect on # re+enues& T Which o* # custo ers are (ike(# to !e the ost (o#a(&
19

Data Mining helps extract such information

App(ic tion Are s


Industry Application "inance &redit &ard 'nalysis Insurance &laims( "raud 'nalysis $elecommunication &all record analysis $ransport )ogistics management &onsumer goods promotion analysis Data !ervice providers *alue added data +tilities ,ower usage analysis
20

D t Minin' in Use
T The <S ;o+ern ent uses Data Mining to track *raud T A Super arket !eco es an in*or ation !roker T .asket!a(( tea s use it to track ga e strateg# T Dross Se((ing T Warrant# D(ai s 'outing T 1o(ding on to ;ood Dusto ers T Weeding out .ad Dusto ers

21

W% t 0 !es d t 0inin' possi+(e&


T Ad+ances in the *o((owing areas are aking data ining dep(o#a!(e:
S data warehousing S !etter and ore data $i.e." operationa(" !eha+iora(" and de ographic% S the e ergence o* easi(# dep(o#ed data ining too(s and S the ad+ent o* new data ining techni3ues.
E 22 ;artner ;roup
22

W%$ Sep r te D t W re%ouse&


T Performance
S Op d!s designed F tuned *or known t,s F work(oads. S Do p(e, OLAP 3ueries wou(d degrade per*. *or op t,s. S Specia( data organi0ation" access F i p(e entation ethods needed *or u(tidi ensiona( +iews F 3ueries.

T Function
S Missing data: Decision support re3uires historica( data" which op d!s do not t#pica((# aintain. S Data conso(idation: Decision support re3uires conso(idation $aggregation" su ari0ation% o* data *ro an# heterogeneous sources: op d!s" e,terna( sources. S Data 3ua(it#: Di**erent sources t#pica((# use inconsistent data representations" codes" and *or ats which ha+e to !e 23 reconci(ed.

W% t re Oper tion ( S$ste0s&


T The# are OLTP s#ste s T 'un ission critica( app(ications T Beed to work with stringent per*or ance re3uire ents *or routine tasks T <sed to run a !usinessG
24

RD,MS used 1or O*T3


T Data!ase S#ste s ha+e !een used traditiona((# *or OLTP
S S S S S c(erica( data processing tasks detai(ed" up to date data structured repetiti+e tasks readHupdate a *ew records iso(ation" reco+er# and integrit# are critica(
25

Oper tion ( S$ste0s


T 'un the !usiness in rea( ti e T .ased on up2to2the2second data T Opti i0ed to hand(e (arge nu !ers o* si p(e readHwrite transactions T Opti i0ed *or *ast response to prede*ined transactions T <sed !# peop(e who dea( with custo ers" products 22 c(erks" sa(espeop(e etc. T The# are increasing(# used !# custo ers
26

E. 0p(es o1 Oper tion ( D t


Data Industry+sage $echnology *olumes
$rack )egacy application( flat !mall-medium &ustomer files( main frames Details 'ccount "inance &ontrol )egacy applications( )arge %alance account hierarchical databases( activities mainframe ,oint-of- -etail .enerate E-,( &lient/!erver( *ery )arge !ale data bills( manage relational databases stock &all $elecomm- %illing )egacy application( *ery )arge -ecord unications hierarchical database( mainframe ,roduction/anufact- &ontrol E-,( /edium -ecord uring ,roduction relational databases( '!/011
27

&ustomer 'll "ile

So# w% t4s di11erent&

App(ic tion-Orient tion vs. Su+5ect-Orient tion


'pplication-2rientation !ub3ect-2rientation

Operation al Database
)oans &redit &ard $rust !avings &ustomer

Data Warehouse

*endor ,roduct 'ctivity


29

O*T3 vs. D t W re%ouse


T OLTP s#ste s are tuned *or known transactions and work(oads whi(e work(oad is not known a priori in a data warehouse T Specia( data organi0ation" access ethods and i p(e entation ethods are needed to support data warehouse 3ueries $t#pica((# u(tidi ensiona( 3ueries%
S e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December
30

O*T3 vs D t W re%ouse
T OLTP
S App(ication Oriented S <sed to run !usiness S Detai(ed data S Durrent up to date S Iso(ated Data S 'epetiti+e access S D(erica( <ser

T Warehouse $DSS%
S Su!Cect Oriented S <sed to ana(#0e !usiness S Su ari0ed and re*ined S Snapshot data S Integrated Data S Ad2hoc access S Inow(edge <ser $Manager%
31

O*T3 vs D t W re%ouse
T OLTP
S Per*or ance Sensiti+e S 4ew 'ecords accessed at a ti e $tens% S 'eadH<pdate Access S Bo data redundanc# S Data!ase Si0e 500M. 2500 ;.

T Data Warehouse
S Per*or ance re(a,ed S Large +o(u es accessed at a ti e$ i((ions% S Most(# 'ead $.atch <pdate% S 'edundanc# present S Data!ase Si0e 500 ;. 2 *ew tera!#tes

32

O*T3 vs D t W re%ouse
T OLTP
S Transaction throughput is the per*or ance etric S Thousands o* users S Managed in entiret#

T Data Warehouse
S Juer# throughput is the per*or ance etric S 1undreds o* users S Managed !# su!sets

33

To su00 ri6e ...


T OLTP S#ste s are used to run a !usiness

T The Data Warehouse he(ps to optimi!e the !usiness


34

W%$ Now&
T T T T T T Data is !eing produced 9'P pro+ides c(ean data The co puting power is a+ai(a!(e The co puting power is a**orda!(e The co petiti+e pressures are strong Do ercia( products are a+ai(a!(e

35

M$t%s surroundin' O*A3 Servers nd D t M rts


T Data arts and OLAP ser+ers are depart enta( so(utions supporting a hand*u( o* users T Mi((ion do((ar assi+e(# para((e( hardware is needed to de(i+er *ast ti e *or co p(e, 3ueries T OLAP ser+ers re3uire indices assi+e and unwie(d#

T Do p(e, OLAP 3ueries c(og the network with data T Data warehouses e**ecti+e ust !e at (east 500 ;. to !e

K Source 22 Ar!or So*tware 1o e Page


36

W (7M rt C se Stud$
T 4ounded !# Sa Wa(ton T One the (argest Super Market Dhains in the <S T Wa(LMart: >000M 'etai( Stores T SAMNs D(u!s 500MWho(esa(ers Stores
R This case stud# is *ro 4e(ipe Darino)s $BD' Teradata% presentation ade at Stan*ord Data!ase Se inar
37

O(d Ret i( 3 r di'0


T Wa(LMart T Supp(iers
S Accept Orders S Pro ote Products S Pro+ide specia( Incenti+es S Monitor and Track The Incenti+es S .i(( and Do((ect 'ecei+a!(es S 9sti ate 'etai(er De ands
38

S In+entor# Manage ent S Merchandise Accounts Pa#a!(e S Purchasing S Supp(ier Pro otions: Bationa(" 'egion" Store Le+e(

New 89ust-In-Ti0e: Ret i( 3 r di'0


T Bo ore dea(s T She(*2Pass Through $POS App(ication%
S One <nit Price
R Supp(iers paid once a week on ADT<AL ite s so(d

S Wa(LMart Manager
R Dai(# In+entor# 'estock R Supp(iers $so eti es Sa eDa#% ship to Wa(LMart

T Warehouse2Pass Through
S Stock so e Large Ite s
R De(i+er# R Supp(ier)s Trucks a# co e *ro supp(ier

S Distri!ution Denter
erchandise un(oaded direct(# onto Wa(LMart

39

W (7M rt S$ste0
"# $% &aw Dis'( )** +*** Pentium ,P-s . 5 %illions /5 wee's 05 1uarters2 ,urrent Apps3 )5 Million 4ew Apps3 +** Million 5 T Bu !er o* <sers: $housands T Bu !er o* Jueries: /*,*** per wee' T BD' ?500M 67 BodesO T Bu !er o* 'ows: T 1istorica( Data: T Bew Dai(# Vo(u e:
40

Course Overview
T 0. Introduction T I. Data Warehousing T II. Decision Support and OLAP T III. Data Mining T IV. Looking Ahead T De os and La!s
41

I. D t W re%ouses; Arc%itecture# Desi'n 2 Construction


T T T T T DW Architecture Loading" re*reshing StructuringHMode(ing DWs and Data Marts Juer# Processing

T de os" (a!s

42

D t W re%ouse Arc%itecture
Re"ati$na" .atabases
1,timi2ed 3$ader

*R# S6stems

*4tracti$n 5"eansin0 .ata /areh$use *n0ine +na"62e &uer6

#urchased .ata 3e0ac6 .ata

)etadata Re,$sit$r6
43

Co0ponents o1 t%e W re%ouse


T T T T Data 9,traction and Loading The Warehouse Ana(#0e and Juer# 22 OLAP Too(s Metadata

T Data Mining too(s

44

*o din' t%e W re%ouse

5"eanin0 the data be7$re it is "$aded

Source D t
Operational/ Source Data Sequential Legacy Relational External

T T#pica((# host !ased" (egac# app(ications


S Dusto i0ed app(ications" DO.OL" P;L" A;L

T Point o* Dontact De+ices


S POS" ATM" Da(( switches

T 9,terna( Sources
S Bie(sen)s" Ac,io " DMI9" Vendors" Partners
46

D t <u (it$ - T%e Re (it$


T Te pting to think creating a data warehouse is si p(# e,tracting operationa( data and entering into a data warehouse T Bothing cou(d !e *arther *ro the truth T Warehouse data co es *ro disparate 3uestiona!(e sources

47

D t <u (it$ - T%e Re (it$


T Legac# s#ste s no (onger docu ented T Outside sources with 3uestiona!(e 3ua(it# procedures T Production s#ste s with no !ui(t in integrit# checks and no integration
S Operationa( s#ste s are usua((# designed to so(+e a speci*ic !usiness pro!(e and are rare(# de+e(oped to a a corporate p(an
R QAnd get it done 3uick(#" we do not ha+e ti e to worr# a!out corporate standards...R
48

D t Inte'r tion Across Sources


!avings )oans $rust &redit card

!ame data different name

Different data !ame name

Data found here nowhere else

Different keys same data

49

D t Tr ns1or0 tion E. 0p(e


Data Warehouse
encoding
appl appl appl appl appl appl appl appl appl appl appl appl ' - m(f % - 4(1 & - x(y D - male( female ' - pipeline - cm % - pipeline - in & - pipeline - feet D - pipeline - yds ' - balance % - bal & - currbal D - balcurr
50

field

unit

D t Inte'rit$ 3ro+(e0s
T Sa e person" di**erent spe((ings S Agarwa(" Agrawa(" Aggarwa( etc... T Mu(tip(e wa#s to denote co pan# na e S Persistent S#ste s" PSPL" Persistent P+t. LTD. T <se o* di**erent na es S u !ai" !o !a# T Di**erent account nu !ers generated !# di**erent app(ications *or the sa e custo er T 'e3uired *ie(ds (e*t !(ank T In+a(id product codes co((ected at point o* sa(e S anua( entr# (eads to istakes S Qin case o* a pro!(e use 6666666R

51

D t Tr ns1or0 tion Ter0s


T T T T T 9,tracting Donditioning Scru!!ing Merging 1ouseho(ding T T T T T 9nrich ent Scoring Loading Va(idating De(ta <pdating

52

D t Tr ns1or0 tion Ter0s


T 9,tracting
S Dapture o* data *ro Qas isR status operationa( source in

S Sources *or data genera((# in (egac# ain*ra es in VSAM" IMS" IDMS" D.>O ore data toda# in re(ationa( data!ases on <ni,

T Donditioning
S The con+ersion o* data t#pes *ro the source to the target data store $warehouse% 22 a(wa#s a re(ationa( data!ase
53

D t Tr ns1or0 tion Ter0s


T 1ouseho(ding
S Identi*#ing a(( e !ers o* a househo(d $(i+ing at the sa e address% S 9nsures on(# one ai( is sent to a househo(d S Dan resu(t in su!stantia( sa+ings: 5 (akh cata(ogues at 's. ?0 each costs 's. ?0 (akhs. A >S sa+ings wou(d sa+e 's. 5 (akh.
54

D t Tr ns1or0 tion Ter0s


T 9nrich ent
S .ring data *ro e,terna( sources to aug entHenrich operationa( data. Data sources inc(ude Dunn and .radstreet" A. D. Bie(sen" DMI9" IM'A etc...

T Scoring
S co putation o* a pro!a!i(it# o* an e+ent. e.g..." chance that a custo er wi(( de*ect to ATFT *ro MDI" chance that a custo er is (ike(# to !u# a new product
55

*o ds
T A*ter e,tracting" scru!!ing" c(eaning" +a(idating etc. need to (oad the data into the warehouse T Issues
S huge +o(u es o* data to !e (oaded S s a(( ti e window a+ai(a!(e when warehouse can !e taken o** (ine $usua((# nights% S when to !ui(d inde, and su ar# ta!(es S a((ow s#ste ad inistrators to onitor" cance(" resu e" change (oad rates S 'eco+er grace*u((# 22 restart a*ter *ai(ure *ro where #ou were and without (oss o* data integrit#
56

*o d Tec%ni=ues
T <se SJL to append or insert new data
S record at a ti e inter*ace S wi(( (ead to rando disk IHO)s

T <se !atch (oad uti(it#

57

*o d T .ono0$
T Incre enta( +ersus 4u(( (oads T On(ine +ersus O**(ine (oads

58

Re1res%
T Propagate updates on source data to the warehouse T Issues:
S when to re*resh S how to re*resh 22 re*resh techni3ues

59

W%en to Re1res%&
T periodica((# $e.g." e+er# night" e+er# week% or a*ter signi*icant e+ents T on e+er# update: not warranted un(ess warehouse data re3uire current data $up to the inute stock 3uotes% T re*resh po(ic# set !# ad inistrator !ased on user needs and tra**ic T possi!(# di**erent po(icies *or di**erent sources
60

Re1res% Tec%ni=ues
T 4u(( 9,tract *ro !ase ta!(es

S read entire source ta!(e: too e,pensi+e S a#!e the on(# choice *or (egac# s#ste s

61

How To Detect C% n'es


T Dreate a snapshot (og ta!(e to record ids o* updated rows o* source data and ti esta p T Detect changes !#:
S De*ining a*ter row triggers to update snapshot (og when source ta!(e changes S <sing regu(ar transaction (og to detect changes to source data
62

D t E.tr ction nd C(e nsin'


T 9,tract data *ro e,isting operationa( and (egac# data T Issues:
Sources o* data *or the warehouse Data 3ua(it# at the sources Merging di**erent data sources Data Trans*or ation 1ow to propagate updates $on the sources% to the warehouse S Tera!#tes o* data to !e (oaded S S S S S
63

Scru++in' D t
T Sophisticated trans*or ation too(s. T <sed *or c(eaning the 3ua(it# o* data T D(ean data is +ita( *or the success o* the warehouse T 9,a p(e
S Seshadri" Sheshadri" Sesadri" Seshadri S." Srini+asan Seshadri" etc. are the sa e person
64

Scru++in' Too(s
T Apertus 22 9nterpriseHIntegrator T Va(it# 22 IP9 T Posta( So*t

65

Structurin'>Mode(in' Issues

D t -- He rt o1 t%e D t W re%ouse
T 1eart o* the data warehouse is the data itse(*G T Sing(e +ersion o* the truth T Dorporate e or# T Data is organi0ed in a wa# that represents !usiness 22 su!Cect orientation
67

D t W re%ouse Structure
T Su!Cect Orientation 22 custo er" product" po(ic#" account etc... A su!Cect a# !e i p(e ented as a set o* re(ated ta!(es. 9.g." custo er a# !e *i+e ta!(es

68

D t W re%ouse Structure
S !ase custo er $56:?2:8%
R custid" *ro Time is S !ase custo part of R custid" *ro key of e p(o#er each table S date" to date" na e" phone" do!

er $56::260%
date" to date" na e" credit rating"

custo er acti+it# $56:72:6% 22 onth(# su ar# S custo er acti+it# detai( $56:82:6%


R custid" acti+it# date" a ount" c(erk id" order no

S custo er acti+it# detai( $5660265%


R custid" acti+it# date" a ount" (ine ite no" order no
69

D t Gr nu( rit$ in W re%ouse


T Su ari0ed data stored

S reduce storage costs S reduce cpu usage S increases per*or ance since s a((er nu !er o* records to !e processed S design around traditiona( high (e+e( reporting needs S tradeo** with +o(u e o* data to !e stored and detai(ed usage o* data
70

Gr nu( rit$ in W re%ouse


T Dan not answer so e 3uestions with su ari0ed data
S Did Anand ca(( Seshadri (ast onth& Bot possi!(e to answer i* tota( duration o* ca((s !# Anand o+er a onth is on(# aintained and indi+idua( ca(( detai(s are not.

T Detai(ed data too +o(u inous


71

Gr nu( rit$ in W re%ouse


T Tradeo** is to ha+e dua( (e+e( o* granu(arit#
S Store su ar# data on disks
R 6?S o* DSS processing done against this data

S Store detai( on tapes


R ?S o* DSS processing against this data

72

)ertic ( 3 rtitionin'
'cct6 7o 7ame %alanceDate 2pened Interest -ate 'ddress

"re5uently accessed
'cct6 %alance 7o 'cct6 7o 7ame Date 2pened

-arely accessed
Interest -ate 'ddress

!maller table and so less I/2

73

Derived D t
T Introduction o* deri+ed $ca(cu(ated data% a# o*ten he(p T 1a+e seen this in the conte,t o* dua( (e+e(s o* granu(arit# T Dan keep au,i(iar# +iews and inde,es to speed up 3uer# processing

74

Sc%e0 Desi'n
T Data!ase organi0ation
S ust (ook (ike !usiness S ust !e recogni0a!(e !# !usiness user S approacha!(e !# !usiness user S Must !e simple

T Sche a T#pes
S Star Sche a S 4act Donste((ation Sche a S Snow*(ake sche a
75

Di0ension T +(es
T Di ension ta!(es
S De*ine !usiness in ter *a i(iar to users S Wide rows with (ots o* S S a(( ta!(es $a!out a S Toined to *act ta!(e !# S hea+i(# inde,ed S t#pica( di ensions s a(read# descripti+e te,t i((ion rows% a *oreign ke#

R ti e periods" geographic region $ arkets" cities%" products" custo ers" sa(esperson" etc.
76

/ ct T +(e
T Dentra( ta!(e
S ost(# raw nu eric ite s S narrow rows" a *ew co(u ns at ost S (arge nu !er o* rows $ i((ions to a !i((ion% S Access +ia di ensions

77

St r Sc%e0
T A sing(e *act ta!(e and *or each di ension one di ension ta!(e T Does not capture hierarchies direct(#
T i
m
date( custno( prodno( cityname( 666

e c u s t

f a c t

p r o d c i t y

78

Snow1( !e sc%e0
T 'epresent di ensiona( hierarch# direct(# !# nor a(i0ing ta!(es. T 9as# to aintain and sa+es storage
T i
m
date( custno( prodno( cityname( 666

e c u s t

f a c t

p r o d c i t y
r e g i o 79 n

/ ct Conste(( tion
T 4act Donste((ation
S Mu(tip(e *act ta!(es that share an# di ension ta!(es S .ooking and Dheckout a# share an# di ension ta!(es in the hote( industr#
Hotels

Booking Checkout
Customer

Promotion

Travel Agents

Room Type
80

De-nor0 (i6 tion


T Bor a(i0ation in a data warehouse a# (ead to (ots o* s a(( ta!(es T Dan (ead to e,cessi+e IHO)s since an# ta!(es ha+e to !e accessed T De2nor a(i0ation is the answer especia((# since updates are rare

81

Cre tin' Arr $s


T Man# ti es each occurrence o* a se3uence o* data is in a di**erent ph#sica( (ocation T .ene*icia( to co((ect a(( occurrences together and store as an arra# in a sing(e row T Makes sense on(# i* there are a sta!(e nu !er o* occurrences which are accessed together T In a data warehouse" such situations arise natura((# due to ti e !ased orientation
S can create an arra# !# onth
82

Se(ective Redund nc$


T Description o* an ite can !e stored redundant(# with order ta!(e 22 ost o*ten ite description is a(so accessed with order ta!(e T <pdates ha+e to !e care*u(

83

3 rtitionin'
T .reaking data into se+era( ph#sica( units that can !e hand(ed separate(# T Bot a 3uestion o* whether to do it in data warehouses !ut how to do it T ;ranu(arit# and partitioning are ke# to e**ecti+e i p(e entation o* a warehouse
84

W%$ 3 rtition&
T 4(e,i!i(it# in anaging data T S a((er ph#sica( units a((ow
S S S S S S eas# restructuring *ree inde,ing se3uentia( scans i* needed eas# reorgani0ation eas# reco+er# eas# onitoring
85

Criterion 1or 3 rtitionin'


T T#pica((# partitioned !#
S S S S S date (ine o* !usiness geograph# organi0ationa( unit an# co !ination o* a!o+e

86

W%ere to 3 rtition&
T App(ication (e+e( or D.MS (e+e( T Makes sense to partition at app(ication (e+e(
S A((ows di**erent de*inition *or each #ear
R I portant since warehouse spans an# #ears and as !usiness e+o(+es de*inition changes

S A((ows data to !e o+ed !etween processing co p(e,es easi(#


87

D t W re%ouse vs. D t M rts

/hat c$mes 7irst

/ro0 t%e D t W re%ouse to D t M rts


Information
Individually !tructured Departmentally !tructured

)ess #istory 7ormali8ed Detailed /ore

2rgani8ationally Data Warehouse !tructured

Data
89

D t W re%ouse nd D t M rts
2)', Data /art )ightly summari8ed Departmentally structured

2rgani8ationally structured 'tomic Detailed Data Warehouse Data


90

C% r cteristics o1 t%e Dep rt0ent ( D t M rt


OLAP S a(( 4(e,i!(e Dusto i0ed !# Depart ent T Source is depart enta((# structured data warehouse T T T T
91

Tec%ni=ues 1or Cre tin' Dep rt0ent ( D t M rt


T OLAP
!ales "inance /ktg6

T Su!set T Su ari0ed T Superset T Inde,ed T Arra#ed


92

D t M rt Centric
Data !ources

Data /arts

Data Warehouse

93

3ro+(e0s wit% D t M rt Centric So(ution

If you end up creating multiple warehouses( integrating them is a problem


94

True W re%ouse
Data !ources

Data Warehouse

Data /arts

95

<uer$ 3rocessin'

T Inde,ing

T Pre co puted +iewsHaggregates T SJL e,tensions


96

Inde.in' Tec%ni=ues
T 9,p(oiting inde,es to reduce scanning o* data is o* crucia( i portance T .it ap Inde,es T Toin Inde,es T Other Issues
S Te,t inde,ing S Para((e(i0ing and se3uencing o* inde, !ui(ds and incre enta( updates
97

Inde.in' Tec%ni=ues
T .it ap inde,:
S A co((ection o* !it aps 22 one *or each distinct +a(ue o* the co(u n S 9ach !it ap has B !its where B is the nu !er o* rows in the ta!(e S A !it corresponding to a +a(ue + *or a row r is set i* and on(# i* r has the +a(ue *or the inde,ed attri!ute

98

,itM p Inde.es
T An a(ternati+e representation o* 'ID2(ist T Specia((# ad+antageous *or (ow2cardina(it# do ains T 'epresent each row o* a ta!(e !# a !it and the ta!(e as a !it +ector T There is a distinct !it +ector .+ *or each +a(ue + *or the do ain T 9,a p(e: the attri!ute se, has +a(ues M and 4. A ta!(e o* 500 i((ion peop(e needs > (ists o* 500 i((ion !its
99

,it0 p Inde.

/ " " / " "

9 9 7 7 9 7

1 4 4 1 4 4

4 4 1 1 4 1

1 4 1 1 4 1

Customer

Query : select * from customer here 100 gender ! "#$ and %ote ! "&$

,it M p Inde.
Base Table Region Index

5ust 51 52 53 54 55 5' 5:

Re0i$n Ratin0 8 9 S ) / 3 / 9 S 3 / 3 8 9

R$; !. 8 S * / 1 1 0 0 0 2 0 1 0 0 3 0 0 0 1 4 0 0 0 1 5 0 1 0 0 ' 0 0 0 1 : 1 0 0 0

R$; !. 9 ) 3 1 1 0 0 2 0 1 0 3 0 0 0 4 0 0 0 5 0 1 0 ' 0 0 0 : 1 0 0

Rating Index

Customers where

Region = W

And

Rating = M
101

,itM p Inde.es
T Do parison" Coin and aggregation operations are reduced to !it arith etic with dra atic i pro+e ent in processing ti e T Signi*icant reduction in space and IHO $P0:5% T Adapted *or higher cardina(it# do ains as we((. T Do pression $e.g." run2(ength encoding% e,p(oited T Products that support !it aps: Mode( >0A" TargetInde, $'ed!rick%" IJ $S#!ase%" Orac(e 8.P
102

9oin Inde.es
T Pre2co puted Coins T A Coin inde, !etween a *act ta!(e and a di ension ta!(e corre(ates a di ension tup(e with the *act tup(es that ha+e the sa e +a(ue on the co on di ensiona( attri!ute
S e.g." a Coin inde, on cit6 di ension o* calls *act ta!(e S corre(ates *or each cit# the ca((s $in the calls ta!(e% *ro that cit#
103

9oin Inde.es
T Toin inde,es can a(so span di ension ta!(es u(tip(e

S e.g." a Coin inde, on cit6 and time di ension o* calls *act ta!(e

104

St r 9oin 3rocessin'
T <se Coin inde,es to Coin di ension and *act ta!(e
Calls C+T Time Location Plan C+T+L

C+T+L +P
105

Opti0i6ed St r 9oin 3rocessin'

Time Location Plan

Apply "elections Calls irtual Cross Product of T! L and P

106

,it0 pped 9oin 3rocessin'

Bitmaps Time Location Plan Calls Calls Calls


# $ #

$ $ #

'7D

# # $

107

Inte((i'ent Sc n
T Pigg#!ack u(tip(e scans o* a re(ation $'ed!rick%
S pigg#!acking a(so done i* second scan starts a (itt(e whi(e a*ter the *irst scan

108

3 r ((e( <uer$ 3rocessin'


T Three *or s o* para((e(is
S Independent S Pipe(ined S Partitioned and Qpartition and rep(icateR

T Deterrents to para((e(is
S startup S co unication

109

3 r ((e( <uer$ 3rocessin'


T Partitioned Data
S Para((e( scans S Uie(ds IHO para((e(is

T Para((e( a(gorith s *or re(ationa( operators


S Toins" Aggregates" Sort

T Para((e( <ti(ities
S Load" Archi+e" <pdate" Parse" Dheckpoint" 'eco+er#

T Para((e( Juer# Opti i0ation

110

3re-co0puted A''re' tes


T Ieep aggregated data *or e**icienc# $pre2co puted 3ueries% T Juestions
S Which aggregates to co pute& S 1ow to update aggregates& S 1ow to use pre2co puted aggregates in 3ueries&

111

3re-co0puted A''re' tes


T Aggregated ta!(e can !e !# the
S warehouse ser+er S idd(e tier S c(ient app(ications

aintained

T Pre2co puted aggregates 22 specia( case o* ateria(i0ed +iews 22 sa e 3uestions and issues re ain
112

S<* E.tensions
T 9,tended *a i(# o* aggregate *unctions
S rank $top 50 custo ers% S percenti(e $top P0S o* custo ers% S edian" ode S O!Cect 'e(ationa( S#ste s a((ow addition o* new aggregate *unctions

113

S<* E.tensions
T 'eporting *eatures
S running tota(" cu u(ati+e tota(s

T Du!e operator
S group !# on a(( su!sets o* a set o* attri!utes $ onth"cit#% S redundant scan and sorting o* data can !e a+oided

114

Red ,ric! % s E.tended set o1 A''re' tes


T Select month, dollars, cume(dollars) as run_dollars, weight, cume(weight) as run_weights from sales, market, product, period t where year = 1993 and product like olum!ian"# and city like San $r"# order !y t%perkey

115

RIS<* 8Red ,ric! S$ste0s: E.tensions


T Aggregates
S S S S S S D<M9 MOVIB;AV; MOVIB;S<M 'ABI T9'TIL9 'ATIOTO'9PO'T

T Da(cu(ating 'ow Su!tota(s


S .'9AI .U

T Sophisticated Date Ti e Support


S DAT9DI44

T <sing Su!Jueries in ca(cu(ations

116

Usin' Su+<ueries in C (cu( tions


select product, dollars as &un9'_sales, (select sum(s1%dollars) from market mi, product pi, period, ti, sales si where pi%product = product%product and ti%year = period%year and mi%city = market%city) as total9'_sales, 1(( ) dollars* (select sum(s1%dollars) from market mi, product pi, period, ti, sales si where pi%product = product%product and ti%year = period%year and mi%city = market%city) as percent_of_yr from market, product, period, sales where year = 199' and month = +une# and city like ,hmed"# order !y product-

117

Course Overview
T The course: what and how T 0. Introduction T I. Data Warehousing T II. Decision Support and OLAP T III. Data Mining T IV. Looking Ahead T De os and La!s
118

II. On-*ine An ($tic ( 3rocessin' 8O*A3:

)akin0 .ecisi$n Su,,$rt #$ssib"e

*i0it tions o1 S<* QA 4resh an in .usiness needs a Ph.D. in SJLR

22 'a(ph Ii !a((
120

T$pic ( O*A3 <ueries


T Write a u(ti2ta!(e Coin to co pare sa(es *or each product (ine UTD this #ear +s. (ast #ear. T 'epeat the a!o+e process to *ind the top ? product contri!utors to argin. T 'epeat the a!o+e process to *ind the sa(es o* a product (ine to new +s. e,isting custo ers. T 'epeat the a!o+e process to *ind the custo ers that ha+e had negati+e sa(es growth.
121

W% t Is O*A3&
T On(ine Ana(#tica( Processing 2 coined !# 94 Dodd in 566A paper contracted !# Ar!or So*twareL ;enera((# s#non# ous with ear(ier ter s such as Decisions Support" .usiness Inte((igence" 9,ecuti+e In*or ation S#ste OLAP V Mu(tidi ensiona( Data!ase MOLAP: Mu(tidi ensiona( OLAP $Ar!or 9ss!ase" Orac(e 9,press% 'OLAP: 'e(ationa( OLAP $In*or i, MetaDu!e" Microstrateg# DSS Agent%
* Reference: http://www.arborsoft.com/essbase/wht_ppr/co !"#.htm$ 122

T T T

T%e O*A3 M r!et


T 'apid growth in the enterprise
S 566?: W800 Mi((ion S 5668: W>.5 .i((ion

arket

T Signi*icant conso(idation acti+it# a ong aCor D.MS +endors


S S S S S 50H6A: S#!ase ac3uires 9,pressWa# 8H6?: Orac(e ac3uires 9,press 55H6?: In*or i, ac3uires Metacu!e 5H68: Ar!or partners up with I.M 50H67: Microso*t ac3uires Panora a

T 'esu(t: OLAP shi*ted *ro s a(( +ertica( niche to ainstrea D.MS categor#
123

Stren't%s o1 O*A3
T It is a power*u( +isua(i0ation paradig T It pro+ides *ast" interacti+e response ti es T It is good *or ana(#0ing ti e series T It can !e use*u( to *ind so e c(usters and out(iers T Man# +endors o**er OLAP too(s
124

O*A3 Is /ASMI
T T T T T 4ast Ana(#sis Shared Mu(tidi ensiona( In*or ation

%&'e$ (en se) R&char #reath * !he "+,( Report


125

Mu(ti-di0ension ( D t
T Q1e#XI so(d W500M worth o* goodsR
2 1 % /.&ce #o$a -&$0 #ream !oothpaste 1oap 1 2 34 5 6 7

R e' &o n

3&mens&ons: (ro .ct) Re'&on) !&me 4&erarch&ca$ s.mmar&5at&on paths


(ro .ct 6n .str7 Re'&on #o.ntr7 !&me 8ear

(ro .ct

#ate'or7

Re'&on

9.arter

(ro .ct

#&t7 "ff&ce

-onth

2ee0

-onth

3a7126

D t Cu+e * ttice
T Du!e (attice S A.D A. AD .D A . D none T Dan ateria(i0e so e group!#s" co pute others on de and T Juestion: which group!#s to ateria(0e& T Juestion: what indices to create T Juestion: how to organi0e data $chunks" etc%
127

)isu (i6in' Nei'%+ors is si0p(er


4 'pr /ay @ un @ ul 'ug !ep 2ct 7ov Dec @ an "eb /ar : ; 0 < = > ?
/onth 'pr 'pr 'pr 'pr 'pr 'pr 'pr 'pr /ay /ay /ay /ay /ay /ay /ay /ay @ un @ un !tore 4 : ; 0 < = > ? 4 : ; 0 < = > ? 4 : !ales

128

A )isu ( Oper tion; 3ivot 8Rot te:


%& 'A
th n Mo

"ui#e Cola Mil$

Cream 1!

Region ,rodu#t

()

10 47

*1 *! *

+ate

*4
129

?S(icin' nd Dicin'@
'roduct $he $elecomm !lice

#ousehold $elecomm *ideo 'udio

i g e

s n o
Europe

"ar East India -etail Direct !pecial

(ales Channel
130

Ro((-up nd Dri(( Down


#igher )evel of 'ggregation

T T T T T T

Sa(es Dhanne( 'egion Dountr# State Location Address Sa(es 'epresentati+e

Drill-Down

-oll +p

)ow-level Details
131

N ture o1 O*A3 An ($sis


T Aggregation 22 $tota( sa(es" percent2to2tota(% T Do parison 22 .udget +s. 9,penses T 'anking 22 Top 50" 3uarti(e ana(#sis T Access to detai(ed and aggregate data T Do p(e, criteria speci*ication T Visua(i0ation

132

Or' ni6 tion (($ Structured D t


T Di**erent Depart ents (ook at the sa e detai(ed data in di**erent wa#s. Without the detai(ed" organi0ationa((# structured data as a *oundation" there is no reconci(a!i(it# o* data
marketing sales finance manufacturing

133

Mu(tidi0ension ( Spre ds%eets


T Ana(#sts need spreadsheets that support
S S S S S S pi+ot ta!(es $cross2ta!s% dri((2down and ro((2up s(ice and dice sort se(ections deri+ed attri!utes

T Popu(ar in retai( do ain

134

O*A3 - D t Cu+e T Idea: ana(#sts need to group data in an# di**erent wa#s S eg. Sa(es$region" product" prodt#pe" prodst#(e" date" sa(ea ount% S sa(ea ount is a easure attri!ute" rest are di ension attri!utes S group!# e+er# su!set o* the other attri!utes R ateria(i0e $preco pute and store% group!#s to gi+e on(ine response S A(so: hierarchies on attri!utes: date 2Y weekda#" date 2Y onth 2Y 3uarter 2Y #ear

135

S<* E.tensions
T 4ront2end too(s re3uire
S 9,tended 4a i(# o* Aggregate 4unctions
R rank" edian" ode

S 'eporting 4eatures
R running tota(s" cu u(ati+e tota(s

S 'esu(ts o*

u(tip(e group !#
onth and tota( sa(es !#

R tota( sa(es !# product

S Data Du!e
136

Re( tion ( O*A3; A Tier DSS


3ata 2areho.se R"+,( :n'&ne 3ec&s&on 1.pport #$&ent

3atabase +a7er

,pp$&cat&on +o'&c +a7er

(resentat&on +a7er

!tore atomic data in industry standard -D%/!6

.enerate !A) execution plans in the -2)', engine to obtain 2)', functionality6

2btain multidimensional reports from the D!! &lient6


137

MD-O*A3; B Tier DSS


-33; :n'&ne -33; :n'&ne 3ec&s&on 1.pport #$&ent

3atabase +a7er

,pp$&cat&on +o'&c +a7er

(resentat&on +a7er

!tore atomic data in a proprietary data structure B/DD%C( pre-calculate as many outcomes as possible( obtain 2)', functionality via proprietary algorithms running against this data6

2btain multidimensional reports from the D!! &lient6


138

T$pic ( O*A3 3ro+(e0s


D t E.p(osion %.mber of ,''re'at&ons

70000 60000 50000 40000 30000 20000 10000 0 2

3ata :<p$os&on 17n rome


65536

16384 16 3 81 4 256 1024 5 6 4096 7 8


139

=4 $e>e$s &n each &mens&on?

%.mber of 3&mens&ons
Microsoft TechEd !

Met d t Repositor$
T Ad inistrati+e
S S S S S S S S

etadata

source data!ases and their contents gatewa# descriptions warehouse sche a" +iew F deri+ed data de*initions di ensions" hierarchies pre2de*ined 3ueries and reports data art (ocations and contents data partitions data e,traction" c(eansing" trans*or ation ru(es" de*au(ts S data re*resh and purging ru(es S user pro*i(es" user groups S securit#: user authori0ation" access contro(

140

Metd t Repositor$ .. B
T .usiness data
S !usiness ter s and de*initions S ownership o* data S charging po(icies

T operationa(

etadata

S data (ineage: histor# o* igrated data and se3uence o* trans*or ations app(ied S currenc# o* data: acti+e" archi+ed" purged S onitoring in*or ation: warehouse usage statistics" error reports" audit trai(s.
141

Recipe 1or Success1u( W re%ouse

/or

Success1u( W re%ouse

"rom )arry .reenfield( http //pwp6starnetinc6com/larryg/index6htm

T 4ro da# one esta!(ish that warehousing is a Coint userH!ui(der proCect T 9sta!(ish that aintaining data 3ua(it# wi(( !e an 7487948 Coint userH!ui(der responsi!i(it# T Train the users one step at a ti e T Donsider doing a high (e+e( corporate data ode( in no ore than three weeks
143

/or

Success1u( W re%ouse

T Look c(ose(# at the data e,tracting" c(eaning" and (oading too(s T I p(e ent a user accessi!(e auto ated director# to in*or ation stored in the warehouse T Deter ine a p(an to test the integrit# o* the data in the warehouse T 4ro the start get warehouse users in the ha!it o* NtestingN co p(e, 3ueries
144

/or

Success1u( W re%ouse

T Doordinate s#ste ro((2out with network ad inistration personne( T When in a !ind" ask others who ha+e done the sa e thing *or ad+ice T .e on the (ookout *or s a((" !ut strategic" proCects T Market and se(( #our data warehousing s#ste s

145

D t W re%ouse 3it1 ((s


T Uou are going to spend uch ti e e,tracting" c(eaning" and (oading data T Despite !est e**orts at proCect anage ent" data warehousing proCect scope wi(( increase T Uou are going to *ind pro!(e s with s#ste s *eeding the data warehouse T Uou wi(( *ind the need to store data not !eing captured !# an# e,isting s#ste T Uou wi(( need to +a(idate data not !eing +a(idated !# transaction processing s#ste s
146

D t W re%ouse 3it1 ((s


T So e transaction processing s#ste s *eeding the warehousing s#ste wi(( not contain detai( T Man# warehouse end users wi(( !e trained and ne+er or se(do app(# their training T A*ter end users recei+e 3uer# and report too(s" re3uests *or IS written reports a# increase T Uour warehouse users wi(( de+e(op con*(icting !usiness ru(es T Large sca(e data warehousing can !eco e an e,ercise in data ho ogeni0ing
147

D t W re%ouse 3it1 ((s


T NO+erheadN can eat up great a ounts o* disk space T The ti e it takes to (oad the warehouse wi(( e,pand to the a ount o* the ti e in the a+ai(a!(e window... and then so e T Assigning securit# cannot !e done with a transaction processing s#ste indset T Uou are !ui(ding a 1I;1 aintenance s#ste T Uou wi(( *ai( i* #ou concentrate on resource opti i0ation to the neg(ect o* proCect" data" and custo er anage ent issues and an understanding o* what adds +a(ue to the custo er

148

DW nd O*A3 Rese rc% Issues


T Data c(eaning
S *ocus on data inconsistencies" not sche a di**erences S data ining techni3ues

T Ph#sica( Design
S design o* su ar# ta!(es" partitions" inde,es S tradeo**s in use o* di**erent inde,es

T Juer# processing
S se(ecting appropriate su ar# ta!(es S d#na ic opti i0ation with *eed!ack S acid test *or 3uer# opti i0ation: cost esti ation" use o* trans*or ations" search strategies S partitioning 3uer# processing !etween OLAP ser+er and !ackend ser+er.
149

DW nd O*A3 Rese rc% Issues .. B


T Warehouse Manage ent
detecting runawa# 3ueries resource anage ent incre enta( re*resh techni3ues co puting su ar# ta!(es during (oad *ai(ure reco+er# during (oad and re*resh process anage ent: schedu(ing 3ueries" (oad and re*resh S Juer# processing" caching S use o* work*(ow techno(og# *or process anage ent S S S S S S
150

3roducts# Re1erences# Use1u( *in!s

Reportin' Too(s
T T T T T T T T T T T And#ne Do puting 22 ;JL .rio 22 .rioJuer# .usiness O!Cects 22 .usiness O!Cects Dognos 22 I pro ptu In*or ation .ui(ders Inc. 22 4ocus *or Windows Orac(e 22 Disco+erer>000 P(atinu Techno(og# 22 SJLLAssist" Pro'eports PowerSo*t 22 In*oMaker SAS Institute 22 SASHAssist So*tware A; 22 9sperant Ster(ing So*tware 22 VISIOB:Data
152

O*A3 nd E.ecutive In1or0 tion S$ste0s


T And#ne Do puting 22 Pa!(o T Ar!or So*tware 22 9ss!ase T Dognos 22 PowerP(a# T Do share 22 Do OLAP ander T Microso*t 22 P(ato T Orac(e 22 9,press T Pi(ot 22 LightShip T P(anning Sciences 22 ;entiu T P(atinu Techno(og# 22 Prodea.eacon" 4orest F Trees T SAS Institute 22 SASH9IS" OLAPMM T Speedware 22 Media
153

T 1o(istic S#ste s 22 1o(os T In*or ation Ad+antage 22 AZSUS" We!OLAP T In*or i, 22 Metacu!e T Microstrategies 22DSSHAgent

Ot%er W re%ouse Re( ted 3roducts


T Data e,tract" c(ean" trans*or " re*resh
S S S S S S DA2Ingres rep(icator Dar(eton Passport Pris Warehouse Manager SAS Access S#!ase 'ep(ication Ser+er P(atinu In*ore*iner" In*opu p
154

E.tr ction nd Tr ns1or0 tion Too(s


T Dar(eton Dorporation 22 Passport T 9+o(utionar# Techno(ogies Inc. 22 9,tract T In*or atica 22 Open.ridge T In*or ation .ui(ders Inc. 22 9DA Dop# Manager T P(atinu T Pris Techno(og# 22 In*o'e*iner Warehouse Manager So(utions 22 Pris

T 'ed .rick S#ste s 22 DecisionScape 4or ation

155

Scru++in' Too(s
T Apertus 22 9nterpriseHIntegrator T Va(it# 22 IP9 T Posta( So*t

156

W re%ouse 3roducts
T T T T T T T T Do puter Associates 22 DA2Ingres 1ew(ett2Packard 22 A((!aseHSJL In*or i, 22 In*or i," In*or i, ZPS Microso*t 22 SJL Ser+er Orac(e 22 Orac(e8" Orac(e Para((e( Ser+er 'ed .rick 22 'ed .rick Warehouse SAS Institute 22 SAS So*tware A; 22 ADA.AS
157

T S#!ase 22 SJL Ser+er" IJ" MPP

W re%ouse Server 3roducts


T Orac(e : T In*or i,

T S#!ase

S On(ine D#na ic Ser+er S ZPS 229,tended Para((e( Ser+er S <ni+ersa( Ser+er *or o!Cect re(ationa( app(ications S Adapti+e Ser+er 55.? S S#!ase MPP S S#!ase IJ
158

W re%ouse Server 3roducts


T 'ed .rick Warehouse T Tande Bonstop T I.M
S D.> MVS S <ni+ersa( Ser+er S D.> A00

T Teradata
159

Ot%er W re%ouse Re( ted 3roducts


T Donnecti+it# to Sources
S S S S S S S Apertus In*or ation .ui(ders 9DAHSJL P(ati u In*ohu! SAS Donnect I.M Data Toiner Orac(e Open Donnect In*or i, 9,press ;atewa#
160

Ot%er W re%ouse Re( ted 3roducts


T Juer#H'eporting 9n+iron ents
S S S S S S .rioHJuer# Dognos I pro ptu In*or i, Viewpoint DA Visua( 9,press .usiness O!Cects P(atinu 4orest and Trees

161

CG*Ds# GUI ,ui(ders# nd 3C D t + ses


T In*or ation .ui(ders 22 4ocus T Lotus 22 Approach T Microso*t 22 Access" Visua( .asic T MITI 22 SJ'HWork!ench T PowerSo*t 22 Power.ui(der T SAS Institute 22 SASHA4

162

D t Minin' 3roducts
T DataMind 22 neurOagent T In*or ation Disco+er# 22 IDIS T SAS Institute 22 SASHBeuronets

163

D t W re%ouse
T W.1. In on" .ui(ding the Data Warehouse" Second 9dition" Tohn Wi(e# and Sons" 5667 T W.1. In on" T. D. We(ch" Iatherine L. ;(asse#" Managing the Data Warehouse" Tohn Wi(e# and Sons" 5668 T .arr# De+(in" Data Warehouse *ro Architecture to I p(e entation" Addison Wes(e# Long an" Inc 5668
164

D t W re%ouse
T W.1. In on" Tohn A. @ach an" Tonathan ;. ;eiger" Data Stores Data Warehousing and the @ach an 4ra ework" Mc;raw 1i(( Series on Data Warehousing and Data Manage ent" 5668 T 'a(ph Ii !a((" The Data Warehouse Too(kit" Tohn Wi(e# and Sons" 5667

165

O*A3 nd DSS
T 9rik Tho sen" OLAP So(utions" Tohn Wi(e# and Sons 5668 T Microso*t Tech9d Transparencies *ro Microso*t Tech9d 6: T 9ss!ase Product Literature T Orac(e 9,press Product Literature T Microso*t P(ato We! Site T Microstrateg# We! Site
166

D t Minin'
T Michae( T.A. .err# and ;ordon Lino**" Data Mining Techni3ues" Tohn Wi(e# and Sons 5668 T Peter Adriaans and Do(* @antinge" Data Mining" Addison Wes(e# Long an Ltd. 5667 T IDD Don*erences

167

Ot%er Tutori (s
T Dono+an Schneider" Data Warehousing Tutoria(" Tutoria( at Internationa( Don*erence *or Manage ent o* Data $SI;MOD 5667% and Internationa( Don*erence on Ver# Large Data .ases 68 T < eshwar Da#a( and SuraCit Dhaudhuri" Data Warehousing Tutoria( at Internationa( Don*erence on Ver# Large Data .ases 5667 T Anand Deshpande and S. Seshadri" Tutoria( on Datawarehousing and Data Mining" DSI268
168

Use1u( UR*s
T 'a(ph Ii !a(()s ho e page
S http:HHwww.rki !a((.co

T Larr# ;reen*ie(d)s Data Warehouse In*or ation Denter


S http:HHpwp.starnetinc.co H(arr#gH

T Data Warehousing Institute


S http:HHwww.dw2institute.co H

T OLAP Dounci(
S http:HHwww.o(apcounci(.co H
169

You might also like