Def|n|t|on of 8us|ness Inte|||gence

ln Lhls secLlon you wlll be famlllar wlLh Lhe deflnlLlon of 8l also Lhe appllcaLlon lnvolved ln lLŦ 8uslness
lnLelllgence (8l) ls a wlde caLegory of appllcaLlons and Lechnologles whlch gaLhersţ sLoresţ analyslsţ and
provldes access Lo daLaŦ

lL helps enLerprlse users Lo make beLLer buslness declslonsŦ 8l appllcaLlons lnvolve Lhe acLlvlLles of
declslon supporL sysLemsţ query and reporLlngţ onllne analyLlcal processlng (CLAÞ)ţ sLaLlsLlcal analyslsţ
forecasLlngţ and daLa mlnlngŦ

8uslness lnLelllgence appllcaLlons lncludeť

O ,lsslonŴcrlLlcal and lmporLanL Lo an enLerprlseƌs funcLlons or occaslonal Lo meeL a unlque
O nLerprlseŴwlde or resLrlcLed Lo one dlvlslonţ subdlvlslonţ or pro[ecLŦ
O enLrally sLarLed or drlven by user demandŦ

8uslness lnLelllgence ls a Lerm used Lo refer a number of acLlvlLles a company may underLake Lo gaLher
lnformaLlon abouL Lhelr markeL or Lhelr compeLlLorsŦ
Some of Lhe acLlvlLles areť

O ompeLlLlon analyslsŦ
O ,arkeL analyslsŦ
O lndusLry analyslsŦ
O lndusLrlal lnLelllgenceŦ
O ÞredlcLlve analyslsŦ
O Cnllne analyslsŦ
O Þrocesslng analyslsŦ
O aLa mlnlngŦ
O 8uslness performance managemenLŦ
O 8enchmarklngŦ
O @exL mlnlngŦ
8uslness lnLelllgence (8l) provldes ln depLh knowledge abouL performance lndlcaLorsţ such asť

O usLomers
O ompeLlLors
O 8uslness counLerparLs
O conomlc envlronmenL
O lnLernal operaLlons

@he lmprovemenL ln buslness pace allows a company Lo consLanLly grow ln dlverse markeL condlLlonsŦ
Accordlng Lo ,lcrosofL's vlslon of 8l for uslng SCL server 2003ţ º8l ls a meLhod of sLorlng and key
enLerprlse daLa so LhaL anyone ln your company can qulckly and easlly ask quesLlons of accuraLe and
Llmely daLaŦ" ffecLlve 8l allows you Lo use lnformaLlon Lo reallze why your buslness goL Lhe parLlcular
resulLsţ Lo declde on courses of acLlon based on pasL facLsţ and Lo correcLly pro[ecL poLenLlal resulLsŦ
8l daLa ls dlsplayed ln a way LhaL lL ls sulLable Lo each Lype of userţ lŦeŦţ Lhe speclallsLs are able Lo see
lnLo deLalled daLaţ execuLlves concenLraLe on Llmely summarlesţ and mlddle managers wlll look lnLo
daLa presenLed ln deLalled way whlch helps Lhem Lo make good buslness declslonsŦ
xampleť ,lcrosofL's 8l uses cubesţ raLher Lhan Lablesţ Lo sLore lnformaLlon and presenLs lnformaLlon
vla reporLsŦ @he lnformaLlon can be accesslble Lo end users ln a range of formaLsť Wlndows appllcaLlonsţ
Web AppllcaLlonsţ and ,lcrosofL 8l cllenL Loolsţ such as xcel or SCL 8eporLlng ServlcesŦ

lL ls lmporLanL Lo glve an explanaLlon Lo Lhe Lerm lnslghLŦ

lnslghLs are Lhe flnal goals for auLhorsţ vendors and l@ consulLanLsţ when commenclng Lhe 8l pro[ecLsŦ
@hey are someLlmes called 'moments of c|ar|ty' LhaL drlve forward Lhe enLerprlseŦ When a buslness
overvlew ls presenLed by 8lţ lL ls posslble Lo look aL a prevlously unseen facL or aspecL of Lhe

@he maln challenge of 8uslness lnLelllgence ls Lo gaLher and serve planned lnformaLlon regardlng all
appllcable facLors whlch makes Lhe buslness proflLable and allows you Lo access LhaL knowledge easlly
and efflclenLly and ln resulL maxlmlse Lhe success of an organlsaLlonŦ xampleť AccounLanL's Armouryť
Work lnc2 ls group flnance company and Cavln LevereLL ls Lhe plannlng dlrecLor slnce !anuary 2007Ŧ Cne
of hls prlmary prlorlLles was Lo geL a sLrong seL of coverage and enqulry Lools ln place as qulckly as
Accordlng Lo hlm consolldaLlon and reporLlng Lools are key weapons ln Lhe accounLanL's armouryŦ Cne
of hls flrsL prlorlLles was Lo geL a sLrong seL of reporLlng and lnqulry Lools ln placeŦ
A prlorlLy for LevereLL was Lo puL buslness lnLelllgence Lools ln a place LhaL would auLomaLe Lhe work
lnvolved ln collecLlng Lhe monLhly proflL and loss daLa from each buslness unlLţ consolldaLe lLţ and
compare lL agalnsL Lhe corporaLe budgeLŦ
urlng 2008ţ Lhe company worked wlLh lLs sysLems lnLegraLlon parLner aLel Lo glve flnance sLaff Lhe
ablllLy Lo reporL agalnsL daLa held ln Lhe companyƌs cusLomer relaLlonshlp managemenL sysLemţ also
from Sageţ Lo geL vlslblllLy lnLo Work lncƌs sales plpellneŦ @lle manufacLurer Þ Croupţ used buslness
lnLelllgence Lools Lo analyse proflLablllLy rlghL down Lo Lhe level of Lhe lndlvldual producLŦ
@hls lnvolved Lhe accumulaLlon of lnformaLlon held ln several backŴend sysLemsţ lncludlng producL llsLsţ
cusLomer daLabases and sales reporLsţ lnLo a slngle ƍcubeƍ of daLaţ based on Lhe Appllx @, onllne
analyLlcal programmlng englneŦ

lrom Lhereţ Lhe daLa was Laken ouL Lo make prlclng declslons for Þ Croupƌs exLenslve producL llneŦ @he
nexL sLep was Lo creaLe annual budgeLsŦ AfLer LhaLţ lL was used Lo analyse sLock managemenL daLaţ so
LhaL Þ Croup could produce more accuraLe demand forecasLsţ whlch enabled Lhe company Lo declde
how much raw maLerlal Lo buyŦ

lurLherţ Lhe company had lnLraneLŴbased dashboards Lo leL dlfferenL deparLmenLs beneflL from some of
Lhe lnslghL mlned by Lhe flnance deparLmenLţ alerLlng Lhem Lo where Lhey need Lo focus Lhelr efforLs ln
order Lo meeL flnanclal LargeLsŦ
As shown ln flgure Ŧ a daLa warehouse ls deslgned for query and analysls raLher Lhan for LransacLlonal
processlngŦ lL brlngs daLa from varlous sources provldlng leaders and declslon makersr greaLer approach
lnLo Lhe pro[ecLŦ @he key Lools provlded Lhrough
8l soluLlons lncludeť

O Þerformance ,anagemenLŦ
O nLerprlse 8eporLlng and aLa ,lnlngŦ

8l provldes an lnLegraLed 8uslness lnLelllgence and aLa Warehouslng soluLlon LhaL meeLs lLs buslness
needsŦ @hese soluLlons glve an organlsaLlon Lhe ablllLy Lo plan performance for buslness sLraLegyţ goals
and ob[ecLlvesŦ @hrough Lhe connecLlon of performance Lo corporaLe goalsţ leadershlp can use Lhe dayŴ
LoŴday daLa generaLed LhroughouL Lhelr organlsaLlon Lo monlLor key lndlcaLors and Lake declslons LhaL
make a dlfferenceŦ WlLh performance managemenL soluLlon LhaL uses accesslble lnformaLlon Lo assume
performanceţ an organlsaLlon can qulckly deLermlneŦ @he performance managemenL soluLlon generally
lncludes Lhe concern of meLrlcs and Lhe way Lo Lrack Lhemţ and Lhe expanslon of dashboards Lo
successfully communlcaLe Lhe performance resulLsŦ
nLerprlse reporLlng provldes Lhe organlsaLlon wlLh Lhe lnformaLlon needed Lo check and conLrol
behavlour and Lhe assoclaLed performance and aLa mlnlng ls used Lo ldenLlfy Lrends Lo ldenLlfy lLs
cusLomers and Lhe lnLerŴrelaLlonshlps of organlsaLlon performanceŦ lL uses webŴbased Lools and
processes besL sulLed Lo meeL Lhe needs of each organlsaLlon based on Lhelr slze and needsŦ
1ransform|ng data |nto Informat|on
When a servlce consumer requesLs daLa from a servlce provlderţ Lhe daLa senL across by Lhe
consumer ls sLlll rawţ ln Lhe sense LhaL Lhe consumer sLlll needs Lo massage and process Lhe
daLaţ converL lnLo useful lnformaLlonŦ @he lmpllcaLlon here ls LhaL Lhere ls an evoluLlonary
process by whlch daLa geLs Lransformed lnLo useful lnformaLlonŦ @he followlng deLalls LhaL

8aw aLa refers Lo Lhe conLenLs of enLerprlse daLabases and oLher such arLlfacLs wlLhln Lhe
enLerprlse where daLa ls sLoredŦ lL ls raw because lL has noL been processedţ and ls noL yeL ln a
sufflclenLly usable formŦ

Þrocessed Informat|on
Þrocessed lnformaLlon can be LhoughL of daLa whlch has evolved from lLs mosL rudlmenLary
sLaLe lnLo someLhlng more meanlngful Lo Lhe enLerprlseŦ @yplcallyţ sofLware appllcaLlons
consume Lhe raw daLa and presenL lL ln more meanlngful buslness enLlLles and Lhelr lnLerŴ
8us|ness know|edge
8uslness knowledge ls a sLage of daLa evoluLlon where a deeper and more comprehenslve
undersLandlng of Lhe dynamlc envlronmenL where Lhe buslness operaLes has been capLuredŦ
8uslness knowledge can be LhoughL of as process lnformaLlonţ buL only aL a hlgher level of
evoluLlonţ wlLh overloaded semanLlc meanlng and crossŴreferencesŦ
8us|ness Inte|||gence
AL Lhe 8uslness lnLelllgence sLageţ daLa has evolved Lo Lhe mosL reflned sLaLeţ and represenLs
an acLlonable plcLure Lo a declslonŴmakerţ by helplng Lhem arrlve aL Lhe rlghL declslonsţ
resulLlng ln reduced rlskţ effecLlve uLlllzaLlon of resources and enhanced enLerprlse efflclency ln
buslness operaLlonsŦ

very enLerprlse has daLa ln each of Lhe above sLages of lnformaLlon evoluLlonţ and so lL ls
crlLlcally lmporLanL Lo conslder Lhe sLraLegles of daLa evoluLlon lnLo useful lnformaLlonţ
especlally when conslderlng a LransformaLlonal acLlvlLy llke adopLlng a SCA
ow to make |nformat|on out of raw data
A furLher problem arlses when Lhere are mulLlple sLakeholders ln Lhe enLerprlse lnLeresLed ln
Lhls lnformaLlon (osLenslblyţ Lo ald Lhelr own declslonŴmaklng)ţ buL Lhelr semanLlc
lnLerpreLaLlon of LhaL lnformaLlon varles wldelyŦ ln oLher wordsţ lf Lhe lnformaLlon were Lo be
unlversally undersLood and accepLed by all wlLhln Lhe organlzaLlonţ all would be wellŦ Poweverţ
Lhls ls Lyplcally nC@ Lhe caseŦ ach consumer of Lhe lnformaLlon needs Lo have Lhelr own
narrow lnLerpreLaLlon of how Lhey percelve Lhe raw daLa and how lL needs Lo evolve Lo be
presenLed wlLh acLlonable lnLelllgenceŦ
AL flrsL glanceţ Lhls problem ls easlly resolved by developlng a unlversalţ canonlcal lnformaLlon
model (meLadaLa) and deploylng lL across Lhe enLerprlseŦ ÞracLlcally howeverţ Lhls ls akln Lo
forclng everybody ln Lhe world Lo speak ngllshţ and Lhls ls noL accepLable from a funcLlonalţ
culLuralţ pollLlcal and maybe even a legal polnL of vlewŦ @hls necesslLaLes allowlng a cerLaln
amounL of leeway ln allowlng lndlvldual groups of consumers Lo consume Lhe daLa by way of
Lransformlng lL Lo sulL Lhelr speclflc needsŦ ln facLţ Lhls leeway can be lnsLlLuLed as a deslgn
paLLern or a besL pracLlceŦ
A furLherţ reflned model mlghL lnvolve a federaLed conLrol mechanlsm over Lhe lnformaLlon
meLadaLa modelţ ln whlch a commonţ globalţ absLracL deflnlLlon exlsLsţ howeverţ each
consumer mlghL deflne LransformaLlonal conLracLs on Lhe global modelŦ @hese LransformaLlonal
conLracLs allows for modlflcaLlons of Lhe commonţ global deflnlLlon of LhaL daLa Lo Lhe domaln
speclflc daLa deflnlLlonŦ When applled Lo acLual daLaţ Lhese LransformaLlon conLracLs can Lhen
modlfy Lhe daLa so as Lo be easlly consumable wlLhln Lhe local domaln of Lhe consumerŦ @husţ
each consumer ls empowered Lo be responslble for Lransformlng Lhe common sLream of daLa
lnLo a conLexLual deflnlLlon for lLs own consumpLlonŦ Alsoţ noLe LhaL once a consumer deflnes
Lhelr own seL of LransformaLlonal conLracLsţ Lhey are arLlfacLs whlch can be reŴused by
subsequenL lnLeresLed parLlesŦ
@he followlng lllusLraLlon shows an example where sales daLa ls consumed by varlous lnLeresLed
parLles wlLhln an enLerprlseţ buL each parLy has Lhelr unlque vlew of daLaŦ @hey mlghL be
lnLeresLed ln only cerLaln parLs of LhaL daLaţ or may vlew lL ln cerLaln oLher formaLsţ by applylng
lndlvldual LransformaLlonsŦ

@hls model flLs ln very well ln Lhe nLerprlse Servlce 8us (S8) deslgn paLLern whlch ls
ublqulLously used ln mosL SCA deslgnsŦ @he S8 noL only provldes Lhe messaglng backbone for
Lhe servlce provlder and Lhe consumer Lo communlcaLe wlLhţ buL also provldes daLa medlaLlon
and lnformaLlon LransformaLlonal capablllLles as wellŦ lL can Lhus serve as a cenLrallzed
reposlLory of Lhe LransformaLlonal conLracLs (for eŦgŦť x,L schemas and xSL sLyle sheeLs)Ŧ

what do you mean by data ware house? What are the ma[or concepts and
term|no|ogy used |n the study of data warehouse?

Data Warehouse

A Data warehouse ls a parL of Lhe daLa warehouslng sysLemŦ lL provldes consolldaLedţ accesslble and
flexlble collecLlon of daLa for end user analysls and reporLlngŦ
@he characLerlsLlcs of daLa warehouse as deflned by lnmon as 'a sub[ecLŴorlenLedţ lnLegraLedţ nonŴ
volaLlleţ Llme varlanL collecLlon of daLa LhaL supporLs Lhe declslon maklng process of an organlsaLlonŦ'

Pe explalns Lhe Lerms ln Lhe above deflnlLlon asť

O Sub[ectŴCr|entedť
aLa Warehouse ls sub[ecLŴorlenLed as Lhe daLa glves lnformaLlon abouL a parLlcular sub[ecL lnsLead of
abouL a company's ongolng operaLlonŦ
O Integratedť
aLa Warehouse ls lnLegraLed as Lhe daLa ls gaLhered from a varleLy of sources lnLo Lhe daLa warehouse
and merged lnLo a coherenL wholeŦ
O 1|me Var|antť

aLa warehouse ls LlmeŴvarlanL as all Lhe daLa ln lL ls ldenLlfled wlLh a parLlcular Llme perlodŦ
O NonŴVo|at||eť

aLa ls sLable ln a daLa warehouseŦ ,ore daLa ls added buL daLa ls never removedŦ @husţ Lhe
managemenL can galn a consLanL plcLure of Lhe buslnessŦ Pence Lhe daLa warehouse ls nonŴvolaLlle (long
Lerm sLorage)Ŧ


A fundamenLal concepL of a daLa warehouse ls Lhe dlsLlncLlon beLween daLa and lnformaLlonŦ
aLa ls composed of observable and recordable facLs LhaL are ofLen found ln operaLlonal or
LransacLlonal sysLemsŦ AL 8uLgersţ Lhese sysLems lnclude Lhe reglsLrar's daLa on sLudenLs
(wldely known as Lhe S88)ţ human resource and payroll daLabasesţ course schedullng daLaţ
and daLa on flnanclal aldŦ ln a daLa warehouse envlronmenLţ daLa only comes Lo have value Lo
endŴusers when lL ls organlzed and presenLed as lnformaLlonŦ lnformaLlon ls an lnLegraLed
collecLlon of facLs and ls used as Lhe basls for declslon maklngŦ lor exampleţ an academlc unlL
needs Lo have dlachronlc lnformaLlon abouL lLs exLenL of lnsLrucLlonal ouLpuL of lLs dlfferenL
faculLy members Lo gauge lf lL ls becomlng more or less rellanL on parLŴLlme faculLy

Data Warehouseť

A daLa sLrucLure LhaL ls opLlmlzed for dlsLrlbuLlonŦ lL collecLs and sLores lnLegraLed seLs of
hlsLorlcal daLa from mulLlple operaLlonal sysLems and feeds Lhem Lo one or more daLa marLsŦ lL
may also provlde endŴuser access Lo supporL enLerprlse vlews of daLaŦ

Data Martť

A daLa sLrucLure LhaL ls opLlmlzed for accessŦ lL ls deslgned Lo faclllLaLe endŴuser analysls of
daLaŦ lL Lyplcally supporLs a slngleţ analyLlc appllcaLlon used by a dlsLlncL seL of workersŦ

Stag|ng Areať

Any daLa sLore LhaL ls deslgned prlmarlly Lo recelve daLa lnLo a warehouslng envlronmenLŦ

Cperat|ona| Data Storeť

A collecLlon of daLa LhaL addresses operaLlonal needs of varlous operaLlonal unlLsŦ lt ls oot o
compooeot of o Joto woteboosloq otcbltectoteţ bot o solotloo to opetotloool oeeJsŦ
CLAÞ (CnŴL|ne Ana|yt|ca| Þrocess|ng)ť

A meLhod by whlch mulLldlmenslonal analysls occursŦ

Mu|t|d|mens|ona| Ana|ys|sť

@he ablllLy Lo manlpulaLe lnformaLlon by a varleLy of relevanL caLegorles or ºdlmenslons" Lo
faclllLaLe analysls and undersLandlng of Lhe underlylng daLaŦ lL ls also someLlmes referred Lo as
ºdrllllngŴdown"ţ ºdrllllngŴacross" and ºsllclng and dlclng"


A means of vlsually represenLlng mulLldlmenslonal daLaŦ

Star Schemať

A means of aggregaLlng daLa based on a seL of known dlmenslonsŦ lL sLores daLa
mulLldlmenslonallLy ln a Lwo dlmenslonal 8elaLlonal aLabase ,anagemenL SysLem (88,S)ţ
such as CracleŦ

Snowf|ake Schemať

An exLenslon of Lhe sLar schema by means of applylng addlLlonal dlmenslons Lo Lhe dlmenslons
of a sLar schema ln a relaLlonal envlronmenLŦ

Mu|t|d|mens|ona| Databaseť

Also known as ,8 or ,8SŦ A class of proprleLaryţ nonŴrelaLlonal daLabase managemenL
Lools LhaL sLore and manage daLa ln a mulLldlmenslonal mannerţ as opposed Lo Lhe Lwo
dlmenslons assoclaLed wlLh LradlLlonal relaLlonal daLabase managemenL sysLemsŦ

CLAÞ 1oo|sť

A seL of sofLware producLs LhaL aLLempL Lo faclllLaLe mulLldlmenslonal analyslsŦ an lncorporaLe
daLa acqulslLlonţ daLa accessţ daLa manlpulaLlonţ or any comblnaLlon LhereofŦ

ÇŦ3 what are the data mode||ng techn|ques used |n data warehous|ng env|ronment?

aLa Warehouse ,odellng @echnlques aLa warehouse modellng ls Lhe process of bulldlng a
model for Lhe daLa LhaL ls Lo be sLored ln Lhe daLa warehouseŦ @he model produced ls an
absLracL modelţ and ln Lhls senseţ lL ls a represenLaLlon of reallLyţ or aL leasL a parL of reallLy
whlch Lhe daLa warehouse ls assumed Lo supporLŦ When consldered llke Lhlsţ daLa warehouse
modellng seems Lo resemble LradlLlonal daLabase modellngţ whlch mosL of us are famlllar wlLh
ln Lhe conLexL of daLabase developmenL for operaLlonal appllcaLlons (CL@Þ daLabase
developmenL)Ŧ @hls resemblance should be consldered wlLh greaL careţ howeverţ because Lhere
are a number of slgnlflcanL dlfferences beLween daLa warehouse modellng and CL@Þ daLabase
modellngŦ @hese dlfferences lmpacL noL only Lhe modellng process buL also Lhe modellng
Lechnlques Lo be usedŦ ln hapLer 7ţ º@he Þrocess of aLa Warehouslngţ" Lhe baslc lssues and
sLeps of a daLa warehouse modellng process were descrlbedŦ @hls chapLer focuses enLlrely on
Lhe Lechnlques lnvolved ln a daLa warehouse modellng processŦ lL exLends and complemenLs
hapLer 7ţ º@he Þrocess of aLa Warehouslng" ln several waysť
O Whereas hapLer 7ţ º@he Þrocess of aLa Warehouslng" focuses on Lhe modellng
processţ Lhls chapLer focuses on daLa warehouse modellng LechnlquesŦ
O Whereas hapLer 7ţ º@he Þrocess of aLa Warehouslng" ln large parL deals wlLh Lhe
baslc lssues of dlmenslonal modellng and lllusLraLes how endŴuser requlremenLs can be
capLured and somehow formallzed ln whaL ls an lnlLlal dlmenslonal modelţ Lhls chapLer
lnvesLlgaLes daLa warehouse modellng beyond Lhese aspecLsŦ
O Whereas hapLer 7ţ º@he Þrocess of aLa Warehouslng" conslders daLa warehouse
modellng prlmarlly from Lhe polnL of vlew of rapld developmenL of an lndependenL daLa
marLţ Lhls chapLer ls concerned wlLh maklng daLa warehouse models LhaL are sulLable
for lnLegraLlon wlLh oLher daLa marLs or can be deployed wlLhln a corporaLe daLa
warehouse envlronmenLŦ
AlLhough we focus on modellng Lechnlquesţ we Lry Lo presenL Lhe Lechnlques as parL of a
sLrucLured approach Lo daLa warehouse modellngŦ Poweverţ more work ls requlred ln Lhls area
Lo develop a meLhodologlcal approach Lo daLa warehouse modellngŦ
Data Warehouse Mode||ng and CL1Þ Database Mode||ng
8efore sLudylng daLa warehouse modellng Lechnlquesţ lL ls worLhwhlle lnvesLlgaLlng Lhe
dlfferences beLween daLa warehouse modellng and CL@Þ daLabase modellngŦ @hls wlll glve you
a beLLer ldea of why new or adapLed Lechnlques are requlred for performlng daLa warehouse
modellng wlll help you undersLand how Lo seL up a daLa warehouse modellng approach or
Cr|g|n of the Mode||ng D|fferences
@here are Lhree maln reasons why daLa warehouse modellng requlres modellng Lechnlques
oLher Lhan CL@Þ daLabase modellng or why LradlLlonal modellng Lechnlquesţ when used ln daLa
warehouse developmenL pro[ecLsţ requlre a slgnlflcanLly dlfferenL focusŦ
O A daLa warehouse has base properLles LhaL make lL fundamenLally dlfferenL from CL@Þ
daLabasesŦ ln Lhe nexL secLlonţ Lhese properLles and Lhe lmpacL Lhey have on daLa
warehouse modellng are lnvesLlgaLed furLherŦ
O @he compuLlng conLexL ln whlch a daLa warehouse resldes dlffers from Lhe conLexL ln
whlch CL@Þ daLabases resldeŦ users of CL@Þ appllcaLlons are 'shlelded' from Lhe
daLabase sLrucLure because Lhey lnLeracL Lhrough user lnLerfaces and use appllcaLlon
servlces for worklng wlLh Lhe daLabasesŦ users of a daLa warehouseţ howeverţ are much
more dlrecLly lnvolved wlLh Lhe daLa warehouse model and Lhe way daLa ls organlzed ln
Lhe warehouseŦ lalllng Lo make models LhaL are slmple Lo undersLand and dlrecLly
represenL Lhe end user's percepLlon of reallLy ls one of Lhe worsL Lhlngs LhaL can happen
Lo a daLa warehouse enablemenL pro[ecLŦ
O lnherenL Lo daLa warehouse enablemenL ls Lhe fuzzlness and lncompleLeness of endŴ
user requlremenLs and Lhe conLlnuous evoluLlon of Lhe daLa warehouseŦ @hese
lncompleLe requlremenLs call for a flexlble modellng process and for Lechnlques whlch
are approprlaLe for evoluLlonary developmenLŦ @he rlsks of flexlble and evoluLlonary
sofLware developmenL are lncoherence and lnconslsLency of Lhe end resulLŦ @hese lssues
cerLalnly requlre aLLenLlon when performlng daLa warehouse modellngŦ

,osL of Lhe above reasons for why daLa warehouse modellng ls dlfferenL from CL@Þ daLabase
modellng also apply ln Lhe conLexL of daLa marL developmenLŦ AlLhough Lhe developmenL of
daLa marLs may appear Lo be less compllcaLed Lhan Lhe developmenL of corporaLe daLa
warehousesţ many of Lhe properLles of a daLa warehouse LhaL make modellng so dlfferenL from
CL@Þ daLabase modellng also apply for daLa marL developmenL pro[ecLsŦ ln addlLlonţ Lhe lmpacL
of end users and endŴuser requlremenLs on Lhe modellng process and Lechnlques applled for
daLa marLs become even more lmporLanL for daLa warehousesŦ

8ase Þropert|es of a Data Warehouse
Some of Lhe mosL slgnlflcanL dlfferences beLween daLa warehouse modellng and CL@Þ daLabase
modellng are relaLed Lo Lhe base properLles of a daLa warehouseţ whlch are summarlzed ln
below menLloned flgureŦ

A daLa warehouse ls an lnLegraLed collecLlon of daLabases raLher Lhan a slngle daLabaseŦ lL
should be concelved as Lhe slngle source of lnformaLlon for all declslon supporL processlng and
all lnformaLlonal appllcaLlons LhroughouL Lhe organlzaLlonŦ A daLa warehouse ls an organlc
Lhlngţ and lL Lends Lo become blgţ lf noL blg from Lhe beglnnlngŦ ln addlLlon Lo Lhe obvlous
requlremenL LhaL a daLa warehouse should saLlsfy Lhe needs of end usersţ Lhere ls also a greaL
need Lo achleve maxlmum conslsLency LhroughouL Lhe whole daLa warehouse envlronmenLţ aL
Lhe level of prlmlLlve daLa and derlved daLaţ and also wlLhln Lhe lnformaLlon derlvaLlon
processes LhemselvesŦ
A daLa warehouse conLalns daLa LhaL belongs Lo dlfferenL lnformaLlon sub[ecL areasţ whlch can
be Lhe basls for loglcally parLlLlonlng Lhe daLa warehouse ln several dlfferenL (concepLual or
even physlcal) daLabasesŦ A daLa warehouse also conLalns dlfferenL caLegorles of daLaŦ lL
conLalns prlmlLlve daLa (Lhe 'SysLem of 8ecord') elLher represenLed and organlzed as an
accumulaLlon of capLured source daLa changesţ buslness evenLs and LransacLlonsţ or as an
lnLerpreLed and wellŴsLrucLured hlsLorlcal daLabaseŦ ln many cases boLh represenLaLlons of
prlmlLlve daLa are presenL ln Lhe daLa warehouse and are poslLloned and mapped Lo form an
lnLegraLed collecLlon of daLa LhaL represenLs 'Lhe corporaLe memoryŦ' AnoLher ma[or caLegory
of daLa ln Lhe daLa warehouse ls LhaL whlch ls condensed and aggregaLed ln lnformaLlon
analysls daLabases havlng a formaL and layouL LhaL ls dlrecLly sulLable for end users Lo lnLerpreL
and useŦ A daLa warehouse also usually conLalns 'supporL daLabasesţ' whlch are noL dlrecLly of
lnLeresL Lo end users for Lhelr daLa analysls acLlvlLles buL are lmporLanL componenLs ln Lhe
process of capLurlng source daLa and dellverlng conslsLenL lnformaLlon Lo end usersŦ
learlyţ daLa warehouse modellng musL conslsL of dlfferenL klnds of modellng LechnlquesŦ @he
SysLem of 8ecord ls usually besL lf noL modeled uslng Lhe same modellng Lechnlques as Lhe
endŴuserŴorlenLed lnformaLlon analysls daLabasesŦ lfţ ln addlLlonţ one conslders LhaL end users
may be deallng wlLh declslon supporL Lools (query and reporLlngţ CLAÞţ daLa mlnlngţ ŦŦŦ) and
lnformaLlonal appllcaLlons LhaL have dlfferenL usage and developmenL characLerlsLlcsţ lL
becomes clear LhaL daLa warehouse modellng ls ln facL a compllaLlon of dlfferenL modellng
Lechnlquesţ each wlLh lLs own area of appllcablllLyŦ
1he Data Warehouse Comput|ng Context
aLa warehouses have Lo be developed wlLhln a compuLlng conLexL LhaL dlffers from Lhe
conLexL ln whlch CL@Þ daLabase appllcaLlons are developedŦ

@here ls a fundamenLal dlfference beLween Lhe way end users use CL@Þ daLabases and daLa
warehousesŦ CL@Þ users are shlelded from Lhe daLabases by an appllcaLlon layerŦ @hey perform
Lasksţ usually conslsLlng of a flxed number of predeflned daLabase operaLlonsţ whlch are parL of
a flxed LransacLlon workflowŦ
aLa warehouse appllcaLlons are LoLally dlfferenLŦ @hey are daLaŴcenLrlc raLher Lhan processŴ
cenLrlcŦ nd users deal almosL dlrecLly wlLh Lhe daLa and Lhere are no flxed workflows (wlLh a
few excepLlons here and Lhere)Ŧ nd users are noL lnLeresLed ln recordlng daLa ln Lhe
warehouseť Lhey wanL Lo geL lnformaLlon ouL of Lhe warehouseŦ @hey ralse quesLlons agalnsL
Lhe warehouseţ LesL and verlfy hypoLheses wlLh lnformaLlon Lhey drag ouL of Lhe warehouse
reconsLrucL chalns of evenLsţ whlch Lhey Lhen analyze posslbly Lo deLecL paLLerns or seasonal
Lrendsţ and make exLrapolaLlons and pro[ecLlons for Lhe fuLureŦ
aLa warehouses are very much open collecLlons of daLaţ and endŴuser lnvolvemenL ln Lhe
enablemenL process ls known Lo be a vlLal elemenL of successŦ ln addlLlonţ good daLa
warehouses reallze whaL could be called Lhe lnformaLlon supermarkeL prlnclpleţ whereby end
users freely access Lhe daLa warehouse when Lhey need lnformaLlon for Lhelr own purposesŦ
llgure 38 also polnLs Lo Lhe oLher slde of Lhe colnŦ aLa warehouse developersţ lncludlng Lhose
who do daLa warehouse modellngţ do have Lo Lake lnLo accounL LhaL Lhe daLa warehouse 'lnpuL
conLexL' conslsLs of a legacy daLa processlng envlronmenL resldlng ln a legacy buslness process
envlronmenLŦ 8equlred daLa may noL be avallable or perhaps cannoL be capLured aL Lhe
sufflclenL level of deLallţ unless money and efforL are spenL changlng Lhe legacy lnpuL
envlronmenLŦ aLa warehouse enablemenL pro[ecLs Lherefore ofLen geL lnvolved wlLh buslness
process and source appllcaLlon reenglneerlngŦ
All of Lhls has a fundamenLal lmpacL on Lhe modellng process as well as on Lhe Lechnlques used
for produclng Lhe daLa warehouse modelţ lncludlng Lhe 'bag of Lrlcks' (heurlsLlcsţ guldellnesţ
meLrlcsţ eLcŦ) Lhe daLa warehouse modeler uses Lo make Lhlngs happen Lhe way Lhey should

SeLLlng up a aLa Warehouse ,odellng Approach Cne of Lhe real challenges LhaL a daLa
warehouse modellng experL faces ls Lo comblne sulLable modellng Lechnlques ln an approach
LhaL ls endŴuser focusedţ leads Lo a loglcally lnLegraLed hlsLorlcal daLa organlzaLlonţ supporLs
Lhe dellvery of conslsLenL lnformaLlon Lo end usersţ and ls flexlble and scalableŦ
@he maln requlremenLs for a solld daLa warehouse modellng approach can be summarlzed as
O lL has Lo lncorporaLe several dlfferenL modellng Lechnlques ln a wellŴbalanced and
lnLegraLed approachŦ ln Lhls chapLerţ we lnvesLlgaLe a number of modellng Lechnlques
LhaL we belleve are core Lechnlques for daLa warehouse modellngŦ
O ach modellng Lechnlque should have lLs own area of appllcablllLyŦ Cbvlouslyţ Lhe fewer
Lechnlques LhaL are blended and lnLegraLed lnLo a modellng processţ Lhe easler Lhe
process becomesŦ ,odellng Lechnlques wlLh a broad scope of appllcablllLy Lherefore are
hlghly recommendedŦ @hls chapLer should help you ln selecLlng sulLable modellng
Lechnlques and recognlzlng Lhelr scope of appllcablllLyŦ
O lL ls of Lhe uLmosL lmporLance LhaL end users see a slngleţ wellŴlnLegraLedţ and slmpleŴ
LoŴlnLerpreL loglcal model of Lhe daLa warehouseŦ @hls loglcal model ls one of Lhe
cenLerpleces of Lhe daLa warehouse meLadaLaŦ SlmpllclLy for Lhe end user and
lnLegraLlon and consolldaLlon of Lhe hlsLorlcal daLa are key prlnclples LhaL Lhe modellng
approach should help provldeŦ
O 8ecalllng LhaL a daLa warehouse envlronmenL ls an organlc Lhlng and LhaL fuzzlness and
lncompleLeness are lnherenL characLerlsLlcs of daLa warehouse enablemenLţ any
approach Lo daLa warehouse modellng should be flexlble and provlde supporL for an
evoluLlonary daLa warehouse developmenL processŦ nd users musL be lnvolved
maxlmally ln Lhe modellng process lLselfŦ @hereforeţ Lhe modellng Lechnlques and Lhe
resulLs Lhey produce musL be undersLandable for lnformaLlon analysLs who haveţ by
deflnlLlonţ no Lechnlcal l@ backgroundŦ knowlng LhaLţ ln addlLlonţ flexlblllLy and supporL
for evoluLlonary developmenL call for Lhe supporL of consLanL changes and exLenslons
applled Lo Lhe daLa warehouse modelŦ Poweverţ provldlng Lhls flexlblllLy ln seLLlng up a
daLa warehouse modellng approach ls very much a challengeŦ
@ools can have a slgnlflcanL lmpacL on Lhe esLabllshmenL of a daLa warehouse modellng
approach for an organlzaLlonŦ aLa modellng Lools and meLadaLa caLalogs are lmporLanL for Lhe
daLa warehouse modellng approachŦ @hey usually have a slgnlflcanL lmpacL on Lhe cholce of
modellng LechnlquesŦ
AlLhough lL ls noL Lhe lnLenLlon of Lhls chapLer Lo fully descrlbe a daLa warehouse modellng
approachţ we do wanL Lo conLrlbuLe Lo Lhe esLabllshmenL of a reallsLlc and wellŴsLrucLured daLa
warehouse modellng approachŦ ln Lhe nexL secLlonţ we presenL a survey of Lhe mosL lmporLanL
Lechnlques LhaL should somehow be lncorporaLed ln an overall modellng approachŦ AL Lhe end
of Lhls chapLerţ we brlng Lhe dlfferenL elemenLs of Lhe 'modellng puzzle' LogeLher and conslder
how Lo seL up a daLa warehouse modellng approachŦ
Þr|nc|pa| Data Warehouse Mode||ng 1echn|ques
LlsLed below are Lhe prlnclpal modellng Lechnlques (beyond whaL can be consldered
'LradlLlonal' daLabase modellng Lechnlquesţ such as 8 modellng and normallzaLlon Lechnlques)
LhaL should be arranged lnLo an overall daLa warehouse modellng approachŦ
Ŧ lmenslonal daLa modellng
2Ŧ @emporal daLa modellng
3Ŧ @echnlques for bulldlng generlc and reusable daLa models (someLlmes referred Lo as paLLernŴ
orlenLed daLa modellng)Ŧ @hese Lechnlques are much more exLenslvely and frequenLly
consldered ln Lhe conLexL of sofLware developmenLŦ aLa warehouse modelers should learn Lo
apply some of Lhese Lechnlquesţ alLhough a LransposlLlon from a sofLware developmenL conLexL
Lo a daLa warehouse developmenL conLexL may noL always be obvlousŦ
4Ŧ aLa archlLecLure modellng conslsLs of a comblnaLlon of LopŴdown enLerprlse daLa modellng
Lechnlques and boLLomŴup (deLalled) model lnLegraLlonŦ aLa archlLecLure modellng also should
provlde Lhe Lechnlques for loglcal daLa parLlLlonlngţ granularlLy modellngţ and bulldlng
mulLlLlered daLa archlLecLuresŦ
CLher modellng Lechnlques may have Lo be added Lo Lhe overall approachŦ lfţ for exampleţ Lhe
daLa warehouse also lncorporaLes mulLlmedla daLa Lypes such as documenLs and lmages or lf
endŴusers are lnvolved ln oLher Lypes of lnformaLlonal appllcaLlons Lhan dlmenslonal daLa
@o keep Lhe scope and complexlLy of Lhls chapLer wlLhln reallsLlc boundarlesţ we concenLraLe
our aLLenLlon on dlmenslonal daLa modellng and Lemporal daLa modellngŦ We presenL Lhe base
Lechnlques for dlmenslonal and Lemporal daLa modellngţ andţ ln Lhe course of Lhe dlscusslonţ
we commenL on Lechnlques for bulldlng generlc and reusable daLa modelsŦ 8e aware LhaL much
more can be sald abouL dlmenslonal and Lemporal daLa modellng Lhan whaL we say ln Lhls
chapLer and LhaL we only scraLch Lhe surface of Lhe Lechnlques and approaches for bulldlng
generlc and reusable daLa modelsŦ

ÇŦ4 D|scuss the categor|es |n wh|ch data |s d|v|ded before structur|ng |t |nto data ware
A daLa warehouse lsţ by deflnlLlonţ a sub[ecLŴorlenLedţ lnLegraLedţ LlmeŴvarlanL collecLlon of
daLa Lo enable declslon maklng across a dlsparaLe group of usersŦ Cne of Lhe mosL baslc
concepLs of daLa warehouslng ls Lo cleanţ fllLerţ Lransformţ summarlzeţ and aggregaLe Lhe daLaţ
and Lhen puL lL ln a sLrucLure for easy access and analysls by Lhose usersŦ 8uLţ LhaL sLrucLure
musL flrsL be deflned and LhaL ls Lhe Lask of Lhe daLa warehouse modelŦ ln modellng a daLa
warehouseţ we begln by archlLecLlng Lhe daLaŦ 8y archlLecLlng Lhe daLaţ we sLrucLure and locaLe
lL accordlng Lo lLs characLerlsLlcsŦ
ln Lhls chapLerţ we revlew Lhe Lypes of daLa used ln daLa warehouslng and provlde some baslc
hlnLs and Llps for archlLecLlng LhaL daLaŦ We Lhen dlscuss approaches Lo developlng a daLa
warehouse daLa model along wlLh some of Lhe conslderaLlonsŦ
Pavlng an enLerprlse daLa model (,) avallable would be very helpfulţ buL noL requlredţ ln
developlng Lhe daLa warehouse daLa modelŦ lor exampleţ from Lhe , you can derlve Lhe
general scope and undersLandlng of Lhe buslness requlremenLsŦ @he , would also leL you
relaLe Lhe daLa elemenLs and Lhe physlcal deslgn Lo a speclflc area of lnLeresLŦ
aLa granularlLy ls one of Lhe mosL lmporLanL crlLerla ln archlLecLlng Lhe daLaŦ Cn one handţ
havlng daLa of a hlgh granularlLy can supporL any queryŦ Poweverţ havlng a large volume of
daLa LhaL musL be manlpulaLed and managed could be an lssue as lL would lmpacL response
LlmesŦ Cn Lhe oLher handţ havlng daLa of a low granularlLy would supporL only speclflc querlesŦ
8uLţ wlLh Lhe reduced volume of daLaţ you would reallze slgnlflcanL lmprovemenLs ln
@he slze of a daLa warehouse varlesţ buL Lhey are Lyplcally qulLe largeŦ @hls ls especlally Lrue as
you conslder Lhe lmpacL of sLorlng volumes of hlsLorlcal daLaŦ @o deal wlLh Lhls lssue you have
Lo conslder daLa parLlLlonlng ln Lhe daLa archlLecLureŦ We conslder boLh loglcal and physlcal
parLlLlonlng Lo beLLer undersLand and malnLaln Lhe daLaŦ ln loglcal parLlLlonlng of daLaţ you
should conslder Lhe concepL of sub[ecL areasŦ @hls concepL ls Lyplcally used ln mosL lnformaLlon
englneerlng (l) meLhodologlesŦ We dlscuss sub[ecL areas and Lhelr dlfferenL deflnlLlons ln more
deLall laLer ln Lhls chapLerŦ

ln sLrucLurlng Lhe daLaţ for daLa warehouslngţ we can dlsLlngulsh Lhree baslc Lypes of daLa LhaL
can be used Lo saLlsfy Lhe requlremenLs of an organlzaLlonť
O 8ealŴLlme daLa
O erlved daLa
O 8econclled daLa
ln Lhls secLlonţ we descrlbe Lhese Lhree Lypes of daLa accordlng Lo usageţ scopeţ and currencyŦ
?ou can conflgure an approprlaLe daLa warehouse based on Lhese Lhree daLa Lypesţ wlLh
conslderaLlon for Lhe requlremenLs of any parLlcular lmplemenLaLlon efforLŦ ependlng on Lhe
naLure of Lhe operaLlonal sysLemsţ Lhe Lype of buslnessţ and Lhe number of users LhaL access
Lhe daLa warehouseţ you can comblne Lhe Lhree Lypes of daLa Lo creaLe Lhe mosL approprlaLe
archlLecLure for Lhe daLa warehouseŦ
ea|Ŵ1|me Data
8ealŴLlme daLa represenLs Lhe currenL sLaLus of Lhe buslnessŦ lL ls Lyplcally used by operaLlonal
appllcaLlons Lo run Lhe buslness and ls consLanLly changlng as operaLlonal LransacLlons are
processedŦ 8ealŴLlme daLa ls aL a deLalled levelţ meanlng hlgh granularlLyţ and ls usually
accessed ln read/wrlLe mode by Lhe operaLlonal LransacLlonsŦ
noL conflned Lo operaLlonal sysLemsţ realŴLlme daLa ls exLracLed and dlsLrlbuLed Lo
lnformaLlonal sysLems LhroughouL Lhe organlzaLlonŦ lor exampleţ ln Lhe banklng lndusLryţ
where realŴLlme daLa ls crlLlcal for operaLlonal managemenL and LacLlcal declslon maklngţ an
lndependenL sysLemţ Lhe soŴcalled deferred or delayed sysLemţ dellvers Lhe daLa from Lhe
operaLlonal sysLems Lo Lhe lnformaLlonal sysLems (daLa warehouses) for daLa analysls and more
sLraLeglc declslon maklngŦ
@o use realŴLlme daLa ln a daLa warehouseţ Lyplcally lL flrsL musL be cleansed Lo ensure
approprlaLe daLa quallLyţ perhaps summarlzedţ and Lransformed lnLo a formaL more easlly
undersLood and manlpulaLed by buslness analysLsŦ @hls ls because Lhe realŴLlme daLa conLalns
all Lhe lndlvldualţ LransacLlonalţ and deLalled daLa values as well as oLher daLa valuable only Lo
Lhe operaLlonal sysLems LhaL musL be fllLered ouLŦ ln addlLlonţ because lL may come from
mulLlple dlfferenL sysLemsţ realŴLlme daLa may noL be conslsLenL ln represenLaLlon and
meanlngŦ As an exampleţ Lhe unlLs of measureţ currencyţ and exchange raLes may dlffer among
sysLemsŦ @hese anomalles musL be reconclled before loadlng lnLo Lhe daLa warehouseŦ
Der|ved Data
erlved daLa ls daLa LhaL has been creaLed perhaps by summarlzlngţ averaglngţ or aggregaLlng
Lhe realŴLlme daLa Lhrough some processŦ erlved daLa can be elLher deLalled or summarlzedţ
based on requlremenLsŦ lL can represenL a vlew of Lhe buslness aL a speclflc polnL ln Llme or be
a hlsLorlcal record of Lhe buslness over some perlod of LlmeŦ
erlved daLa ls LradlLlonally used for daLa analysls and declslon maklngŦ aLa analysLs seldom
need large volumes of deLalled daLaŤ raLher Lhey need summarles LhaL are much easler for
manlpulaLlon and useŦ ,anlpulaLlng large volumes of aLomlc daLa can also requlre Lremendous
processlng resourcesŦ onslderlng Lhe requlremenLs for lmproved query processlng capablllLyţ
an efflclenL approach ls Lo pre calculaLe derlved daLa elemenLs and summarlze Lhe deLalled daLa
Lo beLLer meeL user requlremenLsŦ fflclenLly processlng large volumes of daLa ln an
approprlaLe amounL of Llme ls one of Lhe mosL lmporLanL lssues Lo resolveŦ

econc||ed Data
8econclled daLa ls realŴLlme daLa LhaL has been cleansedţ ad[usLedţ or enhanced Lo provlde an
lnLegraLed source of quallLy daLa LhaL can be used by daLa analysLsŦ @he baslc requlremenL for
daLa quallLy ls conslsLencyŦ ln addlLlonţ we can creaLe and malnLaln hlsLorlcal daLa whlle
reconclllng Lhe daLaŦ @husţ we can say reconclled daLa ls a speclal Lype of derlved daLaŦ
8econclled daLa ls seldom expllclLly deflnedŦ lL ls usually a loglcal resulL of derlvaLlon operaLlonsŦ
SomeLlmes reconclled daLa ls sLored only as Lemporary flles LhaL are requlred Lo Lransform
operaLlonal daLa for conslsLencyŦ

ÇŦS D|scuss the purpose of execut|ve |nformat|on system |n an organ|zat|on?

An lS ls a Lool LhaL provldes dlrecL onŴllne access Lo relevanL lnformaLlon abouL aspecLs of a
buslness LhaL are of parLlcular lnLeresL Lo Lhe senlor managerŦ,any senlor managers flnd LhaL
dlrecL onŴllne access Lo organlzaLlonal daLa ls helpfulŦ lor exampleţ Þaul lrechţ presldenL of
LockheedŴCeorglaţ monlLored employee conLrlbuLlons Lo companyŴsponsored programs
(unlLed Wayţ blood drlves) as a surrogaLe measure of employee morale (Poudeshel and
WaLsonţ 87)Ŧ Ŧ 8oberL kldderţ C of uracellţ found LhaL producLlvlLy problems were due
Lo salespeople ln Cermany wasLlng Llme calllng on small sLores and Look correcLlve acLlon
(,alnţ 8)Ŧ
lnformaLlon sysLems have long been used Lo gaLher and sLore lnformaLlonţ Lo produce speclflc
reporLs for workersţ and Lo produce aggregaLe reporLs for managersŦ Poweverţ senlor
managers rarely use Lhese sysLems dlrecLlyţ and ofLen flnd Lhe aggregaLe lnformaLlon Lo be of
llLLle use wlLhouL Lhe ablllLy Lo explore underlylng deLalls (WaLson Ǝ 8alnerţ ţ rockeLLţ
An xecuLlve lnformaLlon SysLem (lS) ls a Lool LhaL provldes dlrecL onŴllne access Lo relevanL
lnformaLlon ln a useful and navlgable formaLŦ 8elevanL lnformaLlon ls Llmelyţ accuraLeţ and
acLlonable lnformaLlon abouL aspecLs of a buslness LhaL are of parLlcular lnLeresL Lo Lhe senlor
managerŦ @he useful and navlgable formaL of Lhe sysLem means LhaL lL ls speclflcally deslgned Lo
be used by lndlvlduals wlLh llmlLed Llmeţ llmlLed keyboardlng sklllsţ and llLLle dlrecL experlence
wlLh compuLersŦ An lS ls easy Lo navlgaLe so LhaL managers can ldenLlfy broad sLraLeglc lssuesţ
and Lhen explore Lhe lnformaLlon Lo flnd Lhe rooL causes of Lhose lssuesŦ

xecuLlve lnformaLlon SysLems dlffer from LradlLlonal lnformaLlon sysLems ln Lhe followlng

O are speclflcally Lallored Lo execuLlveƌs lnformaLlon needs
O are able Lo access daLa abouL speclflc lssues and problems as well as aggregaLe reporLs
O provlde exLenslve onŴllne analysls Lools lncludlng Lrend analyslsţ excepLlon reporLlng Ǝ
ƍdrlllŴdownƍ capablllLy
O access a broad range of lnLernal and exLernal daLa
O are parLlcularly easy Lo use (Lyplcally mouse or Louch screen drlven)
O are used dlrecLly by execuLlves wlLhouL asslsLance
O presenL lnformaLlon ln a graphlcal form

Þurpose of Lxecut|ve Informat|on System

@he prlmary purpose of an xecuLlve lnformaLlon SysLem ls Lo supporL managerlal learnlng
abouL an organlzaLlonţ lLs work processesţ and lLs lnLeracLlon wlLh Lhe exLernal envlronmenLŦ
lnformed managers can ask beLLer quesLlons and make beLLer declslonsŦ vandenbosch and Puff
(2) from Lhe unlverslLy of WesLern CnLarlo found LhaL anadlan flrms uslng an lS achleved
beLLer buslness resulLs lf Lhelr lS promoLed managerlal learnlngŦ llrms wlLh an lS deslgned Lo
malnLaln managersƌ ƍmenLal modelsƍ were less effecLlve Lhan flrms wlLh an lS deslgned Lo
bulld or enhance managersƌ knowledgeŦ
@hls dlsLlncLlon ls supporLed by ÞeLer Senge ln @he llfLh lmenslonŦ Pe lllusLraLes Lhe beneflLs
of learnlng abouL Lhe behavlour of sysLems versus slmply learnlng more abouL Lhelr sLaLesŦ
Learnlng more abouL Lhe sLaLe of a sysLem leads Lo reacLlve managemenL flxesŦ @yplcally Lhese
reacLlons feed lnLo Lhe underlylng sysLem behavlour and conLrlbuLe Lo a downward splralŦ
Learnlng more abouL sysLem behavlour and how varlous sysLem lnpuLs and acLlons lnLerrelaLe
wlll allow managers Lo make more proacLlve changes Lo creaLe longŴLerm lmprovemenLŦ
A secondary purpose for an lS ls Lo allow Llmely access Lo lnformaLlonŦ All of Lhe lnformaLlon
conLalned ln an lS can Lyplcally be obLalned by a manager Lhrough LradlLlonal meLhodsŦ
Poweverţ Lhe resources and Llme requlred Lo manually complle lnformaLlon ln a wlde varleLy of
formaLsţ and ln response Lo ever changlng and ever more speclflc quesLlons usually lnhlblL
managers from obLalnlng Lhls lnformaLlonŦ CfLenţ by Lhe Llme a useful reporL can be complledţ
Lhe sLraLeglc lssues faclng Lhe manager have changedţ and Lhe reporL ls never fully uLlllzedŦ
@lmely access also lnfluences learnlngŦ When a manager obLalns Lhe answer Lo a quesLlonţ LhaL
answer Lyplcally sparks oLher relaLed quesLlons ln Lhe managerƌs mlndŦ lf Lhose quesLlons can
be posed lmmedlaLelyţ and Lhe nexL answer reLrlevedţ Lhe learnlng cycle conLlnues unbrokenŦ
uslng LradlLlonal meLhodsţ by Lhe Llme Lhe answer ls producedţ Lhe conLexL of Lhe quesLlon may
be losLţ and Lhe learnlng cycle wlll noL conLlnueŦ
A Lhlrd purpose of an lS ls commonly mlspercelvedŦ An lS has a powerful ablllLy Lo dlrecL
managemenL aLLenLlon Lo speclflc areas of Lhe organlzaLlon or speclflc buslness problemsŦ
Some managers see Lhls as an opporLunlLy Lo dlsclpllne subordlnaLesŦ Some subordlnaLes fear
Lhe dlrecLlve naLure of Lhe sysLem and spend a greaL deal of Llme Lrylng Lo ouLwlL or dlscredlL lLŦ
nelLher of Lhese behavlors ls approprlaLe or producLlveŦ 8aLherţ managers and subordlnaLes
can work LogeLher Lo deLermlne Lhe rooL causes of lssues hlghllghLed by Lhe lSŦ
@he powerful focus of an lS ls due Lo Lhe maxlm ƍwhaL geLs measured geLs doneŦƍ ,anagers
are parLlcularly aLLenLlve Lo concreLe lnformaLlon abouL Lhelr performance when lL ls avallable
Lo Lhelr superlorsŦ @hls focus ls very valuable Lo an organlzaLlon lf Lhe lnformaLlon reporLed ls
acLually lmporLanL and represenLs a balanced vlew of Lhe organlzaLlonƌs ob[ecLlvesŦ
,lsallgned reporLlng sysLems can resulL ln lnordlnaLe managemenL aLLenLlon Lo Lhlngs LhaL are
noL lmporLanL or Lo Lhlngs whlch are lmporLanL buL Lo Lhe excluslon of oLher equally lmporLanL
LhlngsŦ lor exampleţ a producLlon reporLlng sysLem mlghL lead managers Lo emphaslze volume
of work done raLher Lhan quallLy of workŦ Worse yeLţ producLlvlLy mlghL have llLLle Lo do wlLh
Lhe organlzaLlonƌs overrldlng cusLomer servlce ob[ecLlvesŦ

ÇŦ6 D|scuss the cha||enges |nvo|ved |n data |ntegrat|on and coord|nat|on process?
Cne of Lhe mosL fundamenLal challenges ln Lhe process of daLa lnLegraLlon ls seLLlng reallsLlc
expecLaLlonsŦ @he Lerm daLa lnLegraLlon con[ures a perfecL coordlnaLlon of dlverslfled
daLabasesţ sofLwareţ equlpmenLţ and personnel lnLo a smooLhly funcLlonlng alllanceţ free of Lhe
perslsLenL headaches LhaL mark less comprehenslve sysLems of lnformaLlon managemenLŦ @hlnk
@he requlremenLs analysls sLage offers one of Lhe besL opporLunlLles ln Lhe process Lo recognlze
and dlgesL Lhe full scope of complexlLy of Lhe daLa lnLegraLlon LaskŦ @horough aLLenLlon Lo Lhls
analysls ls posslbly Lhe mosL lmporLanL lngredlenL ln creaLlng a sysLem LhaL wlll llve Lo see
adopLlon and maxlmum useŦ
As Lhe fleld of daLa lnLegraLlon progressesţ howeverţ oLher common lmpedlmenLs and
compensaLory soluLlons wlll be easlly ldenLlfledŦ urrenL lnLegraLlon pracLlces have already
hlghllghLed a few famlllar challenges as well as sLraLegles Lo address Lhemţ as ouLllned below Ŧ

eterogeneous Data
lor mosL LransporLaLlon agenclesţ daLa lnLegraLlon lnvolves synchronlzlng huge quanLlLles of
varlableţ heLerogeneous daLa resulLlng from lnLernal legacy sysLems LhaL vary ln daLa formaLŦ
Legacy sysLems may have been creaLed around flaL flleţ neLworkţ or hlerarchlcal daLabasesţ
unllke newer generaLlons of daLabases whlch use relaLlonal daLaŦ aLa ln dlfferenL formaLs from
exLernal sources conLlnue Lo be added Lo Lhe legacy daLabases Lo lmprove Lhe value of Lhe
lnformaLlonŦ ach generaLlonţ producLţ and homeŴgrown sysLem has unlque demands Lo fulflll
ln order Lo sLore or exLracL daLaŦ So daLa lnLegraLlon can lnvolve varlous sLraLegles for coplng
wlLh heLerogenelLyŦ ln some casesţ Lhe efforL becomes a ma[or exerclse ln daLa
homogenlzaLlonţ whlch may noL enhance Lhe quallLy of Lhe daLa offeredŦ
8ad Data
aLa quallLy ls a Lop concern ln any daLa lnLegraLlon sLraLegyŦ Legacy daLa musL be cleaned up
prlor Lo converslon and lnLegraLlonţ or an agency wlll almosL cerLalnly face serlous daLa
problems laLerŦ Legacy daLa lmpurlLles have a compoundlng effecLŤ by naLureţ Lhey Lend Lo
concenLraLe around hlgh volume daLa usersŦ
lf Lhls lnformaLlon ls corrupLţ soţ Looţ wlll be Lhe declslons made from lLŦ lL ls noL unusual for
undlscovered daLa quallLy problems Lo emerge ln Lhe process of cleanlng lnformaLlon for use by
Lhe lnLegraLed sysLemŦ @he lssue of bad daLa leads Lo procedures for regularly audlLlng Lhe
quallLy of lnformaLlon usedŦ 8uL who holds Lhe ulLlmaLe responslblllLy for Lhls [ob ls noL always
Lack of Storage Capac|ty
@he unanLlclpaLed need for addlLlonal performance and capaclLy ls one of Lhe mosL common
challenges Lo daLa lnLegraLlonţ parLlcularly ln daLa warehouslngŦ @wo sLorageŴrelaLed
requlremenLs generally come lnLo playť exLenslblllLy and scalablllLyŦ AnLlclpaLlng Lhe exLenL of
growLh ln an envlronmenL ln whlch Lhe need for sLorage can lncrease exponenLlally once a
sysLem ls lnlLlaLed drlves fears LhaL Lhe sLorage cosL wlll exceed Lhe beneflL of daLa lnLegraLlonŦ
lnLroduclng such masslve quanLlLles of daLa can push Lhe llmlLs of hardware and sofLwareŦ @hls
may force developers Lo lnsLlgaLe cosLly flxes lf archlLecLure for processlng much larger
amounLs of daLa musL be reLroflLLed lnLo Lhe planned sysLemŦ
Unant|c|pated Costs
aLa lnLegraLlon cosLs are fueled largely by lLems LhaL are dlfflculL for Lhe unlnlLlaLed Lo
quanLlfyţ and Lhus predlcLŦ @hese mlghL lncludeť
-Labor cosLs for lnlLlal plannlngţ evaluaLlonţ programmlng and addlLlonal daLa acqulslLlon
-SofLware and hardware purchases
-unanLlclpaLed Lechnology changes/advances
-8oLh labor and Lhe dlrecL cosLs of daLa sLorage and malnLenance
lL ls lmporLanL Lo noLe LhaLţ regardless of efforLs Lo sLreamllne malnLenanceţ Lhe reallLles of a
fully funcLlonlng daLa lnLegraLlon sysLem may demand a greaL deal more malnLenance Lhan
could be anLlclpaLedŦ
unreallsLlc esLlmaLlng can be drlven by an overly opLlmlsLlc budgeLţ parLlcularly ln Lhese Llmes
of budgeL shorLfall and dolng more wlLh lessŦ ,ore usersţ more analysls needs and more
complex requlremenLs may drlve performance and capaclLy problemsŦ LlmlLed resources may
cause pro[ecL Llmellnes Lo be exLendedţ wlLhouL commensuraLe fundlngŦ unanLlclpaLed lssuesţ
or new lssuesţ may call for expenslve consulLlng helpŦ And Lhe dynamlc aLmosphere of Lodayƌs
LransporLaLlon agency musL be Laken lnLo accounLţ ln whlch lack of sLaffţ changes ln buslness
processesţ problems wlLh hardware and sofLwareţ and shlfLlng leadershlp can drlve addlLlonal
@he lnvesLmenL ln Llme and labor requlred Lo exLracLţ cleanţ loadţ and malnLaln daLa can creep
lf Lhe quallLy of Lhe daLa presenLed ls weakŦ lL ls noL unusual for Lhls Lo produce unanLlclpaLed
labor cosLs LhaL are raLher alarmlngly ouL of proporLlon Lo Lhe LoLal pro[ecL budgeLŦ
Lack of Cooperat|on from Staff
user groups wlLhln an agency may have developed daLabases on Lhelr ownţ someLlmes
lndependenLly from lnformaLlon sysLems sLaffţ LhaL are hlghly responslve Lo Lhe usersƌ
parLlcular needsŦ lL ls naLural LhaL owners of Lhese funcLlonlng sLandalone unlLs mlghL be
skepLlcal LhaL Lhe new sysLem would supporL Lhelr needs as effecLlvelyŦ
CLher proprleLary lnLeresLs may come lnLo playŦ lor exampleţ dlvlslon sLaff may noL wanL Lhe
daLa Lhey collecL and Lrack Lo be aL all Llmes LransparenLly vlslble Lo headquarLers sLaff wlLhouL
Lhe opporLunlLy Lo address Lhe nuances of whaL Lhe daLa appear Lo showŦ Cwners or users may
fear LhaL hlgher ups wlLhouL appreclaLlon of Lhe pecullarlLles of a glven meLhod of operaLlon
wlll galn more conLrol over how daLa ls collecLed and accessed organlzaLlonŴwldeŦ
ln some agenclesţ Lhe level of personnelţ consulLanLsţ and flnanclal supporL emanaLlng from Lhe
hlghesL echelons of managemenL may be lnsufflclenL Lo dlspel Lhese fears and galn
cooperaLlonŦ @op managemenL musL be fully lnvesLed ln Lhe pro[ecLŦ CLherwlseţ Lhe llkellhood
ls smaller Lhan Lhe sLraLeglc daLa lnLegraLlon plan and Lhe resources assoclaLed wlLh lL wlll be
approvedŦ @he addlLlonal supporL requlred Lo engage and convey Lo everyone ln Lhe agency Lhe
need for and beneflLs of daLa lnLegraLlon ls unllkely Lo flow from leaders who lack awareness of
our commlLmenL Lo Lhe beneflLs of daLa lnLegraLlonŦ
Lack of Data Management Lxpert|se
As more LransporLaLlon agencles naLlonwlde underLake Lhe lnLegraLlon of daLaţ Lhe avallablllLy
of experlenced personnel lncreasesŦ Poweverţ slnce daLa lnLegraLlon ls a mulLlŴyearţ hlghly
complex proposlLlonţ even Lhese leaders may noL have Lhe klnd of experLlse LhaL evolves over a
full pro[ecL llfeŴcycleŦ ommon problems develop aL dlfferenL sLages of Lhe process and Lhese
can beLLer be anLlclpaLed and addressed when key personnel have managed Lhe Lyplcal
varlables of each pro[ecL phaseŦ
Alsoţ Lhe process of Lransferrlng hlsLorlcal daLa from lLs lndependenL source Lo Lhe lnLegraLed
sysLem may beneflL from Lhe knowledge of Lhe manager who orlglnally capLured and sLored Lhe
lnformaLlonŦ Plgh Lurnover ln such poslLlonsţ along wlLh early reLlremenLs and oLher personnel
shlfLs drlven by a hlsLorlcally LlghL budgeL envlronmenLţ may compllcaLe Lhe mlnlng and
preparaLlon of Lhls daLa for convergence wlLh Lhe new sysLemŦ
Þercept|on of Data Integrat|on as an Cverwhe|m|ng Lffort
When LransporLaLlon agencles conslder daLa lnLegraLlonţ one pervaslve noLlon ls LhaL Lhe
analysls of exlsLlng lnformaLlon needs and lnfrasLrucLureţ much less Lhe organlzaLlon of daLa
lnLo vlable channels for lnLegraLlonţ requlres a monumenLal lnlLlal commlLmenL of resources
and sLaffŦ 8esourceŴscarce agencles ldenLlfy Lhls percelved ma[or upfronL overhaul as
ƍunachlevableƍ and ƍdlsrupLlveŦƍ ln addlLlonţ uncerLalnLles abouL fundlng prlorlLles and
poLenLlal shorLfalls can exacerbaLe efforLs Lo move forwardŦ
ÇŦ D|scuss the var|ous components of data ware house?

aLa Warehouse framework sLarLs from exLracLlng daLa from source sysLemsţ Lransformlng and
cleanslng lLţ before loadlng lnLo Lhe reposlLoryŦ lL ends wlLh Lhe daLa belng accessesţ analyzedţ
mlned and dash boarded uslng end user LoolsŦ
@he daLa warehouse archlLecLure ls based on a relaLlonal daLabase managemenL sysLem server
LhaL funcLlons as Lhe cenLral reposlLory for lnformaLlonal daLaŦ CperaLlonal daLa and processlng
ls compleLely separaLed from daLa warehouse processlngŦ @hls cenLral lnformaLlon reposlLory ls
surrounded by a number of key componenLs deslgned Lo make Lhe enLlre envlronmenL
funcLlonalţ manageable and accesslble by boLh Lhe operaLlonal sysLems LhaL source daLa lnLo
Lhe warehouse and by endŴuser query and analysls LoolsŦ
@yplcallyţ Lhe source daLa for Lhe warehouse ls comlng from Lhe operaLlonal appllcaLlonsŦ As Lhe
daLa enLers Lhe warehouseţ lL ls cleaned up and Lransformed lnLo an lnLegraLed sLrucLure and
formaLŦ @he LransformaLlon process may lnvolve converslonţ summarlzaLlonţ fllLerlng and
condensaLlon of daLaŦ 8ecause Lhe daLa conLalns a hlsLorlcal componenLţ Lhe warehouse musL
be capable of holdlng and managlng large volumes of daLa as well as dlfferenL daLa sLrucLures
for Lhe same daLabase over LlmeŦ
@he nexL secLlons look aL Lhe seven ma[or componenLs of daLa warehouslngť
Data Warehouse Database

@he cenLral daLa warehouse daLabase ls Lhe cornersLone of Lhe daLa warehouslng envlronmenLŦ
@hls daLabase ls almosL always lmplemenLed on Lhe relaLlonal daLabase managemenL sysLem
(88,S) LechnologyŦ Poweverţ Lhls klnd of lmplemenLaLlon ls ofLen consLralned by Lhe facL LhaL
LradlLlonal 88,S producLs are opLlmlzed for LransacLlonal daLabase processlngŦ erLaln daLa
warehouse aLLrlbuLesţ such as very large daLabase slzeţ ad hoc query processlng and Lhe need
for flexlble user vlew creaLlon lncludlng aggregaLesţ mulLlŴLable [olns and drlllŴdownsţ have
become drlvers for dlfferenL Lechnologlcal approaches Lo Lhe daLa warehouse daLabaseŦ @hese
approaches lncludeť
-Þarallel relaLlonal daLabase deslgns for scalablllLy LhaL lnclude sharedŴmemoryţ shared
dlskţ or sharedŴnoLhlng models lmplemenLed on varlous mulLlprocessor conflguraLlons
(symmeLrlc mulLlprocessors or S,Þţ masslvely parallel processors or ,ÞÞţ and/or clusLers of
unlŴ or mulLlprocessors)Ŧ
-An lnnovaLlve approach Lo speed up a LradlLlonal 88,S by uslng new lndex sLrucLures
Lo bypass relaLlonal Lable scansŦ
-,ulLldlmenslonal daLabases (,8s) LhaL are based on proprleLary daLabase
LechnologyŤ converselyţ a dlmenslonal daLa model can be lmplemenLed uslng a famlllar 88,SŦ
,ulLlŴdlmenslonal daLabases are deslgned Lo overcome any llmlLaLlons placed on Lhe
warehouse by Lhe naLure of Lhe relaLlonal daLa modelŦ ,8s enable onŴllne analyLlcal
processlng (CLAÞ) Lools LhaL archlLecLurally belong Lo a group of daLa warehouslng componenLs
[olnLly caLegorlzed as Lhe daLa queryţ reporLlngţ analysls and mlnlng LoolsŦ
Sourc|ngţ Acqu|s|t|onţ C|eanup and 1ransformat|on 1oo|s
A slgnlflcanL porLlon of Lhe lmplemenLaLlon efforL ls spenL exLracLlng daLa from operaLlonal
sysLems and puLLlng lL ln a formaL sulLable for lnformaLlonal appllcaLlons LhaL run off Lhe daLa
@he daLa sourclngţ cleanupţ LransformaLlon and mlgraLlon Lools perform all of Lhe converslonsţ
summarlzaLlonsţ key changesţ sLrucLural changes and condensaLlons needed Lo Lransform
dlsparaLe daLa lnLo lnformaLlon LhaL can be used by Lhe declslon supporL LoolŦ @hey produce Lhe
programs and conLrol sLaLemenLsţ lncludlng Lhe C8CL programsţ ,vS [obŴconLrol language
(!L)ţ unlx scrlpLsţ and SCL daLa deflnlLlon language (L) needed Lo move daLa lnLo Lhe daLa
warehouse for mulLlple operaLlonal sysLemsŦ @hese Lools also malnLaln Lhe meLa daLaŦ @he
funcLlonallLy lncludesť
-8emovlng unwanLed daLa from operaLlonal daLabases
-onverLlng Lo common daLa names and deflnlLlons
-sLabllshlng defaulLs for mlsslng daLa
-AccommodaLlng source daLa deflnlLlon changes
@he daLa sourclngţ cleanupţ exLracLţ LransformaLlon and mlgraLlon Lools have Lo deal wlLh some
slgnlflcanL lssues lncludlngť

-aLabase heLerogenelLyŦ 8,Ss are very dlfferenL ln daLa modelsţ daLa access
languageţ daLa navlgaLlonţ operaLlonsţ concurrencyţ lnLegrlLyţ recovery eLcŦ
-aLa heLerogenelLyŦ @hls ls Lhe dlfference ln Lhe way daLa ls deflned and used ln
dlfferenL models Ŵ homonymsţ synonymsţ unlL compaLlblllLy (uŦSŦ vs meLrlc)ţ dlfferenL aLLrlbuLes
for Lhe same enLlLy and dlfferenL ways of modellng Lhe same facLŦ
@hese Lools can save a conslderable amounL of Llme and efforLŦ Poweverţ slgnlflcanL
shorLcomlngs do exlsLŦ lor exampleţ many avallable Lools are generally useful for slmpler daLa
exLracLsŦ lrequenLlyţ cusLomlzed exLracL rouLlnes need Lo be developed for Lhe more
compllcaLed daLa exLracLlon proceduresŦ
Meta data
,eLa daLa ls daLa abouL daLa LhaL descrlbes Lhe daLa warehouseŦ lL ls used for bulldlngţ
malnLalnlngţ managlng and uslng Lhe daLa warehouseŦ ,eLa daLa can be classlfled lnLoť
-@echnlcal meLa daLaţ whlch conLalns lnformaLlon abouL warehouse daLa for use by
warehouse deslgners and admlnlsLraLors when carrylng ouL warehouse developmenL and
managemenL LasksŦ
-8uslness meLa daLaţ whlch conLalns lnformaLlon LhaL glves users an easyŴLoŴundersLand
perspecLlve of Lhe lnformaLlon sLored ln Lhe daLa warehouseŦ
qually lmporLanLţ meLa daLa provldes lnLeracLlve access Lo users Lo help undersLand conLenL
and flnd daLaŦ Cne of Lhe lssues deallng wlLh meLa daLa relaLes Lo Lhe facL LhaL many daLa
exLracLlon Lool capablllLles Lo gaLher meLa daLa remaln falrly lmmaLureŦ @hereforeţ Lhere ls
ofLen Lhe need Lo creaLe a meLa daLa lnLerface for usersţ whlch may lnvolve some dupllcaLlon of
,eLa daLa managemenL ls provlded vla a meLa daLa reposlLory and accompanylng sofLwareŦ
,eLa daLa reposlLory managemenL sofLwareţ whlch Lyplcally runs on a worksLaLlonţ can be used
Lo map Lhe source daLa Lo Lhe LargeL daLabaseŤ generaLe code for daLa LransformaLlonsŤ
lnLegraLe and Lransform Lhe daLaŤ and conLrol movlng daLa Lo Lhe warehouseŦ
As userƌs lnLeracLlons wlLh Lhe daLa warehouse lncreaseţ Lhelr approaches Lo revlewlng Lhe
resulLs of Lhelr requesLs for lnformaLlon can be expecLed Lo evolve from relaLlvely slmple
manual analysls for Lrends and excepLlons Lo agenLŴdrlven lnlLlaLlon of Lhe analysls based on
userŴdeflned LhresholdsŦ @he deflnlLlon of Lhese Lhresholdsţ conflguraLlon parameLers for Lhe
sofLware agenLs uslng Lhemţ and Lhe lnformaLlon dlrecLory lndlcaLlng where Lhe approprlaLe
sources for Lhe lnformaLlon can be found are all sLored ln Lhe meLa daLa reposlLory as wellŦ
Access 1oo|s

@he prlnclpal purpose of daLa warehouslng ls Lo provlde lnformaLlon Lo buslness users for
sLraLeglc declslonŴmaklngŦ @hese users lnLeracL wlLh Lhe daLa warehouse uslng fronLŴend LoolsŦ
,any of Lhese Lools requlre an lnformaLlon speclallsLţ alLhough many end users develop
experLlse ln Lhe LoolsŦ @ools fall lnLo four maln caLegorlesť query and reporLlng Loolsţ appllcaLlon
developmenL Loolsţ onllne analyLlcal processlng Loolsţ and daLa mlnlng LoolsŦ
Cuery and 8eporLlng Lools can be dlvlded lnLo Lwo groupsť reporLlng Lools and managed query
LoolsŦ 8eporLlng Lools can be furLher dlvlded lnLo producLlon reporLlng Lools and reporL wrlLersŦ
ÞroducLlon reporLlng Lools leL companles generaLe regular operaLlonal reporLs or supporL hlghŴ
volume baLch [obs such as calculaLlng and prlnLlng paychecksŦ 8eporL wrlLersţ on Lhe oLher
handţ are lnexpenslve deskLop Lools deslgned for endŴusersŦ
,anaged query Lools shleld end users from Lhe complexlLles of SCL and daLabase sLrucLures by
lnserLlng a meLalayer beLween users and Lhe daLabaseŦ @hese Lools are deslgned for easyŴLoŴ
useţ polnLŴandŴcllck operaLlons LhaL elLher accepL SCL or generaLe SCL daLabase querlesŦ
CfLenţ Lhe analyLlcal needs of Lhe daLa warehouse user communlLy exceed Lhe bullLŴln
capablllLles of query and reporLlng LoolsŦ ln Lhese casesţ organlzaLlons wlll ofLen rely on Lhe
LrledŴandŴLrue approach of lnŴhouse appllcaLlon developmenL uslng graphlcal developmenL
envlronmenLs such as Þower8ullderţ vlsual 8aslc and lorLeŦ @hese appllcaLlon developmenL
plaLforms lnLegraLe well wlLh popular CLAÞ Lools and access all ma[or daLabase sysLems
lncludlng Cracleţ Sybaseţ and lnformlxŦ
CLAÞ Lools are based on Lhe concepLs of dlmenslonal daLa models and correspondlng
daLabasesţ and allow users Lo analyze Lhe daLa uslng elaboraLeţ mulLldlmenslonal vlewsŦ @yplcal
buslness appllcaLlons lnclude producL performance and proflLablllLyţ effecLlveness of a sales
program or markeLlng campalgnţ sales forecasLlng and capaclLy plannlngŦ @hese Lools assume
LhaL Lhe daLa ls organlzed ln a mulLldlmenslonal modelŦ
A crlLlcal success facLor for any buslness Loday ls Lhe ablllLy Lo use lnformaLlon effecLlvelyŦ aLa
mlnlng ls Lhe process of dlscoverlng meanlngful new correlaLlonsţ paLLerns and Lrends by
dlgglng lnLo large amounLs of daLa sLored ln Lhe warehouse uslng arLlflclal lnLelllgenceţ
sLaLlsLlcal and maLhemaLlcal LechnlquesŦ
Data Marts
@he concepL of a daLa marL ls causlng a loL of exclLemenL and aLLracLs much aLLenLlon ln Lhe
daLa warehouse lndusLryŦ ,osLlyţ daLa marLs are presenLed as an alLernaLlve Lo a daLa
warehouse LhaL Lakes slgnlflcanLly less Llme and money Lo bulldŦ Poweverţ Lhe Lerm daLa marL
means dlfferenL Lhlngs Lo dlfferenL peopleŦ A rlgorous deflnlLlon of Lhls Lerm ls a daLa sLore LhaL
ls subsldlary Lo a daLa warehouse of lnLegraLed daLaŦ @he daLa marL ls dlrecLed aL a parLlLlon of
daLa (ofLen called a sub[ecL area) LhaL ls creaLed for Lhe use of a dedlcaLed group of usersŦ A
daLa marL mlghLţ ln facLţ be a seL of denormallzedţ summarlzedţ or aggregaLed daLaŦ
SomeLlmesţ such a seL could be placed on Lhe daLa warehouse raLher Lhan a physlcally separaLe
sLore of daLaŦ ln mosL lnsLancesţ howeverţ Lhe daLa marL ls a physlcally separaLe sLore of daLa
and ls resldenL on separaLe daLabase serverţ ofLen a local area neLwork servlng a dedlcaLed user
groupŦ SomeLlmes Lhe daLa marL slmply comprlses relaLlonal CLAÞ Lechnology whlch creaLes
hlghly denormallzed dlmenslonal model (eŦgŦţ sLar schema) lmplemenLed on a relaLlonal
daLabaseŦ @he resulLlng hypercubes of daLa are used for analysls by groups of users wlLh a
common lnLeresL ln a llmlLed porLlon of Lhe daLabaseŦ
@hese Lypes of daLa marLsţ called dependenL daLa marLs because Lhelr daLa ls sourced from Lhe
daLa warehouseţ have a hlgh value because no maLLer how Lhey are deployed and how many
dlfferenL enabllng Lechnologles are usedţ dlfferenL users are all accesslng Lhe lnformaLlon vlews
derlved from Lhe slngle lnLegraLed verslon of Lhe daLaŦ
unforLunaLelyţ Lhe mlsleadlng sLaLemenLs abouL Lhe slmpllclLy and low cosL of daLa marLs
someLlmes resulL ln organlzaLlons or vendors lncorrecLly poslLlonlng Lhem as an alLernaLlve Lo
Lhe daLa warehouseŦ @hls vlewpolnL deflnes lndependenL daLa marLs LhaL ln facLţ represenL
fragmenLed polnL soluLlons Lo a range of buslness problems ln Lhe enLerprlseŦ @hls Lype of
lmplemenLaLlon should be rarely deployed ln Lhe conLexL of an overall Lechnology or
appllcaLlons archlLecLureŦ lndeedţ lL ls mlsslng Lhe lngredlenL LhaL ls aL Lhe hearL of Lhe daLa
warehouslng concepL ŴŴ LhaL of daLa lnLegraLlonŦ ach lndependenL daLa marL makes lLs own
assumpLlons abouL how Lo consolldaLe Lhe daLaţ and Lhe daLa across several daLa marLs may
noL be conslsLenLŦ
,oreoverţ Lhe concepL of an lndependenL daLa marL ls dangerous ŴŴ as soon as Lhe flrsL daLa
marL ls creaLedţ oLher organlzaLlonsţ groupsţ and sub[ecL areas wlLhln Lhe enLerprlse embark on
Lhe Lask of bulldlng Lhelr own daLa marLsŦ As a resulLţ you creaLe an envlronmenL where
mulLlple operaLlonal sysLems feed mulLlple nonŴlnLegraLed daLa marLs LhaL are ofLen
overlapplng ln daLa conLenLţ [ob schedullngţ connecLlvlLy and managemenLŦ ln oLher wordsţ you
have Lransformed a complex manyŴLoŴone problem of bulldlng a daLa warehouse from
operaLlonal and exLernal daLa sources Lo a manyŴLoŴmany sourclng and managemenL
Data Warehouse Adm|n|strat|on and Management
aLa warehouses Lend Lo be as much as 4 Llmes as large as relaLed operaLlonal daLabasesţ
reachlng LerabyLes ln slze dependlng on how much hlsLory needs Lo be savedŦ @hey are noL
synchronlzed ln real Llme Lo Lhe assoclaLed operaLlonal daLa buL are updaLed as ofLen as once a
day lf Lhe appllcaLlon requlres lLŦ
ln addlLlonţ almosL all daLa warehouse producLs lnclude gaLeways Lo LransparenLly access
mulLlple enLerprlse daLa sources wlLhouL havlng Lo rewrlLe appllcaLlons Lo lnLerpreL and uLlllze
Lhe daLaŦ lurLhermoreţ ln a heLerogeneous daLa warehouse envlronmenLţ Lhe varlous
daLabases reslde on dlsparaLe sysLemsţ Lhus requlrlng lnLerŴneLworklng LoolsŦ @he need Lo
manage Lhls envlronmenL ls obvlousŦ
,anaglng daLa warehouses lncludes securlLy and prlorlLy managemenLŤ monlLorlng updaLes
from Lhe mulLlple sourcesŤ daLa quallLy checksŤ managlng and updaLlng meLa daLaŤ audlLlng and
reporLlng daLa warehouse usage and sLaLusŤ purglng daLaŤ repllcaLlngţ subseLLlng and
dlsLrlbuLlng daLaŤ backup and recovery and daLa warehouse sLorage managemenLŦ
Informat|on De||very System
@he lnformaLlon dellvery componenL ls used Lo enable Lhe process of subscrlblng for daLa
warehouse lnformaLlon and havlng lL dellvered Lo one or more desLlnaLlons accordlng Lo some
userŴspeclfled schedullng algorlLhmŦ ln oLher wordsţ Lhe lnformaLlon dellvery sysLem dlsLrlbuLes
warehouseŴsLored daLa and oLher lnformaLlon ob[ecLs Lo oLher daLa warehouses and endŴuser
producLs such as spreadsheeLs and local daLabasesŦ ellvery of lnformaLlon may be based on
Llme of day or on Lhe compleLlon of an exLernal evenLŦ @he raLlonale for Lhe dellvery sysLems
componenL ls based on Lhe facL LhaL once Lhe daLa warehouse ls lnsLalled and operaLlonalţ lLs
users donƌL have Lo be aware of lLs locaLlon and malnLenanceŦ All Lhey need ls Lhe reporL or an
analyLlcal vlew of daLa aL a speclflc polnL ln LlmeŦ WlLh Lhe prollferaLlon of Lhe lnLerneL and Lhe
World Wlde Web such a dellvery sysLem may leverage Lhe convenlence of Lhe lnLerneL by
dellverlng warehouseŴenabled lnformaLlon Lo Lhousands of endŴusers vla Lhe ublqulLous
worldwlde neLworkŦ
ln facLţ Lhe Web ls changlng Lhe daLa warehouslng landscape slnce aL Lhe very hlgh level Lhe
goals of boLh Lhe Web and daLa warehouslng are Lhe sameť easy access Lo lnformaLlonŦ @he
value of daLa warehouslng ls maxlmlzed when Lhe rlghL lnformaLlon geLs lnLo Lhe hands of
Lhose lndlvlduals who need lLţ where Lhey need lL and Lhey need lL mosLŦ Poweverţ many
corporaLlons have sLruggled wlLh complex cllenL/server sysLems Lo glve end users Lhe access
Lhey needŦ @he lssues become even more dlfflculL Lo resolve when Lhe users are physlcally
remoLe from Lhe daLa warehouse locaLlonŦ @he Web removes a loL of Lhese lssues by glvlng
users unlversal and relaLlvely lnexpenslve access Lo daLaŦ ouple Lhls access wlLh Lhe ablllLy Lo
dellver requlred lnformaLlon on demand and Lhe resulL ls a webŴenabled lnformaLlon dellvery
sysLem LhaL allows users dlspersed across conLlnenLs Lo perform a sophlsLlcaLed buslnessŴ
crlLlcal analysls and Lo engage ln collecLlve declslonŴmaklngŦ

ÇŦ3 D|scuss data extract|on process? What are the var|ous methods be|ng used for data

Data extract|on
aLa xLracLlon ls Lhe acL or Lhe process of exLracLlng daLa ouL ofţ whlch ls usually unsLrucLured
or badly sLrucLuredţ daLa sources for added daLa processlng or daLa sLorage or daLa mlgraLlonŦ
@hls daLa can be exLracLed from Lhe webŦ @he lnLerneL pages ln Lhe hLmlţ xml2 eLc can be
consldered Lo be unsLrucLured daLa source because of Lhe wlde varleLy ln Lhe code sLylesŦ @hls
also lncludes excepLlons and vlolaLlons of Lhe sLandard codlng pracLlcesŦ @he lmporL lnLo Lhe
lnLermedlaLe exLracLlng sysLem can be usually followed by daLa LransformaLlon and posslbly Lhe
lncluslon of meLadaLa prlor Lo exporL Lo anoLher sLage ln Lhe daLa workflowŦ
usually unsLrucLured daLa sources lnclude web pagesţ emallsţ documenLsţ Þlsţ scanned LexLţ
malnframe reporLsţ spool flles eLcŦ xLracLlng Lhe daLa from Lhese unsLrucLured sources has
become a conslderable Lechnlcal challenge where as hlsLorlcally daLa exLracLlon had Lo deal
wlLh changes ln physlcal hardware formaLsŦ ,a[orlLy of Lhe currenL daLa exLracLlon deals wlLh
exLracLlng Lhe daLa from Lhe unsLrucLured daLa sourcesţ and from dlfferenL sofLware formaLsŦ
@he rlslng process of daLa exLracLlon from Lhe web ls also known as Web scraplngŦ
@he process of addlng sLrucLure Lo unsLrucLured daLa can be done ln a number of formsť

O uslng a LexL paLLern maLchlng whlch ls also known as 8egular expresslon Lo recognlse
small or large scale sLrucLureŦ lor exampleţ records ln a reporL and Lhelr relaLed daLa
from headers and fooLersŦ
O uslng Lhe LableŴbased approach Lo recognlse Lhe common secLlons wlLhln a llmlLed
domalnŦ lor exampleţ ln resumes ldenLlfy Lhe sklllsţ prevlous work experlenceţ
quallflcaLlons eLcŦ uslng a sLandard seL of commonly used headlngsŦ lor exampleţ
ducaLlon mlghL be found under ducaLlon or CuallflcaLlon or oursesŦ
O uslng Lhe LexL analyLlcs Lo Lry Lo undersLand Lhe LexL and Lhen llnk lL Lo oLher
o|e of L1L process
Cverv|ew of Lxtract|onţ 1ransformat|onţ and Load|ng (L1L)
@he daLa warehouse should be loaded regularly so LhaL Lhe purpose of faclllLaLlng buslness
analysls can be servedŦ ln order Lo perform Lhls operaLlonţ daLa from one or more operaLlonal
sysLems musL be obLalned and copled lnLo Lhe warehouseŦ @he process of obLalnlng daLa from
Lhe source sysLems and brlnglng lL lnLo Lhe daLa warehouse ls usually called xLracLlonţ
@ransformaLlonţ and Loadlng (@L)Ŧ
@L ls perhaps Loo slmpleţ because lL omlLs Lhe LransporLaLlon phase and lndlcaLes LhaL each of
Lhe oLher phases of Lhe process ls dlfferenLŦ @he whole process along wlLh daLa loadlng ls
referred Lo as @LŦ
@L relaLes Lo a broad process and noL Lo Lhe Lhree wellŴdeflned sLepsŦ
@he meLhodology and Lhe Lasks of @L have been well known for many yearsţ and are noL
essenLlally unlque Lo daLa warehouse envlronmenLs where a wlde varleLy of proprleLary
appllcaLlons and daLabase sysLems are consldered as Lhe l@ backbone of any enLerprlseŦ aLa
has Lo be shared beLween Lhe appllcaLlons or sysLemsţ Lry Lo comblne Lhemţ and glve aL leasL
Lwo appllcaLlons Lhe same plcLure of Lhe worldŦ @hls klnd of daLa sharlng was regularly
addressed by mechanlsms slmllar Lo whaL ls now known as @LŦ
ln aLa warehouse envlronmenLs aparL from exchange Lhere ls addlLlonal burden of lnLegraLlngţ
rearranglng and consolldaLlng daLa over many sysLemsţ Lhusţ provldlng a new comblned
lnformaLlon base for buslness lnLelllgenceŦ lurLhermoreţ Lhe daLa volume ln daLa warehouse
envlronmenL Lends Lo be very blgŦ
ln Lhe @L processţ durlng exLracLlon Lhe deslred daLa wlll be ldenLlfled and exLracLed from
many dlfferenL sources lncludlng daLabase sysLems and appllcaLlonsŦ CfLen lL ls noL posslble Lo
recognlse Lhe parLlcular subseL of lnLeresLţ Lherefore more daLa Lhan requlred has Lo be
exLracLedţ so Lhe recognlLlon of Lhe approprlaLe daLa can be done aL a laLer polnL ln LlmeŦ
ependlng on Lhe source sysLemƌs capablllLlesţ for example ln Lhe operaLlng sysLem resourcesţ
some LransformaLlons can Lake place durlng Lhe exLracLlon processŦ @he slze of Lhe exLracLed
daLa can range from hundreds of kllobyLes up Lo glgabyLesţ dependlng on Lhe source sysLem
and Lhe buslness slLuaLlonŦ @he same ls Lrue for Lhe Llme delLa beLween Lwo loglcally ldenLlcal
exLracLlons where Lhe Llme span can dlffer beLween days/hours and mlnuLes Lo near realŴLlmeŦ
@he Web server's log flles for example can easlly become hundreds of megabyLes ln a very shorL
span of LlmeŦ
AfLer exLracLlng Lhe daLaţ lL has Lo be physlcally moved Lo Lhe LargeL sysLem or Lo an
lnLermedlaLe sysLem for furLher processlngŦ ependlng on Lhe selecLed way of LransporLaLlonţ
some LransformaLlons can be done durlng Lhls processŦ lor exampleţ a SCL sLaLemenL whlch
can dlrecLly access a remoLe LargeL Lhrough a gaLeway can concaLenaLe Lwo columns uslng Lhe
SL@ sLaLemenLŦ
Cffllne xLracLţ @ransformţ and Load (@L)
Þrevlously Lhe one common lnLerface LhaL was glven beLween Lhe dlsslmllar sysLems ln an
organlsaLlon was magneLlc LapeŦ @hey were sLandardlzed and any sysLem could have wrlLLen
Lapes LhaL could be read by oLher sysLemsŦ Soţ Lhe flrsL daLa warehouses were fed by magneLlc
Lapes prepared by dlfferenL sysLems wlLhln Lhe organlsaLlon whlch lefL Lhe problem of daLa
dlsparlLyŦ @here ls ofLen llLLle relaLlon Lo daLa wrlLLen Lo Lape Lo one sysLem Lo daLa wrlLLen by
anoLher sysLemŦ @he daLa warehouse's daLabase was deslgned Lo supporL Lhe analyLlcal
funcLlons necessary for Lhe buslness lnLelllgence funcLlonŦ @he daLabase deslgn was a well
sLrucLured daLabase wlLh complex lndlces Lo supporL Lhe Cnllne AnalyLlcal Þrocesslng (CLAÞ)Ŧ
@he daLabases conflgured for CLAÞ allows Lhe complex analyLlcal and ad hoc querles wlLh qulck
execuLlon LlmeŦ @he daLa glven Lo Lhe daLa warehouse from Lhe enLerprlse sysLem geLs
Lransformed Lo a formaL undersLandable Lo Lhe daLa warehouseŦ @o overcome Lhe problem of
loadlng Lhe daLa lnlLlally Lo Lhe daLa warehouseţ keeplng lL updaLed and resolvlng dlscrepancles
Lhe xLracLţ @ransform and Load (@L) uLlllLles were developedŦ @he followlng flgure 0Ŧ shows
how Lhe daLa can be exLracLed from Lhe source daLabasesţ Lransformed lnLo Lhe common daLa
warehouse formaLţ and loaded lnLo Lhe daLa warehouseť

@he key Lo Lhe success of Lhls approach ls Lhe LransformaLlon funcLlonŦ @he Lransform funcLlon
ls Lhe key Lo Lhe success of Lhls approach and helps Lo apply a serles of rules Lo Lhe exLracLed
daLa so LhaL lL ls correcLly formaLLed for loadlng lnLo Lhe daLa warehouseŦ
@he examples of LransformaLlon rules areť
O SelecLlng Lhe daLa Lo loadŦ
O @ranslaLlng Lhe encoded lLemsŦ
O ncodlng and sLandardlzlng Lhe freeŴform valuesŦ
O erlvlng Lhe new calculaLed values LhaL ls sale prlce ƹ prlce Ŷ dlscounLŦ
O ,erglng of daLa from mulLlple sourcesŦ
O Summarlzlng or aggregaLlng cerLaln rows and columnsŦ
O SpllLLlng a column lnLo mulLlple columns for exampleŴ a commaŴseparaLed llsLŦ
O 8esolvlng Lhe dlscrepancles beLween slmllar daLa lLemsŦ
O valldaLlng Lhe daLaŦ
O nsurlng Lhe daLa conslsLencyŦ
@he @L funcLlon allows Lhe lnLegraLlon of mulLlple daLa sources lnLo a wellŴsLrucLured daLabase
for Lhe use ln complex analysesŦ @he @L process wlll have Lo be execuLed perlodlcally such as
dallyţ weeklyţ or monLhlyţ dependlng on Lhe buslness needsŦ @hls process ls called Lhe offllne
@L because Lhe LargeL daLabase ls noL always updaLedŦ lL ls updaLed perlodlcally on baLch
baslsŦ @hough Lhe offllne @L serves lLs purpose well Lhere are some serlous drawbacks whlch
are as followsť
O @he daLa ln Lhe daLa warehouse could be weeks oldŦ @hereforeţ lL ls helpful for planned
funcLlons buL ls noL parLlcularly adapLable for LacLlcal usesŦ
O @he source daLabase Lyplcally should be lnacLlve durlng Lhe exLracL processŦ Cr elseţ Lhe
LargeL daLabase ls noL ln a conslsLenL sLaLe followlng Lhe loadŦ onslderlng Lhls resulLţ
Lhe appllcaLlons musL be shuL downţ ofLen for hoursŦ
ln onllne @Lţ Lhe funcLlon of @L ls Lo supporL Lhe realŴLlme buslness lnLelllgence whlch should
be conLlnuous and nonŴlnvaslveŦ ln conLrasL Lo offllne @L whlch glves old buL conslsLenL
responses Lo querles Lhe onllne @L glves presenL buL varylng responses Lo successlve querlesŦ
@hls ls because Lhe daLa LhaL lL uses ls conLlnuously updaLed Lo reflecL Lhe presenL sLaLe of Lhe
@he Cffllne @L Lechnology has always served buslnesses for decadesŦ @he lnLelllgence LhaL ls
obLalned from Lhls daLa lnforms longŴLerm reacLlve sLraLeglc declslon maklngŦ Cn Lhe oLher
handţ Lhe shorLŴLerm operaLlonal and proacLlve LacLlcal declslon maklng wlll conLlnue Lo rely on
Var|ous data extract|on techn|ques
@he exLracLlon meLhod has Lo be selecLed based on Lhe source sysLem and also on Lhe buslness
needs ln Lhe LargeL daLa warehouse envlronmenLŦ CfLen Lhereƌs no opLlon Lo add addlLlonal
loglc Lo Lhe source sysLems Lo lmprove an lncremenLal exLracLlon of daLa due Lo Lhe
performance or Lhe lncreased workload of Lhese sysLemsŦ SomeLlmes even Lhe cusLomer
cannoL add anyLhlng Lo an ouLŴofŴLheŴbox appllcaLlon sysLemŦ
@he expecLed amounL of Lhe daLa Lo be exLracLed and Lhe sLage ln Lhe @L process whlch can be
lnlLlal load or malnLenance of daLa may also affecL Lhe declslon of how Lo exLracLŦ 8aslcallyţ Lhe
declslon has Lo be made on how Lo exLracL daLa loglcally and physlcallyŦ

1he Log|ca| Lxtract|on Methods
@here are Lwo klnds of loglcal exLracLlon meLhodsť
O lull xLracLlon
O lncremenLal xLracLlon
u|| Lxtract|on
ln full exLracLlon Lhe daLa ls exLracLed LoLally from Lhe source sysLemŦ Slnceţ Lhls exLracLlon
reflecLs all Lhe daLa whlch ls presenLly avallable on Lhe source sysLem Lhere wlll be no need Lo
keep Lrack of Lhe changes Lo Lhe daLa source slnce Lhe prevlous successful exLracLlonŦ @he
source daLa wlll be glven as lL ls and no addlLlonal loglcal lnformaLlon for example LlmesLamps ls
requlred on Lhe source slLeŦ An example for full exLracLlon may be an exporL flle of a dlsLlncL
Lable or a remoLe SCL sLaLemenL scannlng Lhe compleLe source LableŦ
Incrementa| Lxtract|on
AL a parLlcular polnL ln Llmeţ only Lhe daLa LhaL has been alLered slnce a wellŴdeflned evenL back
ln Lhe hlsLory wlll be exLracLedŦ @hls evenL mlghL be Lhe lasL Llme of exLracLlon or a more
dlfflculL buslness evenL llke Lhe lasL booklng day of a flscal perlodŦ @o recognlse Lhls delLa
change Lhere musL be an opLlon Lo recognlse all Lhe changed lnformaLlon slnce Lhls parLlcular
Llme evenLŦ @hls lnformaLlon can be elLher glven by Lhe source daLa llke an appllcaLlon columnŦ
@hls mlghL reflecL Lhe lasL changed LlmesLamp or a changed Lable where an approprlaLe
addlLlonal mechanlsm wlll keep Lrack of Lhe changes aparL from Lhe orlglnaLlng LransacLlonsŦ ln
mosL of Lhe casesţ uslng Lhe laLLer meLhod means addlng Lhe exLracLlon loglc Lo Lhe source
@here are many daLa warehouses whlch do noL use any changeŴcapLure Lechnlques as parL of
Lhe exLracLlon processŦ lnsLead Lhe enLlre Lable from Lhe source sysLems ls exLracLed Lo Lhe daLa
warehouse or Lo Lhe sLaglng areaţ and Lhese Lables are Lhen compared wlLh a prevlous exLracL
from Lhe source sysLem Lo recognlse Lhe changed daLaŦ @hls approach may noL have lmporLanL
affecL on Lhe source sysLemsţ buL can clearly place an lmporLanL burden on Lhe daLa warehouse
Þhys|ca| Lxtract|on Methods
ependlng on Lhe selecLed loglcal exLracLlon meLhod and Lhe poLenLlals and llmlLaLlons on Lhe
source sldeţ Lhe reLrleved daLa can be physlcally exLracLed uslng Lwo meLhodsŦ @he daLa can be
elLher exLracLed onllne from Lhe source sysLem or from an offllne sLrucLureŦ Such an offllne
sLrucLure may already exlsL or lL may be creaLed by an exLracLlon rouLlneŦ
@he followlng are Lhe meLhods of physlcal exLracLlonť
O Cnllne xLracLlon
O Cffllne xLracLlon
Cn||ne Lxtract|on
ln onllne exLracLlon Lhe daLa ls exLracLed dlrecLly from Lhe source sysLem lLselfŦ @he exLracLlon
process can Lhen connecL dlrecLly Lo Lhe source sysLem Lo access Lhe source Lables Lhemselves
or Lo an lnLermedlaLe sysLem LhaL keeps Lhe daLa ln a preconflgured mannerŦ lor exampleţ
snapshoL logs or change LablesŦ noLe LhaL Lhe mlddle sysLem ls noL essenLlally physlcally
dlfferenL from Lhe source sysLemŦ
WlLh onllne exLracLlonsţ we need Lo conslder lf Lhe dlsLrlbuLed LransacLlons are uslng Lhe
orlglnal source ob[ecLs or Lhe prepared source ob[ecLsŦ
Cff||ne Lxtract|on
@he daLa ls noL exLracLed dlrecLly from Lhe source sysLem buL ls kepL expllclLly ouLslde Lhe
orlglnal source sysLemŦ @he daLa already has an exlsLlng sLrucLure for exampleţ redo logsţ
archlve logs or LransporLable Lable spacesŦ
@he followlng sLrucLures can be conslderedť
O llaL fllesť ln flaL flles Lhe daLa ls ln a deflnedţ generlc formaLŦ @he AddlLlonal lnformaLlon
abouL Lhe source ob[ecL ls requlred for furLher processlngŦ
O ump fllesť ln ump flles Lhe lnformaLlon abouL Lhe conLalnlng ob[ecLs ls lncludedŦ
O 8edo and archlve logsť ln redo and archlve logs Lhe lnformaLlon ls ln a speclalţ addlLlonal
dump flleŦ
O @ransporLable Lablespacesť @ransporLable Lablespaces are a powerful way Lo exLracL and
move large volumes of daLa beLween Cracle daLabasesŦ Cracle orporaLlon suggesLs
LhaL Lhe LransporLable Lablespaces can be used whenever posslbleţ because Lhey can
provlde slgnlflcanL advanLages ln performance and manageablllLy over Lhe oLher
exLracLlon LechnlquesŦ
Change Data Capture
An lmporLanL conslderaLlon ln exLracLlon ls Lhe lncremenLal exLracLlonţ also known hange aLa
apLureŦ lf a daLa warehouse exLracLs daLa from an operaLlonal sysLem on an every nlghL baslsţ
Lhen Lhe daLa warehouse may requlre only Lhe daLa LhaL has changed slnce Lhe lasL exLracLlon
whlch mlghL be Lhe daLa LhaL has been modlfled ln Lhe pasL 24 hoursŦ
When lL ls posslble Lo ably ldenLlfy and exLracL only Lhe mosL recenLly changed daLaţ Lhe
exLracLlon process as well as all Lhe downsLream operaLlons ln Lhe @L process can be much
more efflclenLţ because lL musL exLracL a much smaller volume of daLaŦ unforLunaLelyţ for many
source sysLemsţ recognlzlng Lhe recenLly modlfled daLa can be dlfflculL or lnLruslve Lo Lhe
operaLlon of Lhe sysLemŦ hange aLa apLure ls Lyplcally Lhe mosL demandlng Lechnlcal lssue
ln daLa exLracLlonŦ
@hls ls because Lhe change daLa capLure ls ofLen deslrable as parL of Lhe exLracLlon process and
lL may noL be posslble Lo use Cracleƌs hange aLa apLure mechanlsmŦ
@here are several Lechnlques for lmplemenLlng a selfŴdeveloped change capLure on Cracle
source sysLemsť
O @lmesLamps
O ÞarLlLlonlng
O @rlggers
@hese Lechnlques are based on Lhe characLerlsLlcs of Lhe source sysLemsţ or may need
modlflcaLlons Lo Lhe source sysLemsŦ @husţ each of Lhese Lechnlques musL be carefully valuaLed
by Lhe owners of Lhe source sysLem before Lhe lmplemenLaLlonŦ
ach of Lhese Lechnlques can work ln comblnaLlon wlLh Lhe daLa exLracLlon LechnlquesŦ lor
exampleţ Lhe LlmesLamps can be used wheLher Lhe daLa ls belng unloaded Lo a flle or accessed
Lhrough a dlsLrlbuLed queryŦ
@here are some Lables ln operaLlonal sysLems whlch have LlmesLamp columnsŦ @hls LlmesLamp
Lells Lhe Llme and daLe LhaL a glven row was lasL modlfledŦ lf Lhe Lable ln an operaLlonal sysLem
has columns havlng LlmesLampsţ Lhen Lhe laLesL daLa can be easlly ldenLlfled uslng Lhe
LlmesLamp columnsŦ lor exampleţ Lhe followlng query mlghL help ln exLracLlng Lodayƌs daLa
from an orders Lableť
SL@ * l8C, orders WP8 @8un(AS@(order_daLe AS daLe)ţƌddƌ) ƹ @C_A@(S?SA@ţƌddŴ
lf Lhe LlmesLamp lnformaLlon ls noL Lhere ln an operaLlonal source sysLemţ Lhen lL may noL be
posslble Lo modlfy Lhe sysLem Lo lnclude LlmesLampsŦ Such modlflcaLlons would flrsL requlre
modlfylng Lhe operaLlonal sysLemƌs Lables Lo lnclude a new LlmesLamp column and Lhen
creaLlng a Lrlgger whlch would help Lo updaLe Lhe LlmesLamp column afLer every operaLlon LhaL
modlfles a glven rowŦ
Some source sysLems may use Cracle range parLlLlonlngţ so LhaL Lhe source Lables are
parLlLloned along a daLe keyţ whlch would allow easy ldenLlflcaLlon of new daLaŦ lor exampleţ lf
you are exLracLlng from an orders Lableţ and Lhe orders Lable ls parLlLloned by weekţ Lhen lL ls
easy Lo ldenLlfy Lhe presenL weekƌs daLaŦ
@rlggers can be creaLed ln operaLlonal sysLems Lo keep Lrack of Lhe recenLly updaLed recordsŦ
@hey can Lhen be used along wlLh LlmesLamp columns Lo recognlse Lhe exacL Llme and daLe
when a glven row was lasL modlfledŦ @hls can be done by creaLlng a Lrlgger on each source Lable
LhaL requlres change daLa capLureŦ lollowlng each ,L sLaLemenL LhaL ls execuLed on Lhe
source Lableţ Lhls Lrlgger helps Lo updaLe Lhe LlmesLamp column wlLh Lhe currenL LlmeŦ @husţ
Lhe LlmesLamp column glves Lhe exacL Llme and daLe when a glven row was lasL modlfledŦ
A slmllar lnLernallzed LrlggerŴbased Lechnlque ls used for Cracle maLerlallzed vlew logsŦ @hese
are Lhe logs whlch are used by maLerlallsed vlews Lo ldenLlfy changed daLaţ and Lhese are Lhe
logs whlch are accesslble Lo end usersŦ A maLerlallsed vlew log can be creaLed on each source
Lable whlch would requlre change daLa capLureŦ Whenever Lhere are any modlflcaLlons Lo be
made Lo Lhe source Lableţ a record ls lnserLed lnLo Lhe maLerlallsed vlew log showlng whlch
rows were modlfledŦ ,aLerlallsed vlew logs rely on Lrlggersţ buL also provlde an advanLage ln
Lhe creaLlon and malnLenance of Lhe changeŴdaLa sysLem whlch ls largely managed by CracleŦ
Poweverţ Cracle suggesLs Lhe use of synchronous hange aLa apLure for Lrlgger based
change capLureŦ Slnceţ hange aLa apLure provldes an exLernallsed lnLerface for accesslng
Lhe change lnformaLlon and also provlde a framework for malnLalnlng Lhe dlsLrlbuLlon of Lhe
lnformaLlon Lo varlous cllenLs
@he @rlggerŴbased Lechnlques affecL Lhe performance on Lhe source sysLemsţ and should be
carefully consldered before applylng on a producLlon source sysLemŦ
Lxtract|ng data from the operat|ona| systems
@he daLa ln Lhe operaLlonal sysLem has Lo be locaLed wlLhln Lhe company once Lhe daLa for
analysls purpose has been ldenLlfledŦ @he daLa whlch ls requlred for Lhe warehouse ls exLracLed
from Lhe source operaLlonal sysLems and wrlLLen lnLo Lhe sLaglng area whlch wlll be laLer
LransformedŦ @o mlnlmlse Lhe performance lmpacL on Lhe source daLabase Lhe daLa should be
loaded wlLhouL applylng Lhe LransformaLlons Lo lLŦ
,ore ofLen Lhe operaLlonal sysLem's owners wlll noL leL Lhe warehouse developers Lo dlrecLly
access Lhe sysLems buL can glve perlodlc exLracLsŦ @hese exLracLs wlll usually be ln Lhe form of
flaLţ sequenLlal operaLlng sysLem fllesţ whlch wlll make Lhe sLaglng areaŦ
@he appllcaLlon programs have Lo be developed Lo selecL Lhe flelds and records necessary for
Lhe warehouseŦ lf Lhe daLa ls kepL ln a legacy sysLemţ Lhen lL can be wrlLLen ln C8CL whlch wlll
requlre speclal loglc Lo handle Lhlngs such as Lhe repeaLlng flelds ln Lhe ƍC8CL occurs clauseŦƍ
@he daLa warehouse deslgners have Lo work wlLh Lhe appllcaLlon developers for Lhe CL@Þ
sysLems whlch are usually bullL Lo exLracL scrlpLs whlch provlde Lhe requlred columns and
formaLs of Lhe daLaŦ
As parL of deslgnlng Lhe @L processţ how frequenLly daLa should be exLracLed from Lhe
operaLlonal sysLems should be deLermlnedŦ lL can be aL Lhe end of some Llme perlod or
buslness evenLţ such as aL Lhe end of Lhe day or week or Lhe closlng of Lhe flscal quarLerŦ @he
meanlng of ƍend of Lhe dayƍ or Lhe ƍlasL day of Lhe weekţƍ has Lo be clearly deflned lf Lhe
sysLem ls used across dlfferenL Llme zonesŦ @hls klnd of exLracLlon can be done aL dlfferenL
Llmes for dlfferenL sysLems and consldered Lo be loaded lnLo Lhe warehouse durlng an
upcomlng baLch wlndowŦ AnoLher feaLure of Lhe warehouse deslgn process lnvolves decldlng
whaL level of aggregaLlon has Lo be used Lo answer Lhe buslness querlesŦ @hls also has an effecL
on whaL and how much daLa ls exLracLed and LransporLed across Lhe neLworkŦ
Some of Lhe operaLlonal sysLems may be ln relaLlonal daLabasesţ such as Cracle 8l or lţ Cracle
8dbţ 82/,vSţ ,lcrosofL SCL Serverţ Sybaseţ or lnformlxŦ CLhers can be ln a legacy daLabase
formaLţ such as Lhe l,S or Cracle 8,SŦ CLhers may be ln Lhe vSA,ţ 8,S lndexed fllesţ or
some oLher sLrucLured flle sysLemŦ
lf lL ls posslble Lo access Lhe source sysLems dlrecLlyţ Lhen Lhe daLa can be obLalned ouL by a
varleLy of Lechnlques dependlng on Lhe Lype of sysLem Lhe daLa ls lnŦ lor small amounLs of daLaţ
you can use a gaLeway or Cpen aLabase onnecLlvlLy (C8)Ŧ When Lhere are large amounLs
of daLaţ a cusLom program wlll be dlrecLly connecLed Lo Lhe source daLabase ln Lhe daLabaseƌs
naLlveŦ ,any @L Lools wlll slmpllfy Lhe exLracLlon process by glvlng connecLlvlLy Lo Lhe sourceŦ
@he aLa Lo be used ln Lhe daLa warehouse musL be exLracLed from Lhe operaLlonal sysLems
LhaL wlll have Lhe source daLaŦ aLa should flrsL be exLracLed durlng Lhe daLa warehouse
creaLlonţ and Lhe ongolng perlodlc exLracLlons wlll occur durlng Lhe updaLes of Lhe daLa
warehouseŦ aLa exLracLlon can be a slmple operaLlonţ lf Lhe source daLa ls kepL ln a slngle
relaLlonal daLabaseţ or a very complex operaLlonţ lf Lhe source daLa ls kepL ln mulLlple
heLerogeneous operaLlonal sysLemsŦ @he alm of Lhe daLa exLracLlon process ls Lo brlng all Lhe
source daLa lnLo a commonţ conslsLenL formaL so lL wlll be avallable for loadlng lnLo Lhe daLa
lL ls beLLer lf Lhe daLa ln Lhe source operaLlonal sysLems do noL conLaln valldaLlon errorsŦ lor
exampleţ Lhe purchase records for whlch Lhere are no relevanL cusLomer records Lo recognlse
Lhe purchasers are clearly Lhe errors ln Lhe source daLaţ and should be correcLed ln Lhe source
operaLlonal sysLem before Lhe daLa ls exLracLed for loadlng lnLo Lhe daLa warehouseŦ lL mlghL
be posslble Lo lmplemenL error checklng ln Lhe source operaLlonal sysLem so LhaL Lhe errors can
be deLecLed before exLracLlng Lhe daLa for Lhe daLa warehouseŦ lf such errors occur frequenLlyţ
Lhen Lhe operaLlonal sysLem wlll have Lo be examlned and modlfled Lo reduce such errors
because such errors mlghL affecL Lhe organlzaLlon's buslness as well as Lhe daLa warehouseŦ

lL may noL be posslble Lo ldenLlfy Lhe valldaLlon errors unLll Lhe daLa has been exLracLed from
Lhe operaLlonal sysLemsŦ @hls slLuaLlon can Lake place when Lhe daLa ls exLracLed from mulLlple
daLa sourcesŦ lor exampleţ lnLegraLlng daLa exLracLed from separaLe sales Lracklngţ shlpplngţ
and bllllng sysLems mlghL uncover dlscrepancles LhaL should be addressed ln one or more of Lhe
source sysLemsŦ
?ou may also recognlse Lhe lnconslsLencles oLher Lhan Lhe errors ln Lhe daLa afLer lL has been
exLracLedŦ lor exampleţ dlfferenL Lype of daLa sources may used for dlfferenL codlng sysLems
for Lhe slmllar klnd of daLaŦ ?ou can also use LranslaLlon Lables Lo reconclle Lhe dlfferences
durlng Lhe exLracLlon operaLlon or laLer durlng Lhe LransformaLlon operaLlonsŦ lor exampleţ a
legacy sysLem mlghL code sLaLe provlnce names uslng a LhreeŴcharacLer codeţ whereas anoLher
sysLem mlghL use a LwoŴcharacLer codeŦ @he daLa obLalned from one or boLh of Lhese sysLems
should be LranslaLed lnLo a slngle seL of codes before Lhe daLa ls loaded lnLo Lhe daLa
ln oLher casesţ lnconslsLencles can be dlscovered lf Lhe source sysLems allow freeŴform enLry of
LexL lnformaLlonŦ Such daLa ls ofLen lnLernally noL conslsLenL because dlfferenL daLaŴenLry
personnel can enLer Lhe same daLa ln dlfferenL waysŦ @he lnconslsLenL represenLaLlons of Lhe
same daLa wlll have Lo be reconclled lf such daLa ls used for analyslsŦ lor exampleţ ln a daLa
source LhaL allows freeŴform LexL enLry for Lhe sLaLe or provlnce porLlon of an address Lhen Lhe
sLaLe of llorlda can be enLered as lLţ llaţ llorldaţ or even llorŦ lL may become dlfflculL Lo modlfy
legacy source sysLems Lo lmplemenL a sLandard codlng valldaLlonŦ @he ,anual LransformaLlon
ad[usLmenLs mlghL be necessary Lo reconclle such dlfferences lf Lhe conLrlbuLlng source sysLems
cannoL be modlfledŦ
Data |n Cperat|ona| Systems
@he source sysLems usually sLore daLa ln Lwo waysŦ @he operaLlonal daLa ln Lhe source sysLem
can be classlfled lnLo Lwo broad caLegorlesŦ @he Lype of daLa exLracLlon Lechnlque may depend
on Lhe naLure of each of Lhe followlng Lwo caLegorlesť
O urrenL value
O Þerlodlc SLaLus
urrenL valueť ln currenL value mosL of Lhe aLLrlbuLes ln Lhe source sysLems fall lnLo Lhls
caLegoryŦ Pereţ Lhe sLored value of an aLLrlbuLe shows Lhe value of Lhe aLLrlbuLe aL Lhls momenL
of LlmeŦ @he values of whlch are LranslenL or LranslLoryŦ As Lhe buslness LransacLlons occurţ Lhe
values wlll also changeŦ ÞredlcLlon of how long Lhe presenL value wlll sLay or when lL wlll geL
changed cannoL be madeŦ @hereforeţ Lhe currenL value ls Lhe sLored value of an aLLrlbuLe whlch
occurs aL LhaL momenL of LlmeŦ
Þerlodlc SLaLusť ln perlodlc sLaLus Lhe value of Lhe aLLrlbuLes are sLored as Lhe sLaLus every Llme
a change happensŦ @hls means Lhe sLaLus value ls sLored wlLh Lhe reference Lo Lhe LlmeŦ lor
exampleţ Lhe daLa abouL an lnsurance pollcy ls kepL as Lhe sLaLus daLa of Lhe pollcy aL every
polnL of LlmeŦ So Lhe hlsLory of Lhe changes ls kepL ln Lhe source sysLems LhemselvesŦ @hls
makes Lhe daLa exLracLlon becomes relaLlvely slmpleŦ

ÇŦ4 D|scuss the needs of deve|op|ng CLAÞ too|s |n deta||s?

Cn||ne Ana|yt|ca| Þrocess|ng
Cnllne AnalyLlcal Þrocesslng CLAÞ ls a daLa warehouslng Lool used Lo organlseţ parLlLlon and
summarlze daLa ln Lhe daLa warehouse and daLa marLsŦ
CLAÞ ls an approach Lo respond qulckly Lo mulLlŴdlmenslonal analyLlcal querlesŦ CLAÞ belongs
Lo Lhe caLegory of buslness lnLelllgenceŦ CLAÞ flnds appllcaLlons ln buslness reporLlng for salesţ
markeLlngţ budgeLlng and forecasLlngţ managemenL reporLlngţ buslness process managemenLţ
flnanclal reporLlng and slmllar areasŦ
@he ouLpuL of an CLAÞ query ls dlsplayed as a maLrlxŦ @he dlmenslons form Lhe rows and
columns of Lhe maLrlx and Lhe measures form Lhe valuesŦ CLAÞ creaLes a hypercube of
lnformaLlonŦ @he cube meLadaLa ls usually creaLed from a sLar schema or snowflake schema (Lo
be dlscussed laLer) of Lables ln a relaLlonal daLabaseŦ
@he below flgure deplcLs Lhe schemaLlc represenLaLlon of mulLlŴLlered archlLecLure of aLa

As shown ln above flgure Lhe CLAÞ servers are ln Lhe Lhlrd Ller of Lhe mulLlŴLlered archlLecLure
of daLa warehouslngŦ CLAÞ servers handle Lhe daLa LhaL ls cruclal for Lhe managemenL whlch ls
accessed Lhrough an lLeraLlve analyLlcal lnspecLlonŦ
Character|st|cs of CLAÞ
@he daLa warehouse dealers and few oLher researchers have exLracLed a general deflnlLlon for
CLAÞ whlch ls lAS,lŦ lL sLands for lasLţ Analyslsţ Sharedţ ,ulLldlmenslonalţ and lnformaLlonŦ
LeL us dlscuss Lhese characLerlsLlcs ln deLallŦ
O lasLť @hls refers Lo Lhe ablllLy of CLAÞ Lo respond Lo Lhe user requesLs ln less Lhan 3
secondsŦ @he response Llme for a complex requesLs would probably Lake 20 secondsŦ
@he speed ls achleved by uslng varlous Lechnlques llke speclallsed daLa sLorageţ cerLaln
hardware componenLsţ preŴcalculaLlons and so onŦ
O Analyslsť CLAÞ has Lhe capaclLy of handllng any buslness or sLaLlsLlcal analysls for usersŦ
@he mosL commonly used analysls Lechnlques are sllce and dlce and drlll downŦ
O Sharedť When mulLlple wrlLe access ls granLedţ Lhe sysLem has Lhe ablllLy Lo malnLaln
confldenLlallLy and lock slmulLaneous updaLeŦ @he recenL CLAÞ producLs reallse Lhe
need for wrlLe access and are capable of handllng updaLes from mulLlple users ln a
Llmely orderŦ Shared also refers Lo Lhe ablllLy of Lhe sysLem Lo provlde mulLlple user
access wlLhouL leLLlng Lhe flles Lo dupllcaLeŦ
O ,ulLldlmenslonallLyť lL ls Lhe maln feaLure of CLAÞ producLsŦ ,ulLldlmenslonallLy
requlres organlzlng daLa ln Lhe formaL as per Lhe organlzaLlon's acLual buslness
dlmenslonŦ lor exampleţ ln a markeLlng companyţ Lhe dlmenslons maybe llned on
dlmenslons llke cllenLsţ producLsţ salesţ Llme and so onŦ @he cells conLaln relevanL daLa
aL Lhe lnLersecLlon of dlmenslonsŦ SomeLlmes cells are lefL blankŦ lor exampleţ a cllenL
may noL always buy producLs aL all Llme framesŦ @hls ls called sparslLyŦ
O lnformaLlonť 8efers Lo all Lhe daLa and requlred lnformaLlon for usersŦ @he daLa capaclLy
varles wlLh facLors llke daLa access meLhods and level of daLa dupllcaLlonŦ CLAÞ musL
conLaln Lhe daLa whlch Lhe user requlres and musL provlde efflclenL daLa analysls
lfferenL Lechnlques can be followed Lo aLLaln lAS,l ob[ecLlves whlch lnclude cllenLŴserver
archlLecLureţ Llme serles analyslsţ ob[ecL orlenLaLlonţ parallel processlngţ opLlmlzed daLa
sLorageţ and mulLlŴLhreadlngŦ
CLAÞ 1oo|s
Cnllne AnalyLlcal Þrocesslng falls under Lhe group of sofLware Lools whlch enables Lhe analysls
of daLa sLored ln a daLabaseŦ lL ls ofLen used ln daLa mlnlngŦ uslng Lhls Loolţ user can analyse
varlous dlmenslons of a mulLldlmenslonal daLaŦ lor exampleţ lL provldes boLh Lhe Llme analysls
and Lhe Lrend analysls vlewsŦ @here are Lwo Lypes of CLAÞ LoolsŦ @hey areť
,ulLldlmenslonal CLAÞ (,CLAÞ)ť ln Lhls Lype of CLAÞ a cube ls exLracLed from Lhe relaLlonal
daLa warehouseŦ Cnce Lhe user generaLes a reporL requesLţ Lhe ,CLAÞ Lools responds qulckly
as Lhe daLa ls exLracLedŦ
8elaLlonal CLAÞ (8CLAÞ)ť ln Lhls Lype of CLAÞţ Lhe daLa ls noL exLracLedŦ @he 8CLAÞ englne
behaves llke a smarL SCL generaLorŦ @he 8CLAÞ Lool comes wlLh a ƌeslgnerƌ pleceţ where Lhe
daLa warehouse admlnlsLraLor can noL only speclfy Lhe relaLlonshlp beLween Lhe relaLlonal
Lablesţ buL also how dlmenslonsţ aLLrlbuLesţ and hlerarchles map Lo Lhe daLabase LablesŦ
As of now Lhe 8CLAÞ and ,CLAÞ Lraders are golng ln for Lhe comblnaLlon of boLhŦ ,CLAÞ
Lraders flnd lL necessary Lo geL down Lo low levels of deLalls aL Llmes and 8CLAÞ Lraders flnd lL
lmporLanL Lo dellver resulLs Lo users aL a rapld paceŦ Pence Lhe Lraders flnd lL essenLlal Lo merge
boLh Lhe LoolsŦ
CLAÞ Data Mode|||ng
ln Lhe CLAÞ daLa model Lhe daLa ls vlewed as daLa cube LhaL conslsLs of measures and
dlmenslons as menLloned earllerŦ
ach level of daLa ln Lhe dlmenslon can be arranged as a hlerarchy wlLh levels of deLallŦ lor
exampleţ Lhe dlmenslon Llme can have levels such as daysţ monLhs and yearsŦ ach level ln Lhe
hlerarchy wlll have speclflc values aL a parLlcular glven lnsLanceŦ Whlle vlewlng a daLabaseţ a
user can elLher move up or down Lo vlew less or more deLalled lnformaLlon beLween levelsŦ
Aggregat|on and Storage Mode|s
@he fundamenLal of mulLldlmenslonal navlgaLlon of CLAÞ ls cubes dlmenslonsţ hlerarchyţ and
measuresŦ When daLa ls presenLed ln Lhls mannerţ users can use a complex seL of daLa
@he ldeology of CLAÞ ls Lo provlde conslsLenL response Llmes for all operaLlons Lhe users
requesL forŦ @he lnformaLlon summary ls worked ouL before hand as Lhe daLa ls always
collecLed aL Lhe deLall level onlyŦ @hese preŴcompuLed values are Lhe basls of performance galns
When CLAÞ Lechnology beganţ few daLa warehouslng dealers belleved LhaL Lhe only soluLlon
for CLAÞ appllcaLlon ls a speclallsedţ nonŴrelaLlonal sLorage modelŦ LaLerţ howeverţ Lhe dealers
found LhaL relaLlonal daLabase managemenL sysLem (88,S) could be used for CLAÞ Lhrough
daLabase sLrucLures (SLar and Snowflakes Schema)ţ lndexlng and sLorlng aggregaLesŦ @hey called
Lhelr Lechnology as 8elaLlonal CLAÞ (8CLAÞ)Ŧ @hen Lhe earller Lraders adopLed Lhe name
,ulLldlmenslonal CLAÞ (,CLAÞ)Ŧ
,CLAÞ can perform beLLer Lhan 8CLAÞ Lechnology buL has lssues wlLh scalablllLyŦ 8CLAÞ ls
scalableŦ usLomers prefer 8CLAÞ as lL lmplemenLs exlsLlng relaLlonal daLabase LechnologyŦ
Pybrld CLAÞ (PCLAÞ) has been developedŦ lL comblnes Lhe besL feaLures of 8CLAÞ and ,CLAÞ
archlLecLures LhaL are beLLer performance and hlgh scalablllLyŦ
8u||d|ng the CLAÞ Data Mode|
@he challenge ln bulldlng an CLAÞ daLa model ls Lhe mapplng of Lhe lnlLlal daLabase model Lo
Lhe mulLldlmenslonal modelŦ @hls lnvolves cerLaln programmlng sklllŦ CLAÞ daLabase deslgn has
become an lndlspensable process ln Lhe developmenL of CLAÞ producLs as lL llnks wlLh Lhe
CLAÞ Lechnology belng seL upŦ As a resulLţ CLAÞ daLabase developers are speclallsedţ whlch has
led Lo hlgh cosLs ln developlng appllcaLlons whlch ls concenLraLed aL Lhe daLa deslgn sLageŦ
CLAÞ 1oo|s and the Internet
@he Lwo mosL perslsLenL Loplcs ln compuLlng have been Lhe lnLerneL and daLa warehouslngŦ @he
comblnaLlon of Lhese Lwo ls of hlgh lmporLanceŦ @he reason ls slmpleţ Lhe advanLages of uslng
web for access lncreases wlLh daLa warehouseŦ
@he reasons why lnLerneL ls consldered as a besL declslon supporL medlum ls as followsť
O lnLerneL provldes connecLlvlLy wlLhln and beLween companlesŦ
O lL makes managlng complex admlnlsLraLlve Lasks easy even ln a dlsLrlbuLed
O aLa can be sLored and managed on a server Lhus maklng updaLlng and manlpulaLlng
daLa cenLrally an easy LaskŦ Pence problems wlLh sofLware companles and daLa currency
can be solvedŦ
@he general feaLures of daLa access uslng Lhe lnLerneL are as followsť
O ln Lhe flrsL generaLlonţ cllenLs accessed sLaLlc P@,L pages Lhrough web browsersŦ @hls
was Lhe sLaLlc dlsLrlbuLlon model ln whlch declslon supporL reporLs were sLored as P@,L
requesLs and on requesL from Lhe user Lhe resulLs were dlsplayedŦ @hls ls lnefflclenL as lL
does noL provlde web cllenLs wlLh lnLeracLlve analyLlcal efflclency such as drlll downŦ
O @he second generaLlon uLlllzes a mulLlLler archlLecLure Lo process daLabase querlesŦ @he
web cllenLs submlL Lhe requesL ln Lhe form of a P@,L encoded requesL Lo a web serverŦ
@henţ Lhls web server sends Lhe query ln Lhe form of sLrucLured daLa lnLo Cl (ommon
CaLeway lnLerface) scrlpLŦ lL ls Lhe gaLeway whlch submlLs SCL querles Lo Lhe daLabaseţ
recelves an answerţ Lransforms lL lnLo a P@,L and Lhen sends Lhe page Lo Lhe requesLerŦ
llgure 7Ŧ2 deplcLs Lhe mulLlLler archlLecLure Lo process daLabase querles whlch was used
ln Lhe second generaLlonŦ

O @he Lhlrd generaLlon ls Lhe upcomlng LechnologyŦ Web based appllcaLlon server replaces
P@,L gaLewaysŦ @hese servers download !ava appleLs or Lhe AcLlve x appllcaLlons LhaL
execuLes on Lhe cllenLs endţ or lL may correspond wlLh Lhe appleLs runnlng on Lhe server
sldeŴ servleLsŦ @hls model provldes users wlLh Lhe advanLage of exlsLlng declslon supporL
appllcaLlons wlLhouL Lhe necesslLy Lo load addlLlonal sofLware excepL for a web browserŦ

ÇŦS what do you understand by the term stat|st|ca| ana|ys|s? D|scuss the most |mportant
stat|st|ca| techn|ques?

Stat|st|ca| Þerspect|ve on Data M|n|ng
@he alm of daLa mlnlng and lLs Lools have common characLerlsLlcs as LhaL of classlcal sLaLlsLlcsŦ
aLa mlnlng ls regarded as a collecLlon of meLhods for drawlng lnferences from daLaŦ @he
lnference lncludesţ undersLandlng Lhe paLLerns of correlaLlon and llnks among Lhe daLa or
predlcLlng Lhe fuLure daLa valuesŦ
aLa mlnlng ls Lhe buslness of answerlng quesLlons LhaL have noL been asked yeLŦ lL reaches
deep lnLo daLabaseŦ aLa mlnlng can be classlfled lnLo Lwo caLegorles namely descrlpLlve and
predlcLlve daLa mlnlngŦ
escrlpLlve daLa mlnlng provldes lnformaLlon Lo be acqualnLed wlLh whaL ls happenlng lnslde
Lhe daLa wlLhouL a predeLermlned ldeaŦ ÞredlcLlve daLa mlnlng allows Lhe user Lo submlL
records wlLh unknown fleld valuesţ and Lhe sysLem wlll guess Lhe unknown values based on
prevlous paLLerns dlscovered form Lhe daLabaseŦ
aLa mlnlng models can be caLegorlzed based on Lhe Lasks Lhey performŦ @hey are lasslflcaLlon
and ÞredlcLlonţ lusLerlngţ AssoclaLlon 8ulesŦ lasslflcaLlon and predlcLlon fall under Lhe
predlcLlve model of daLa mlnlng whlle Lhe clusLerlng and assoclaLlon falls under Lhe descrlpLlve
models of daLa mlnlngŦ @he mosL commonly performed Lask ln daLa mlnlng ls classlflcaLlonŦ lL
recognlzes paLLern LhaL ldenLlfles Lhe group Lo whlch an lLem belongs by examlnlng Lhe exlsLlng
lLems LhaL have been already classlfled and lnferred wlLh a seL of rulesŦ lusLerlng ls very slmllar
Lo classlflcaLlonŤ Lhe only dlfference ls LhaL no groups have been predeflnedŦ ÞredlcLlon ls
consLrucLlon and use of a model Lo assess Lhe class or value or value ranges of a glven
unlabeled ob[ecLŦ lorecasLlng ls dlfferenL from predlcLlons as lL esLlmaLes Lhe fuLure value of
conLlnuous varlables based on paLLerns wlLhln Lhe daLaŦ
SLaLlsLlclans have esLabllshed Lechnlques for aLLacklng problems ldenLlfylng paLLern and
summarlzlng Lhe large daLaŦ @here are several sLaLlsLlcal models avallable for deLermlnlng Lhe
relaLlonshlps ln Lhe daLa seL or for predlcLlng Lhe daLa seLŦ @he sLaLlsLlcal models llke clusLer
analyslsţ dlscrlmlnanL analyslsţ and nonparameLrlc regresslon can be used Lo solve huge daLa
problemsŦ Slnce daLa mlnlng Lechnlques can Lackle Lhls problem effecLlvelyţ lL ls urglng
sLaLlsLlclan Lo conslder daLa mlnlng as a branch or a parL of sLaLlsLlcsŦ
SLaLlsLlcs follows an approach LhaL lnvolves speclfylng a model for Lhe probablllLy dlsLrlbuLlon of
Lhe daLa and drawlng Lhe lnferences as probablllLy sLaLemenLsŦ aLa mlnlng follows dlfferenL
approach compared Lo LhaL of classlcal sLaLlsLlcsŦ When daLa mlnlng ls applled Lo Lhe famlllar
sLaLlsLlcal problems such as classlflcaLlon and regresslonţ lL reLalns some dlsLlncL feaLuresŦ
lL ls common Lo have boLh conLlnuous and dlscreLe valued varlables ln a daLa seLŦ @he
mulLlvarlaLe analyses meLhods ln sLaLlsLlcs are deslgned for conLlnuous varlables where as daLa
mlnlng meLhods are deslgned for dlscreLe varlablesŦ ln appllcaLlonsţ where daLa ls a
comblnaLlon of conLlnuous and dlscreLe valuesţ lL ls useful Lo use daLa mlnlng and sLaLlsLlcal
meLhods Lo solve Lhe problem efflclenLlyŦ
aLa mlnlng meLhod ofLen mlnlmlse a loss funcLlon expressed ln Lerms of predlcLlon errorŦ
ross valldaLlon esLlmaLes Lhe predlcLlon errorţ lL ls a Lechnlque known Lo sLaLlsLlcs buL wldely
used ln daLa mlnlng processŦ ,lnlmlzlng predlcLlon error uslng cross valldaLlon ls a powerful
LechnlqueŦ lL can be used ln nesLed fashlon Lo opLlmlse several aspecLs of Lhe appllcaLlon
CreaLer complexlLles of daLa mlnlng meLhods are noL always accepLableŦ SLaLlsLlcal meLhods are
preferable for falrly slmpler appllcaLlon modelŦ @here are slLuaLlons where Lhe usage of daLa
mlnlng Lechnlques does noL glve any progress Lo Lhe glven LaskŦ ln such cases lL ls sulLable Lo
use sLaLlsLlcal meLhodŦ

@o exLracL Lhe necessary daLa from Lhe lnformaLlonţ lL ls very lmporLanL Lhe lnformaLlon
@echnology (l@) Leam and 8uslness lnLelllgence (8l) Leam Lo work LogeLherŦ 8uslness lnLelllgence
sysLems have lnformaLlon @echnology ln lLŦ lnformaLlon Lechnology ls Lhe core parL of all Lhe
buslness lnLelllgence appllcaLlon LhaL deals wlLh gaLherlngţ sLorlngţ sorLlng and analyzlng Lhe
daLa of an organlsaLlonŦ
@he mosL obvlous and lmporLanL buslness lnLelllgence appllcaLlon LhaL dlrecLly wlLh lnformaLlon
Lechnology ls sLaLlsLlcsŦ SLaLlsLlcs helps Lhe managers and execuLlves of an organlsaLlon Lo geL
an ldea of whaL ls happenlngţ whaL has happenedţ and whaL may happen wlLhln Lhelr
SLaLlsLlcs ls a sclenLlflc appllcaLlon of maLhemaLlcal prlnclples for Lhe collecLlonţ analyslsţ and
presenLaLlon of numerlcal daLaŦ lL ls a dlsclpllne LhaL lnvolves Lhe developmenL and appllcaLlon
of meLhods Lo collecLţ analyse and lnLerpreL daLaŦ SLaLlsLlcs can also be referred as Lhe sclence
of learnlng from Lhe daLaŦ
A sLaLlsLlclan ls Lhe one who ls well versed ln successful appllcaLlon of sLaLlsLlcal analyslsŦ
SLaLlsLlclans add Lhelr conLrlbuLlon Lo Lhe sclenLlflc enqulry by applylng maLhemaLlcal and
sLaLlsLlcal knowledge Lo Lhe deslgn of surveys and experlmenLsŦ @hls lncludes collecLlngţ
processlngţ analyzlng Lhe daLa and lnLerpreLlng Lhe resulLsŦ
SLaLlsLlcal knowledge can be applled Lo varlous sub[ecL areasţ such as blologyţ economlcsţ
englneerlngţ medlclneţ publlc healLhţ psychologyţ markeLlngţ educaLlonţ and sporLsŦ ,odern
sLaLlsLlcal meLhod lnvolvesť
O @he deslgn and analysls of experlmenLs and surveysŦ
O @he modlflcaLlon of blologlcalţ soclal and sclenLlflc phenomenonŦ
O @he pracLlcal appllcaLlon of sLaLlsLlcal prlnclples Lo undersLand more abouL Lhe world
around usŦ
,odern sLaLlsLlcal meLhod has become Lhe mosL lmporLanL facLor ln declslon maklng ln areas
such as Lhe medlcalţ blologlcal and soclal sclencesţ economlcsţ flnanceţ markeLlng researchţ
manufacLurlng and managemenLţ governmenLţ research lnsLlLuLes and so onŦ
Need of Stat|st|cs
SLaLlsLlcs ls essenLlal for declslon maklng under uncerLalnLlesŦ lL ls concerned wlLh Lhe mosL
baslc of human needsŦ lL also flnds ouL more abouL Lhe world and how lL operaLes ln face of
varlaLlon and uncerLalnLyŦ Accordlng Lo PŦCŦ Wellsţ ºSLaLlsLlcal Lhlnklng wlll one day be as
necessary for efflclenL clLlzenshlp as Lhe ablllLy Lo read and wrlLeŦ"
SLaLlsLlcs ls requlred Lo place knowledge on a sysLemaLlc evldence baseŦ lL forms Lhe means Lo
communlcaLe Lhe lnformaLlon of Lhe knowledge we knowŦ aLa are known Lo be Lhe crude
lnformaLlon whlch ls Lhe lnLegral parL of mosL areas of human enLerprlseŦ aLa Lhemselves do
noL form knowledgeŦ @here ls sequence of flow Lo form Lhe knowledge from Lhe daLa whlch ls
descrlbed as followsť
Ŧ aLa Lo lnformaLlonť aLa can be consldered as lnformaLlon when Lhe daLa becomes
lmporLanL for declslon problemŦ
2Ŧ lnformaLlon Lo facLsť lnformaLlon can be consldered as facL when Lhe daLa can supporL Lhe
3Ŧ lacL Lo knowledgeť lacL can be consldered as knowledge when Lhe facL becomes useful ln
Lhe successful compleLlon of Lhe declslon processŦ
@he llgure 6Ŧ belowţ shows Lhe sLaLlsLlcal Lhlnklng process based on daLa ln consLrucLlng
sLaLlsLlcal models for Laklng declslon ln Lhe perlod of uncerLalnLlesŦ

S|m||ar|ty Measures
SlmllarlLy measures deLermlne Lhe slmllarlLy beLween Lhe Lwo ob[ecLsŦ SlmllarlLy measure ls an
lmporLanL Lool of buslness lnLelllgence sysLem whlch ls useful ln deLermlnlng Lhe slmllarlLles
beLween Lhe Lwo facLors ln a buslness appllcaLlonŦ @hls helps Lhe user Lo adapL Lhe sulLable
sLeps Lo lmprove Lhe buslness and brlng ln Lhe necessary sLeps Lo achleve buslness lnLelllgence
concepLsŦ When concerned wlLh pure verlflcaLlon and ldenLlflcaLlon appllcaLlon ln a buslnessţ lL
ls very lmporLanL Lo deLermlne wheLher Lhe new LemplaLe maLches wlLh Lhe sLored oneŦ
SlmllarlLy measures deLermlne Lhe maLch beLween Lhe Lwo essenLlal componenLs of a buslness
appllcaLlon whlch helps ln Laklng crlLlcal declslonsŦ
lL glves Lhe slmllarlLy characLerlsLlc beLween Lhe ob[ecLsŦ ln lnLerneLţ all Lhe web pages
represenL Lhe whole daLabaseŦ @hese pages are classlfled lnLo Lwo caLegory lŦeŦ pages LhaL
answer Lhe glven query and Lhose LhaL do noL answerŦ @he pages LhaL answer Lhe query are
more slmllar llke each oLher Lhan Lhose pages whlch do noL answer Lhe queryŦ Cuery sLaLed
deLermlnes Lhe slmllarlLy beLween Lhe pages ln Lhls caseŦ
Dec|s|on 1ree
eclslon Lree ls a sLyle of represenLaLlon whlch deplcLs learnlng from a seL of lndependenL
lnsLancesŦ eclslon Lree adopLs ūlvlde and conquerr approach Lo Lhe problem of learnlngŦ
eclslon Lree ls a slmple and powerful form of mulLlple varlable analysesŦ lL ls an algorlLhm LhaL
ldenLlfles varlous ways of spllLLlng daLa seL lnLo branch llke segmenLsŦ @he segmenLs form an
lnverLed declslon Lree wlLh rooL node aL Lhe LopŦ
eclslon Lree ls a posslble represenLaLlon of a declslon funcLlonŦ lL ls useful when Lhe compleLe
knowledge of daLa ls noL necessary for maklng approprlaLe declslons and when Lhe process of
galnlng daLa ls expenslveŦ eclslon Lree deLermlnes whlch daLa and ln whlch order one should
collecL Lo achleve Lhe effecLlve declslon wlLh mlnlmal average cosLŦ eclslon Lree Lhus
represenLs knowledge and can be used for effecLlve declslonŴmaklngŦ @here exlsL algorlLhms for
auLomaLlc consLrucLlon of declslon LreesŦ AuLomaLlc consLrucLlon of declslon Lrees ls Lhe
LradlLlonal parL of arLlflclal lnLelllgenceŦ
@he declslon Lree ls an analyLlcal Lool of buslness lnLelllgence sysLem LhaL helps buslness
managers Lo resolve Lhe uncerLalnLles ln maklng lnvesLmenL declslonsŦ lL clarlfles Lhe cholcesţ
rlsksţ ob[ecLlves and moneLary galnsŦ lL also helps Lo galn Lhe lnformaLlon requlred ln a buslness
appllcaLlonŦ @he declslonŴLree helps managemenL Lo deLermlne an alLernaLlve polnL ln buslness
LhaL ylelds Lhe greaLesL moneLary galnŦ
eclslon Lree ls a wldely used learnlng meLhodŦ lL ls easy Lo lnLerpreL and can be represenLed ln
ºlf Lhen else" ruleŦ eclslon Lree algorlLhm does noL requlre any prlor knowledge of daLa
dlsLrlbuLlonŦ eclslon Lree has Lhe ablllLy Lo work well on nolsy daLaŦ @hls algorlLhm ls useful ln
dlverse fleld llke classlfylng paLlenLs based on Lhe dlsease ln a hosplLalţ Lo flnd loan appllcanL by
llkellhood of paymenL ln banks and so onŦ
eclslon Lree has Lhe followlng advanLagesť
O lL has reasonable Lralnlng LlmeŦ
O lL allows fasL appllcaLlon and ls easy Lo lnLerpreL and deduceŦ
O lL ls easy Lo lmplemenLŦ
O lL has Lhe ablllLy Lo handle large number of feaLures and daLaŦ
O AparL from havlng varlous advanLagesţ Lhe declslon algorlLhm also has some drawbacks
whlch are menLloned belowť
O lL cannoL handle compllcaLed relaLlonshlp beLween feaLuresŦ
O lL has slmple declslon boundarlesŦ
O lL has loL of mlsslng daLa problemŦ

Neura| Networks
neural neLworks slmulaLlons seem Lo be recenL developmenLŦ neural neLwork has slgnlflcanL
lmpacL on buslness lnLelllgenceŦ lL was esLabllshed before Lhe compuLer came lnLo exlsLenceŦ
Warren ,cullochţ Lhe neurophyslologlsL and WalLer ÞlLs2ţ Lhe loglclan lnvenLed Lhe flrsL
arLlflclal neuron ln 43Ŧ @he Lechnology and Lhe fund durlng Lhose days dld noL allow Lhem Lo
progress much ln Lhelr lnvenLlonŦ @oday Lhe neural neLwork fleld ls beneflLed wlLh a resurgence
of lnLeresL and a correspondlng lncrease ln fundlngŦ
neural neLworks operaLes dlfferenLly and offers a much more powerful and deep daLa analyslsŦ
lL ls a very powerful Lool of buslness lnLelllgenceŦ lmplemenLlng neural neLworkţ user does noL
have Lo locaLe lnLeresLlng paLLerns and relaLlons manuallyŦ @hls wlll ºmlne" Lhe
mulLldlmenslonal daLa lnLelllgenLly ln a seml or full auLomaLlc process and exLracL Lhe useful
flndlngs for buslness appllcaLlonsŦ neural neLwork ls meanL Lo learn and apply Lhe knowledge Lo
lmprove Lhe buslness processŦ

kohonen's Se|fŴCrgan|s|ng Maps
kohonen's SelfŴCrganlslng ,aps (SC,s) have become a promlslng Lechnlque ln daLa mlnlng for
clusLer analyslsŦ @hey are based on unsupervlsed learnlngŦ @he welghLs ln connecLlon are
asslgned wlLh small random numbers aL Lhe beglnnlngŦ @he lncomlng lnpuL vecLors represenLed
by Lhe sample daLa are recelved by Lhe lnpuL neuronsŦ @he lnpuL vecLor ls LransmlLLed Lo Lhe
ouLpuL neurons Lhrough Lhe connecLlonsŦ @he ouLpuL neuron wlLh Lhe welghL mosL slmllar Lo
LhaL of Lhe lnpuL vecLor becomes acLlveŦ
ln Lhe learnlng sLageţ welghLs are asslgned followlng kohonen's ruleŦ WelghL can be updaLed
only Lo Lhe acLlve ouLpuL neurones and Lhelr Lopologlcal nelghborsŦ @he nelghborhood ls large
when lL sLarLs and slowly decreases ln slze over Llme slnce Lhe leanlng raLe ls reduced Lo slze
zero when Lhe learnlng process convergesŦ
Cnce Lhe learnlng process endsţ slmllar seLs of lLems acLlvaLe Lhe same neuronŦ SelfŴCrganlslng
,aps dlvldes Lhe lnpuL seL lnLo slmllar recordsŦ Pence SC, ls referred as a meLhod of clusLer
analysls and lL ls ofLen used ln vecLor humanlsaLlonŦ ln daLa mlnlngţ clusLer Lechnlques based
kohonen's selfŴorganlzlng maps have Lhe followlng advanLages over sLandard sLaLlsLlcal
O aLa mlnlng usually deals wlLh hlghŴdlmenslonal daLaŦ A record ln daLabase normally
conslsLs of a large number of lLemsŦ @he daLa do noL have regular mulLlvarlaLe
dlsLrlbuLlonŦ @hereforeţ Lhe LradlLlonal sLaLlsLlcal meLhods have Lhelr llmlLaLlons and
Lhey are noL effecLlveŦ SelfŴCrganlslng ,aps work wlLh hlghŴdlmenslonal daLa efflclenLlyŦ
O kohonen's selfŴorganlzlng maps provlde means for vlsuallzaLlon of mulLlvarlaLe daLaŦ
@hls ls posslble because Lhe Lwo clusLers of slmllar members acLlvaLe ouLpuL neurons
wlLh small dlsLance ln Lhe ouLpuL layerŦ @he neurons LhaL share a Lopology resemblance
wlll be senslLlve Lo lnpuLs LhaL are slmllarŦ
aLa mlnlng ls human cenLredŦ lL ls lmplemenLed Lhrough knowledge dlscovery loops coupled
wlLh humanŴcompuLer lnLeracLlon and vlsual represenLaLlonŦ @he maln alm ls Lo exLracL novelţ
plauslbleţ relevanL and lnLeresLlng knowledge from Lhe daLabaseŦ SC, can be efflclenLly used
Lo serve Lhls appllcaLlon of daLa mlnlngŦ SC, ls a dynamlc sysLem whlch learns Lo absLracL
sLrucLures ln hlgh dlmenslonal lnpuL space uslng low dlmenslonal space for represenLaLlonŦ A
well deslgned SC, can he used Lo organlse Lhe hlghŴdlmenslonal clusLers ln a lowŴdlmenslonal
mapŦ @hese lowŴdlmenslonal clusLer maps can he used Lo asslsL Lhe human ln dlscoverlng
knowledge because Lhey could be easlly vlsuallzedŦ llgure 6Ŧ4 shows selfŴorganlzlng daLa mlnlng

Genet|c A|gor|thm
CeneLlc algorlLhm ls a global search algorlLhm based on Lhe prlnclple of evoluLlonŦ lL
lncorporaLes Lhe ldeas of naLural evoluLlonŦ CeneLlc algorlLhm refers Lo Lhe evoluLlonary
sysLemsţ buL ln parLlcular lL ls Lhe algorlLhm LhaL sLaLesţ how Lhe populaLlon of organlsm should
be formedţ evaluaLedţ and modlfledŦ CeneLlc algorlLhms are easlly parallellzable algorlLhmŦ lL ls
used for classlflcaLlon and opLlmlzaLlon of problemsŦ lL ls also used Lo evaluaLe Lhe flLness of
oLher algorlLhms ln daLa mlnlngŦ
CeneLlc algorlLhm ls a predlcLlve Lool of buslness lnLelllgenceŦ lL glves compeLlLlve edge ln
solvlng buslness problemsŦ lL can learn Lo solve predlcLlonţ classlflcaLlon and opLlmlzaLlon
problems common Lo buslness needsŦ CeneLlc algorlLhm ls Lhe besL sulLed Lechnlque Lo Lackle
Lhe opLlmlzaLlon problems llke flndlng Lhe besL schedulesţ flnanclal lndlcaLorsţ mlxesţ model
varlablesţ locaLlonsţ parameLer seLLlngs and porLfollos ln buslness appllcaLlonŦ lL can be used
alone Lo opLlmlse a Lradlng sysLem or lL can complemenL a sysLem bullL wlLh neural neLsŦ
CeneLlc algorlLhm ls an opLlmlzlng LechnlqueŦ @hls algorlLhm Lakes very complex problem Lo
solve and come up wlLh good soluLlons wlLhouL havlng a deLalled undersLandlng of Lhe
problemŦ lL can be applled Lo dlverse seL of problems whlch resulLs a besL soluLlon whlch ls
much superlor Lo random guesslng LechnlqueŦ
ln buslnessţ problems mlghL arlse LhaL are qulLe complex and compleLely new whlch does noL
have any prevlous well LhoughL soluLlonţ Lhen ln such cases geneLlc algorlLhm can provlde
opLlmum soluLlonŦ ln buslness problemsţ Lhere are four key aLLrlbuLes LhaL can beneflL from
geneLlc algorlLhmŦ lf Lhe aLLrlbuLes glven below are presenL ln Lhe problemţ we have Lo use
geneLlc algorlLhmţ else oLher Lechnlques exlsLs and lL ls preferred over geneLlc algorlLhmŦ @he
aLLrlbuLes a problem can possess areť
O @he problem ls very complex Lo solve Lo whlch Lhe dlrecL soluLlons cannoL be obLalnedŦ
O @he problem ls comparaLlvely new and Lhe opLlmlzaLlon Lechnlque ls noL deLermlned Lo
flnd soluLlonŦ
O @he problem has large number of varlables whlch can produce large scale effecLŦ
O @he values from dlverse proposed soluLlons can be well deflnedŦ
A prlmary populaLlon ls formed conslsLlng of randomly generaLed rulesŦ ach rule ls
represenLed by a sLrlng of blLsŦ lor exampleţ suppose Lhe samples ln a glven Lralnlng seL are
descrlbed by Lwo 8oolean aLLrlbuLesţ A and A2ţ and Lwo classesţ and 2Ŧ @he rule ºll A and
noL A2 @Pn 2ºcan be encoded Lo represenL Lhe blL sLrlng º00" where Lhe Lwo lefLmosL blLs
represenL aLLrlbuLes A and A2ţ respecLlvelyţ and Lhe rlghLmosL blL represenLs Lhe classŦ
Llkewlseţ Lhe rule ºlf noL A and noL A2 Lhen " can be encoded as"00"Ŧ lf an aLLrlbuLe
possesses k values where k ƽ 2ţ Lhen k blLs may be used Lo encode Lhe aLLrlbuLeƌs valuesŦ
lasses can be encoded ln a slmllar fashlonŦ
A new populaLlon ls formed based on Lhe noLlon of survlval of Lhe flLLesL LhaL has Lhe flLLesL
rules ln Lhe currenL populaLlonţ along wlLh Lhe offsprlng of Lhe ruleŦ CeneLlc operaLor llke
crossover and muLaLlon creaLes Lhe offsprlngŦ ln crossoverţ subsLrlngs from Lhe palrs of rules
are swapped Lo form Lhe new palrs of rulesŦ ln muLaLlon Lhe randomly selecLed blLs ln a rule
sLrlng are lnverLedŦ

ÇŦ6 what are the methods for determ|n|ng the execut|ve needs? ż10 MarksŽ


