You are on page 1of 16

Chapt

er1:AdvancedDat
abaseManagement
G2CS&I T

Quer
yPr
ocessi
ng&Opt
imi
zat
ion

QUERYPROCESSI
NG
Adat abasequer yisthevehi clef ori nstruct i
ng a DBMS t o updat e orr etrieve
specifi
c datat o/from t he physi cally stored medi um.Quer y processi ng refers
tot her ange ofact i
vities involved i n ext racti
ng dat af r
om a dat abase.The
acti
viti
esi ncl
udet ranslation ofquer iesi n hi gh -
leveldat abasel anguagesi nto
expressionsthatcanbeusedatt hephysi calleveloft hef il
esyst em,a var i
etyof
query-optimi
zingt r
ansfor mat i
ons,andact ualeval uationofquer ies.The act ual
updatingandr etri
evalofdat ai sperfor
medt hroughvar ious“ l
ow-level ”operations.
Exampl esofsuch oper ati
onsf ora r el
ationalDBMS can ber elati
onalal gebr a
operationssuchaspr oject,j
oi n,select,Car t
esianpr oduct ,etc.Whi l
et heDBMS i s
designedt opr ocesst hesel ow -leveloper ati
onsef f
icientl
y,itcanbequi t
et he
burdent oausert osubmi tr equestst otheDBMSi nthesef or mats.

Therearethreephasest hataquer ypassest hroughdur i


ngtheDBMS’pr ocessing
ofthatquery:
1.Parsingandt r
anslati
on
2.Opt i
mizati
on
3.Evaluati
on
1.Par si
ngandTr anslat
ion
 Thef ir
ststepinpr ocessingaquer y submittedtoaDBMSi stoconver tthe
quer
yintoaf orm usablebyt he quer
ypr ocessingengi
ne.
Aqueryexpr essed
i
nahi gh-l
evelquer ylanguagesuchasSQLmustf ir
stbescanned,par sed,
andvali
dated.
Scanne Identi
fi
es the l
anguage t
okens—such as SQL keywor
ds,
r attr
ibut
enames,andr el
ati
onnames—thatappeari
nt het
ext
ofthequery.

Par
ser Checks the query synt
ax to det
ermi
ne whether i
tis
for
mulatedaccor
dingtothesynt
axrul
es(r
ulesofgrammar
)
ofthequeryl
anguage.

Val
idat Vali
datedbycheckingthatal
lat
tri
buteandrelat
ionnames
e areval
idandsemant i
cal
lymeani
ngfulnamesintheschema
ofthepart
icul
ardat
abasebei
ngqueri
ed.
 Par serext r
actthet okensf
rom t
her
aw st
ri
ngofchar
act
ersand t
ransl
ate
them i
nto the
cor responding
i
nt ernal data
element s (i
.e.
Rel ati
onal al gebra
oper ations and
oper ands) and
struct ures. An
i
nt ernal

1
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

represent ationoft hequer yi sthencr eat ed,usuall


yasat reedatast r
ucture
calledaquer ytreeoragr aphdat ast ruct urecall
edaquer ygraph.
2.Opt
imizat i
on:
 In second st age,t hequer ypr ocessorappl i
esr ulestothei nt
ernaldat a
structuresoft hequer yt ot r
ansf or mt hesest ructuresi
ntoequivalent,but
mor eef ficientrepr esent ations.Ther ulescanbebaseduponmat hemat i
cal
model soft her elationalal gebr aexpr essi on
andt r
ee( heuristics),uponcostest i
mat es
of di fferent al gorithms appl ied t o
oper ati
onsorupont hesemant i
cswi t
hi n
the quer y and t he r elati
ons i ti nvol ves.
Selectingt hepr operr ulest oappl y, whent o
appl ythem andhowt heyar eappl iedi st he
funct i
onoft hequer yopt imi zationengi ne.
 A quer y t ypi
cal ly has many possi bl e
execut i
on st r
ategi es,and t hepr ocessof
choosi ngasui tabl eonef orpr ocessi nga
quer yisknownasquer yopt imizat ion.

3.Evaluati
on
 Thef i
nalst epinprocessingaqueryisthe
evaluati
onphase.Thebesteval uati
onplancandidatesgener
atedbyt he
optimizati
onenginei sselectedandthenexecuted.Notet hat t
her
e can
existmul t
iplemethodsofexecut i
ngaquer y.
 Codegener atorgeneratesthecodetoexecut
ethatplaneit
heri
ncompiledor
i
nterpretedmode.

 Ther unt
imedatabaseprocessorhasthetaskofrunning(executi
ng)the
querycode,whet
herincompiledorint
erpr
etedmode,t
opr oducethequery
resul
t.

 Besidesprocessi
ngaqueryinasi mplesequenti
almanner,someofaquery’s
i
ndividualoperat
ionscanbepr ocessedinpar al
lelei
therasindependent
processes or as inter
dependent pipeli
nes of processes or thr
eads.
Regardlessofthemethodchosen,theactualr
esult
sshouldbesame

Tr
ansl
ati
ngSQLQuer
iesi
ntoRel
ati
onalAl
gebr
a
Inpract
ice,SQListhequerylanguagethati
susedinmostcommer cialRDBMSs.
AnSQLquer yisfirsttr
ansl
atedintoanequival
entextendedrel
ati
onalalgebr
a
expr
ession—repr
esentedasaquerytreedat
astr
uct
ure—thatist
henopti
mized.

1.SQLquer iesaredecomposedi nt
oquer yblocks,(Quer
ybl ock-basicuni
ts
thatcanbet ransl
atedint
otheal gebrai
coperatorsandopt imi
zed.Aquery
block contai
ns a singl
e SELECT- FROM-WHERE expr ession,as wel
las
GROUPBYandHAVI NGcl
auses,aggregatef
unctionset
c..)

2.Nest
ed(
inner
)quer
ieswi
thi
naquer
yar
eident
if
iedassepar
atequer
ybl
ock.

Gener
ati
ngexecut
ionPl
an–Example1
SELECTbal
anceFROM accountWHEREbal
ance<2500

2
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

Thi
scanbet ransl at edi nt
oeit
herofthef
ollowingrel
ati
onalal
gebraexpr
essi
ons:
 σbalance<2500(Πbalance( acount
))
 Πbalance(σbalance<2500(account
))
Thi
scanal sober epr esentedaseit
herofthefoll
owingquer
ytrees:

Generati
ngexecutionPlan–Example2
SELECTLname, FnameFROM EMPLOYEE
WHERESal ary>(SELECTMAX( Sal
ary)FROM EMPLOYEEWHEREDno=5) ;
This queryretr
ieves the names ofempl oyees (
from any depar
tmentint he
company)whoear nasal ar
ythati
sgr eat
erthanthehighestsal
aryi
ndepartment
5.Thequeryi
ncludesanest edsubqueryandhencewoul dbedecomposedint
otwo
blocks.

Theinnerblockis:
(SELECTMAX( Salary)FROM EMPLOYEEWHEREDno=5)
Thi
sr etri
evesthehighestsalaryindepart
ment5.
Theout erquerybl
ocki s:
SELECTLname, FnameFROM EMPLOYEEWHERESal ar
y>c
wherecr epresent
st heresultr
eturnedfr
om t
heinnerbl
ock.

Theinnerblockcoul
dbet
ransl
atedi
ntot
hef
oll
owi
ngext
endedr
elat
ionalal
gebr
a
expr
ession:

and t
he out
er bl
ock i
nto t
he expr
essi
on:

Thequer yopti
mizerwouldthenchooseanexecut
ionplanf
oreachqueryblock.
Noti
cet hati
ntheaboveexampl e,t
hei
nnerbl
ockneedstobeeval
uatedonl
yonce
toproducethemaximum salaryofempl
oyeesi
ndepart
ment5,whichist
henused
astheconstantcbytheouterbl
ock.

Al
gor
it
hmsf
orExt
ernalSor
ti
ng
 Oneoft heprimaryalgorit
hmsusedi nqueryprocessi
ng.
 Requir
edwhenquer yspeci fi
es
o Anor derby-clause
o Joi n,
unionandi ntersection
o Dupl i
cateeliminational gori
thmsf orthepr oj
ectoperati
on(whenan
sqlqueryspecifi
esdi sti
nct).
 Exter
nalsortingref
erst osor t
ingalgori
thmst hataresui
tabl
eforlar
gefi
lesof
recor
dsst or
edondi skt hatdonotf itenti
relyi
nmai nmemor y.
o Usesasor t-
mer gestrategy
o Her ethemai ndat abasef i
leisdividedintosubfi
leswhereeachsub

3
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

f
il
eiscal l
ed“ r
uns”.
o Requi resbuf f
erspaceinmainmemor y,wheretheactualsor
ti
ngand
mergingoft he“runs”i
sperf
ormed.Thebufferspaceinmainmemory
i
spar toftheDBMScache—anar eainthecomput er
’smainmemory
t
hatiscont r
olledbytheDBMS
 Consist
sof2phases:
1.SortingPhase 2.Mer gi
ngPhase

4
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

 Sor tingPhase
1.“ Runs”oft hef il
et hatcanf i
tintheavai lablebufferspacearereadinto
mainmemor y.
2.Sor ti tusi
ngani nternalsor t
ingal gorit
hm.
3.Wi t
tenbackt odi skast empor ar
ysor tedruns.
 Mer gingPhase
1.Sor tedr unsar emer geddur i
ngoneormor emer gepasses.
Sor
ti
ngPhase
1.Thebuf ferspacei sdi videdi ntoi ndividualbuf fers,wher eeachbuf f
erist he
samesi zei nbyt esasthesi zeofonedi skbl ock.
2.Thus,onebuf fercanhol dthecont ent sofexact l
yonedi skblock.
3.Sizeofeachr unandt henumberofi nit
ialruns( nR)aredictat
edbyt hesizeoffil
e
inblocks( b)andt heavailablebuff erspace( nB).
Consi
dert
hef
oll
owi
ngdat
a,f
ori
ll
ust
rat
ingt
heal
gor
it
hm
Numberofavai
labl
emai
nmemor
ybuf
fer
s 5di
skbl
ock
,nB
Si
zeoft
hef
il
e,b 1024di
skbl
ock
Noofi
nit
ialr
uns,
nR=(
b/nB) 204+1=205
Ever
yrunshoul
dcont
ainequalnoofdi
skbl
ocks.Eachr
un=5 di
skbl
ocks
andthel
astrun=4di
skblocks
Aft
erthesor
ti
ngphase,205sor
tedr
unsar
est
oredast
empor
arysub-
fi
les
ondisk.
Mer
gingPhase
4.Dur i
ngeachmer gest ep,onebufferblocki
sneededt oholdonediskblock
fr
om eachoft hesor t
edsubf i
lesbeingmerged,andoneadditi
onalbuff
eris
neededf orcontaini
ngonedi skblockofthemergeresult
.
5.Degr eeofmer ging dM -Numberofsor tedsubf i
les(r
uns)thatcanbe
mergedi neachmer gestep.
6.dM i st hesmal lerof nB-1and nR ,andt henumberofmer gepassesi s
l
ogdM(nR).

7.Inourexampl ewher enB=5,dM=4( four


-waymer ging)
,sot he205i nit
ial
sortedrunswouldbemer ged4atat i
meineachst epinto52l argersor
ted
subf il
esattheendoft hefi
rstmergepass.These52sor tedf i
lesarethen
mer ged4atat i
mei nt
o13sor t
edfil
es,whi
charethenmer gedinto4sorted
fi
les,andthenfinal
lyi
nto1f ul
lysort
edfil
e,whichmeanst hatfourpasses
areneeded.

8.Theperformanceoft hesor t
-mergeal gorit
hm,( TotalCost)canbemeasur ed
i
nt henumberofdi skblockr eadsandwr i
tes( betweent hediskandmai n
memor y)befor
et hesort
ingoft hewhol ef i
leiscompl et
ed.
i
.Tot alCost=( 2*b)+( 2*b*l ogdM(nR)).
9.(2*
b)=numberofbl ockaccessesf ort hesor ti
ngphase, si
nceeachf i
lebl
ock
i
saccessedt wice:oncef orreadingintoamai nmemor ybufferandoncefor
wri
tingthesort
edr ecor
dsbackt odiski nt
ooneoft hesortedsubfi
les.

10.
(2*b*logdM(
nR))=numberofbl
ockaccessesfort
hemer gi
ngphase.During
each mergepass,a numberofdiskbl ocksapproxi
matelyequaltot he
ori
ginalf
il
eblocksbisr
eadandwri
tt
eninat ot
almergepassof(l
ogdMnR)
.

5
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

11.
Wor
st-
caseperfor
manceoft
healgor
it
hm,i
swhen dM=2;
TotalCost=(2*
b)+(2*b*l
og2(
nR)
).

Al
gor
it
hmsf
orSELECTandJOI
NOper
ati
ons
Implement ingt heSELECTOper at i
on
Therear emanyal gor it
hmsf orexecut i
ngaSELECToper at i
on,whi chi sbasi callya
searchoper ationt ol ocat et her ecor dsi nadi skf il
et hatsat isfyacer taincondi t
ion.
Someoft hesear chal gor ithmsdependont hef ilehavi ngspeci ficaccesspat hs,and
theymayappl yonl yt ocer t
aint ypesofsel ect ioncondi tions.
SearchMet hodsf orSi mpl eSel ect i
on:Anumberofsear chal gor ithmsar epossi ble
forsel ectingr ecor dsf rom af il
e.Thesear eal soknownasf ilescans,becauset hey
scant her ecor dsofaf ilet osear chf orandr etriever ecor dst hatsat i
sfyasel ection
condi tion.I ft hesear chal gor ithm i nvolvest heuseofani ndex,t hei ndexsear chi s
call
edani ndexscan.Thef ol l
owi ngsear chmet hodsar eexampl esofsomeoft he
searchal gor ithmst hatcanbeusedt oimpl ementasel ectoper ation:
 Li nearsear ch( br ut ef or ceal gor i
thm) :Ret ri
eveever yr ecor di nt hef il
e,andt est
whet heri tsat tr
ibut eval uessat isfythesel ect ioncondi tion.Si ncet her ecordsar e
gr oupedi ntodi skbl ocks, eachdi skbl ocki sr eadi ntoamai nmemor ybuf f
er,and
thenasear cht hr ought her ecordswi thint hedi skbl ocki sconduct edi nmai n
memor y.
 Bi nar ysear ch:Ift hesel ect ioncondi t
ioni nvol vesanequal i
tycompar i
sonona
keyat t ri
but eonwhi cht hef il
ei sor der ed,bi nar ysear chwhi chi smor eef fi
cient
thanl i
nearsear chcanbeused.
 Usi ng a pr i
mar yi ndex:I ft he sel ection condi tion i nvol ves an equal it
y
compar i
sononakeyat tribut ewi thapr imar yindex.Not et hatt hiscondi tion
retr i
evesasi ngler ecor d( atmost ).
 Usi ngapr i
mar yi ndext or etrievemul tipler ecor ds:Ift hecompar isoncondi ti
oni s
>, >=, <,or<=onakeyf ieldwi t
hapr i
mar yi ndex;uset hei ndext of i
ndt herecor d
sat isfyingt hecor respondi ngequal itycondi tionandt henr etrieveal lsubsequent
recor dsi nt he( or der ed)f i
le.
 Usi ngacl ust eringi ndext or etr
ievemul t
ipler ecords:I ft hesel ectioncondi tion
i
nvol vesanequal itycompar isononanonkeyat tri
but ewi t
hacl ust eringindex
;uset hei ndext or etrieveal lther ecor dssat isfyingt hecondi tion.
 Usi ng asecondar y( B+- tree)i ndexon an equal i
tycompar ison:Thi ssear ch
met hodcanbeusedt or etrieveasi ngl er ecor difthei ndexi ngf ieldisakey( has
uni queval ues)ort or etri
evemul ti
pler ecor dsi fthei ndexi ngf ieldi snotakey.
Thi scanal sobeusedf orcompar i
sonsi nvol ving>, >=, <,or<=.
Wheneverasi nglecondi t
ionspeci fiest hesel ectiont heDBMS canonl ycheck
whet herornotanaccesspat hexi stsont heat tr i
butei nvol vedi nt hatcondi tion.If
anaccesspat h( suchasi ndexorhashkeyorsor tedf il
e)exi sts,t hemet hod
correspondi ngt othataccesspat hisused;ot her wise, t
hebr utef or ce, l
inearsear ch
canbeused.
Conjunct i
ve Sel ect ion Condi tions,whenevermor et han one oft he at t
ri
but es
i
nvol vedi nt hecondi ti
onshaveanaccesspat h,quer yopt imi zationshoul dbedone
tochooset heaccesspat ht hatr etri
evest hef ewestr ecor dsi nt hemostef fi
cient

6
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

way.Di sjunctiveSel ectionCondi ti


ons,( wheresi mpl econdi ti
onsar econnect edby
theOR l ogi calconnect i
ver at herthanbyAND)i smuchhar dertopr ocessand
optimizebecauset her ecor dssat i
sfyi
ngt hedi sjunct i
vecondi ti
onar etheuni onof
therecor dssat isf
yingt hei ndi vidualcondi ti
ons.Hence, i
fanyoneoft hecondi tions
doesnothaveanaccesspat h,wear ecompel ledt ouset hebr utef orce,linear
searchappr oach.Onl yifanaccesspat hexi st sonever ysimpl econdi ti
oni nt he
disj
unct ioncanweopt imi zet heselectionbyr etrievi ngt herecordssat i
sf yi
ngeach
condi t
ionort heirrecor di dsandt henappl yi
ngt heuni onoper at
iont oel i
mi nate
duplicates.
Implement ingt heJOI NOper ation
The JOI N oper at
ion is one oft he mostt i
me- consumi ng operati
ons i n quer y
processi ng.Many oft he j oin operations encount ered in quer i
es ar e oft he
EQUI JOI N and NATURAL JOI N varieties.Ther e ar e many possi ble ways t o
i
mpl ementat wo-wayj oin,whi chi saj oinont wof iles.Joi nsinvolvingmor et han
two f i
les ar e cal led mul tiwayj oi ns.I nt hissectionwedi scuss
techniques for i
mpl ement i
ng onl y t wo-way j oins. The
algorit
hmswedi scuss nextar eforaj oinoper ati
onoft hef orm:

Where A and B ar
et he j
oin at
tr
ibut
es,whi
ch shoul
d be domai
n-compat
ibl
e
at
tr
ibut
esofRandSr espect
ivel
y.

Nest ed- l
oopj oin(ornest ed- blockj oi
n):Thi sist hedef ault(br
ut eforce)al gorithm, as
i
tdoesnotr equireanyspeci alaccesspat hsonei t
herf il
ei nt hej oin.Foreach
recordti nR(out erloop) , retr
ieveever yr ecor dsf rom S( innerloop)andt estwhet her
thet wor ecor dssat isfyt hej oincondi tiont [
A]=s[ B]
.
Single- l
oopj oin( usinganaccessst ruct uretor etr
ievet hemat chingr ecor ds) .
Ifan
i
ndexexi stsforoneoft het woj oi nattributessay,at tri
but eBoff il
eSr etri
eveeach
recordti nR,andt henuset heaccessst ructur e(suchasani ndex)t or etri
eve
directlyal lmatchingr ecor dssf rom St hatsat i
sfys[ B]=t [A].
Sor t
-mer gej oin:Ifther ecor dsofRandSar ephysi callysortedbyval ueoft hej oin
attri
but esAandB, respect i
vely,wecani mpl ementt hej oininthemostef ficientway
possi ble.Bot hf il
es ar e scanned concur rent lyi n orderoft he join at tributes,
mat chi ngt her ecordst hathavet hesameval uesf orAandB.I fthef i
lesar enot
sor t
ed,t heymaybesor tedf irstbyusi ngext er nalsor ti
ng.Int hismet hod,pai rsof
fil
ebl ocksar ecopi edi ntomemor ybuf f
er sinor derandt herecor dsofeachf i
lear e
scannedonl yonceeachf ormat chi
ngwi ththeot herf il
eunl essbot hAandBar e
nonkeyat tr
ibutes,i nwhi chcaset hemet hodneedst obemodi f
iedsl i
ght ly. A
variationoft hesor t-mer gej oincanbeusedwhensecondar yi ndexesexi stonbot h
joi
nat tri
butes.

Al
gor
it
hmsf
orPROJECTandSetOper
ati
ons

APROJECToper ati
on <attributelist>(
R)isst
rai
ghtf
orwar
dtoimpl ementif<att
ri
butel i
st>
i
ncludesakeyofr elat
ionR,becausei nthi
scasetheresul toftheoperati
onwi l
l
havethesamenumberoft upl esasR,butwi t
honlythevaluesf ortheatt
ri
butesi n
<att
ri
butel
ist
>ineacht upl e.I f<at t
ri
but
eli
st>doesnoti
ncludeakeyofR, duplicate
tupl
esmustbeel i
minated.Thi scanbedonebysor t
ingther esultoftheoperati
on

7
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

andt henel i
mi nat ingdupl i
cat etuples,whichappearconsecut ivelyaftersor ti
ng.
Setoper ations:UNI ON,I NTERSECTI ON,SET DI FFERENCE,and CARTESI AN
PRODUCTar esomet imesexpensi vetoi mplement .Inpar ti
cular,theCARTESI AN
PRODUCToper at ionR×Si squi teexpensivebecausei tsr esulti
ncludesar ecor dfor
eachcombi nat ionofr ecor dsf r
om RandS.Al so,eachr ecordint her esultincl udes
allattribut esofRandS.I fRhasnr ecordsandjat t
ri
but es,andShasm r ecords
andkat t
ribut es,t her esultr elati
onf orR×Swi l
lhaven* mr ecordsandeachr ecord
willhavej+kat tributes.Hence,i tisimpor t
antt oavoi dt heCARTESI ANPRODUCT
oper ationandt osubst it
uteot heroper ati
onssuchasj oindur i
ngquer yoptimi zat i
on.
Theot hert hr eesetoper ationsUNI ON, I
NTERSECTI ON, andSETDI FFERENCEappl y
onlyt ot ype- compat ible( oruni on-compat i
ble)r elati
ons,whi ch have t he same
numberofat tributesand t hesameat t
ributedomai ns.Thecust omar ywayt o
i
mpl ementt heseoper ati
onsi st ousevar iati
onsoft hesor t-
mer get echni que:t he
twor elationsar esor tedont hesameat tri
butes,and,af tersor t
ing,asi ngl escan
througheachr elationi ssuf fi
cienttopr oducet her esult.Forexampl e,wecan
i
mpl ementt heUNI ONoper ation,R∪S,byscanni ngandmer gingbot hsor tedf il
es
concur r ently, andwhenevert hesamet upleexistsinbot hrelati
ons, onlyonei skept
i
nt hemer ged r esult.Fort heI NTERSECTI ON oper ation,R∩S,wekeep i nt he
mer gedr esul tonl yt hoset upl est hatappearinbot hsor tedr el
ations.

Usi
ngHeur
ist
icsi
nQuer
yOpt
imi
zat
ion
Inthi ssectionwedi scussopt imi
zati
ont echniquest hatappl yheuristi
cr ul
est o
modi fytheinternalrepr esentati
onofaquer ywhi chisusuallyintheform ofaquer y
tr
eeoraquer ygr aphdat ast ruct
uret oimpr ovei t
sexpect edper f
ormance.The
scannerand par serofan SQL quer yfirstgener ate a dat a struct
uret hat
correspondst oani nitialqueryr epr
esentat
ion,whi chisthenopt i
mizedaccor di
ngt o
heuristi
crules.Thisl eadst oanopt imizedquer yr epr
esentati
on,whichcor r
esponds
tothequer yexecut ionst rategy.Foll
owingthat, aqueryexecut i
onplani sgenerated
toexecut egr oupsofoper ati
onsbasedont heaccesspat hsavail
ableont hef i
les
i
nvol vedinthequer y.

Oneoft hemainheuri
sti
crulesistoappl
ySELECTandPROJECToper ati
onsbef ore
applyi
ngtheJOINorot herbinar
yoperat
ions,becauset
hesizeofthefil
er esul
ting
fr
om abi nar
yoperationsuchasJOI Ni susuallyamulti
pli
cat
ivefuncti
onoft he
si
zesoft hei
nputfi
les.TheSELECTandPROJECToper ati
onsreducethesi zeofa
fi
leandhenceshouldbeappl i
edbefor
eajoinorotherbi
nar
yoperati
on.

Not
ati
onf
orQuer
yTr
eesandQuer
yGr
aphs
 Queryt
r ee-Tr eedatastruct
urethatcorrespondstoar el
ati
onalal
gebra
expr
ession.
Inputr elat
ionsofthequeryasl eafnodesofthet r
ee
Rel ati
onalalgebr
aoper at
ionsasi nter
nalnodes.
Anexecut i
onoft hequerytreeconsistsofexecuti
nganinter
nalnode
oper ati
onwheneveri t
soper andsar eavai
labl
eandt henr
eplaci
ngthat
i
nt ernalnodebyt herelat
ionthatresult
sfrom executi
ngt
heoperati
on.

8
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

Theor derofexecuti
onofoper ati
onsst ar
tsattheleafnodes, andends
attherootnode.
Theexecut i
onterminateswhent her ootnodeoper ati
oni sexecuted
andpr oducestheresultrelati
onf orthequer y.
 Quer
yGr aph: usedtorepresentar el
ati
onal cal
culusexpression.
 Rel ati
onsinthequer yarer epresentedbyr el
ati
onnodes, whi
chare
di
splayedassi ngl
eci r
cles.
 Const antvaluesarerepr esentedbyconst antnodes,whi chare
di
splayedasdoubl ecirclesoroval s.
 Sel ecti
onandj oi
ncondi tionsarer epresentedbythegr aphedges.
 At tr
ibutestoberetri
evedf r
om eachr el at
ionaredi
splayedi nsquare
bracketsaboveeachr elation.

St
epsi
nconver
ti
ngaquer
ytr
eedur
ingheur
ist
icopt
imi
zat
ion.
1.I
nit
ial(canoni
cal
)querytreeforSQLqueryQ.
2.MovingSELECToper at
ionsdownt hequeryt
ree.
3.Applyi
ngt hemorer
estri
ctiveSELECToperat
ionfir
st.
4.ReplacingCARTESI
ANPRODUCTandSELECTwi t
hJOINoper
ati
ons.
5.MovingPROJECToper ati
onsdownt hequerytreeexecut
e.
Consi
dert
hef
oll
owi
ngr
elat
ionsf
ort
heexampl
espr
ovi
dedher
eaf
ter
:

Li
stnameofstudent
s,cour
setakenbyt
hem andnameoft
hedepar
tmentgi
vingt
he
cour
seforal
lwhosecgpa>3.

SELECTName,
Cname,DnameFROM St
udent,
Cour
se,
Depar
tment
WHERECid=CnoandDi
d=DnoandCgpa>3;

St
ep1: Rel ati
onal
Al gebr aicExpressi
on
Name,Cname,Dname(((  Cgpa>3(
Student
)) d=Cno (
Ci Cour
se)
) Di
d=Dno (
Depar
tment
))

9
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

St
ep2:
Drawt
hecanoni calf or
m Step3:Movi
ngSELECToper at
ionsdownt
he
Name,Cname,Dname queryt
ree. Name,Cname,Dname

 Cid=CnoandDid=DnoandCgpa>3
 Did=Dno

× ×
 Cid=Cno
× Depar
nt
tme Depar
nt
tme

×
St
uden Cour
se
t

 Cgpa>3 Cour
se

St
uden

10
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

Step4:Replaci
ngCARTESIAN Step5:Movi
ngPROJECToperat
ions
PRODUCTandSELECTwi t
hJOIN downthequeryt
reeexecut
e.
operat
ions

Name,Cname,Dname Name,Cname,Dname

  Dno=Did   Dno=Did

  Cid=Cno Depar
tme Name,Cname,Dno Dname,Did
nt

  Cid=Cno Depar
tme
 Cgpa>3 Cour
se nt

Name,cid Cname,Cno,Dno
St
uden
t
 Cgpa>3
Cour
se

St
uden
t

Thisparti
cularquer yneedsonl yt hoserecordsf r
om t heSTUDENTr el
ati
onwhose
cgpa>3.Theabovequer ytreeshowsani mpr ovedquer ytreet hatfir
stappli
est he
SELECToper ationst oreducet henumberoft uplest hatappearint heCARTESI AN
PRODUCT.Wecanf urtherimprovet hequer ytreebyr eplaci
nganyCARTESI AN
PRODUCToper ati
ont hatisf oll
owedbyaj oincondi ti
onwi thaJOI N operat
ion.
Anotheri mprovementi st o keep onl yt he attribut
es needed by subsequent
operat
ionsi nt heintermedi at
er elat
ions,byi ncludingROJECT( π)operati
onsas
earl
yaspossi bleint hequer ytree.Thisreducest heattri
butes( col
umns)oft he
i
ntermediaterelati
ons, whereastheSELECToper ationsreducethenumberoft uples
(r
ecords).

Ast heprecedingexampl edemonst r


ates,
aquerytr
eecanbet r
ansf or
medstepby
stepintoanequi valentquerytreethatismoreef
fi
cienttoexecute.However,we
mustmakesur et hatthetransformati
onstepsal
waysl eadtoanequi val
entquer
y
tr
ee.Todot hi
s,thequer yoptimizermustknowwhicht ransf
ormationrul
es
preservethi
sequi valence.Wedi scusssomeofthesetransformationrul
esnext.

Gener alTransf ormati


onRul esf orRel ationalAl gebraOper ati
ons.Ther ear emany
rulesf ortransformingr elati
onalal gebr aoper ati
onsi ntoequivalentones.Forquer y
opt i
mi zati
onpur poses,wear eint erestedi nthemeani ngoft heoper at
ionsandt he
resultingr elati
ons.Hence,i ft wor elationshavet hesamesetofat tri
butesi na
differentor derbutt het wor elationsr epresentt hesamei nformat i
on,weconsi der
ther elati
onst obeequi valent.InSect ion3. 1.
2wegaveanal t
er nat
ivedef i
niti
onof
rel
at i
ont hatmakest heor derofat t
ri
but esuni mpor tant;wewi lluset hisdefini
tion
her e.Wewi llstatesomet r
ansf ormat ionr ulesthatareusefulinquer yopt i
mi zati
on,
withoutpr ovi ngthem:

11
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

12
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

Cascadeofσ Aconjuncti
vesel
ecti
oncondit
ioncanbebr
okenupi
ntoacascade(
thati
s,a
sequence)ofi
ndi
vidualσoper
ati
ons:

Commut
ati
veofσ Theσoper
ati
oni
scommut
ati
ve:

Cascadeofπ I
nacascade(
sequence)ofπoper
ati
ons,
allbutt
hel
astonecanbei
gnor
ed:

Commut
ingσwi
thπ .
.Ift
hesel
ect
ioncondi
ti
onci
nvol
vesonl
ythoseat
tr
ibut
esA

Thej
oinoper
ati
oni
scommut
ati
veasi
ntheXoper
ati
ons.

Noticethatalt
hough theorderofat t
ri
butesmaynotbet hesamei nt he
rel
ationsr
esult
ingfr
om thetwojoi
ns(ortwoCartesi
anproducts)
,themeani
ng
i
st hesamebecauset heorderofatt
ri
butesisnotimpor
tantinthealt
ernat
ive
defi
niti
onofrel
ati
on.

Ifal
ltheat
tr
ibut
esintheselect
ioncondi
ti
oncinvol
veonl
ytheat
tr
ibut
es
ofoneoftherel
ati
onsbeingjoi
ned—say,R—t
he

t
wooper
ati
onscanbecommut
edasf
oll
ows:

Al
ter
nat
ivel
y,ift
heselecti
onconditi
onccanbewr i
tt
enas(c1ANDc2) ,
wher
econditi
onc1involvesonl
yt heat
tr
ibutesofRandcondi
ti
onc2
i
nvol
vesonlytheat
tri
butesofS,theoperat
ionscommuteasfol
lows:

13
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

Commutingπwith( or×).Supposethattheproject
ionli
sti
sL={ A1,.
..
,An,B1,
...
,Bm},wher
eA1, ...
,Anar eatt
ri
butesofRandB1, ..
.,
Bm ar
eat t
ri
butesofS.
Ift
hejoi
ncondit
ionci nvolvesonlyat
tri
butesinL,thetwooperat
ionscanbe
commutedasfollows:

Ift
hej oincondi
ti
onccontainsadditi
onalattr
ibutesnoti
nL, t
hesemustbe
addedt otheproj
ecti
onli
st,
andaf inalπoper ati
onisneeded.Forexample,i
f
att
ributesAn+1,.
..
,An+kofRandBm+1, ..
.,
Bm+pofSar einvol
vedinthejoi
n
conditi
oncbutar enoti
ntheprojecti
onlistL,theoperat
ionscommut eas
fol
lows:

For×,
ther
eisnocondi
ti
onc,
sot
hef
ir
stt
ransf
ormat
ionr
uleal
waysappl
ies

byr
epl
aci
ng wi
th×.

Commutat
ivi
tyofset Theset∪ and∩ ar
ecommut
ati
vebut−i
snot
.
oper
ati
ons

oper
ati
ons

Thesef
ouroperat
ionsareindi
viduall
yassociat
ive;t
hatis,
ifθst
andsforany
oneoft
hesefouroperat
ions(t
hr ough-
outtheexpressi
on),wehave:

Theσoperat
ioncommuteswith∪,∩and−.I
fθst
andsforanyoneoft
hese
thr
eeoper
ati
ons(thr
oughoutt
heexpr
essi
on)
,wehave:

Usi
ngSel
ect
ivi
tyandCostEst
imat
esi
nQuer
yOpt
imi
zat
ion
Aquer yopt imizerdoesnotdependsol el
yonheur isti
cr ules;italsoest i
mat esand
compar est hecost sofexecut ingaquer yusi ngdi fferentexecut ionst r
ategiesand
algorit
hms, andi tthenchoosest hest rategywi ththel owestcostest i
mat e.Fort his
appr oacht owor k,accuratecostest imat esar erequi redsot hatdiffer
entst r
ategi es
canbecompar edf air
lyandr ealist
ically.I naddition,t heopt imizermustl i
mi tthe
numberofexecut ionst r
ategiest obeconsi dered;ot herwise,t oomucht imewi llbe
spentmaki ngcostest imat esfort hemanypossi bleexecut i
onst rat
egies.Hence,
thisappr oachi smor esuitablef orcompi ledquer i
eswher etheopt imizati
oni sdone
atcompi let i
meandt her esulti
ngexecut ionstrategycodei sst oredandexecut ed
directl
yatr unti
me.af ull
-scaleopt i
mi zationmaysl ow downt her esponset ime.A
mor eel abor at
eopt imizati
oni si ndicatedf orcompi l
edquer ies,wher easapar tial
,
l
esst i
me- consumi ngopt i
mi zat
ionwor ksbestf ori
nt erpretedquer ies.
CostComponent sf orQuer yExecut i
on

14
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

Thecostofexecut ingaquer yincludest hefoll


owingcomponent s:
1.Accesscostt osecondar yst orage:Thi sisthecostoft ransferri
ng( readi ng
andwr iti
ng)dat abl ocksbet weensecondar ydi skst orageandmai nmemor y
buffers.Thi si sal soknownasdi skI /O( i
nput /output)cost .Thecostof
searchi ngf orr ecordsi nadi skf il
edependsont het ypeofaccessst ruct ures
ont hatf ile,suchasor dering,hashi ng,andpr imar yorsecondar yindexes.I n
addition,f act orssuchaswhet hert hef i
leblocksar eallocatedcont iguousl y
onthesamedi skcyl i
nderorscat t
eredont hedi skaf fecttheaccesscost .
2.Di skst or agecost :Thisi st hecostofst or
ingondi skanyi ntermediat ef il
es
thatar egener atedbyanexecut ionstrategyfort hequer y.
3.Comput at ioncost :Thi sist hecostofper formi ngin- memor yoper ationson
ther ecor dswi t
hint hedat abuf fersduringquer yexecut i
on.Suchoper ations
i
ncludesear chingf orandsor t
ingr ecords,mer gingrecor dsforaj oi
norasor t
operat i
on,andper formi ngcomput ati
onsonf ieldval ues.Thisisal soknown
asCPU( cent ralprocessi nguni t)cost.
4.Memor yusagecost :Thi si st hecostper t
ai ningt ot henumberofmai n
memor ybuf fersneededdur ingquer yexecut i
on.
5.Communi cat i
oncost :Thi sist hecostofshi ppingt hequer yandi tsr esul t
s
from thedat abasesi t
et othesi teort erminalwher ethequer yisor i
ginat ed.In
dist
ribut eddat abases,i twoul dal soincludet hecostoft ransferr
ingt ables
andr esul tsamongvar iouscomput ersdur i
ngquer yevaluation.
Measur
esofQuer
yCost
The costofquer y eval uati
on can be measur ed i
nt er ms ofa numberof
diff
erentr esour ces,including di skaccesses,CPUt imet oexecut eaquer y,and, i
n
a di stributed orpar al
leldat abase syst em,t he costofcommuni cation.The
responset i
mef oraquer yeval uati
onpl an,assumi ngnoot heract i
vit
yi sgoi ng
on thecomput er,woul daccountf oral lthesecost s,and coul d beused asa
good measur e oft he costoft he plan.Inl arge database syst ems,however ,
disk accesses ar e usual lythe mosti mportantcost,si nce disk accesses ar e
slow compar ed toi n-memor yoper at
ions.Al soCPUspeedshavebeeni mproving
muchf ast erthanhavedi skspeeds.Thus,i tisli
kelythatt het i
mespenti ndisk
acti
vity wi llcont i
nue t o domi natet he totalt i
me to execut e a quer y.Fi nal
ly,
esti
mat ingt heCPUt imei srelativelyhar dcompar edtoest imatingthedi sk-access
cost.Ther efore,mostpeopl econsi derthedisk-accesscostar easonablemeasur e
ofthecostofaquer y-evaluati
onpl an.
Semant
icQuer
yOpt
imi
zat
ion
Usesconst rai
ntsspeci
fiedonthedat
abaseschema—suchasuniqueatt
ri
butesand
othermor ecompl exconstr
aint
s—inordertomodifyonequer
yintoanotherquery
thatismor eeffi
cientt
oexecute.Thi
smet hodworksi
ncombinat
ionwit
ht heother
met hodssuchascostbasedquer yopti
mizati
on.

Consideraquer ythatretri
evesthenamesofempl oyeeswhoear nmor et hanthei
r
supervi
sors.Supposet hatwehadaconst r
aintont hedatabaseschemawhi ch
stat
edt hatnoempl oyeecanear nmoret hanhisorherdirectsupervi
sor.Semant i
c
queryoptimizercancheckf ortheexi
stenceofthisconstr
aint;andifchecksitdoes
notneedt oexecutet hequer yatal
lbecausei tknowsthatther esul
toft hequery

15
Chapt
er1:AdvancedDat
abaseManagement
G2CS&I T

wil
lbeempt y.Thi
smaysaveconsi
der
abl
eti
mei
ftheconst
rai
ntchecki
ngcanbe
doneef
fi
cientl
y.

Revi
ewQuest
ions
1.Discusst her easonsf orconver tingSQLquer iesint orel
ationalal gebr aqueri
es
bef oreopt imizat ioni sdone.
2.Discusst hedi ffer entalgor i
thmsf orimpl ement i
ngeachoft hefollowi ng
relationaloper at or sandt heci rcumst ancesunderwhi cheachal gor i
thm canbe
used:SELECT, JOI N,PROJECT, UNI ON, I
NTERSECT, SETDI FFERENCE,
CARTESI ANPRODUCT.
3.Whati saquer yexecut i
onpl an?
4.Whati smeantbyt heterm heur i
sticopt imization?Di scusst hemai nheur i
sti
cs
thatar eappl ieddur i
ngquer yopt i
mi zation.
5.Howdoesaquer yt r
eer epresentar elationalalgebr aexpression?Whati smeant
byanexecut ionofaquer yt r
ee?Di scusst her ulesf ortr
ansf ormat ionofquer y
treesandi dentifywheneachr uleshoul dbeappl i
eddur i
ngopt imi zation.
6.Whati smeantbycost -basedquer yopt i
mi zati
on?
7.Whati st hedi ffer encebet weenpi peli
ningandmat eri
ali
zation?
8.Discusst hecostcomponent sforacostf unctiont hatisusedt oest imatequery
execut ioncost .Whi chcostcomponent sar eusedmostof tenast hebasisfor
costf unct i
ons?
9.Af ileof4096bl ocksi st obesor tedwi thanavai lablebufferspaceof64bl ocks.
Howmanypasseswi l
lbeneededi nthemer gephaseoft heext er nalsort
-merge
algor ithm?
10.
Dr awt hei nit
ialquer ytreef oreachoft hef oll
owingquer i
es, andt henshowhow
thequer ytreeisopt imized.( Consi dert hedat abasedi scussedabove)
a.SELECTNAME, CNAMEFROM STUDENT, COURSE

WHERECI
D=CNOandDNO=1

b.Li
stnameofst
udent
swhoi
staki
ngcour
sespr
ovi
dedbyCS&I
T.

c.SELECTSNAMEFROM STUDENT,
COURSE,DEPARTMENTWHERE
DNAME=‘CS&I
T’andDNO=DI
DandCID=CNOandCGPA>3;

16

You might also like