Professional Documents
Culture Documents
Algorithms and Data Structures
Algorithms and Data Structures
6lusMorMminus, used to
define an interval Fof
uncertaintyG
SAuare root
log 8ogarithm to the ase $
ln Datural logarithm, to the ase e
iff !f and only if
Although 5e may take a cavalier attitude to5ard notational di%%erences+ and readily use concise notations such
as %or the more verbose ;and;+ ;or;+ 5e 5ill try to remind readers e)plicitly about our assumptions 5hen there is a
0uestion about semantics. As an e)ample+ 5e assume that the boolean operators and are conditional+ also called
;cand; and ;cor;& An e)pression containing these operators is evaluated %rom le%t to right+ and the evaluation stops as
soon as the result is kno5n. $n the e)pression ) y+ %or e)ample+ ) is evaluated %irst. $% ) evaluates to ;%alse;+ the
entire e)pression is ;%alse; 5ithout y ever being evaluated. This convention makes it possible to leave y unde%ined
5hen ) is ;%alse;. Cnly i% ) evaluates to ;true; do 5e proceed to evaluate y. An analogous convention applies to ) y.
2rogram structure
:hereas the concise notations introduced above to denote operators can be translated almost one#to#one into a
single line o% standard 2ascal+ 5e also introduce a %e5 e)tensions that may a%%ect the program structure. $n our vie5
these changes make programs more elegant and easier to understand. Borro5ing %rom many modern languages+ 5e
introduce a ;returnHI; statement to e)it %rom procedures and %unctions and to return the value computed by a
%unction.
")ample
function gcd*u, v- integer+- integer&
{ computes the greatest common divisor (gcd of u and v }
egin if v . 0 then return*u+ else return*gcd*v, u mod v++
end&
$n this e)ample+ ;returnHI; merely replaces the 2ascal assignments ;gcd &V u; and ;gcd &V gcdHv+ u mod vI;. The
latter in particular illustrates ho5 ;returnHI; avoids a notational blemish in 2ascal& Cn the le%t o% the second
assignment+ ;gcd; denotes a variable+ on the right a %unction. ;8eturnHI; also has the more drastic conse0uence that it
causes control to e)it %rom the surrounding procedure or %unction as soon as it is e)ecuted. :ithout entering into a
controversy over the general advantages and disadvantages o% this E%lo5 o% controlE mechanism+ let us present one
e)ample+ typical o% many search procedures+ 5here ;returnHI; greatly simpli%ies coding. The point is that a search
Algorithms and Data Structures <2 A ,lobal Te)t
4. Algorithms and programs as literature: substance and form
routine terminates in one o% Hat leastI t5o di%%erent 5ays& success%ully+ by having %ound the item in 0uestion+ or
unsuccess%ully+ because o% a number o% reasons Hthe item is not present+ and some inde) is about to %all outside the
range o% a tableL 5e cannot insert an item because the table is %ull+ or 5e cannot pop a stack because it is empty+
etc.I. /or the sake o% e%%iciency as 5ell as readability 5e pre%er to e)it %rom the routine as soon as a case has been
identi%ied and dealt 5ith+ as the %ollo5ing e)ample %rom RAddress computation&S illustrates&
function insertMintoMhashMtale*x- key+- addr&
var a- addr&
egin
a -. h*x+& { locate the home address of the item x to be
inserted }
while TFaG T empty do egin
{ skipping over cells that are already occupied }
if TFaG . x then return*a+& { x is already present; return
its address }
a -. *a 9 "+ mod m { keep searching at the next address }
end&
{ we-ve found an empty cell; see if there is room for x to be
inserted }
if n S m 0 " then { n -. n 9 "& TFaG -. x } else errM
msg*,tale is full,+&
return*a+ { return the address where x was inserted }
end&
This code can only be appreciated by comparing it 5ith alternatives that avoid the use o% ;returnHI;. :e
encourage readers to try their hands at this challenge. !otice the three di%%erent 5ays this procedure can terminate&
H1I no need to insert ) because ) is already in the table+ H2I impossible to insert ) because the table is %ull+ and H3I
the normal case 5hen ) is inserted. .tandard 2ascal incorporates no %acilities %or Ee)ception handlingE He.g. to
cover the %irst t5o cases that should occur only rarelyI and %orces all three outcomes to e)it the procedure at its
te)tual end.
Let us 3ust mention a %e5 other liberties that 5e may take. :hereas 2ascal limits results o% %unctions to certain
simple types+ 5e 5ill let them be o% an type& in particular+ structured types+ such as records and arrays. 8ather
than nesting i%#then#else statements in order to discriminate among more than t5o mutually e)clusive cases+ 5e use
the E%latE and more legible control structure&
if ;
"
then S
"
elsif ;
$
then S
$
elsif C else S
n
&
Cur sample programs do not return dynamically allocated storage e)plicitly. They rely on a memory
management system that retrieves %ree storage through Egarbage collectionE. 'any implementations o% 2ascal avoid
garbage collection and instead provide a procedure ;disposeH[I; %or the programmer to e)plicitly return unneeded
cells. $% you 5ork 5ith such a version o% 2ascal and 5rite list#processing programs that use signi%icant amounts o%
memory+ you must insert calls to ;disposeH[I; in appropriate places in your programs.
The list above is not intended to be e)haustive+ and neither do 5e argue that the constructs 5e use are
necessarily superior to others commonly available. Cur reason %or e)tending the notation o% 2ascal Hor any other
programming language 5e might have chosen as a starting pointI is the %ollo5ing& in addressing human readers+ 5e
believe an open#ended+ some5hat in%ormal notation is pre%erable to the straight3acket o% any one programming
language. The latter becomes necessary i% and 5hen 5e e)ecute a program+ but during the incubation period 5hen
<3
This book is licensed under a Creative Commons Attribution 3.0 License
our understanding slo5ly gro5s to5ard a %irm grasp o% an idea+ supporting intuition is much more important than
%ormality. Thus 5e describe data structures and algorithms 5ith the help o% %igures+ 5ords+ and programs as 5e see
%it in any particular instance.
2rogramming pro3ect
1. *se your graphics %rame program o% R,raphics primitives and environmentsS to implement an editor %or
simple graphics productions such as those used to de%ine sno5%lakes He.g. ;any line segment gets replaced
by a speci%ied se0uence o% line segments;I+ and an interpreter that dra5s successive generations o% the
%ractals de%ined by these productions.
Algorithms and Data Structures << A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
4& Divide)and)con1uer and
recursion
Learning ob3ectives&
The algorithmic principle o% divide#and#con0uer leads directly to recursive procedures.
")amples& 'erge sort+ tree traversal. 8ecursion and iteration.
'y %riend liked to claim E$;m 2Q3 Cherokee.E *ntil someone 5ould challenge him ET5o# thirds? Xou mean
1Q2 + or+ or maybe 3QD+ ho5 on earth can you be 2Q3 o% anything?E E$t;s easy+E said im+ Eboth my parents are
2Q3.E
An algorithmic principle
Let AH(I denote the application o% an algorithm A to a set o% data (+ producing a result 8. An important class o%
algorithms+ o% a type called divide#and#con0uer+ processes data in t5o distinct 5ays+ according to 5hether the data
is small or large&
$% the set ( is small+ andQor o% simple structure+ 5e invoke a simple algorithm A0 5hose application A0H(I
yields 8.
$% the set ( is large+ andQor o% comple) structure+ 5e partition it into smaller subsets (1+ [ + (k. /or each i+
apply AH(iI to yield a result 8i. Combine the results 81+ [ + 8k to yield 8.
This algorithmic principle o% divide#and#con0uer leads naturally to the notion o% recursive procedures. The
%ollo5ing e)ample outlines the concept in a high#level notation+ highlighting the role o% parameters and local
variables.
procedure O*@- data& var 5- result+&
var @
"
, C , @
k
- data& 5
"
, C , 5
k
- result&
egin
if simple*@+ then 5 -. O
0
*@+
else { @
"
, C , @
k
-. partition*@+&
5
"
-. O*@
"
+& C & 5
k
-. O*@
k
+&
5 -. comine*5
"
, C , 5
k
+ }
end&
!otice ho5 an initial data set ( spa5ns sets (1+ [ + (k 5hich+ in turn+ spa5n children o% their o5n. Thus the
collection o% all data sets generated by the partitioning scheme is a tree 5ith root (. $n order %or the recursive
procedure AH(I to terminate in all cases+ the partitioning %unction must meet the %ollo5ing condition& "ach branch
o% the partitioning tree+ starting %rom the root (+ eventually terminates 5ith a data set (0 that satis%ies the predicate
;simpleH(0I;+ to 5hich 5e can apply the algorithm.
(ivide#and#con0uer reduces a problem on data set ( to k instances o% the same problem on ne5 sets (1+ [ + (k
that are EsimplerE than the original set (. .impler o%ten means Ehas %e5er elementsE+ but any measure o%
Algorithms and Data Structures <@ A ,lobal Te)t
5. ivide!and!con"uer and recursion
EsimplicityE that monotonically heads %or the predicate ;simple; 5ill do+ 5hen algorithm A0 5ill %inish the 3ob. E( is
simpleE may mean E( has no elementsE+ in 5hich case A0 may have to do nothing at allL or it may mean E( has
e)actly one elementE+ and A0 may 3ust mark this element as having been visited.
The %ollo5ing sections sho5 e)amples o% divide#and#con0uer algorithms. As 5e 5ill see+ the actual 5orkload is
sometimes distributed une0ually among di%%erent parts o% the algorithm. $n the sorting e)ample+ the step
;8&VcombineH81+ [ + 8kI; re0uires most o% the 5orkL in the ETo5er o% AanoiE problem+ the application o% algorithm
A0 takes the most e%%ort.
Divide)and)con1uer e/pressed as a diagram# merge sort
.uppose that 5e 5ish to sort a se0uence o% names alphabetically+ as sho5n in ")hibit @.1. :e make use o% the
divide#and#con0uer strategy by partitioning a ElargeE se0uence ( into t5o subse0uences (1 and (2+ sorting each
subse0uence+ and then merging them back together into sorted order. This is our algorithm AH(I. $% ( contains at
most one element+ 5e do nothing at all. A0 is the identity algorithm+ A0H(I V (.
")hibit @.1& .orting the se0uence a4+ A+ .+ (b by using a divide#and#con0uer scheme
procedure sort*var @- seAuence+&
var @
"
, @
$
- seAuence&
function comine*@
"
, @
$
- seAuence+- seAuence&
egin { combine }
merge the two sorted seAuences @
"
and @
$
into a single sorted seAuence @,&
return*@,+
end& { combine }
egin { sort}
if W@W K " then { split @ into two seAuences @
"
and @
$
of
eAual siLe&
sort*@
"
+& sort*@
$
+& @ -. comine*@
"
, @
$
+ }
{ if >?> 4 &, ? is trivially sorted, do nothing }
end& { sort }
<>
This book is licensed under a Creative Commons Attribution 3.0 License
$n the chapter on Rsorting and its comple)ityS+ under the section Rmerging and merge sortsS 5e turn this divide#
and#con0uer scheme into a program.
'ecursively defined trees
A tree+ more precisely+ a rooted+ ordered tree+ is a data type used primarily to model any type o% hierarchical
organi6ation. $ts primitive parts are nodes and leaves. $t has a distinguished node called the root+ 5hich+ in
violation o% nature+ is typically dra5n at the top o% the page+ 5ith the tree gro5ing do5n5ard. "ach node has a
certain number o% children+ either leaves or nodesL leaves have no children. The e)act de%inition o% such trees can
di%%er slightly 5ith respect to details and terminology. :e may de%ine a binar tree+ %or e)ample+ by the condition
that each node has either e)actly+ or at most+ t5o children.
The pictorial grammar sho5n in ")hibit @.2 captures this recursive de%inition o% ;binary tree; and %i)es the
details le%t unspeci%ied by the verbal description above. $t uses an alphabet o% three symbols& the nonterminal ;tree
symbol;+ 5hich is also the start symbolL and t5o terminal symbols+ %or ;node; and %or ;lea%;.
")hibit @.2& The three symbols o% the alphabet o% a tree grammar
There are t5o production or re5riting rules+ p1 and p2 H")hibit @.3I. The derivation sho5n in ")hibit @.<
illustrates the application o% the production rules to generate a tree %rom the nonterminal start symbol.
")hibit @.3& 8ule p
1
generates a lea%+ rule p
2
generates a node and t5o ne5 trees
")hibit @.<& Cne 5ay to derive the tree at right
:e may make the production rules more detailed by e)plicitly naming the coordinates associated 5ith each
symbol. Cn a display device such as a computer screen+ the )# and y#values o% a point are typically Cartesian
coordinates 5ith the origin in the upper#le%t corner. The )#values increase to5ard the bottom and the y#values
increase to5ard the right o% the display. Let H)+ yI denote the screen position associated 5ith a particular symbol+
and let d denote the depth o% a node in the tree. The root has depth 0+ and the children o% a node 5ith depth d have
depth d]1. The di%%erent levels o% the tree are separated by some constant distance s. The separation bet5een
siblings is determined by a Hrapidly decreasingI %unction tHdI 5hich takes as argument the depth o% the siblings and
depends on the dra5ing si6e o% the symbols and the resolution o% the screen. These more detailed productions are
sho5n in ")hibit @.@.
Algorithms and Data Structures <7 A ,lobal Te)t
5. ivide!and!con"uer and recursion
")hibit @.@& Adding coordinate in%ormation to productions in order to control graphic layout
The translation o% these t5o rules into high#level code is no5 plain&
procedure p
"
*x, y- coordinate+&
egin
eraseTreeSymol*x, y+&
draw8eafSymol*x, y+
end&
procedure p
$
*x, y- coordinate& d- level+&
egin
eraseTreeSymol*x, y+&
drawDodeSymol*x, y+&
drawTreeSymol*x 9 s, y 0 t*d 9 "++&
drawTreeSymol*x 9 s, y 9 t*d 9 "++
end&
$% 5e choose t(d) = c 2
d
+ these t5o procedures produce the display sho5n in ")hibit @.> o% the tree generated
in ")hibit @.<.
")hibit @.>& .ample layout obtained by halving hori6ontal displacement at each successive level
(echnical remark about the details of defining binar trees: Cur grammar %orces every node to have e)actly t5o
children& A child may be a node or a lea%. This lets us subsume t5o %re0uently occurring classes o% binary trees
under one common de%inition.
1. 2-3 4binar1 trees' :e may identi%y leaves and nodes+ making no distinction bet5een them Hreplace the
s0uares by circles in ")hibit @.3 and ")hibit @.<I. "very node in the ne5 tree no5 has either 6ero or t5o
children+ but not one. The smallest tree has a single node+ the root.
2. 4Arbitrar1 5inar trees' $gnore the leaves Hdrop the s0uares in ")hibit @.3 and ")hibit @.< and the
branches leading into a s0uareI. "very node in the ne5 tree no5 has either 6ero+ one+ or t5o children. The
smallest tree H5hich consisted o% a single lea%I no5 has no node at allL it is empty.
/or clarity;s sake+ the %ollo5ing e)amples use the terminology o% nodes and leaves introduced in the de%ining
grammar. $n some instances 5e point out 5hat happens under the interpretation that leaves are dropped.
<D
This book is licensed under a Creative Commons Attribution 3.0 License
'ecursive tree traversal
8ecursion is a po5er%ul tool %or programming divide#and#con0uer algorithms in a straight%or5ard manner. $n
particular+ 5hen the data to be processed is de%ined recursively+ a recursive processing algorithm that mirrors the
structure o% the data is most natural. The recursive tree traversal procedure belo5 illustrates this point.
(raversing a tree Hin general& a graph+ a data structureI means visiting every node and every lea% in an orderly
se0uence+ beginning and ending at the root. :hat needs to be done at each node and each lea% is o% no concern to
the traversal algorithm+ so 5e merely designate that by a call to a ;procedure visitH I;. Xou may think o% inspecting
the contents o% all nodes andQor leaves+ and 5riting them to a %ile.
8ecursive tree traversals use divide#and#con0uer to decompose a tree into its subtrees& At each node visited
along the 5ay+ the t5o subtrees L and 8 to the le%t and right o% this node must be traversed. There are three natural
5ays to se0uence the node visit and the subtree traversals&
". node& 8& 5 { preorder, or prefix }
$. 8& node& 5 { inorder or infix }
%. 8& 5& node { postorder or suffix }
The %ollo5ing e)ample translates this traversal algorithm into a recursive procedure&
procedure traverse*T- tree+&
{ preorder, inorder, or postorder traversal of tree , with
leaves }
egin
if leaf*T+ then visitleaf*T+
else { , is composite }
{ visit
"
*root*T++&
traverse*leftsutree*T++&
visit
$
*root*T++&
traverse*rightsutree*T+&
visit
%
*root*T++ }
end&
:hen leaves are ignored Hi.e. a tree consisting o% a single lea% is considered to be emptyI+ the procedure body
becomes slightly simpler&
if not empty*T+ then { C }
To accomplish the k#th traversal scheme Hk V 1+ 2+ 3I+ ;visit
k
; per%orms the desired operation on the node+ 5hile
the other t5o visits do nothing. $% all three visits print out the name o% the node+ 5e obtain a se0uence o% node
names called ;triple tree traversal;+ sho5n in ")hibit @.7 along 5ith the three traversal orders o% 5hich it is
composed. (uring the traversal the nodes are visited in the %ollo5ing se0uence&
Algorithms and Data Structures <9 A ,lobal Te)t
5. ivide!and!con"uer and recursion
")hibit @.7& Three standard orders merged into a triple tree traversal
'ecursion versus iteration# the To*er of Hanoi
The ETo5er o% AanoiE is a stack o% n disks o% di%%erent si6es+ held in place by a tall peg H")hibit @.DI. The task is to
trans%er the to5er %rom source peg . to a target peg T via an intermediate peg $+ one disk at a time+ 5ithout ever
placing a larger disk on a smaller one. $n this case the data set ( is a to5er o% n disks+ and the divide#and#con0uer
algorithm A partitions ( asymmetrically into a small Eto5erE consisting o% a single disk Hthe largest+ at the bottom
o% the pileI and another to5er (; Husually larger+ but conceivably emptyI consisting o% the n U 1 topmost disks. The
pu66le is solved recursively in three steps&
". Transfer @, to the intermediate peg !.
$. Move the largest disk to the target peg T.
%. Transfer @, on top of the largest disk at the target peg T.
")hibit @.D& $nitial con%iguration o% the To5er o% Aanoi.
.tep 1 deserves more e)planation. Ao5 do 5e trans%er the n U 1 topmost disks %rom one peg to another? !otice
that they themselves constitute a to5er+ to 5hich 5e may apply the same three#step algorithm. Thus 5e are
presented 5ith successively simpler problems to solve+ namely+ trans%erring the n U 1 topmost disks %rom one peg to
another+ %or decreasing n+ until %inally+ %or n V 0+ 5e do nothing.
procedure Eanoi*n- integer& x, y, L- peg+&
{ transfer a tower with n disks from peg x, via y, to ) }
egin
if n K 0 then { Eanoi*n 0 ", x, L, y+& move*x, L+& Eanoi*n 0
", y, x, L+ }
end&
8ecursion has the advantage o% intuitive clarity. "legant and e%%icient as this solution may be+ there is some
comple)ity hidden in the bookkeeping implied by recursion.
@0
This book is licensed under a Creative Commons Attribution 3.0 License
The %ollo5ing procedure is an e0ually elegant and more e%%icient iterative solution to this problem. $t assumes
that the pegs are cyclically ordered+ and the target peg 5here the disks 5ill %irst come to rest depends on this order
and on the parity o% n H")hibit @.9I. /or odd values o% n+ ;$terativeAanoi; moves the to5er to peg $+ %or even values o%
n+ to peg T.
")hibit @.9& Cyclic order o% the pegs.
procedure !terativeEanoi*n- integer+&
var odd- oolean& { odd represents the parity of the move }
egin
odd -. true&
repeat
case odd of
true- transfer smallest disk cyclically to next peg&
false- make the only legal move leaving the smallest in place
end&
odd -. not odd
until entire tower is on target peg
end&
")ercise& recursive or iterative pictures?
Chapter < presented some beauti%ul e)amples o% recursive pictures+ 5hich 5ould be hard to program 5ithout
recursion. But %or simple recursive pictures iteration is 3ust as natural. .peci%y a convenient set o% graphics
primitives and use them to 5rite an iterative procedure to dra5 ")hibit @.10 to a nesting depth given by a
parameter d.
")hibit @.10& $nterleaved circles and e0uilateral triangles cause the radius to be e)actly halved at each step.
.olution
There are many choices o% suitable primitives and many 5ays to program these pictures. .peci%ying an
e0uilateral triangle by its center and the radius o% its circumscribed circle simpli%ies the notation. Assume that 5e
may use the procedures&
procedure circle*x, y, r- real+& { coordinates of center and
radius }
procedure eAuitr*x, y, r- real+& { center and radius of
circumscribed circle}
Algorithms and Data Structures @1 A ,lobal Te)t
5. ivide!and!con"uer and recursion
procedure citr*x, y, r- real& d- integer+&
var vr- real& { variable radius }
i- integer&
egin
vr -. r&
for i -. " to d do = eAuitr*x, y, vr+& vr -. vrP$& circle*x, y,
vr+ >
{ show that the radius of consecutively nested circles gets
exactly halved at each step }
end&
The flag of Alfanumerica# an algorithmic novel on iteration and recursion
$n the process o% automating its %lag industry+ the *nited .tates o% Al%anumerica announced a competition %or
the most elegant program to print its %lag&
All solutions submitted to the pri6e committee %ell into one o% t5o classes+ the iterative and recursive programs.
The proponents o% these t5o algorithm design principles could not agree on a 5inner+ and the selection process
sparked a civil 5ar that split the nation into t5o& the $terative .tates o% Al%anumerica H$.AI and the 8ecursive .tates
o% Al%anumerica H8.AI. Both nations %ly the same %lag but use entirely di%%erent production algorithms.
1. :rite a
procedure !SO*k- integer+&
to print the $.A %lag+ using an iterative algorithm+ o% course. Assume that k is a po5er o% 2 and k ` Hhal% the
line length o% the printerI.
2. ")plain 5hy the printer industry in 8.A is much more innovative than the one in $.A. All modern 8.A
printers include operations %or positioning the 5riting head any5here 5ithin a line+ and line %eed 5orks
both %or5ard and back5ard.
3. .peci%y the precise operations %or some 8.A printer o% your design. *sing these operations+ 5rite a
recursive
procedure 5SO*k- integer+&
to print the 8.A %lag.
<. ")plain an un%oreseen conse0uence o% this drive to automate the %lag industry o% Al%anumerica& $n both $.A
and 8.A+ a gro5ing number o% %lags can be seen %luttering in the bree6e turned around by 90^.
")ercises
1. :hereas divide#and#con0uer algorithms usually attempt to divide the data in e0ual halves+ the recursive
To5er o% Aanoi procedure presented in the section ;8ecursion versus iteration& The To5er o% AanoiE
divides the data in a very asymmetric manner& a single disk versus n U 1 disks. :hy?
2. 2rove by induction on n that the iterative program ;$terativeAanoi; solves the problem in 2
n
U1 iterations.
@2
****************
******** ********
**** **** **** ****
** ** ** ** ** ** ** **
* * * * * * * * * * * * * * * *
k blanks followed by k stars
twice (k/2 blanks followed by k/2 stars)
= A;
Hn#1I
The e%%iciency o% an algorithm is o%ten measured by the number o% EelementaryE operations that are e)ecuted on
a given data set. The e)ecution time o% an elementary operation Ne.g. the binary boolean operators Hand+ orI used
aboveO does not depend on the operands. To estimate the number o% elementary operations per%ormed in boolean
matri) multiplication as a %unction o% the matri) si6e n+ 5e concentrate on the leading terms and neglect the lesser
terms. Let us use asymptotic notation in an intuitive 5ayL it is de%ined %ormally in 2art $F.
The number o% operations Hand+ orI+ e)ecuted by procedure ;mmb; 5hen multiplying t5o boolean n n matrices
is Hn
3
I since each o% the nested loops is iterated n times. Aence the cost %or computing A;
HnU1I
by repeatedly
multiplying 5ith A; is Hn
<
I. This algorithm can be improved to Hn
3
K log nI by repeatedly s0uaring& A;
2
+ A;
<
+ A;
D
+ [ +
A;
k
5here k is the smallest po5er o% 2 5ith k Z n U 1. $t is not necessary to compute e)actly A;
HnU1I
. $nstead o% A;
13
+ %or
e)ample+ it su%%ices to compute A;
1>
+ the ne)t higher po5er o% 2+ 5hich contains all paths o% length at most 1>. $n a
graph 5ith 1< nodes+ this set is e0ual to the set o% all paths o% length at most 1.
Warshall+s algorithm
$n search o% a %aster algorithm 5e consider other 5ays o% iterating over the set o% all paths. $nstead o% iterating
over paths o% gro5ing length+ 5e iterate over an increasing number o% nodes that may be used along a path %rom
node i to node 3. This idea leads to an elegant algorithm due to :arshall N:ar >2O&
Compute a se0uence o% matrices B0+ B1+ B2+ [ + Bn&
B
0
Ni+ 3O V A;Ni+ 3O V true i%% i V 3 or i 3.
B
1
Ni+ 3O V true i%% i 3 using at most node 1 along the 5ay.
B
2
Ni+ 3O V true i%% i 3 using at most nodes 1 and 2 along the 5ay
[
B
k
Ni+ 3O V true i%% i 3 using at most nodes 1+ 2+ [ + k along the 5ay.
The matrices B0+ B1+ [ e)press the e)istence o% paths that may touch an increasing number o% nodes along the
5ay %rom node i to node 3L thus Bn talks about unrestricted paths and is the connectivity matri) C V Bn.
An iteration step BkU1 Bk is computed by the %ormula
Algorithms and Data Structures 9@ A ,lobal Te)t
11. /atrices and graphs: transitive closure
BkNi+ 3O V BkU1Ni+ 3O or HBkU1Ni+ kO and BkU1Nk+ 3OI.
The cost %or per%orming one step is Hn
2
I+ the cost %or computing the connectivity matri) is there%ore Hn
3
I. A
comparison o% the %ormula %or :arshall;s algorithm 5ith the %ormula %or matri) multiplication sho5s that the n#ary
;C8; has been replaced by a binary ;or;.
At %irst sight+ the %ollo5ing procedure appears to e)ecute the algorithm speci%ied above+ but a closer look reveals
that it e)ecutes something else& the assignment in the innermost loop computes ne5 values that are used
immediately+ instead o% the old ones.
procedure warshall*var a- nnoolean+&
var i, /, k- integer&
egin
for k -. " to n do
for i -. " to n do
for / -. " to n do
aFi, /G -. aFi, /G or *aFi, kG and aFk, /G+
{ this assignment mixes values of the old and new matrix }
end&
A more thorough e)amination+ ho5ever+ sho5s that this EnaivelyE programmed procedure computes the correct
result in#place more e%%iciently than 5ould direct application o% the %ormulas %or the matrices Bk. :e encourage you
to veri%y that the replacement o% old values by ne5 ones leaves intact all values needed %or later stepsL that is+ sho5
that the %ollo5ing e0ualities hold&
B
k
Ni+ kO V B
kU1
Ni+ kO and B
k
Nk+ 3O V B
kU1
Nk+ 3O.
")ercise& distances in a directed graph+ /loyd;s algorithm
'odi%y :arshall;s algorithm so that it computes the shortest distance bet5een any pair o% nodes in a directed
graph 5here each arc is assigned a length Z 0. :e assume that the data is given in an n n array o% reals+ 5here dNi+
3O is the length o% the arc bet5een node i and node 3. $% no arc e)ists+ then dNi+ 3O is set to _+ a constant that is the
largest real number that can be represented on the given computer. :rite a procedure ;dist; that 5orks on an array
d o% type
type nnreal . arrayF" .. n, " .. nG of real&
Think o% the meaning o% the boolean operations ;and; and ;or; in :arshall;s algorithm+ and %ind arithmetic
operations that play an analogous role %or the problem o% computing distances. ")plain your reasoning in 5ords
and pictures.
.olution
The %ollo5ing procedure ;dist; implements /loyd;s algorithm N/lo >2O. :e assume that the length o% a
none)istent arc is _+ that ) ] _ V _+ and that minH)+ _I V ) %or all ).
procedure dist*var d- nnreal+&
var i, /, k- integer&
egin
for k -. " to n do
for i -. " to n do
for / -. " to n do
dFi, /G -. min*dFi, /G, dFi, kG 9 dFk, /G+
end&
9>
This book is licensed under a Creative Commons Attribution 3.0 License
")ercise& shortest paths
$n addition to the distance dNi+ 3O o% the preceding e)ercise+ 5e 5ish to compute a shortest path %rom i to 3 Hi.e.
one that reali6es this distanceI. ")tend the solution above and 5rite a procedure ;shortestpath; that returns its
result in an array ;ne)t; o% type&
type nnn . arrayF" .. n, " .. nG of 0 .. n&
nextFi,/G contains the next node after i on a shortest path from i to
/, or 0 if no such path exists.
.olution
procedure shortestpath*var d- nnreal& var next- nnn+&
var i, /, k- integer&
egin
for i -. " to n do
for / -. " to n do
if dFi, /G T R then nextFi, /G -. / else nextFi, /G -.
0&
for k -. " to n do
for i -. " to n do
for / -. " to n do
if dFi, kG 9 dFk, /G S dFi, /G then
{ dFi, /G -. dFi, kG 9 dFk, /G& nextFi, /G -. nextFi, kG
}
end&
$t is easy to prove that ne)tNi+ 3O V 0 at the end o% the algorithm i%% dNi+ 3O V _ Hi.e. there is no path %rom i to 3I.
@inimum spanning tree in a graph
Consider a weighted graph , V HF+ "+ 5I+ 5here F V av1+ [+ vnb is the set o% vertices+ " V ae1+ [ + emb is the set o%
edges+ each edge ei is an unordered pair Hv3+ vkI o% vertices+ and 5& " 8 assigns a real number to each edge+ 5hich
5e call its 5eight. :e consider only connected graphs ,+ in the sense that any pair Hv3+ vkI o% vertices is connected by
a se0uence o% edges. $n the %ollo5ing e)ample+ the edges are labeled 5ith their 5eight H")hibit 11.2I.
")hibit 11.2& ")ample o% a minimum spanning tree.
A tree T is a connected graph that contains no circuits& any pair Hv3+ vkI o% vertices in T is connected by a uni0ue
se0uence o% edges. A spanning tree o% a graph , is a subgraph T o% ,+ given by its set o% edges "T
"+ that is a tree
and satis%ies the additional condition o% being ma)imal+ in the sense that no edge in " j "T can be added to T
5ithout destroying the tree property. "bservation: a connected graph , has at least one spanning tree. The weight
Algorithms and Data Structures 97 A ,lobal Te)t
11. /atrices and graphs: transitive closure
o% a spanning tree is the sum o% the 5eights o% all its edges. A minimum spanning tree is a spanning tree o% minimal
5eight. $n ")hibit 11.2+ the bold edges %orm the minimal spanning tree.
Consider the %ollo5ing t5o algorithms&
*row)
"
T
&V L ? initiali&e to empt set @ 5hile T is not a spanning tree do "
T
&V "
T
aa min cost edge that does
not %orm a circuit 5hen added to "
T
b
+hrink)
"
T
&V "L ? initiali&e to set of all edges @ 5hile T is not a spanning tree do "
T
&V "
T
j aa ma) cost edge that
leaves T connected a%ter its removalb
0laim& The Egro5ing algorithmE and Eshrinking algorithmE determine a minimum spanning tree.
$% T is a spanning tree o% , and e V Hv3+ vkI "T+ 5e de%ine CktHe+ TI+ Ethe circuit %ormed by adding e to TE as the
set o% edges in "T that %orm a path %rom v3 to vk. $n the e)ample o% ")hibit 11.2 5ith the spanning tree sho5n in bold
edges 5e obtain CktHHv<+ v@I+ TI V aHv<+ v1I+ Hv1+ v2I+ Hv2+ v@Ib.
")ercise
.ho5 that %or each edge e "T there e)ists e)actly one such circuit. .ho5 that %or any e "T and any t CktHe+
TI the graph %ormed by H"T j atbI aeb is still a spanning tree.
A local minimum spanning tree o% , is a spanning tree T 5ith the property that there e)ist no t5o edges e "T +
t CktHe+ TI 5ith 5HeI e 5HtI.
Consider the %ollo5ing ;e)change algorithm;+ 5hich computes a local minimum spanning tree&
,xchange)
T &V any spanning treeL
5hile there e)ists e "
T
+ t CktHe+ TI 5ith 5HeI e 5HtI do
"
T
&V H"
T
j atbI aebL ? e*change @
Theorem: A local minimum spanning tree %or a graph , is a minimum spanning tree.
/or the proo% o% this theorem 5e need&
Lemma: $% T; and TE are arbitrary spanning trees %or ,+ T; f TE+ then there e)ist eE "
T
;
+ e; "
T
E
+ such that eE
CktHe;+ TEI and e; CktHeE+ T;I.
Proof: .ince T; and TE are spanning trees %or , and T; f TE+ there e)ists eE "
T
E
j "
T
;
. Assume that CktHeE+ T;I
T
E
. Then eE and the edges in CktHeE+ T;I %orm a circuit in TE that contradicts the assumption that TE is a tree. Aence
there must be at least one e; CktHeE+ T;I j "
T
E
.
Assume that %or all e; CktHeE+ T;I j "
T
E
5e have eE CktHe;+ TEI. Then
%orms a circuit in TE that contradicts the proposition that TE is a tree. Aence there must be at least one e; CktHeE+
T;I j "TE 5ith eE CktHe;+ TEI.
9D
This book is licensed under a Creative Commons Attribution 3.0 License
Proof of the Theorem: Assume that T; is a local minimum spanning tree. Let TE be a minimum spanning tree.
$% T; f TE the lemma implies the e)istence o% e; CktHeE+ T;I j "TE and eE CktHe;+ TEI j "T;.
$% 5He;I e 5HeEI+ the graph de%ined by the edges H"TE j aeEbI ae;b is a spanning tree 5ith lo5er 5eight than TE.
.ince TE is a minimum spanning tree+ this is impossible and it %ollo5s that
5He;I Z5 HeEI. HI
$% 5He;I d 5HeEI+ the graph de%ined by the edges H"T; j ae;bI aeEb is a spanning tree 5ith lo5er 5eight than T;.
.ince TE is a local minimum spanning tree+ this is impossible and it %ollo5s that
5He;I ` 5HeEI. HI
/rom HI and HI it %ollo5s that 5He;I V 5HeEI must hold. The graph de%ined by the edges H"TE j aeEbI ae;b is
still a spanning tree that has the same 5eight as TE. :e replace TE by this ne5 minimum spanning tree and
continue the replacement process. .ince T; and TE have only %initely many edges the process 5ill terminate and TE
5ill become e0ual to T;. This proves that TE is a minimum spanning tree.
The theorem implies that the tree computed by ;")change; is a minimum spanning tree.
")ercises
1. Consider ho5 to e)tend the transitive closure algorithm based on boolean matri) multiplication so that it
computes HaI distances and HbI a shortest path.
2. 2rove that the algorithms ;,ro5; and ;.hrink; compute local minimum spanning trees. Thus they are
minimum spanning trees by the theorem o% the section entitled R'inimum spanning tree in a graphS.
Algorithms and Data Structures 99 A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
%-& "ntegers
Learning ob3ectives&
integers and their operations
"uclidean algorithm
.ieve o% "ratosthenes
large integers
modular arithmetic
Chinese remainder theorem
random numbers and their generators
:perations on integers
/ive basic operations account %or the lion;s share o% integer arithmetic&
] U K div mod
The product ;) K y;+ the 0uotient ;) div y;+ and the remainder ;) mod y; are related through the %ollo5ing div-mod
identit&
H1I H) div yI K y ] H) mod yI V ) %or y f 0.
'any programming languages provide these %ive operations+ but un%ortunately+ ;mod; tends to behave
di%%erently not only bet5een di%%erent languages but also bet5een di%%erent implementations o% the same language.
Ao5 come have 5e not learned in school 5hat the remainder o% a division is?
The div#mod identity+ a cornerstone o% number theory+ de%ines ;mod; assuming that all the other operations are
de%ined. $t is mostly used in the conte)t o% nonnegative integers ) Z 0+ y d 0+ 5here everything is clear+ in particular
the convention 0 ` ) mod y e y. Cne hal% o% the domain o% integers consists o% negative numbers+ and there are good
reasons %or e)tending all %ive basic operations to the domain o% all integers H5ith the possible e)ception o% y V 0I+
such as&
Any operation 5ith an unde%ined result hinders the portability and testing o% programs& i% the E%orbiddenE
operation does get e)ecuted by mistake+ the computation may get into nonrepeatable states. ")ample& %rom
a practical point o% vie5 it is better not to leave ;) div 0; unde%ined+ as is customary in mathematics+ but to
de%ine the result as ;V over%lo5;+ a %eature typically supported in hard5are.
.ome algorithms that 5e usually consider in the conte)t o% nonnegative integers have natural e)tensions
into the domain o% all integers Hsee the %ollo5ing sections on ;gcd; and modular number representationsI.
*n%ortunately+ the attempt to e)tend ;mod; to the domain o% integers runs into the problem mentioned above&
Ao5 should 5e de%ine ;div; and ;mod;? Let;s %ollo5 the standard mathematical approach o% listing desirable
properties these operations might possess. $n addition to the EsacredE div#mod identity H1I 5e consider&
H2I .ymmetry o% div& HU)I div y V ) div HUyI V UH) div yI.
The most plausible 5ay to e)tend ;div; to negative numbers.
Algorithms and Data Structures 100 A ,lobal Te)t
12. 0ntegers
H3I A constraint on the possible values assumed by ;) mod y;+ 5hich+ %or y d 0+ reduces to the convention o%
nonnegative remainders&
0 ` ) mod y e y.
This is important because a standard use o% ;mod; is to partition the set o% integers into y residue classes. :e
consider a 5eak and a strict re0uirement&
H3;I !umber o% residue classes V \y\& %or given y and varying )+ ;) mod y; assumes e)actly \y\ distinct values.
H3EI $n addition+ 5e ask %or nonnegative remainders& 0 ` ) mod y e \y\.
2ondering the conse0uences o% these desiderata+ 5e soon reali6e that ;div; cannot be e)tended to negative
arguments by means o% symmetry. "ven the relatively innocuous case o% positive denominator y d 0 makes it
impossible to preserve both H2I and H3EI+ as the %ollo5ing %ailed attempt sho5s&
HHU3I div 2I K 2 ] HHU3I mod 2I ?V? U3 2reserving H1I
HUH3 div 2II K 2 ] 1 ?V? U3 and using H2I and H3EI
HU1I K 2 ] 1 f U3 [ %ailsM
"ven the 5eak condition H3;I+ 5hich 5e consider essential+ is incompatible 5ith H2I. /or y V U2+ it %ollo5s %rom
H1I and H2I that there are three residue classes modulo HU2I& ) mod HU2I yields the values 1+ 0+ U1L %or e)ample+
1 mod HU2I V 1+ 0 mod HU2I V 0+ HU1I mod HU2I V U1.
This does not go 5ith the %act that ;) mod 2; assumes only the t5o values 0+ 1. .ince a reasonable partition into
residue classes is more important than the super%icially appealing symmetry o% ;div;+ 5e have to admit that H2I 5as
3ust 5ish%ul thinking.
:ithout giving any reasons+ N-nu 73aO Hsee the chapter E8educing a task to given primitivesL programming
motionI de%ines ;mod; by means o% the div#mod identity H1I as %ollo5s&
) mod y V ) U y K ) Q y+ i% y f 0L ) mod 0 V )L
Thus he implicitly de%ines ) div y V ) Q y+ 5here 6+ the E%loorE o% 6+ denotes the largest integer ` 6L the EceilingE
6 denotes the smallest integer Z 6. -nuth e)tends the domain o% ;mod; even %urther by de%ining E) mod 0 V )E.
:ith the e)ception o% this special case y V 0+ -nuth;s de%inition satis%ies H3;I& !umber o% residue classes V \y\. The
de%inition does not satis%y H3EI+ but a slightly more complicated condition. /or given y f 0+ 5e have 0 ` ) mod y e y+
i% y d 0L and 0 Z ) mod y d y+ i% y e 0. -nuth;s de%inition o% ;div; and ;mod; has the added advantage that it holds %or
real numbers as 5ell+ 5here ;mod; is a use%ul operation %or e)pressing the periodic behavior o% %unctions Ne.g. tan )
V tan H) mod IO.
")ercise& another de%inition o% ;div; and ;mod;
1. .ho5 that the de%inition
in con3unction 5ith the div#mod identity H1I meets the strict re0uirement H3EI.
101
This book is licensed under a Creative Commons Attribution 3.0 License
.olution
")ercise
/ill out comparable tables o% values %or -nuth;s de%inition o% ;div; and ;mod;.
.olution
The 6uclidean algorithm
A %amous algorithm %or computing the greatest common divisor HgcdI o% t5o natural numbers appears in Book 7
o% "uclid;s "lements Hca. 300 BCI. $t is based on the identity gcdHu+ vI V gcdHu U v+ vI+ 5hich can be used %or u d v to
reduce the si6e o% the arguments+ until the smaller one becomes 0.
:e use these properties o% the greatest common divisor o% t5o integers u and v d 0&
gcdHu+ 0I V u By convention this also holds %or u V 0.
gcdHu+ vI V gcdHv+ uI 2ermutation o% arguments+ important %or the termination o% the %ollo5ing procedure.
gcdHu+ vI V gcdHv+ u U 0 K vI /or any integer 0.
The %ormulas above translate directly into a recursive procedure&
Algorithms and Data Structures 102 A ,lobal Te)t
12. 0ntegers
function gcd*u, v- integer+- integer&
egin
if v . 0 then return*u+ else return*gcd*v, u mod v++
end&
A test %or the relative si6e o% u and v is unnecessary. $% initially u e v+ the %irst recursive call permutes the t5o
arguments+ and therea%ter the %irst argument is al5ays larger than the second.
This simple and concise solution has a relatively high implementation cost. A stack+ introduced to manage the
recursive procedure calls+ consumes space and time. $n addition to the operations visible in the code Htest %or
e0uality+ assignment+ and ;mod;I+ hidden stack maintenance operations are e)ecuted. There is an e0ually concise
iterative version that re0uires a bit more thinking and 5riting+ but is signi%icantly more e%%icient&
function gcd*u, v- integer+- integer&
var r- integer&
egin
while v T 0 do { r -. u mod v& u -. v& v -. r }&
return*u+
end&
The prime number sieve of 6ratosthenes
The oldest and best#kno5n algorithm o% type sieve is named a%ter "ratosthenes Hca. 200 BCI. A set o% elements is
to be separated into t5o classes+ the EgoodE ones and the EbadE ones. As is o%ten the case in li%e+ bad elements are
easier to %ind than good ones. A sieve process successively eliminates elements that have been recogni6ed as badL
each element eliminated helps in identi%ying %urther bad elements. Those elements that survive the epidemic must
be good.
.ieve algorithms are o%ten applicable 5hen there is a striking asymmetry in the comple)ity or length o% the
proo%s o% the t5o assertions Ep is a good elementE and Ep is a bad elementE. This theme occurs prominently in the
comple)ity theory o% problems that appear to admit only algorithms 5hose time re0uirement gro5s %aster than
polynomially in the si6e o% the input H!2 completenessI. Let us illustrate this asymmetry in the case o% prime
numbers+ %or 5hich "ratosthenes; sieve is designed. $n this analogy+ EprimeE is EgoodE and EnonprimeE is EbadE.
A prime is a positive integer greater than 1 that is divisible only by 1 and itsel%. Thus primes are de%ined in terms
o% their lack o% an easily veri%ied property& a prime has no %actors other than the t5o trivial ones. To prove that 1 >7@
307 <19 is not prime+ it su%%ices to e)hibit a pair o% %actors&
1 >7@ 307 <19 V 1 23< @>7 K 1 3@7.
This veri%ication can be done by hand. The proo% that 2
17
U 1 is prime+ on the other hand+ is much more elaborate.
$n general H5ithout kno5ledge o% any special property this particular number might haveI one has to veri%y+ %or
each and every number that 0uali%ies as a candidate %actor+ that it is not a %actor. This is obviously more time
consuming than a mere multiplication.
")hibiting %actors through multiplication is an e)ample o% 5hat is sometimes called a Eone#5ayE or EtrapdoorE
%unction& the %unction is easy to evaluate H3ust one multiplicationI+ but its inverse is hard. $n this conte)t+ the
inverse o% multiplication is not division+ but rather %actori6ation. 'uch o% modern cryptography relies on the
di%%iculty o% %actori6ation.
The prime number sieve o% "ratosthenes 5orks as %ollo5s. :e mark the smallest prime+ 2+ and erase all o% its
multiples 5ithin the desired range 1 .. n. The smallest remaining number must be primeL 5e mark it and erase its
103
This book is licensed under a Creative Commons Attribution 3.0 License
multiples. :e repeat this process %or all numbers up to n& $% an integer c e n can be %actored+ c V a K b+ then at least
one o% the %actors is en.
{ sieve of Gratosthenes marks all the primes in & "" n }
const n . C &
var sieve- packed array F$ .. nG of oolean&
p, sArtn, i- integer&
C
egin
for i -. $ to n do sieveFiG -. true& { initiali)e the
sieve }
sArtn -. trunc*sArt*n++&
{ it suffices to consider as divisors the numbers up to n }
p -. $&
while p J sArtn do egin
i -. p 1 p&
while i J n do { sieveFiG -. false& i -. i 9 p }&
repeat p -. p 9 " until sieveFpG&
end&
end&
Aarge integers
The range o% numbers that can be represented directly in hard5are is typically limited by the 5ord length o% the
computer. /or e)ample+ many small computers have a 5ord length o% 1> bits and thus limit integers to the range U
2
1@
` a e ]2
1@
V327>D. :hen the built#in number system is insu%%icient+ a variety o% so%t5are techni0ues are used to
e)tend its range. They di%%er greatly 5ith respect to their properties and intended applications+ but all o% them come
at an additional cost in memory and+ above all+ in the time re0uired %or per%orming arithmetic operations. Let us
mention the most common techni0ues.
%ouble-length or double-precision integers. T5o 5ords are used to hold an integer that s0uares the
available range as compared to integers stored in one 5ord. /or a 1>#bit computer 5e get 32#bit integers+ %or a 32#
bit computer 5e get ><#bit integers. Cperations on double#precision integers are typically slo5er by a %actor o% 2 to
<.
.ariable precision integers. The idea above is e)tended to allocate as many 5ords as necessary to hold a
given integer. This techni0ue is used 5hen the si6e o% intermediate results that arise during the course o% a
computation is unpredictable. $t calls %or list processing techni0ues to manage memory. The time o% an operation
depends on the si6e o% its arguments& linearly %or addition+ mostly 0uadratically %or multiplication.
Packed /0% integers. This is a compromise bet5een double precision and variable precision that comes %rom
commercial data processing. The programmer de%ines the ma)imal si6e o% every integer variable used+ typically by
giving the ma)imal number o% decimal digits that may be needed to e)press it. The compiler allocates an array o%
bytes to this variable that contains the %ollo5ing in%ormation& ma)imal length+ current length+ sign+ and the digits.
The latter are stored in BC( Hbinary#coded decimalI representation& a decimal digit is coded in < bits+ t5o o% them
are packed into a byte. 2acked BC( integers are e)pensive in space because most o% the time there is unused
allocated spaceL and even more so in time+ due to digit#by#digit arithmetic. They are unsuitable %or lengthy
scienti%icQtechnical computations+ but C- %or $QC#intensive data processing applications.
Algorithms and Data Structures 10< A ,lobal Te)t
12. 0ntegers
@odular number systems# the poor man+s large integers
'odular arithmetic is a special#purpose techni0ue 5ith a narro5 range o% applications+ but is e)tremely e%%icient
5here it appliesYtypically in combinatorial and number#theoretic problems. $t handles addition+ and particularly
multiplication+ 5ith une0ualed e%%iciency+ but lacks e0ually e%%icient algorithms %or division and comparison.
Certain combinatorial problems that re0uire high precision can be solved 5ithout divisions and 5ith %e5
comparisonsL %or these+ modular numbers are unbeatable.
Chinese Remainder Theorem: Let m1+ m2+ [ + mk be pair5ise relatively prime positive integers+ called
moduli. Let m V m1 K m2 K [ K mk be their product. ,iven k positive integers r1+ r2+ [ + rk+ called residues+ 5ith 0 ` ri
e
mi %or 1 ` i ` rk+ there e)ists e)actly one integer r+ 0 ` r e m+ such that r mod mi
V ri %or 1 ` i ` k.
The Chinese remainder theorem is used to represent integers in the range 0 ` r e m uni0uely as k#tuples o% their
residues modulo mi. :e denote this number representation by
r k Nr1+ r2+ [ + rkO.
The practicality o% modular number systems is based on the %ollo5ing %act& The arithmetic operations H] + U + KI
on integers r in the range 0 ` re m are represented by the same operations+ applied component5ise to k#tuples Nr1+
r2+ [ + rkO. A modular number system replaces a single ]+ U+ or K in a large range by k operations o% the same type in
small ranges.
!f r [ Fr
"
, r
$
, C , r
k
G, s [ Fs
"
, s
$
, C , s
k
G, t [ Ft
"
, t
$
, C , t
k
G,
then-
*r 9 s+mod m . t *r
i
9 s
i
+ mod m
i
. t
i
for " J i J k,
*r 0 s+mod m . t *r
i
0 s
i
+ mod m
i
. t
i
for " J i J k,
*r 1 s+mod m . t *r
i
1 s
i
+ mod m
i
. t
i
for " J i J k.
")ample
m1 V 2 and m2 V @+ hence m V m1 K m2 V 2 K @ V 10. $n the %ollo5ing table the numbers r in the range 0 .. 9 are
represented as pairs modulo 2 and modulo @.
Let r V 2 and s V 3+ hence r K s V >. $n modular representation& r k N0+ 2O+ s k N1+ 3O+ hence r K s k N0+ 1O.
A use%ul modular number system is %ormed by the moduli
m
1
V 99+ m
2
V 100+ m
3
V 101+ hence m V m
1
K m
2
K m
3
V 999900.
!early a million integers in the range 0 ` r e 999900 can be represented. The conversion o% a decimal number
to its modular %orm is easily computed by hand by adding and subtracting pairs o% digits as %ollo5s&
r mod 99& Add pairs o% digits+ and take the resulting sum mod 99.
r mod 100& Take the least signi%icant pair o% digits.
r mod 101& Alternatingly add and subtract pairs o% digits+ and take the result mod 101.
The largest integer produced by operations on components is 100
2
k 2
13
L it is smaller than 2
1@
V 327>D k 32k and
thus causes no over%lo5 on a computer 5ith 1>#bit arithmetic.
10@
This book is licensed under a Creative Commons Attribution 3.0 License
")ample
r V 123<@>
r mod 99 V H@> ] 3< ] 12I mod 99 V 3
r mod 100 V @>
r mod 101 V H@> U 3< ] 12I mod 101 V 3<
r k N3+ @>+ 3<O
s V >@<321
s mod 99 V H21 ] <3 ] >@I mod 99 V 30
s mod 100 V 21
s mod 101 V H21 U <3 ] >@I mod 101 V <3
s k N30+ 21+ <3O
r ] s k N3+ @>+ 3<O ] N30+ 21+ <3O V N33+ 77+ 77O
'odular arithmetic has some shortcomings& division+ comparison+ over%lo5 detection+ and conversion to
decimal notation trigger intricate computations.
")ercise& /ibonacci numbers and modular arithmetic
The se0uence o% /ibonacci numbers
0+ 1+ 1+ 2+ 3+ @+ D+ 13+ 21+ 3<+ @@+ D9+ 1<<+ 233+ [
is de%ined by
)0 V 0+ )1 V 1+ )n V )nU1 ] )nU2 %or n Z 2.
:rite HaI a recursive %unction HbI an iterative %unction that computes the n#th element o% this se0uence. *sing
modular arithmetic+ compute /ibonacci numbers up to 10
D
on a computer 5ith 1>#bit integer arithmetic+ 5here the
largest integer is 2
1@
U 1 V 327>7.
HcI *sing moduli m1 V 999+ m2 V 1000+ m3 V 1001+ 5hat is the range o% the integers that can be represented
uni0uely by their residues Nr1+ r2+ r3O 5ith respect to these moduli?
HdI (escribe in 5ords and %ormulas ho5 to compute the triple Nr1+ r1+ r3O that uni0uely represents a number
r in this range.
HeI 'odi%y the %unction in HbI to compute /ibonacci numbers in modular arithmetic 5ith the moduli 999+
1000+ and 1001. *se the declaration
type triple . array F" .. %G of integer&
and write the procedure
procedure modfi*n- integer& var r- triple+&
.olution
*a+ function fi*n- integer+- integer&
egin
if n J " then return*n+ else return*fi*n 0 "+ 9 fi*n 0 $++
end&
*+ function fi*n- integer+- integer&
var p, A, r, i- integer&
egin
if n J " then return*n+
Algorithms and Data Structures 10> A ,lobal Te)t
12. 0ntegers
else egin
p -. 0& A -. "&
for i -. $ to n do { r -. p 9 A& p -. A& A -. r }&
return*r+
end
end&
HcI The range is 0 .. m U 1 5ith m V m1 K m2 K m3 V 999 999 000.
HdI r V d1 K 1 000 000 ] d2 K 1000 ] d
3
5ith 0 ` d1+ d2+ d3 ` 999
1 000 000 V 999 999 ] 1V 1001 K 999 ] 1
1000 V 999 ] 1 V 1001 U 1
r1 V r mod 999 V Hd1 ] d2 ] d3I mod 999
r2 V r mod 1000 V d3
r3 V r mod 1001 V Hd1 U d2 ] d3I mod 1001
*e+ procedure modfi*n- integer& var r- triple+&
var p, A- triple&
i, /- integer&
egin
if n J " then
for / -. " to % do rF/G -. n
else egin
for / -. " to % do { pF/G -. 0& AF/G -. " }&
for i -. $ to n do egin
for / -. " to % do r F/G -. *pF/G 9 AF/G+ mod *992 9 /+&
p -. A& A -. r
end
end
end&
'andom numbers
The collo0uial meaning o% the term at random o%ten implies EunpredictableE. But random numbers are used in
scienti%icQtechnical computing in situations 5here unpredictability is neither re0uired nor desirable. :hat is
needed in simulation+ in sampling+ and in the generation o% test data is not unpredictability but certain statistical
properties. A random number generator is a program that generates a se0uence o% numbers that passes a number
o% speci%ied statistical tests. Additional re0uirements include& it runs %ast and uses little memoryL it is portable to
computers that use a di%%erent arithmeticL the se0uence o% random numbers generated can be reproduced Hso that a
test run can be repeated under the same conditionsI.
$n practice+ random numbers are generated by simple %ormulas. The most 5idely used class+ linear congruential
generators+ given by the %ormula
ri]1 V Ha K ri ] cI mod m
are characteri6ed by three integer constants& the multiplier a+ the increment c+ and the modulus m. The se0uence is
initiali6ed 5ith a seed r0.
All these constants must be chosen care%ully. Consider+ as a bad e)ample+ a %ormula designed to generate
random days in the month o% /ebruary&
r0 V 0+ ri]1 V H2 K ri ] 1I mod 2D.
$t generates the se0uence 0+ 1+ 3+ 7+ 1@+ 3+ 7+ 1@+ 3+ [ . .ince 0 ` ri e m+ each generator o% the %orm above
generates a se0uence 5ith a pre%i) o% length e m 5hich is %ollo5ed by a period o% length ` m. $n the e)ample+ the
107
This book is licensed under a Creative Commons Attribution 3.0 License
pre%i) 0+ 1 o% length 2 is %ollo5ed by a period 3+ 7+ 1@ o% length 3. *sually 5e 5ant a long period. 8esults %rom
number theory assert that a period o% length m is obtained i% the %ollo5ing conditions are met&
m is chosen as a prime number.
Ha U 1I is a multiple o% m.
m does not divide c.
")ample
r
0
. 0, r
i9"
. *2 1 r
i
9 "+ mod )
generates a seAuence- 0, ", $, %, #, (, ', 0, C with a period of length
).
.hall 5e accept this as a se0uence o% random integers+ and i% not+ 5hy not? .hould 5e pre%er the se0uence <+ 1+ >+
2+ 3+ 0+ @+ <+ [ ?
/or each application o% random numbers+ the programmerQanalyst has to identi%y the important statistical
properties re0uired. *nder normal circumstances these include&
!o periodicity over the length o% the se0uence actually used. E*ample: to generate a se0uence o% 100 random
5eekdays a.u+ 'o+ [ + .atb+ do not pick a generator 5ith modulus 7+ 5hich can generate a period o% length at
most 7L pick one 5ith a period much longer than 100.
A desired distribution+ most o%ten the uni%orm distribution. $% the range 0 .. m U 1 is partitioned into k e0ually
si6ed intervals $1+ $2+ [ + $k+ the numbers generated should be uni%ormly distributed among these intervalsL this must
be the case not only at the end o% the period Hthis is trivially so %or a generator 5ith ma)imal period mI+ but %or any
initial part o% the se0uence.
'any 5ell#kno5n statistical tests are used to check the 0uality o% random number generators. The run test Hthe
lengths o% monotonically increasing and monotonically decreasing subse0uences must occur 5ith the right
%re0uenciesIL the gap test Hgiven a test interval called the EgapE+ ho5 many consecutively generated numbers %all
outside?IL the permutation test Hpartition the se0uence into subse0uences o% t elementsL there are tM possible
relative orderings o% elements 5ithin a subse0uenceL each o% these orderings should occur about e0ually o%tenI.
")ercise& visuali6ation o% random numbers
:rite a program that lets its user enter the constants a+ c+ m+ and the seed r0 %or a linear congruential generator+
then displays the numbers generated as dots on the screen& A pair o% consecutive random numbers is interpreted as
the H)+ yI#coordinates o% the dot. Xou 5ill observe that most generators you enter have obvious %la5s& our visual
system is an e)cellent detector o% regular patterns+ and most regularities correspond to undesirable statistical
properties.
The point made above is substantiated in N2' DDO.
The %ollo5ing simple random number generator and some o% its properties are easily memori6ed&
r0 V 1+ ri]1
V 12@ K ri mod D192.
1. D192 V 2
13
+ hence the remainder mod D192 is represented by the 13 least signi%icant bits.
2. 12@ V 127 U 2 V H1111101I in binary representation.
3. Arithmetic can be done 5ith 1>#bit integers 5ithout over%lo5 and 5ithout regard to the representation o%
negative numbers.
Algorithms and Data Structures 10D A ,lobal Te)t
12. 0ntegers
<. The numbers rk generated are e)actly those in the range 0 ` rk
e D192 5ith rk
mod < V 1 Hi.e. the period has
length 211 V 20<DI.
@. $ts statistical properties are described in N-ru >9O+ N-nu D1O contains the most comprehensive treatment o%
the theory o% random number generators.
As a conclusion o% this brie% introduction+ remember an important rule o% thumb&
$ever choose a random number generator at random2
")ercises
1. :ork out the details o% implementing double#precision+ variable#precision+ and BC( integer arithmetic+ and
estimate the time re0uired %or each operation as compared to the time o% the same operation in single
precision. /or variable precision and BC(+ introduce the length L o% the representation as a parameter.
2. The least common multiple HlcmI o% t5o integers u and v is the smallest integer that is a multiple o% u and v.
(esign an algorithm to compute lcmHu+ vI.
3. The prime decomposition o% a natural number n d 0 is the Huni0ueI multiset 2(HnI V Np1+ p2+ [ + pkO o%
primes pi 5hose product is n. A multiset di%%ers %rom a set in that elements may occur repeatedly He.g.
2(H12I V N2+ 2+ 3OI. (esign an algorithm to compute 2(HnI %or a given n d 0.
<. :ork out the details o% modular arithmetic 5ith moduli 9+ 10+ 11.
@. Among the 9@ linear congruential random number generators given by the %ormula ri]1 V a K ri mod m+
5ith prime modulus m V 97 and 1 e a e 97+ %ind out ho5 many get dis0uali%ied Eat %irst sightE by a simple
visual test. Consider that the period o% these 8!,s is at most 97.
109
This book is licensed under a Creative Commons Attribution 3.0 License
%0& 'eals
Learning ob3ectives&
%loating#point numbers and their properties
pit%alls o% numeric computation
Aorner;s method
bisection
!e5ton;s method
8loating)point numbers
+eal numbers+ those declared to be o% type 8"AL in a programming language+ are represented as %loating#point
numbers on most computers. A %loating#point number 6 is represented by a HsignedI mantissa m and a HsignedI
e)ponent e 5ith respect to a base b& 6Vl mKb
le
He.g. 6V]0.11K2
U1
I. This section presents a very brie% introduction to
%loating#point arithmetic. :e recommend N,ol91O as a comprehensive survey.
/loating#point numbers can only appro)imate real numbers+ and in important 5ays+ they behave di%%erently.
The ma3or di%%erence is due to the %act that any %loating#point number system is a finite number sstem+ as the
mantissa m and the e)ponent e lie in a bounded range. Consider+ as a simple e)ample+ the %ollo5ing number
system&
6 V l0.b
1
b
2
K 2
le
+ 5here b
1
+ b
2
+ and e may take the values 0 and 1.
The number representation is not uni,ue& The same real number may have many di%%erent representations+
arranged in the %ollo5ing table by numerical value HlinesI and constant e)ponent HcolumnsI.
1.@ ] 0.11 K 2
]1
1.0 ] 0.10 K 2
]1
0.7@ ] 0.11 K 2
l0
0.@ ] 0.01 K 2
]1
] 0.10 K 2
l0
0.37@ ]0.11 K 2
U1
0.2@ ] 0.01 K 2
l0
]0.10 K 2
U1
0.12@ ]0.01 K 2
U1
0. ]0.00 K 2
]1
] 0.00 K 2
l0
]0.00 K 2
U1
The table is symmetric %or negative numbers. !otice the cluster o% representable numbers around 6ero. There
are only 1@ di%%erent numbers+ but 2
@
V 32 di%%erent representations.
")ercise& a %loating#point number system
Consider %loating#point numbers represented in a >#bit E5ordE as %ollo5s& The %our bits b b
2
b
1
b
0
represent a
signed mantissa+ the t5o bits e e
0
a signed e)ponent to the base 2. "very number has the %orm )Vb b
2
b
1
b
0
K2
ee0
.
Algorithms and Data Structures 110 A ,lobal Te)t
13. Reals
Both the e)ponent and the mantissa are integers represented in 3%s complement form. This means that the integer
values U2..1 are assigned to the %our di%%erent representations e e
0
as sho5n&
v e e
0
0 0 0
1 0 1
U2 1 0
U1 1 1
1. Complete the %ollo5ing table o% the values o% the mantissa and their representation+ and 5rite do5n a
%ormula to compute v %rom b b
2
b
1
b
0
.
v b b
2
b
1
b
0
0 0 0 0 0
1 0 0 0 1
[
7 0 1 1 1
UD 1 0 0 0
[
U1 1 1 1 1
2. Ao5 many di%%erent number representations are there in this %loating#point system?
3. Ao5 many di%%erent numbers are there in this system? (ra5 all o% them on an a)is+ each number 5ith all its
representations.
Cn a byte#oriented machine+ %loating#point numbers are o%ten represented by < bytes V32 bits& 2< bits %or the
signed mantissa+ D bits %or the signed e)ponent. The mantissa m is o%ten interpreted as a %raction 0 ` m e 1+ 5hose
precision is bounded by 23 bitsL the D#bit e)ponent permits scaling 5ithin the range
2
U12D
` 2
e
` 2
127
. Because 32# and ><#bit %loating#point number systems are so common+ o%ten coe)isting on the
same hard5are+ these number systems are o%ten identi%ied 5ith Esingle precisionE and Edouble precisionE+
respectively. $n recent years an $""" standard %ormat %or#single precision %loating#point numbers has emerged+
along 5ith standards %or higher precisions& double+ single e)tended+ and double e)tended.
111
This book is licensed under a Creative Commons Attribution 3.0 License
The %ollo5ing e)ample sho5s the representation o% the number
]1.011110 [ 0 K 2
U@<
in the $""" %ormat&
Some dangers
/loating#point computation is %raught 5ith problems that are hard to analy6e and control. *ne)pected results
abound+ as the %ollo5ing e)amples sho5. The %irst t5o use a binary %loating#point number system 5ith a signed 2#
bit mantissa and a signed 1#bit e)ponent. 8epresentable numbers lie in the range
U0.11 K 2
]1
` 6 ` ]0.11 K 2
]1
.
")ample& y ] ) V y and ) f 0
$t su%%ices to choose \)\ small as compared to \y\L %or e)ample+
) V 0.01 K 2
U1
+ y V 0.10 K 2
]1
.
The addition %orces the mantissa o% ) to be shi%ted to the right until the e)ponents are e0ual Hi.e. ) is represented
as 0.0001K2
]1
I. "ven i% the sum is computed correctly as 0.1001 K2
]1
in an accumulator o% double length+ storing the
result in memory 5ill %orce rounding& ) ] yV0.10K2
]1
Vy.
")ample& Addition is not associative& H) ] yI ] 6 f ) ] Hy ] 6I
The %ollo5ing values %or )+ y+ and 6 assign di%%erent values to the le%t and right sides.
Le%t side& H0.10 K 2
]1
] 0.10 K 2
U1
I ] 0.10 K 2
U1
V 0.10 K 2
]1
8ight side& 0.10 K 2
]1
] H0.10 K 2
U1
] 0.10 K 2
U1
I V 0.11 K 2
]1
A use%ul rule o% thumb helps prevent the loss o% signi%icant digits& Add the small numbers be%ore adding the large
ones.
,xample) &&x 3 y(
1
4 x
1
4 1xy( 5 y
1
6 17
Let;s evaluate this e)pression %or large \)\ and small \y\ in a %loating#point number system 5ith %ive decimal
digits.
) V 100.00+ y V .01000
) ] y V 100.01
H) ] yI
2
V 10002.0001+ rounded to %ive digits yields 10002.
)
2
V 10000.
H) ] yI
2
U )
2
V 2.???? H%our digits have been lostMI
2)y V 2.0000
H) ] yI
2
U )
2
U 2)y V 2.???? U 2.0000 V 0.?????
!o5 %ive digits have been lost+ and the result is meaningless.
,xample) numerical instability
8ecurrence relations %or se0uences o% numbers are prone to the phenomenon o% numerical instabilit. Consider
the se0uence
)
0
V 1.0+ )
1
V 0.@+ )
n]1 V 2.@ K )
n
U )
nU1
.
Algorithms and Data Structures 112 A ,lobal Te)t
13. Reals
:e %irst solve this linear recurrence relation in closed %orm by trying )
i
Vr
i
%or rf0. This leads to r
n]1
V 2.@ K r
n
U
r
nU1
+ and to the 0uadratic e0uation 0 V r
2
U 2.@ K r ] 1+ 5ith the t5o solutions r V 2 and r V 0.@.
The general solution o% the recurrence relation is a linear combination&
)
i V a K 2
i
] b K 2
Ui
.
The starting values )
0
V 1.0 and )
1
V 0.@ determine the coe%%icients aV0 and bV1+ and thus the se0uence is given
e)actly as )
i
V 2
Ui
. $% the se0uence )
i
V 2
Ui
is computed by the recurrence relation above in a %loating#point number
system 5ith one decimal digit+ the %ollo5ing may happen&
)2 V 2.@ K 0.@ U 1 V0.2 Hrounding the e)act value 0.2@I+
)3 V 2.@ K 0.2 U 0.@ V0 Hrepresented e)actly 5ith one decimal digitI+
)< V 2.@ K 0 U 0.2 VU0.2 Hrepresented e)actly 5ith one decimal digitI+
)@ V 2.@ K HU0.2IU0 VU0.@ represented e)actly 5ith one decimal digitI+
)> V 2.@ K HU0.@IUHU0.2I V U1.0@ He)actI V U1.0 HroundedI+
)7 V 2.@ K HU1I U HU0.@I V U2.0 Hrepresented e)actly 5ith one decimal digitI+
)D V 2.@ K HU2IUHU1I V U<.0Hrepresented e)actly 5ith one decimal digitI.
As soon as the %irst rounding error has occurred+ the computed se0uence changes to the alternative solution )
i
V
a K 2
i
+ as can be seen %rom the doubling o% consecutive computed values.
")ercise& %loating#point number systems and calculations
HaI Consider a %loating#point number system 5ith t5o ternary digits t
1
+ t
2
in the mantissa+ and a ternary digit e
in the e)ponent to the base 3. "very number in this system has the %orm ) V .t
1
t
2
K 3
e
+ 5here t
1
+ t
2
+ and e
assume a value chosen amonga0+1+2b. (ra5 a diagram that sho5s all the di%%erent numbers in this system+
and %or each number+ all o% its representations. Ao5 many representations are there? Ao5 many di%%erent
numbers?
HbI 8ecall the series
5hich holds %or \)\ e 1+ %or e)ample+
*se this %ormula to e)press 1Q0.7 as a series o% po5ers.
Horner+s method
A polynomial o% n#th degree He.g. n V 3I is usually represented in the %orm
a
3
K )
3
] a
2
K )
2
] a
1
K ) ] a
0
but is better evaluated in nested %orm+
HHa
3
K ) ] a
2
I K ) ] a
1
I K ) ] a
0
.
113
This book is licensed under a Creative Commons Attribution 3.0 License
The %irst %ormula needs n multiplications o% the %orm a
i
K )
i
and+ in addition+ nU1 multiplications to compute the
po5ers o% ). The second %ormula needs only n multiplications in total& The po5ers o% ) are obtained %or %ree as a
side e%%ect o% the coe%%icient multiplications.
The %ollo5ing procedure assumes that the Hn]1I coe%%icients a
i
are stored in a su%%iciently large array a o% type
;coe%%;&
type coeff . arrayF0 .. mG of real&
function horner*var a- coeff& n- integer& x- real+- real&
var i- integer& h- real&
egin
h -. aFnG&
for i -. n 0 " downto 0 do h -. h 1 x 9 aFiG&
return*h+
end&
7isection
Bisection is an iterative method %or solving e0uations o% the %orm %H)I V 0. Assuming that the %unction % & 8 8
is continuous in the interval Na+ bO and that %HaI K %HbI e 0+ a root o% the e0uation %H)I V 0 Ha 6ero o% %I must lie in the
interval Na+ bO H")hibit 13.1I. Let m be the midpoint o% this interval. $% %HmI V 0+ m is a root. $% %HmI K %HaI e 0+ a root
must be contained in Na+ mO+ and 5e proceed 5ith this subintervalL i% %HmI K %HbI e 0+ 5e proceed 5ith Nm+ bO. Thus at
each iteration the interval of uncertaint that must contain a root is hal% the si6e o% the interval produced in the
previous iteration. :e iterate until the interval is smaller than the tolerance 5ithin 5hich the root must be
determined.
")hibit 13.1& As in binary search+ bisection e)cludes hal% o% the interval
under consideration at every step.
function isect*function f- real& a, - real+- real&
const epsilon . "0
0'
&
var m- real& faneg- oolean&
egin
faneg -. f*a+ S 0.0&
repeat
m -. *a 9 + P $.0&
if *f*m+ S 0.0+ . faneg then a -. m else -. m
until Wa 0 W S epsilon&
return*m+
Algorithms and Data Structures 11< A ,lobal Te)t
13. Reals
end&
A se,uence )
1
+ )
2
+ )
3
+[ converging to ) converges linearl i% there e)ist a constant c and an inde) i
0
such that %or
all $ d i
0
& \)
i]1
U )\ ` c K \)
i
U )\. An algorithm is said to converge linearly i% the se0uence o% appro)imations
constructed by this algorithm converges linearly. $n a linearly convergent algorithm each iteration adds a constant
number o% signi%icant bits. /or e)ample+ each iteration o% bisection halves the interval o% uncertainty in each
iteration Hi.e. adds one bit o% precision to the resultI. Thus bisection converges linearly 5ith c V 0.@. A se0uence )
1
+
)
2
+ )
3
+[ converges ,uadraticall i% there e)ist a constant c and an inde) i
0
such that %or all i d i
0
& \)
i]1
U )\ ` c K\)
i
U
)\
2
.
Ne*ton+s method for computing the s1uare root
!e5ton;s method %or solving e0uations o% the %orm %H)I V 0 is an e)ample o% an algorithm 5ith 0uadratic
convergence. Let %& 8 8 be a continuous and di%%erentiable %unction. An appro)imation )
i]1
is obtained %rom )
i
by
appro)imating %H)I in the neighborhood o% )
i
by its tangent at the point H)
i
+ %H)
i
II+ and computing the intersection o%
this tangent 5ith the )#a)is H")hibit 13.2I. Aence
x
i
x
i+1
f(x )
i
x
")hibit 13.2& !e5ton;s iteration appro)imates a curve locally by a tangent.
!e5ton;s method is not guaranteed to converge HE*ercise: construct countere)amplesI+ but 5hen it converges+ it
does so 0uadratically and there%ore very %ast+ since each iteration doubles the number o% signi%icant bits.
To compute the s0uare root ) V ma o% a real number a d 0 5e consider the %unction %H)I V )
2
U a and solve the
e0uation )
2
U a V 0. :ith %;H)IV 2 K ) 5e obtain the iteration %ormula&
The %ormula that relates )
i
and )
i]1
can be trans%ormed into an analogous %ormula that determines the
propagation o% the relative error&
11@
This book is licensed under a Creative Commons Attribution 3.0 License
.ince
5e obtain %or the relative error&
*sing
5e get a recurrence relation %or the relative error&
$% 5e start 5ith )
0
d 0+ it %ollo5s that 1]8
0
d 0. Aence 5e obtain
8
1
d 8
2
d 8
3
d [ d 0.
As soon as 8
i
becomes small Hi.e. 8
i
n 1I+ 5e have 1 ] 8
i
o 1+ and 5e obtain
8i]1 o o.@ K 8i
2
!e5ton;s method converges 0uadratically as soon as )
i
is close enough to the true solution. :ith a bad initial
guess 8
i
p 1 5e have+ on the other hand+ 1 ] 8
i
o 8
i
+ and 5e obtain 8
i]1
o 0.@ K 8
i
Hi.e. the computation appears to
converge linearly until 8
i
n 1 and proper 0uadratic convergence startsI.
Thus it is highly desirable to start 5ith a good initial appro)imation )
0
and get 0uadratic convergence right %rom
the beginning. :e assume normali6ed binary %loating#point numbers Hi.e. a V m K 2
e
5ith 0.@ ` m e1I. A good
appro)imation o% is obtained by choosing any mantissa c 5ith 0.@ ` c e 1 and halving the e)ponent&
$n order to construct this initial appro)imation )
0
+ the programmer needs read and 5rite access not only to a
Ereal numberE but also to its components+ the mantissa and e)ponent+ %or e)ample+ by procedures such as
procedure mantissa*L- real+- integer&
procedure exponent*L- real+- integer&
procedure uildreal*mant, exp- integer+- real&
Today;s programming languages o%ten lack such %acilities+ and the programmer is %orced to use backdoor tricks
to construct a good initial appro)imation. $% )
0
can be constructed by halving the e)ponent+ 5e obtain the %ollo5ing
upper bounds %or the relative error&
8
1
e 2
U2
+ 8
2
e 2
U@
+ 8
3
e 2
U11
+ 8
<
e 2
U23
+ 8
@
e 2
U<7
+8
>
e 2
U9@
.
Algorithms and Data Structures 11> A ,lobal Te)t
13. Reals
$t is remarkable that %our iterations su%%ice to compute an e)act s0uare root %or 32#bit %loating#point numbers+
5here 23 bits are used %or the mantissa+ one bit %or the sign and eight bits %or the e)ponent+ and that si) iterations
5ill do %or a Enumber cruncherE 5ith a 5ord length o% >< bits. The starting value )
0
can be %urther optimi6ed by
choosing c care%ully. $t can be sho5n that the optimal value o% c %or computing the s0uare root o% a real number is c
V 1Q2 o 0.707.
")ercise& s0uare root
Consider a %loating#point number system 5ith t5o decimal digits in the mantissa& "very number has the %orm )
V l .d
1
d
2
K 10
le
.
HaI Ao5 many di%%erent number representations are there in this system?
HbI Ao5 many di%%erent numbers are there in this system? .ho5 your reasoning.
HcI Compute m@0 K 10
2
in this number system using !e5ton;s method 5ith a starting value )
0
V 10. .ho5 every
step o% the calculation. 8ound the result o% any operation to t5o digits immediately.
.olution
HaI A number representation contains t5o sign bits and three decimal digits+ hence there are 22 K 103 V <000
distinct number representations in this system.
HbI There are three sources o% redundancy&
1. 'ultiple representations o% 6ero
2. ")ponent ]0 e0uals e)ponent U0
3. .hi%ted mantissa& l.d0 K 10 leVl.0d K 10 le ] 1
A detailed count reveals that there are 3<39 di%%erent numbers.
4ero has 2
2
K10 V <0 representations+ all o% the %orm l.00K10
le
+ 5ith t5o sign bits and one decimal digit e to be
%reely chosen. There%ore+ r
1
V 39 must be subtracted %rom <000.
$% e V 0+ then l.d
1
d
2
K 10
]0
Vl.d
1
d
2
K 10
U0
. :e assume %urthermore that d
1
d
2
f 00. The case d
1
d
2
V 00 has been
covered above. Then there are 2 K 99 such pairs. There%ore+ r
2
V 19D must be subtracted %rom <000.
$% d
2
V 0+ then l.d
1
0 K 10
le
V l.0d
1
K 10
le]1
. The case d
1
V 0 has been treated above. There%ore+ 5e assume that d
1
f 0. .ince le can assume the 1D di%%erent values U9+ UD+ [ + U1+ ]0+ ]1+ [ ]D+ there are 2 K 9 K 1D such pairs.
There%ore+ r
3
V 32< must be subtracted %rom <000.
There are <000 U r
1
U r
2
U r
3
V 3<39 di%%erent numbers in this system.
HcI Computing !e5ton;s s0uare root algorithm&
)
0
V 10
)
1
V .@0 K H10 ] @0Q10I V .@0 K H10 ] @I V .@0 K 1@ V 7.@
)
2
V .@0 K H7.@ ] @0Q7.@I V .@0 K H7.@ ] >.>I V .@0 K 1< V 7
)
3
V .@0 K V .@0 K H7 ] @0Q7I V H7 ] 7.1I V .@0 K 1< V 7
117
This book is licensed under a Creative Commons Attribution 3.0 License
")ercises
1. :rite up all the distinct numbers in the %loating#point system 5ith number representations o% the %orm
6V0.b
1
b
2
K 2
e1e2
+ 5here b
1
+ b
2
and e
1
+ e
2
may take the values 0 and 1+ and mantissa and e)ponent are
represented in 2;s complement notation.
2. 2rovide simple numerical e)amples to illustrate %loating#point arithmetic violations o% mathematical
identities.
Algorithms and Data Structures 11D A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
%2& Straight lines and circles
Learning ob3ectives&
intersection o% t5o line segments
degenerate con%igurations
clipping
digiti6ed lines and circles
Bresenham;s algorithms
braiding straight lines
2oints are the simplest geometric ob3ectsL straight lines and line segments come ne)t. Together+ they make up
the lion;s share o% all primitive ob3ects used in t5o#dimensional geometric computation He.g. in computer graphicsI.
*sing these t5o primitives only+ 5e can appro)imate any curve and dra5 any picture that can be mapped onto a
discrete raster. $% 5e do so+ most 0ueries about comple) %igures get reduced to basic 0ueries about points and line
segments+ such as& is a given point to the le%t+ to the right+ or on a given line? (o t5o given line segments intersect?
As simple as these 0uestions appear to be+ they must be handled e%%iciently and care%ully. "%%iciently because these
basic primitives o% geometric computations are likely to be e)ecuted millions o% times in a single program run.
Care%ully because the ubi0uitous phenomenon o% degenerate configurations easily traps the un5ary programmer
into over%lo5 or meaningless results.
"ntersection
The problem o% deciding 5hether t5o line segments intersect is une)pectedly tricky+ as it re0uires a
consideration o% three distinct nondegenerate cases+ as 5ell as hal% a do6en degenerate ones. .tarting 5ith
degenerate ob3ects+ 5e have cases 5here one or both o% the line segments degenerate into points. The code belo5
assumes that line segments o% length 6ero have been eliminated. :e must also consider nondegenerate ob3ects in
degenerate con%igurations+ as illustrated in ")hibit 1<.1. Line segments A and B intersect HstrictlyI. C and (+ and "
and /+ do not intersectL the intersection point o% the in%initely e)tended lines lies on C in the %irst case+ but lies
neither on " nor on / in the second case. The ne)t three cases are degenerate& , and A intersect barely Hi.e. in an
endpointIL $ and overlap Hi.e. they intersect in in%initely many pointsIL - and L do not intersect. Careless
evaluation o% these last t5o cases is likely to generate over%lo5.
")hibit 1<.1& Cases to be distinguished %or the segment intersection problem.
Computing the intersection point o% the in%initely e)tended lines is a naive approach to this decision problem
that leads to a three#step process&
Algorithms and Data Structures 119 A ,lobal Te)t
14. $traight lines and circles
1. Check 5hether the t5o line segments are parallel Ha necessary precaution be%ore attempting to compute the
intersection pointI. $% so+ 5e have a degenerate con%iguration that leads to one o% three special cases& not
collinear+ collinear nonoverlapping+ collinear overlapping
2. Compute the intersection point o% the e)tended lines Hthis step is still sub3ect to numerical problems %or
lines that are almost parallelI.
3. Check 5hether this intersection point lies on both line segments.
$% all 5e 5ant is a yesQno ans5er to the intersection 0uestion+ 5e can save the e%%ort o% computing the
intersection point and obtain a simpler and more robust procedure based on the %ollo5ing idea& t5o line segments
intersect strictly i%% the t5o endpoints o% each line segment lie on opposite sides o% the in%initely e)tended line o% the
other segment.
Let L be a line given by the e0uation hH)+ yI V a K ) ] b K y ] c V 0+ 5here the coe%%icients have been normali6ed
such that a
2
] b
2
V 1. /or a line L given in this Aessean normal %orm+ and %or any point p V H)+ yI+ the %unction h
evaluated at p yields the signed distance bet5een p and L& hHpI d 0 i% p lies on one side o% L+ hHpI e 0 i% p lies on the
other side+ and hHpI V 0 i% p lies on L. A line segment is usually given by its endpoints H)1+ y1I and H)2+ y2I+ and the
Aessean normal %orm o% the in%initely e)tended line L that passes through H)1+ y1I and H)2+ y2I is
5here
is the length o% the line segment+ and hH)+ yI is the distance o% p V H)+ yI %rom L. T5o points p and 0 lie on opposite
sides o% L i%% hHpI K hH0I e 0 H")hibit 1<.2I. hHpI V 0 or hH0I V 0 signals a degenerate con%iguration. Among these+
hHpI V 0 and hH0I V 0 i%% the segment Hp+ 0I is collinear 5ith L.
")hibit 1<.2& .egment s+ its e)tended line L+ and distance to points p+ 0 as computed by %unction h.
type point . record x, y- real end&
segment . record p
"
, p
$
- point end&
function d*s- segment& p- point+- real&
{ computes h(p for the line 0 determined by s }
var dx, dy, 8
"$
- real&
120
This book is licensed under a Creative Commons Attribution 3.0 License
egin
dx -. s.p
$
.x 0 s.p
"
.x& dy -. s.p
$
.y 0 s.p
"
.y&
8
"$
-. sArt*dx 1 dx 9 dy 1 dy+&
return**dy 1 *p.x 0 s.p
"
.x+ 0 dx 1 *p.y 0 s.p
"
.y++ P 8
"$
+
end&
To optimi6e the intersection %unction+ 5e recall the assumption L12 d 0 and notice that 5e do not need the actual
distance+ only its sign. Thus the %unction d used belo5 avoids computing L12. The %unction ;intersect; begins by
checking 5hether the t5o line segments are collinear+ and i% so+ tests them %or overlap by intersecting the intervals
obtained by pro3ecting the line segments onto the )#a)is Hor onto the y#a)is+ i% the segments are verticalI. T5o
intervals Na+ bO and Nc+ dO intersect i%% minHa+ bI ` ma)Hc+ dI and minHc+ dI ` ma)Ha+ bI. This condition could be
simpli%ied under the assumption that the representation o% segments and intervals is ordered E%rom le%t to rightE
Hi.e. %or interval Na+ bO 5e have a ` bI. :e do not assume this+ as line segments o%ten have a natural direction and
cannot be Eturned aroundE.
function d*s- segment& p- point+- real&
egin
return**s.p
$
.y 0 s.p
"
.y+ 1 *p.x 0 s.p
"
.x+ 0 *s.p
$
.x 0 s.p
"
.x+ 1
*p.y 0 s.p
"
.y++
end&
function overlap*a, , c, d- real+- oolean&
egin return**min*a, + J max*c, d++ and *min*c, d+ J max*a, +++
end&
function intersect*s
"
, s
$
- segment+- oolean&
var d
""
, d
"$
, d
$"
, d
$$
- real&
egin
d
""
-. d*s
"
, s
$
.p
"
+& d
"$
-. d*s
"
, s
$
.p
$
+&
if *d
""
. 0+ and *d
"$
. 0+ then { s
&
and s
5
are collinear }
if s
"
.p
"
.x . s
"
.p
$
.x then { vertical }
return*overlap*s
"
.p
"
.y, s
"
.p
$
.y, s
$
.p
"
.y, s
$
.p
$
.y++
else { not vertical }
return*overlap*s
"
.p
"
.x, s
"
.p
$
.x, s
$
.p
"
.x, s
$
.p
$
.x++
else egin { s
&
and s
5
are not collinear }
d
$"
-. d*s
$
, s
"
.p
"
+& d
$$
-. d*s
$
, s
"
.p
$
+&
return**d
""
1 d
"$
J 0+ and *d
$"
1 d
$$
J 0++
end
end&
$n addition to the degeneracy issues 5e have addressed+ there are numerical issues o% near#degeneracy that 5e
only mention. The length L12 is a condition number Hi.e. an indicator o% the computation;s accuracyI. As ")hibit 1<.3
suggests+ it may be numerically impossible to tell on 5hich side o% a short line segment L a distant point p lies.
Algorithms and Data Structures 121 A ,lobal Te)t
14. $traight lines and circles
")hibit 1<.3& A point;s distance %rom a segment ampli%ies the error o% the E5hich sideE computation.
Conclusion: A geometric algorithm must check %or degenerate con%igurations e)plicitlyYthe code that handles
con%igurations Ein general positionE 5ill not handle degeneracies.
lipping
The 5idespread use o% 5indo5s on graphic screens makes clipping one o% the most %re0uently e)ecuted
operations& ,iven a rectangular 5indo5 and a con%iguration in the plane+ dra5 that part o% the con%iguration 5hich
lies 5ithin the 5indo5. 'ost con%igurations consist o% line segments+ so 5e sho5 ho5 to clip a line segment given
by its endpoints H)1+ y1I and H)2+ y2I into a 5indo5 given by its %our corners 5ith coordinates ale%t+ rightb atop+
bottomb.
The position o% a point in relation to the 5indo5 is described by %our boolean variables& ll Hto the le%t o% the le%t
borderI+ rr Hto the right o% the right borderI+ bb Hbelo5 the lo5er borderI+ tt Habove the upper borderI&
type wcode . set of *ll, rr, , tt+&
A point inside the 5indo5 has the code ll V rr V bb V tt V %alse+ abbreviated 0000 H")hibit 1<.<I.
")hibit 1<.<& The clipping 5indo5 partitions the plane into nine regions.
The procedure ;classi%y; determines the position o% a point in relation to the 5indo5&
procedure classify*x, y- real& var c- wcode+&
egin
c -. Y& { empty set }
if x S left then c -. =ll> elsif x K right then c -. =rr>&
if y S ottom then c -. c => elsif y K top then c -. c
=tt>
end&
The procedure ;clip; computes the endpoints o% the clipped line segment and calls the procedure ;sho5line; to
dra5 it&
procedure clip*x
"
, y
"
, x
$
, y
$
- real+&
122
p
L
This book is licensed under a Creative Commons Attribution 3.0 License
var c, c
"
, c
$
- wcode& x, y- real& outside- oolean&
egin { clip }
classify*x
"
, y
"
, c
"
+& classify*x
$
, y
$
, c
$
+& outside -. false&
while *c
"
T Y+ or *c
$
T Y+ do
if c
"
c
$
T Y then
{ line segment lies completely outside the window }
{ c
"
-. Y& c
$
-. Y& outside -. true }
else egin
c -. c
"
&
if c . Y then c -. c
$
&
if ll c then { segment intersects left }
{ y -. y
"
9 *y
$
0 y
"
+ 1 *left 0 x
"
+ P *x
$
0 x
"
+& x -. left }
elsif rr c then { segment intersects right }
{ y -. y
"
9 *y
$
0 y
"
+ 1 *right 0 x
"
+ P *x
$
0 x
"
+& x -. right }
elsif c then { segment intersects bottom }
{ x -. x
"
9 *x
$
0 x
"
+ 1 *ottom 0 y
"
+ P *y
$
0 y
"
+& y -. ottom }
elsif tt c then { segment intersects top }
{ x -. x
"
9 *x
$
0 x
"
+ 1 *top 0 y
"
+ P *y
$
0 y
"
+& y -. top }&
if c . c
"
then { x
"
-. x& y
"
-. y& classify*x, y, c
"
+ }
else { x
$
-. x& y
$
-. y& classify*x, y, c
$
+ }
end&
if not outside then showline*x
"
, y
"
, x
$
, y
$
+
end& { clip }
Dra*ing digiti,ed lines
A raster graphics screen is an integer grid o% pi)els+ each o% 5hich can be turned on or o%%. "uclidean geometry
does not apply directly to such a discreti6ed plane. Any designer using a CA( system 5ill pre%er "uclidean geometry
to a discrete geometry as a model o% the 5orld. The problem o% ho5 to appro)imate the "uclidean plane by an
integer grid turns out to be a hard 0uestion& Ao5 do 5e map "uclidean geometry onto a digiti6ed space in such a
5ay as to preserve the rich structure o% geometry as much as possible? Let;s begin 5ith simple instances& Ao5 do
you map a straight line onto an integer grid+ and ho5 do you dra5 it e%%iciently? ")hibit 1<.@ sho5s reasonable
e)amples.
")hibit 1<.@& (igiti6ed lines look like staircases.
Consider the slope m V Hy2
U y1I Q H)2
U )1I o% a segment 5ith endpoints p1 V H)1+ y1I and p2
V H)2+ y2I. $% \m\ ` 1 5e
5ant one pi)el blackened on each ) coordinateL i% \m\ Z 1+ one pi)el on each y coordinateL these t5o re0uirements
are consistent %or diagonals 5ith \m\ V 1. Consider the case \m\ ` 1. A unit step in ) takes us %rom point H)+ yI on the
line to H) ] 1+ y ] mI. .o %or each ) bet5een )1 and )2 5e paint the pi)el H)+ yI closest to the mathematical line
according to the %ormula y V roundHy1
] m K H) U )1II. /or the case \m\ d 1+ 5e reverse the roles o% ) and y+ taking a
unit step in y and incrementing ) by 1Qm. The %ollo5ing procedure dra5s line segments 5ith \m\ ` 1 using unit
steps in ).
procedure line*x
"
, y
"
, x
$
, y
$
- integer+&
var x, sx- integer& m- real&
Algorithms and Data Structures 123 A ,lobal Te)t
14. $traight lines and circles
egin
6aint6ixel*x
"
, y
"
+&
if x
"
T x
$
then egin
x -. x
"
& sx -. sgn*x
$
0 x
"
+& m -. *y
$
0 y
"
+ P *x
$
0 x
"
+&
while x T x
$
do
{ x -. x 9 sx& 6aint6ixel*x, round*y
"
9 m 1 *x 0 x
"
+++ }
end
end&
This straight%or5ard implementation has a number o% disadvantages. /irst+ it uses %loating#point arithmetic to
compute integer coordinates o% pi)els+ a costly process. $n addition+ rounding errors may prevent the line %rom
being reversible& reversibilit means that 5e paint the same pi)els+ in reverse order+ i% 5e call the procedure 5ith
the t5o endpoints interchanged. 8eversibility is desirable to avoid the %ollo5ing blemishes& that a line painted
t5ice+ %rom both ends+ looks thicker than other linesL 5orse yet+ that painting a line %rom one end and erasing it
%rom the other leaves spots on the screen. A 5eaker constraint+ 5hich is only concerned 5ith the result and not the
process o% painting+ is easy to achieve but is less use%ul.
Weak reversibilit is most easily achieved by ordering the points p1 and p2 le)icographically by ) and y
coordinates+ dra5ing every line %rom le%t to right+ and vertical lines %rom bottom to top. This solution is inade0uate
%or animation+ 5here the direction o% dra5ing is important+ and the se0uence in 5hich the pi)els are painted is
determined by the applicationYdra5ing the tra3ectory o% a %alling apple %rom the bottom up 5ill not do. Thus
interactive graphics needs the stronger constraint.
"%%icient algorithms+ such as Bresenham;s NBre >@O+ avoid %loating#point arithmetic and e)pensive
multiplications through incremental computation& .tarting 5ith the current point p1+ a ne)t point is computed as a
%unction o% the current point and o% the line segment parameters. $t turns out that only a %e5 additions+ shi%ts+ and
comparisons are re0uired. $n the %ollo5ing 5e assume that the slope m o% the line satis%ies \m\ ` 1. Let
) V )2 U )1+ s) V signH)I+ y V y
2
U y
1
+ sy V signHyI.
Assume that the pi)el H)+ yI is the last that has been determined to be the closest to the actual line+ and 5e no5
5ant to decide 5hether the ne)t pi)el to be set is H) ] s)+ yI or H) ] s)+ y ] syI. ")hibit 1<.> depicts the case s) V 1
and sy V 1.
")hibit 1<.>& At the ne)t coordinate ) ] s)+ 5e identi%y and paint the pi)el closest to the line.
Let t denote the absolute value o% the di%%erence bet5een y and the point 5ith abscissa ) ] s) on the actual line.
Then t is given by
12<
This book is licensed under a Creative Commons Attribution 3.0 License
The value o% t determines the pi)el to be dra5n&
As the %ollo5ing e)ample sho5s+ reversibility is not an automatic conse0uence o% the geometric %act that t5o
points determine a uni0ue line+ regardless o% correct rounding or the order in 5hich the t5o endpoints are
presented. A problem arises 5hen t5o grid points are e0ually close to the straight line H")hibit 1<.7I.
")hibit 1<.7& Breaking the tie among e0uidistant grid points.
$% the tie is not broken in a consistent manner He.g. by al5ays taking the upper grid pointI+ the resulting
algorithm %ails to be reversible&
All the variables introduced in this problem range over the integers+ but the ratio
( y)
( x)
appears to introduce
rational e)pressions. This is easily remedied by multiplying everything 5ith ). :e de%ine the decision variable d as
d V \)\ K H2 K t U 1I V s) K ) K H2 K t U 1I. HI
Let di denote the decision variable 5hich determines the pi)el H)
HiI
+ y
HiI
I to be dra5n in the i#th step. .ubstituting t
and inserting ) V )
HiU1I
and y V y
HiU1I
in HI 5e obtain
di V s) K sy K H2K) K y1 ] 2 K H)
HiU1I
] s) U )1I K y U 2K) K y
HiU1I
U ) K syI
and
di]1 V s) K sy K H2Kx K y
1
] 2 K H)
HiI
] s) U )
1
I K y U 2K) K y
HiI
U ) K syI.
.ubtracting di %rom di]1+ 5e get
di]1 U di V s) K sy K H2 K H)
HiI
U )
HiU1I
I K y U 2 K ) K Hy
HiI
U y
HiU1I
II.
.ince )
HiI
U )
HiU1I
V s)+ 5e obtain
di]1 V di ] 2 K sy K y U 2 K s) K ) K sy K Hy
HiI
U y
HiU1I
I.
Algorithms and Data Structures 12@ A ,lobal Te)t
14. $traight lines and circles
$% di e 0+ or di V 0 and sy V U1+ then y
HiI
V y
HiU1I
+ and there%ore
di]1 V di ] 2 K \y\.
$% di d 0+ or di V 0 and sy V 1+ then y
HiI
V y
HiU1I
] sy+ and there%ore
d
i]1
V d
i
] 2 K \y\ U 2 K \)\.
This iterative computation o% di]1 %rom the previous di lets us select the pi)el to be dra5n. The initial starting
value %or d1 is %ound by evaluating the %ormula %or di+ kno5ing that H)
H0I
+ y
H0I
I V H)1+ y1I. Then 5e obtain
d1 V 2 K \y\ U \)\.
The arithmetic needed to evaluate these %ormulas is minimal& addition+ subtraction and le%t shi%t Hmultiplication
by 2I. The %ollo5ing procedure implements this algorithmL it assumes that the slope o% the line is bet5een U1 and 1.
procedure ;resenham8ine*x
"
, y
"
, x
$
, y
$
- integer+&
var dx, dy, sx, sy, d, x, y- integer&
egin
dx -. Wx
$
0 x
"
W& sx -. sgn*x
$
0 x
"
+&
dy -. Wy
$
0 y
"
W& sy -. sgn*y
$
0 y
"
+&
d -. $ 1 dy 0 dx& x -. x
"
& y -. y
"
&
6aint6ixel*x, y+&
while x T x
$
do egin
if *d K 0+ or **d . 0+ and *sy . "++ then { y -. y 9 sy&0
$1dx}&
x -. x 9 sx& d -. d 9 $ 1 dy&
6aint6ixel*x, y+
end
end&
The riddle of the braiding straight lines
T5o straight lines in a plane intersect in at most one point+ right? $mportant geometric algorithms rest on this
5ell#kno5n theorem o% "uclidean geometry and 5ould have to be ree)amined i% it 5ere untrue. $s this theorem true
%or computer lines+ that is+ %or data ob3ects that represent and appro)imate straight lines to be processed by a
program? 2erhaps yes+ but mostly no.
Xes. $t is possible+ o% course+ to program geometric problems in such a 5ay that every pair o% straight lines has at
most+ or e)actly+ one intersection point. This is most readily achieved through symbolic computation. /or e)ample+
i% the intersection o% L1 and L2 is denoted by an e)pression ;$ntersectHL1+ L2I; that is never evaluated but simply
combined 5ith other e)pressions to represent a geometric construction+ 5e are %ree to postulate that ;$ntersectHL1+
L2I; is a point.
!o. /or reasons o% e%%iciency+ most computer applications o% geometry re0uire the immediate numerical
evaluation o% every geometric operation. This calculation is done in a discrete+ %inite number system in 5hich case
the theorem need not be true. This %act is most easily seen i% 5e 5ork 5ith a discrete plane o% pi)els+ and 5e
represent a straight line by the set o% all pi)els touched by an ideal mathematical line. ")hibit 1<.D sho5s three
digiti6ed straight lines in such a s0uare grid model o% plane geometry. T5o o% the lines intersect in a common
interval o% three pi)els+ 5hereas t5o others have no pi)el in common+ even though they obviously intersect.
12>
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 1<.D& T5o intersecting lines may share none+ one+ or more pi)els.
:ith %loating#point arithmetic the situation is more complicatedL but the %act remains that the "uclidean plane
is replaced by a discrete set o% points embedded in the planeYall those points 5hose coordinates are representable
in the particular number system being used. ")perience 5ith numerical computation+ and the ha6ards o% rounding
errors+ suggests that the 0uestion E$n ho5 many points can t5o straight lines intersect?E admits the %ollo5ing
ans5ers&
There is no intersectionYthe mathematically correct intersection cannot be represented in the number
system.
A set o% points that lie close to each other& %or e)ample+ an interval.
Cver%lo5 aborts the calculation be%ore a result is computed+ even i% the correct result is representable in the
number system being used.
")ercise& t5o lines intersect in ho5 many points?
Construct e)amples to illustrate these phenomena 5hen using %loating#point arithmetic. Choose a suitable
system , o% %loating#point numbers and t5o distinct straight lines
ai K ) ] bi K y ] ci V 0 5ith ai+ bi+ ci ,+ iV1+ 2+
such that+ 5hen all operations are per%ormed in ,&
HaI There is no point 5hose coordinates )+ y , satis%y both linear e0uations.
HbI There are many points 5hose coordinates )+ y , satis%y both linear e0uations.
HcI There is e)actly one point 5hose coordinates )+ y , satis%y both linear e0uations+ but the straight%or5ard
computation o% ) and y leads to over%lo5.
HdI As a conse0uence o% HaI it %ollo5s that the de%inition Et5o lines intersect they share a common pointE is
inappropriate %or numerical computation. /ormulate a numerically meaning%ul de%inition o% the statement
Et5o line segments intersectE.
")ercise HbI may suggest that the points shared by t5o lines are neighbors. 2ictorially+ i% the slopes o% the t5o
lines are almost identical+ 5e e)pect to see a blurred+ elongated intersection. :e 5ill sho5 that 5orse things may
happen& t5o straight lines may intersect in arbitrarily many points+ and these points are separated by intervals in
5hich the t5o lines alternate in lying on top o% each other. Computer lines may be braidedM To understand this
Algorithms and Data Structures 127 A ,lobal Te)t
14. $traight lines and circles
phenomenon+ 5e need to clari%y some concepts& :hat e)actly is a straight line represented on a computer? :hat is
an intersection?
There is no one ans5er+ there are manyM Consider the analogy o% the mathematical concept o% real numbers+
de%ined by a)ioms. :hen 5e appro)imate real numbers on a computer+ 5e have a choice o% many di%%erent number
systems He.g. various %loating#point number systems+ rational arithmetic 5ith variable precision+ interval
arithmeticI. These systems are typically not de%ined by means o% a)ioms+ but rather in terms o% concrete
representations o% the numbers and algorithms %or e)ecuting the operations on these numbers. .imilarly+ a
computer line 5ill be de%ined in terms o% a concrete representation He.g. t5o points+ a point and a slope+ or a linear
e)pressionI. All 5e obtain depends on the %ormulas 5e use and on the basic arithmetic to operate on these
representations. The notion o% a straight line can be %ormali6ed in many di%%erent 5ays+ and although these are
likely to be mathematically e0uivalent+ they 5ill lead to data ob3ects 5ith di%%erent behavior 5hen evaluated
numerically. 2er%orming an operation consists o% evaluating a %ormula. .ubstituting a %ormula by a mathematically
e0uivalent one may lead to results that are topologically di%%erent+ because e0uivalent %ormulas may e)hibit
di%%erent sensitivities to5ard rounding errors.
Consider a computer that has only integer arithmetic+ i.e. 5e use only the operations ]+ U+ K+ div. Let 4 be the set
o% integers. T5o straight lines gi Hi V 1+ 2I are given by the %ollo5ing e0uations&
ai K ) ] bi K y ] ci V 0 5ith ai+ bi+ ci 4L bi
f 0.
:e consider the problem o% 5hether t5o given straight lines intersect in a given point )0. :e use the %ollo5ing
method& .olve the e0uations %or y Ni. e. y V "1H)I and y V "2H)IO and test 5hether "1H)0I is e0ual to "2H)0I.
$s this method suitable? /irst+ 5e need the %ollo5ing de%initions&
) 4 is a turn %or the pair H"1+ "2I i%%
signH"1H)I U "2H)II f signH"1H) ] 1I U "2H) ] 1II.
An algorithm %or the intersection problem is correct i%% there are at most t5o turns.
The intuitive idea behind this de%inition is the recognition that rounding errors may %orce us to deal 5ith an
intersection interval rather than a single intersection pointL but 5e 5ish to avoid separate intervals. The de%inition
above partitions the )#a)is into at most three dis3oint intervals such that in the le%t interval the %irst line lies above
or belo5 the second line+ in the middle interval the lines EintersectE+ and in the right interval 5e have the
complementary relation o% the le%t one H")hibit 1<.9I.
12D
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 1<.9& (esirable consistency condition %or intersection o% nearly parallel lines.
Consider the straight lines&
3 K ) U @ K y ] <0 V 0 and 2 K ) U 3 K y ] 20 V 0
5hich lead to the evaluation %ormulas
Cur naive approach compares the e)pressions
H3 K ) ] <0I div @ and H2 K ) ] 20I div 3.
*sing the de%initions it is easy to calculate that the turns are
7+ D+ 10+ 11+ 12+ 1<+ 1@+ 22+ 23+ 2@+ 2>+ 27+ 29+ 30.
The straight lines have become step %unctions that intersect many times. They are braided H")hibit 1<.10I.
")hibit 1<.10& Braiding straight lines violate the consistency condition o% ")hibit 1<.9.
")ercise& sho5 that the straight lines
) U 2 K y V 0
k K ) U H2 K k ] 1I K y V 0 %or any integer k d 0
Algorithms and Data Structures 129 A ,lobal Te)t
g
1
> g
1
g
2
=
g
1
g
2
< g
1
g
2
g
2
y
x
2 4 6 8 10 12 14 16 18 20 22 24 26 30 32 34 36 38
9
11
13
15
17
19
21
23
25
27
29
31
y
x
(3x + 40) div 5
(2x + 20) div 3
14. $traight lines and circles
have 2 K k U 1 turns in the %irst 0uadrant.
$s braiding due merely to integer arithmetic? Certainly not& rounding errors also occur in %loating#point
arithmetic+ and 5e can construct even more pathological behavior. As an e)ample+ consider a %loating#point
arithmetic 5ith a t5o#decimal#digit mantissa. :e per%orm the evaluation operation&
and truncate intermediate results immediately to t5o decimal places. Consider the straight lines H")hibit 1<.11I
<.3 K ) U D.3 K y V 0+
1.< K ) U 2.7 K y V 0.
")hibit 1<.11& ")ample to be veri%ied by manual computation.
These e)amples 5ere constructed by intersecting straight lines 5ith almost the same slopeYa numerically ill#
conditioned problem. :hile 5orking 5ith integer arithmetic+ 5e made the mistake o% using the error#prone ;div;
operator. The comparison o% rational e)pressions does not re0uire division.
Let a1
K ) ] b1
K y ] c1
V 0 and a2
K ) ] b2
K y ] c2
V 0 be t5o straight lines. To %ind out 5hether they intersect at )0+
5e have to check 5hether the e0uality
holds. This is e0uivalent to b2
K c1
U b1
K c2
V )0
K Ha2
K b1
U a1
K b2I.
The last %ormula can be evaluated 5ithout error i% su%%iciently large integer arguments are allo5ed. Another 5ay
to evaluate this %ormula 5ithout error is to limit the si6e o% the operands. /or e)ample+ i% ai+ bi+ ci+ and )0 are n#digit
binary numbers+ it su%%ices to be able to represent 3n#digit binary numbers and to compute 5ith n#digit and 2n#
digit binary numbers.
These e)amples demonstrate that programming even a simple geometric problem can cause une)pected
di%%iculties. !umerical computation %orces us to rethink and rede%ine elementary geometric concepts.
130
0.37
0.39
0.73 0.77 0.81 0.85 0.89 0.93
0.41
0.43
0.45
0.47
y
x
1. 4
2. 7
4. 3
8. 3
This book is licensed under a Creative Commons Attribution 3.0 License
Digiti,ed circles
The concepts+ problems and techni0ues 5e have discussed in this chapter are not at all restricted to dealing 5ith
straight linesYthey have their counterparts %or any kind o% digiti6ed spatial ob3ect. .traight lines+ de%ined by linear
%ormulas+ are the simplest nontrivial spatial ob3ects and thus best suited to illustrate problems and solutions. $n this
section 5e sho5 that the incremental dra5ing techni0ue generali6es in a straight%or5ard manner to more comple)
ob3ects such as circles.
The basic parameters that de%ine a circle are the center coordinates H)c+ ycI and the radius r. To simpli%y the
presentation 5e %irst consider a circle 5ith radius r centered around the origin. .uch a circle is given by the
e0uation
)
2
] y
2
V r
2
.
"%%icient algorithms %or dra5ing circles+ such as Bresenham;s NBre 77O+ avoid %loating#point arithmetic and
e)pensive multiplications through incremental computation& A ne5 point is computed depending on the current
point and on the circle parameters. Bresenham;s circle algorithm 5as conceived %or use 5ith pen plotters and
there%ore generates all points on a circle centered at the origin by incrementing all the 5ay around the circle. :e
present a modi%ied version o% his algorithm 5hich takes advantage o% the eight-wa smmetr o% a circle. $% H)+ yI is
a point on the circle+ 5e can easily determine seven other points lying on the circle H ")hibit 1<.12I. :e consider only
the <@^ segment o% the circle sho5n in the %igure by incrementing %rom ) V 0 to ) V y V r Q + and use eight#5ay
symmetry to display points on the entire circle.
")hibit 1<.12& "ight%old symmetry o% the circle.
Assume that the pi)el p V H)+ yI is the last that has been determined to be closest to the actual circle+ and 5e no5
5ant to decide 5hether the ne)t pi)el to be set is p1
V H) ] 1+ yI or p2
V H) ] 1+ y U 1I. .ince 5e restrict ourselves to
the <@^ circle segment sho5n above these pi)els are the only candidates. !o5 de%ine
d; V H) ] 1I
2
] y
2
U r
2
dE V H) ] 1I
2
] Hy U 1I
2
U r
2
5hich are the di%%erences bet5een the s0uared distances %rom the center o% the circle to p1 Hor p2I and to the actual
circle. $% \d;\ ` \dE\+ then p1 is closer Hor e0uidistantI to the actual circleL i% \d;\ d \dE\+ then p2 is closer. :e de%ine the
decision variable d as
d V d; ] dE. HI
:e 5ill sho5 that the rule
$% d ` 0 then select p1 else select p2.
Algorithms and Data Structures 131 A ,lobal Te)t
14. $traight lines and circles
correctly selects the pi)el that is closest to the actual circle. ")hibit 1<.13 sho5s a small part o% the pi)el grid and
illustrates the various possible 5ays NH1I to H@IO ho5 the actual circle may intersect the vertical line at ) ] 1 in
relation to the pi)els p
1
and p
2
.
")hibit 1<.13& /or a given octant o% the circle+ i% pi)el p is lit+ only t5o other pi)els
p
1
and p
2
need be e)amined.
$n cases H1I and H2I p2 lies inside+ p1 inside or on the circle+ and 5e there%ore obtain d; ` 0 and dE e 0. !o5 d e 0+
and applying the rule above 5ill lead to the selection o% p1. .ince \d;\ ` \dE\ this selection is correct. $n case H3I p1
lies outside and p2 inside the circle and 5e there%ore obtain d; d 0 and dE e 0. Applying the rule above 5ill lead to
the selection o% p1 i% d ` 0+ and p2 i% d d 0. This selection is correct since in this case d ` 0 is e0uivalent to \d;\ ` \dE\.
$n cases H<I and H@I p1 lies outside+ p2 outside or on the circle and 5e there%ore obtain d; d 0 and dE Z 0. !o5 d d 0+
and applying the rule above 5ill lead to the selection o% p2. .ince \d;\ d \dE\ this selection is correct.
Let d
i
denote the decision variable that determines the pi)el H)
HiI
+ y
HiI
I to be dra5n in the i#th step. .tarting 5ith
H)
H0I
+ y
H0I
I V H0+ rI 5e obtain
d1 V 3 U 2 K r.
$% di ` 0+ then H)
HiI+
y
HiII
V H)
HiI
] 1+ y
HiU1I
I+ and there%ore
d
i]1
V d
i
] < K )
iU1
] >.
$% di d 0+ then H)
HiI
+ y
HiI
I V H)
HiI
] 1+ y
HiU1I
U 1I+ and there%ore
di]1 V di ] < K H)iU1 U yiU1I ] 10.
This iterative computation o% di]1 %rom the previous di lets us select the correct pi)el to be dra5n in the Hi ] 1I#th
step. The arithmetic needed to evaluate these %ormulas is minimal& addition+ subtraction+ and le%t shi%t
Hmultiplication by <I. The %ollo5ing procedure ;BresenhamCircle; 5hich implements this algorithm dra5s a circle
5ith center H)c+ ycI and radius r. $t uses the procedure ;Circle2oints; to display points on the entire circle. $n the
cases ) V y or r V 1 ;Circle2oints; dra5s each o% %our pi)els t5ice. This causes no problem on a raster display.
procedure ;resenham<ircle*x
c
, y
c
, r- integer+&
procedure <ircle6oints*x, y- integer+&
egin
6aint6ixel*x
c
9 x, y
c
9 y+& 6aint6ixel*x
c
0 x, y
c
9 y+&
6aint6ixel*x
c
9 x, y
c
0 y+& 6aint6ixel*x
c
0 x, y
c
0 y+&
6aint6ixel*x
c
9 y, y
c
9 x+& 6aint6ixel*x
c
0 y, y
c
9 x+&
6aint6ixel*x
c
9 y, y
c
0 x+& 6aint6ixel*x
c
0 y, y
c
0 x+
end&
132
This book is licensed under a Creative Commons Attribution 3.0 License
var x, y, d- integer&
egin
x -. 0& y -. r& d -. % 0 $ 1 r&
while x S y do egin
<ircle6oints*x, y+&
if d S 0 then d -. d 9 # 1 x 9 '
else { d -. d 9 # 1 *x 0 y+ 9 "0& y -. y 0 " }&
x -. x 9 "
end&
if x . y then <ircle6oints*x, y+
end& .i+.;resenham,s algorithm-circle&
")ercises and programming pro3ects
1. (esign and implement an e%%icient geometric primitive 5hich determines 5hether t5o aligned rectangles
Hi.e. rectangles 5ith sides parallel to the coordinate a)esI intersect.
2. (esign and implement a geometric primitive
function inTriangle*t- triangle& p- point+- C&
5hich takes a triangle t given by its three vertices and a point p and returns a ternary value& p is inside t+ p
is on the boundary o% t+ p is outside t.
3. *se the %unctions ;intersect; o% in E$ntersectionE and ;inTriangle; above to program a
function Segment!ntersectsTriangle*s- segment& t- triangle+- C&
to check 5hether segment s and triangle t share common points. ;.egment$ntersectsTriangle; returns a
ternary value& yes+ degenerate+ no. List all distinct cases o% degeneracy that may occur+ and sho5 ho5 your
code handles them.
<. $mplement Bresenham;s incremental algorithms %or dra5ing digiti6ed straight lines and circles.
@. T5o circles H);+ y;+ r;I and H);;+ y;;+ r;;I are given by the coordinates o% their center and their radius. /ind
e%%ective %ormulas %or deciding the three#5ay 0uestion 5hether HaI the circles intersect as lines+ HbI the
circles intersect as disks+ or HcI neither. Avoid the s0uare#root operation 5henever possible.
Algorithms and Data Structures 133 A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
!art "B# omple/ity of
problems and algorithms
/undamental issues o% computation
A success%ul search %or better and better algorithms naturally leads to the 0uestion E$s there a best algorithm?E+
5hereas an unsuccess%ul search leads one to ask apprehensively& E$s there an algorithm Ho% a certain typeI to solve
this problem?E These 0uestions turned out to be di%%icult and %ertile. Aistorically+ the 0uestion about the e*istence
o% an algorithm came %irst+ and led to the concepts o% computability and decidability in the 1930s. The 0uestion
about a EbestE algorithm led to the development o% comple)ity theory in the 19>0s.
The study o% these %undamental issues o% computation re0uires a mathematical arsenal that includes
mathematical logic+ discrete mathematics+ probability theory+ and certain parts o% analysis+ in particular
asymptotics. :e introduce a %e5 o% these topics+ mostly by e)ample+ and illustrate the use o% mathematical
techni0ues o% algorithm analysis on the important problem o% sorting.
Algorithms and Data Structures 13< A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
%4& omputability and
comple/ity
Learning ob3ectives&
algorithm
computability
8$.C& 8educed $nstruction .et Computer
Almost nothing is computable.
The halting problem is undecidable.
comple)ity o% algorithms and problems
.trassen;s matri) multiplication
@odels of computation# the ultimate '"S
Algorithm and computabilit are originally intuitive concepts. They can remain intuitive as long as 5e only 5ant
to sho5 that some speci%ic result can be computed by %ollo5ing a speci%ic algorithm. Almost al5ays an in%ormal
e)planation su%%ices to convince someone 5ith the re0uisite background that a given algorithm computes a
speci%ied result. :e have illustrated this in%ormal approach throughout 2art $$$. "verything changes i% 5e 5ish to
sho5 that a desired result is not computable. The 0uestion arises immediately& E:hat tools are 5e allo5ed to use?E
"verything is computable 5ith the help o% an oracle that kno5s the ans5ers to all 0uestions. The attempt to prove
negative results about the none)istence o% certain algorithms %orces us to agree on a rigorous de%inition o%
algorithm.
The 0uestion E:hat can be computed by an algorithm+ and 5hat cannot?E 5as studied intensively during the
1930s by "mil 2ost H1D97U19@<I+ Alan Turing H1912U19@<I+ Alon6o Church H1903I+ and other logicians. They
de%ined various %ormal models o% computation+ such as production systems+ Turing machines+ and recursive
%unctions+ to capture the intuitive concept o% Ecomputation by the application o% precise rulesE. All these di%%erent
%ormal models o% computation turned out to be e0uivalent. This %act greatly strengthens Church;s thesis that the
intuitive concept o% algorithm is %ormali6ed correctly by any one o% these mathematical systems.
:e 5ill not de%ine any o% these standard models o% computation. They all share the trait that they 5ere designed
to be conceptually simple& their primitive operations are chosen to be as 5eak as possible+ as long as they retain
their property o% being universal computing systems in the sense that they can simulate any computation
per%ormed on any other machine. $t usually comes as a surprise to novices that the set o% primitives o% a universal
computing machine can be so simple as long as these machines possess t5o essential ingredients& unbounded
memor and unbounded time.
'ost simulations o% a po5er%ul computer on a simple one share three characteristics& it is straight%or5ard in
principle+ it involves laborious coding in practice+ and it e)plodes the space and time re0uirements o% a
Algorithms and Data Structures 13@ A ,lobal Te)t
15. 1omputabilit% and comple&it%
computation. The 5eakness o% the primitives+ desirable %rom a theoretical point o% vie5+ has the conse0uence that
as simple an operation as integer addition becomes an e)ercise in programming.
The model o% computation used most o%ten in algorithm analysis is signi%icantly more po5er%ul than a Turing
machine in t5o respects& H1I its memory is not a tape+ but an array+ and H2I in one primitive operation it can deal
5ith numbers o% arbitrary si6e. This model o% computation is called random access machine+ abbreviated as 8A'.
A 8A' is essentially a random access memor+ also abbreviated as 8A'+ o% unbounded capacity+ as suggested in
")hibit 1@.1. The memory consists o% an in%inite array o% memory cells+ addressed 0+ 1+ 2+ [ . "ach cell can hold a
number+ say an integer+ o% arbitrary si6e+ as the arro5 pointing to the right suggests.
")hibit 1@.1& 8A' # unbounded address space+ unbounded cell si6e.
A 8A' has an arithmetic unit and is driven by a program. The meaning o% the 5ord random is that any memory
cell can be accessed in unit time Has opposed to a tape memory+ say+ 5here access time depends on distanceI. A
%urther crucial assumption in the 8A' model is that an arithmetic operation H]+ U+ K+ QI also takes unit time+
regardless o% the si6e o% the numbers involved. This assumption is unrealistic in a computation 5here numbers may
gro5 very large+ but o%ten is a use%ul assumption. As is the case 5ith all models+ the responsibility %or using them
properly lies 5ith the user. To give the reader the %lavor o% a model o% computation+ 5e de%ine a 8A' 5hose
architecture is rather similar to real computers+ but is unrealistically simple.
The ultimate 8$.C
8$.C stands %or +educed Instruction Set 0omputer+ a machine that has only a %e5 types o% instructions built
into the hard5are. :hat is the minimum number o% instructions a computer needs to be universal? $n theory+ one.
Consider a stored#program computer o% the Evon !eumann typeE 5here data and program are stored in the
same memory Hohn von !eumann+ 1903U19@7I. Let the random access memory H8A'I be Edoubly in%initeE& There
is a countable infinit o% memory cells addressed 0+ 1+ [ + each o% 5hich can hold an integer o% arbitrary si6e+ or an
instruction. :e assume that the constant 1 is hard5ired into memory cell 1L %rom 1 any other integer can be
constructed. There is a single type o% Ethree#address instructionE 5hich 5e call Esubtract+ test and 3umpE+
abbreviated as
.T )+ y+ 6
5here )+ y+ and 6 are addresses. $ts semantics is e0uivalent to
ST\ x, y, L x -. x 0 y& if x J 0 then goto L&
)+ y+ and 6 re%er to cells C)+ Cy+ and C6. The contents o% C) and Cy are treated as data Han integerIL the contents o%
C6+ as an instruction H")hibit 1@.2I.
13>
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 1@.2& .tored program computer& data and instructions share the memory.
.ince this 8$.C has 3ust one type o% instruction+ 5e 5aste no space on an op#code %ield. An instruction contains
three addresses+ each o% 5hich is an unbounded integer. $n theory+ %ortunately+ three unbounded integers can be
packed into the same space re0uired %or a single unbounded integer. $n the %ollo5ing e)ercise+ this simple idea
leads to a 5ell#kno5n techni0ue introduced into mathematical logic by -urt ,qdel H190> U 197DI.
")ercise& ,qdel numbering
HaI 'otel $n%inity has a countable in%inity o% rooms numbered 0+ 1+ 2+ [ . "very room is occupied+ so the sign
claims E!o FacancyE. Convince the manager that there is room %or one more person.
HbI Assume that a memory cell in our 8$.C stores an integer as a sign bit %ollo5ed by a se0uence d0+ d1+ d2+ [ o%
decimal digits+ least signi%icant %irst. (evise a scheme %or storing three addresses in one cell.
HcI .ho5 ho5 a se0uence o% positive integers i1+ i2+ [ + in o% arbitrary length n can be encoded in a single natural
number 3& ,iven 3+ the se0uence can be uni0uely reconstructed. ,qdel;s solution&
Basic program %ragments
This computer is best understood by considering program %ragments %or simple tasks. These %ragments
implement simple operations+ such as setting a variable to a given constant+ or the assignment operator+ that are
given as primitives in most programming languages. 2rogramming these %ragments naturally leads us to introduce
basic concepts o% assembly language+ in particular symbolic and relative addressing.
.et the content o% cell 0 to 0&
.T 0+ 0+ .]1
:hatever the current content o% cell 0+ subtract it %rom itsel% to obtain the integer 0. This instruction resides at
some address in memory+ 5hich 5e abbreviate as ;.;+ read as Ethe current value o% the program counterE. ;.]1; is the
ne)t address+ so regardless o% the outcome o% the test+ control %lo5s to the ne)t instruction in memory.
a &V b+ 5here a and b are symbolic addresses. *se a temporary variable t&
.T t+ t+ .]1 ? t :A 2 @
.T t+ b+ .]1 ? t :A Bb @
.T a+ a+ .]1? a :A 2 @
.T a+ t+ .]1 ? a :A Bt$ so now a A b @
")ercise& a program library
HaI :rite 8$.C programs %or a&V b ] c+ a &V b K c+ a &V b div c+ a &V b mod c+ a &V \b\+ a & V minHb+ cI+ a &V gcdHb+
cI.
Algorithms and Data Structures 137 A ,lobal Te)t
0
1
2
13
14
.
.
.
0
1
STJ 0, 0, 14
program counter
Executing instruction 13 sets
cell 0 to 0, and increments
the program counter.
15. 1omputabilit% and comple&it%
HbI .ho5 ho5 this 8$.C can compute 5ith rational numbers represented by a pair Na+ bO o% integers denoting
numerator and denominator.
HcI HAdvancedI .ho5 that this 8$.C is universal+ in the sense that it can simulate any computation done by any
other computer.
The e)ercise o% building up a 8$.C program library %or elementary %unctions provides the same e)perience as the
e0uivalent e)ercise %or Turing machines+ but leads to the goal much %aster+ since the primitive .T is much more
po5er%ul than the primitives o% a Turing machine.
The purpose o% this section is to introduce the idea that conceptually simple models o% computation are as
po5er%ul+ in theory+ as much more comple) models+ such as a high#level programming language. The ne)t t5o
sections demonstrate results o% an opposite nature& *niversal computers+ in the sense 5e have 3ust introduced+ are
sub3ect to striking limitations+ even i% 5e remove any limit on the memory and time they may use. :e prove the
e)istence o% noncomputable %unctions and sho5 that the Ehalting problemE is undecidable.
The theory o% computability 5as developed in the 1930s+ and greatly e)panded in the 19@0s and 19>0s. $ts basic
ideas have become part o% the %oundation that any computer scientist is e)pected to kno5. Computability theory is
not directly use%ul. $t is based on the concept Ecomputable in principleE but o%%ers no concept o% a E%easible
computationE. /easibility+ rather than Epossible in principleE+ is the touchstone o% computer science. .ince the
19>0s+ a theory o% the comple)ity o% computation is being developed+ 5ith the goal o% partitioning the range o%
computability into comple)ity classes according to time and space re0uirements. This theory is still in %ull
development and breaking ne5 ground+ in particular in the area o% concurrent computation. :e have used some o%
its concepts throughout 2art $$$ and continue to illustrate these ideas 5ith simple e)amples and surprising results.
Almost nothing is computable
Consider as a model o% computation any programming language+ 5ith the %ictitious %eature that it is
implemented on a machine 5ith in%inite memory and no operational time limits. !evertheless 5e reach the
conclusion that Ealmost nothing is computableE. This %ollo5s simply %rom the observation that there are %e5er
programs than problems to be solved H%unctions to be computedI. Both the number o% programs and the number o%
%unctions are in%inite+ but the latter is an in%inity o% higher cardinality.
A programming language L is de%ined over an alphabet AV aa1+ a2+ [ + akb o% k characters. The set o% programs in
L is a subset o% the set A
ln x dx=xln xx
in H) 5e obtain
(n+1)ln(n+1)nln(n+1)
i=1
n
ln i( n+1) cdotln(n+1)n ,
and there%ore
i=1
n
log
2
i=(n+1)log
2
( n+1)
n
ln 2
+g(n) withg(n) O( log n)
")ample
By substituting
in H) 5e obtain
5ith gHnI CHn K log nI.
'ecurrence relations
A homogeneous linear recurrence relation 5ith constant coe%%icients is o% the %orm
)n V a1 K )nU1 ] a2 K )nU2 ] [ ] ak K )nUk
5here the coe%%icients ai are independent o% n and )1+ )2+ [ + )nU1 are speci%ied. There is a general techni0ue %or
solving linear recurrence relations 5ith constant coe%%icients # that is+ %or determining )n as a %unction o% n. :e 5ill
demonstrate this techni0ue %or the /ibonacci se0uence 5hich is de%ined by the recurrence
Algorithms and Data Structures 1@0 A ,lobal Te)t
1#. )he mathematics of algorithm anal%sis
)n V )nU1 ] )nU2+ )0 V 0+ )1 V 1.
:e seek a solution o% the %orm
)n V c K r
n
5ith constants c and r to be determined. .ubstituting this into the /ibonacci recurrence relation yields
c K r
n
V c K r
nU1
] c K r
nU2
or
c K r
nU2
K Hr
2
U r U 1I V 0.
This e0uation is satis%ied i% either c V 0 or r V 0 or r
2
U r U 1 V 0. :e obtain the trivial solution )n
V 0 %or all n i% c
V 0 or r V 0. 'ore interestingly+ r
2
U r U 1 V 0 %or
The sum o% t5o solutions o% a homogeneous linear recurrence relation is obviously also a solution+ and it can be
sho5n that any linear combination o% solutions is again a solution. There%ore+ the most general solution o% the
/ibonacci recurrence has the %orm
5here c1 and c2 are determined as solutions o% the linear e0uations derived %rom the initial conditions&
5hich yield
the complete solution %or the /ibonacci recurrence relation is there%ore
8ecurrence relations that are not linear 5ith constant coe%%icients have no general solution techni0ues
comparable to the one discussed above. ,eneral recurrence relations are solved Hor their solutions are
appro)imated or boundedI by trial#and#error techni0ues. $% the trial and error is guided by some general techni0ue+
it 5ill yield at least a good estimate o% the asymptotic behavior o% the solution o% most recurrence relations.
")ample
Consider the recurrence relation
1@1
This book is licensed under a Creative Commons Attribution 3.0 License
5ith a d 0 and b d 0+ 5hich appears o%ten in the average#case analysis o% algorithms and data structures. :hen 5e
kno5 %rom the interpretation o% this recurrence that its solution is monotonically nondecreasing+ a systematic trial#
and#error process leads to the asymptotic behavior o% the solution. The simplest possible try is a constant+ )n
V c.
.ubstituting this into HI leads to
so )n V c is not a solution. .ince the le%t#hand side )n is smaller than an average o% previous values on the right#hand
side+ the solution o% this recurrence relation must gro5 %aster than c. !e)t+ 5e try a linear %unction )n
V c K n&
At this stage o% the analysis it su%%ices to %ocus on the leading terms o% each side& c K n on the le%t and Hc ] aI K n on
the right. The assumption a d 0 makes the right side larger than the le%t+ and 5e conclude that a linear %unction also
gro5s too slo5ly to be a solution o% the recurrence relation. A ne5 attempt 5ith a %unction that gro5s yet %aster+ )n V
c K n
2
+ leads to
Comparing the leading terms on both sides+ 5e %ind that the le%t side is no5 larger than the right+ and conclude
that a 0uadratic %unction gro5s too %ast. Aaving bounded the gro5th rate o% the solution %rom belo5 and above+ 5e
try %unctions 5hose gro5th rate lies bet5een that o% a linear and a 0uadratic %unction+ such as )n
V c K n
1.@
. A more
sophisticated approach considers a %amily o% %unctions o% the %orm )n
V c K n
1]e
%or any d 0& All o% them gro5 too
%ast. This suggests )n
V c K n K log2
n+ 5hich gives
5ith gHnI CHn K log nI and hHnI CHlog nI. To match the linear terms on each side+ 5e must choose c such that
or c V a K ln < o 1.3D> K a. Aence 5e no5 kno5 that the solution to the recurrence relation HI has the %orm
Algorithms and Data Structures 1@2 A ,lobal Te)t
1#. )he mathematics of algorithm anal%sis
Asymptotic performance of divide)and)con1uer algorithms
:e illustrate the po5er o% the techni0ues developed in previous sections by analy6ing the asymptotic
per%ormance not o% a speci%ic algorithm+ but rather+ o% an entire class o% divide#and#con0uer algorithms. $n R(ivide
and con0uer recursionS 5e presented the %ollo5ing schema %or divide#and#con0uer algorithms that partition the set
o% data into t5o parts&
O*@+- if simple*@+ then return*O
0
*@++
else ". divide- partition @ into @
"
and @
$
&
$. conAuer- 5
"
-. O*@
"
+& 5
$
-. O*@
$
+&
%. comine- return*merge*5
"
, 5
$
++&
Assume %urther that the data set ( can al5ays be partitioned into t5o halves+ (1 and (2+ at every level o%
recursion. T5o comments are appropriate&
1. /or repeated halving to be possible it is not necessary that the si6e n o% the data set ( be a po5er o% 2+ n V
2
k
. $t is not important that ( be partitioned into t5o e)act halvesYappro)imate halves 5ill do. $magine
padding any data set ( 5hose si6e is not a po5er o% 2 5ith dummy elements+ up to the ne)t po5er o% 2.
(ummies can al5ays be %ound that do not disturb the real computation& %or e)ample+ by replicating
elements or by appending sentinels. 2adding is usually 3ust a conceptual trick that may help in
understanding the process+ but need not necessarily generate any additional data.
2. :hether or not the divide step is guaranteed to partition ( into t5o appro)imate halves+ on the other hand+
depends critically on the problem and on the data structures used. ")ample& Binary search in an ordered
array partitions ( into halves by probing the element at the midpointL the same idea is impractical in a
linked list because the midpoint is not directly accessible.
*nder our assumption o% halving+ the time comple)ity THnI o% algorithm A applied to data ( o% si6e n satis%ies
the recurrence relation
5here %HnI is the sum o% the partitioning or splitting time and the Estitching timeE re0uired to merge t5o solutions
o% si6e n Q2 into a solution o% si6e n. 8epeated substitution yields
The term n K TH1I e)presses the %act that every data item gets looked at+ the second sums up the splitting and
stitching time. Three typical cases occur&
HaI Constant time splitting and merging %HnI V c yields
1@3
This book is licensed under a Creative Commons Attribution 3.0 License
THnI V HTH1I ] cI K n.
E*ample: /ind the ma)imum o% n numbers.
HbI Linear time splitting and merging %HnI V a K n ] b yields
THnI V a K n K log
2
n ] HTH1I ] bI K n.
E*amples: 'ergesort+ 0uicksort.
HcI ")pensive splitting and merging& n oH%HnII yields
THnI V n K TH1I ] CH%HnI K log nI
and there%ore rarely leads to interesting algorithms.
!ermutations
$nversions
Let Hak& 1 ` k ` nI be a permutation o% the integers 1 .. n. A pair Hai+ a3I+ 1 ` $ e 3 ` n+ is called an inversion i%% ai
d
a3. :hat is the average number o% inversions in a permutation? Consider all permutations in pairsL that is+ 5ith any
permutation A&
a1 V )1L a2 V )2L [ L an V )n
consider its inverse A;+ 5hich contains the elements o% A in inverse order&
a1 V )nL a2 V )nU1L [ L an V )1.
$n one o% these t5o permutations )i and )3 are in the correct order+ in the other+ they %orm an inversion. .ince
there are nK Hn U 1I Q2 pairs o% elements H)i+ )3I 5ith 1 ` i e 3 ` n there are+ on average+
inversions.
Average distance
Let Hak& 1 ` k ` nI be a permutation o% the natural numbers %rom 1 to n. The distance o% the element ai %rom its
correct position is \ai U i\. The total distance o% all elements %rom their correct positions is
There%ore+ the average total distance Hi.e. the average over all nM permutationsI is
Algorithms and Data Structures 1@< A ,lobal Te)t
1#. )he mathematics of algorithm anal%sis
Let 1 ` i ` n and 1 ` 3 ` n. Consider all permutations %or 5hich a
i
is e0ual to 3. .ince there are Hn U 1IM such
permutations+ 5e obtain
There%ore+
the average distance o% an element ai %rom its correct position is there%ore
Trees
(rees are ubi0uitous in discrete mathematics and computer science+ and this section summari6es some o% the
basic concepts+ terminology+ and results. Although trees come in di%%erent versions+ in the conte)t o% algorithms and
data structures+ EtreeE almost al5ays means an ordered rooted tree. An ordered rooted tree is either empty or it
consists o% a node+ called a root+ and a se0uence o% k ordered subtrees T1+ T2+ [ + Tk H")hibit 1>.2I. The nodes o% an
ordered tree that have only empty subtrees are called leaves or e)ternal nodes+ the other nodes are called internal
nodes H")hibit 1>.3I. The roots o% the subtrees attached to a node are its childrenL and this node is their parent.
1@@
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 1>.2& 8ecursive de%inition o% a rooted+ ordered tree.
The level o% a node is de%ined recursively. The root o% a tree is at level 0. The children o% a node at level t are at
level t ] 1. The level o% a node is the length o% the path %rom the root o% the tree to this node. The height o% a tree is
de%ined as the ma)imum level o% all leaves. The path length o% a tree is the sum o% the levels o% all its nodes H")hibit
1>.3I.
")hibit 1>.3& A tree o% height V < and path length V 3@.
A binar tree is an ordered tree 5hose nodes have at most t5o children. A 0#2 binary tree is a tree in 5hich
every node has 6ero or t5o children but not one. A 0#2 tree 5ith n leaves has e)actly n U 1 internal nodes. A binary
tree o% height h is called complete Hcompletely balancedI i% it has 2
h]1
U 1 nodes H")hibit 1>.<. A binary tree o% height
h is called almost complete if all its leaves are on levels h U 1 and h+ and all leaves on level h are as %ar le%t as
possible H")hibit 1>.<I.
Algorithms and Data Structures 1@> A ,lobal Te)t
1#. )he mathematics of algorithm anal%sis
")hibit 1>.<& ")amples o% 5ell#balanced binary trees.
")ercises
1. .uppose that 5e are comparing implementations o% t5o algorithms on the same machine. /or inputs o% si6e
n+ the %irst algorithm runs in 9 K n
2
steps+ 5hile the second algorithm runs in D1 K n K log2
n steps. Assuming
that the steps in both algorithms take the same time+ %or 5hich values o% n does the %irst algorithm beat the
second algorithm?
2. :hat is the smallest value o% n such that an algorithm 5hose running time is 2@> K n
2
runs %aster than an
algorithm 5hose running time is 2
n
on the same machine?
3. /or each o% the %ollo5ing %unctions %iHnI+ determine a %unction gHnI such that %iHnI HgHnII. The %unction
gHnI should be as simple as possible.
%1HnI V 0.001 K n
7
] n
2
] 2 K n
%2HnI V n K log n ] log n ] 123< K n
%3HnI V @ K n K log n ] n
2
K log n ] n
2
%<HnI V @ K n K log n ] n
3
] n
2
K log n
<. 2rove %ormally that 102< K n
2
] @ K n Hn
2
I.
@. ,ive an asymptotically tight bound %or the %ollo5ing summation&
>. /ind the most general solutions to the %ollo5ing recurrence relations.
7. .olve the recurrence THnI V 2KTHI ] log2
n. =int: 'ake a change o% variables m V log2
n.
D. Compute the number o% inversions and the total distance %or the permutation H3 1 2 <I.
1@7
This book is licensed under a Creative Commons Attribution 3.0 License
%9& Sorting and its
comple/ity
Learning ob3ectives&
:hat is sorting?
basic ideas and intrinsic comple)ity
insertion sort
selection sort
merge sort
distribution sort
a lo5er bound HnK log nI
=uicksort
.orting in linear time?
sorting net5orks
What is sorting3 Ho* difficult is it3
The problem
Assume that . is a set o% n elements )1+ )2+ [ + )n dra5n %rom a domain W+ on 5hich a total order ` is de%ined Hi.e.
a relation that satis%ies the %ollo5ing a)iomsI&
J is reflexive *i.e x B- x J x+
J is antisymmetric *i.e x, y B- x J y y J x x . y+
J is transitive *i.e x, y, L B- x J y y J L x J L+
J is total *i.e. x, y B x J y y J x+
.orting is the process o% generating a se0uence
such that Hi1+ i2+ [ + inI is a permutation o% the integers %rom 1 to n and
holds. 2hrased abstractly+ sorting is the problem o% %inding a speci%ic permutation Hor one among a %e5
permutations+ 5hen distinct elements may have e0ual valuesI out o% nM possible permutations o% the n given
elements. *sually+ the set . o% elements to be sorted 5ill be given in a data structureL in this case+ the elements o% .
are ordered implicitly by this data structure+ but not necessarily according to the desired order `. Typical sorting
problems assume that . is given in an array or in a se0uential %ile Hmagnetic tapeI+ and the result is to be generated
in the same structure. :e characteri6e elements by their position in the structure He.g. ANiO in the array A or by the
Algorithms and Data Structures 1@D A ,lobal Te)t
1'. $orting and its comple&it%
value o% a pointer in a se0uential %ileI. The access operations provided by the underlying data structure determine
5hat sorting algorithms are possible.
Algorithms
'ost sorting algorithms are re%inements o% the %ollo5ing idea&
while *i, /+- i S / OFiG K OF/G do OFiG -.- OF/G&
5here &V& denotes the e)change operator. "ven sorting algorithms that do not e)plicitly e)change pairs o% elements+
or do not use an array as the underlying data structure+ can usually be thought o% as con%orming to the schema
above. An insertion sort+ %or e)ample+ takes one element at a time and inserts it in its proper place among those
already sorted. To %ind the correct place o% insertion+ 5e can think o% a ripple e%%ect 5hereby the ne5 element
successively displaces He)changes position 5ithI all those larger than itsel%.
As the schema above sho5s+ t5o types o% operations are needed in order to sort&
collecting in%ormation about the order o% the given elements
ordering the elements He.g. by e)changing a pairI
:hen designing an e%%icient algorithm 5e seek to economi6e the number o% operations o% both types& :e try to
avoid collecting redundant in%ormation+ and 5e hope to move an element as %e5 times as possible. The
nondeterministic algorithm given above lets us per%orm any one o% a number o% e)changes at a given time+
regardless o% their use%ulness. /or e)ample+ in sorting the se0uence
)1 V @+ )2
V
2+ )3 V 3+ )< V <+ )@ V 1
the nondeterministic algorithm permits any o% seven e)changes
)1 &V& )i %or 2 ` i ` @ and )3 &V& )@ %or 2 ` 3 ` <.
:e might have reached the state sho5n above by %ollo5ing an e)otic sorting techni0ue that sorts E%rom the
middle to5ard both endsE+ and 5e might kno5 at this time that the single e)change )1
&V& )@ 5ill complete the sort.
The nondeterministic algorithm gives us no handle to e)press and use this kno5ledge.
The attempt to economi6e 5ork %orces us to depart %rom nondeterminacy and to impose a control structure that
care%ully se0uences the operations to be per%ormed so as to make ma)imal use o% the in%ormation gained so %ar. The
resulting algorithms 5ill be more comple) and di%%icult to understand. $t is use%ul to remember+ though+ that
sorting is basically a simple problem 5ith a simple solution and that all the acrobatics in this chapter are due to our
0uest %or e%%iciency.
$ntrinsic comple)ity
There are obvious limits to ho5 much 5e can economi6e. $n the absence o% any previously ac0uired in%ormation+
it is clear that each element must be inspected and+ in general+ moved at least once. Thus 5e cannot hope to get
a5ay 5ith %e5er than HnI primitive operations. There are less obvious limits+ 5e mention t5o o% them here.
1. $% in%ormation is collected by asking binary 0uestions only Hany 0uestion that may receive one o% t5o
ans5ers He.g. a yesQno 0uestion+ or a comparison o% t5o elements that yields either ` or dI+ then at least n K
log2 n 0uestions are necessary in general+ as 5ill be proved in the section EA lo5er bound n K lognE. Thus in
this model o% computation+ sorting re0uires time Hn K log nI.
2. $n addition to collecting in%ormation+ one must rearrange the elements. $n the section E2ermutationE in
chapter 1>+ 5e have sho5n that in a permutation the average distance o% an element %rom its correct
1@9
This book is licensed under a Creative Commons Attribution 3.0 License
position is appro)imately nQ3. There%ore elements have to move an average distance o% appro)imately nQ3
elements to end up at their destination. (epending on the access operations o% the underlying storage
structure+ an element can be moved to its correct position in a single step o% average length nQ3+ or in nQ3
steps o% average length 1. $% elements are rearranged by e)changing ad3acent elements only+ then on average
Hn
2
I moving operations are re0uired. There%ore+ short steps are insu%%icient to obtain an e%%icient Hn K log
nI sorting algorithm.
2ractical aspects o% sorting
!ecords instead of elements. :e discuss sorting assuming only that the elements to be sorted are dra5n
%rom a totally ordered domain. $n practice these elements are 3ust the keys o% records that contain additional data
associated 5ith the key& %or e)ample+
type recordtype . record
key- keytype& { totally ordered by 4 }
data- anytype
end&
:e use the relational operators V+ e+ ` to compare keys+ but in a given programming language+ say 2ascal+ these
may be unde%ined on values o% type keytype. $n general+ they must be replaced by procedures& %or e)ample+ 5hen
comparing strings 5ith respect to the le)icographic order.
$% the key %ield is only a small part o% a large record+ the e)change operation &V&+ interpreted literally+ becomes an
unnecessarily costly copy operation. This can be avoided by leaving the record Hor 3ust its data %ieldI in place+ and
only moving a small surrogate record consisting o% a key and a pointer to its associated record.
+ort generators. Cn many systems+ particularly in the 5orld o% commercial data processing+ you may never
need to 5rite a sorting program+ even though sorting is a %re0uently e)ecuted operation. .orting is taken care o% by
a sort generator+ a program akin to a compilerL it selects a suitable sorting algorithm %rom its repertoire and tailors
it to the problem at hand+ depending on parameters such as the number o% elements to be sorted+ the resources
available+ the key type+ or the length o% the records.
Partially sorted se8uences. The algorithms 5e discuss ignore any order that may e)ist in the se0uence to be
sorted. 'any applications call %or sorting %iles that are almost sorted+ %or e)ample+ the case 5here a sorted master
file is updated 5ith an unsorted transaction file. .ome algorithms take advantage o% any order present in the input
dataL their time comple)ity varies %rom CHnI %or almost sorted %iles to CHn K log nI %or randomly ordered %iles.
Types of sorting algorithms
T5o important classes o% incremental sorting algorithms create order by processing each element in turn and
placing it in its correct position. These classes+ insertion sorts and selection sorts+ are best understood as
maintaining t5o dis3oint+ mutually e)haustive structures called ;sorted; and ;unsorted;.
!nitialiLe- ,sorted, -. Y& ,unsorted, -. =x
"
, x
$
, C , x
n
>&
8oop- for i -. " to n do
move an element from ,unsorted, to its correct place in
,sorted,&
The %ollo5ing illustrations sho5 ;sorted; and ;unsorted; sharing an arrayN1 .. nO. $n this case the boundary
bet5een ;sorted; and ;unsorted; is represented by an inde) i that increases as more elements become ordered. The
important distinction bet5een the t5o types o% sorting algorithms emerges %rom the 0uestion& $n 5hich o% the t5o
Algorithms and Data Structures 1>0 A ,lobal Te)t
1'. $orting and its comple&it%
structures is most o% the 5ork done? $nsertion sorts remove the %irst or most easily accessible element %rom
;unsorted; and search through ;sorted; to %ind its proper place. .election sorts search through ;unsorted; to %ind the
ne)t element to be appended to ;sorted;.
$nsertion sort
The i#th step inserts the i#th element into the sorted se0uence o% the %irst Hi U 1I elements ")hibit 17.1I.
")hibit 17.1& $nsertion sorts move an easily accessed element to its correct place.
.election sort
The i#th step selects the smallest among the n U i ] 1 elements not yet sorted+ and moves it to the i#th position
H")hibit 17.2I.
")hibit 17.2& .election sorts search %or the correct element to move to an easily accessed place.
$nsertion and selection sorts repeatedly search through a large part o% the entire data to %ind the proper place o%
insertion or the proper element to be moved. "%%icient search re0uires random access+ hence these sorting
techni0ues are used primarily %or internal sorting in central memory.
'erge sort
'erge sorts process HsubIse0uences o% elements in unidirectional order and thus are 5ell suited %or e*ternal
sorting on secondary storage media that provide se0uential access only+ such as magnetic tapesL or random access
to large blocks o% data+ such as disks. 'erge sorts are also e%%icient %or internal sorting. The basic idea is to merge
t5o sorted se0uences o% elements+ called runs+ into one longer sorted se0uence. :e read each o% the input runs+ and
5rite the output run+ starting 5ith small elements and ending 5ith the large ones. :e keep comparing the smallest
o% the remaining elements on each input run+ and append the smaller o% the t5o to the output run+ until both input
runs are e)hausted H")hibit 17.3I.
1>1
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 17.3& 'erge sorts e)ploit order already present.
The processor sho5n at le%t in ")hibit 17.< reads t5o tapes+ A and B. Tape A contains runs 1 and 2L tape B
contains runs 3 and <. The processor merges runs 1 and 3 into the single run 1 r 3 on tape C+ and runs 2 and < into
the single run 2 r < on tape (. $n a second merge step+ the processor sho5n at the right reads tapes C and ( and
merges the t5o runs 1 r 3 and 2 r < into one run+ 1 r 3 r 2 r <.
")hibit 17.<& T5o merge steps in se0uence.
(istribution sort
(istribution sorts process the representation o% an element as a value in a radi) number system and use
primitive arithmetic operations such as Ee)tract the k#th digitE. These sorts do not compare elements directly. They
introduce a di%%erent model o% computation than the sorts based on comparisons+ e)changes+ insertions+ and
deletions that 5e have considered thus %ar. As an e)ample+ consider numbers 5ith at most three digits in radi) <
representation. $n a %irst step these numbers are distributed among %our 0ueues according to their least signi%icant
digit+ and the 0ueues are concatenated in increasing order. The process is repeated %or the middle digit+ and %inally
%or the le%tmost+ most signi%icant digit+ as sho5n in ")hibit 17.@
Algorithms and Data Structures 1>2 A ,lobal Te)t
1'. $orting and its comple&it%
")hibit 17.@ (istribution sorts use the radi) representation o% keys to organi6e elements in buckets
:e have no5 seen the basic ideas on 5hich all sorting algorithms are built. $t is more important to understand
these ideas than to kno5 do6ens o% algorithms based on them. To appreciate the intricacy o% sorting+ you must
understand some algorithms in detail& 5e begin 5ith simple ones that turn out to be ine%%icient.
Simple sorting algorithms that *or( in time Cn
-
D
$% you invent your o5n sorting techni0ue 5ithout prior study o% the literature+ you 5ill probably EdiscoverE a
5ell#kno5n ine%%icient algorithm that 5orks in time CHn
2
I+ re0uires time Hn
2
I in the 5orst case+ and thus is o% time
comple)ity Hn
2
I. Xour algorithm might be similar to one described belo5.
Consider in-place algorithms that 5ork on an array declared as
var O- arrayF" .. nG of elt&
and place the elements in ascending order. Assume that the comparison operators are de%ined on values o% type elt.
Let cbest+ caverage+ and c5orst denote the number o% comparisons+ and ebest+ eaverage+ and e5orst the number o% e)change
operations per%ormed in the best+ average+ and 5orst case+ respectively. Let invaverage denote the average number o%
inversions in a permutation.
$nsertion sort H")hibit 17.>I
Let U_ denote a constant ` any key value. The smallest value in the domain o%ten serves as a sentinel U_.
1>3
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 17.>& .traight insertion propagates a ripple#e%%ect across the sorted part o% the array.
OF0G -. 0R&
for i -. $ to n do egin
/ -. i&
while OF/G S OF/ 0 "G do { OF/G -.- OF/ 0 "G& { exchange }
/ -. / 0 " }
end&
This straight insertion sort is an HnI algorithm in the best case and an Hn
2
I algorithm in the average and 5orst
cases. $n the program above+ the point o% insertion is %ound by a linear search interleaved 5ith e)changes. A binary
search is possible but does not improve the time comple)ity in the average and 5orst cases+ since the actual
insertion still re0uires a linear#time ripple o% e)changes.
.election sort H")hibit 17.7I
")hibit 17.7& .traight selection scans the unsorted part o% the array.
for i -. " to n 0 " do egin
minindex -. i& minkey -. OFiG&
for / -. i 9 " to n do
if OF/G S minkey then { minkey -. OF/G& minindex -. / }
OFiG -.- OFminindexG { exchange }
end&
Algorithms and Data Structures 1>< A ,lobal Te)t
1'. $orting and its comple&it%
The sum in the %ormula %or the number o% comparisons re%lects the structure o% the t5o nested %or loops. The
body o% the inner loop is e)ecuted the same number o% times %or each o% the three cases. Thus this straight selection
sort is o% time comple)ity Hn
2
I.
A lo*er bound Cn E log nD
A straight%or5ard counting argument yields a lo5er bound on the time comple)ity o% any sorting algorithm that
collects in%ormation about the ordering o% the elements by asking only binary 0uestions. A binary 0uestion has a
t5o#valued ans5er& yes or no+ true or %alse. A comparison o% t5o elements+ ) ` y+ is the most obvious e)ample+ but
the %ollo5ing theorem holds %or binary 0uestions in general.
Theorem: Any sorting algorithm that collects in%ormation by asking binary 0uestions only e)ecutes at least
binary 0uestions both in the 5orst case+ and averaged over all nM permutations. Thus the average and 5orst#case
time comple)ity o% such an algorithm is Hn K log nI.
Proof: A sorting algorithm o% the type considered here can be represented by a binar decision tree. "ach
internal node in such a tree represents a binary 0uestion+ and each lea% corresponds to a result o% the decision
process. The decision tree must distinguish each o% the nM possible permutations o% the input data %rom all the
othersL and thus must have at least nM leaves+ one %or each permutation.
E*ample: The decision tree sho5n in ")hibit 17.D collects the in%ormation necessary to sort three elements+ )+ y
and 6+ by comparisons bet5een t5o elements.
")hibit 17.D The decision tree sho5s the possible nM Cutcomes 5hen sorting n elements.
1>@
This book is licensed under a Creative Commons Attribution 3.0 License
The average number o% binary 0uestions needed by a sorting algorithm is e0ual to the average depth o% the
leaves o% this decision tree. The lemma %ollo5ing this theorem 5ill sho5 that in a binary tree 5ith k leaves the
average depth o% the leaves is at least log2k. There%ore+ the average depth o% the leaves corresponding to the nM
permutations is at least log2nM. .ince
it %ollo5s that on average at least
nlog
2
n1
n
ln
2
binary 0uestions are needed+ that is+ the time comple)ity o% each such sorting algorithm is Hn K log nI in the
average+ and there%ore also in the 5orst case.
Lemma: $n a binary tree 5ith k leaves the average depth o% the leaves is at least log2k.
Proof: .uppose that the lemma is not true+ and let T be the countere)ample 5ith the smallest number o% nodes.
T cannot consist o% a single node because the lemma is true %or such a tree. $% the root r o% T has only one child+ the
subtree T; rooted at this child 5ould contain the k leaves o% T that have an even smaller average depth in T; than in
T. .ince T 5as the countere)ample 5ith the smallest number o% nodes+ such a T; cannot e)ist. There%ore+ the root r
o% T must have t5o children+ and there must be kL d 0 leaves in the le%t subtree and k8
d 0 leaves in the right subtree
o% r HkL ] k8
V kI. .ince T 5as chosen minimal+ the kL leaves in the le%t subtree must have an average depth o% at least
log2
kL+ and the k8 leaves in the right subtree must have an average depth o% at least log2
k8. There%ore+ the average
depth o% all k leaves in T must be at least
$t is easy to see that HI assumes its minimum value i% kL
V k8. .ince ( )has the value log2
k i% kL
V k8
V k Q 2 5e have
%ound a contradiction to our assumption that the lemma is %alse.
.uic(sort
=uicksort HC. A. 8. Aoare+ 19>2I NAoa >2O combines the po5er%ul algorithmic principle o% divide#and#
con0uer 5ith an e%%icient 5ay o% moving elements using %e5 e)changes. The divide phase partitions the array into
t5o dis3oint parts& the EsmallE elements on the le%t and the ElargeE elements on the right. The con,uer phase sorts
each part separately. Thanks to the 5ork o% the divide phase+ the merge phase re0uires no 5ork at all to combine
t5o partial solutions. =uicksort;s e%%iciency depends crucially on the e)pectation that the divide phase cuts t5o
si6able subarrays rather than merely slicing o%% an element at either end o% the array H")hibit 17.9I.
Algorithms and Data Structures 1>> A ,lobal Te)t
1'. $orting and its comple&it%
")hibit 17.9& =uicksort partitions the array into the EsmallE elements on the le%t and the ElargeE elements
on the right.
:e chose an arbitrary threshold value m to de%ine EsmallE as ` m+ and ElargeE as Z m+ thus ensuring that any
Esmall elementE ` any Elarge elementE. :e partition an arbitrary subarray ANL .. 8O to be sorted by e)ecuting a le%t#
to#right scan Hincrementing an inde) iI EconcurrentlyE 5ith a right#to#le%t scan Hdecrementing 3I H")hibit 17.10I. The
le%t#to#right scan pauses at the %irst element ANiO Z m+ and the right#to#le%t scan pauses at the %irst element AN3O ` m.
:hen both scans have paused+ 5e e)change ANiO and AN3O and resume the scans. The partition is complete 5hen the
t5o scans have crossed over 5ith 3 e i. Therea%ter+ 0uicksort is called recursively %or ANL .. 3O and ANi .. 8O+ unless one
or both o% these subarrays consists o% a single element and thus is trivially sorted. E*ample o% partitioning Hm V 1>I&
$( $% % "' # ) $9 '
i /
' $% % "' # ) $9 $(
i /
' ) % "' # $% $9 $(
i /
' ) % # "' $% $9 $(
/ i
")hibit 17.10& .canning the array concurrently %rom le%t to right and %rom right to le%t.
Although the threshold value m appeared arbitrary in the description above+ it must meet criteria o% correctness
and e%%iciency. 0orrectness: i% either the set o% elements ` m or the set o% elements Z m is empty+ 0uicksort %ails to
terminate. Thus 5e re0uire that minH)iI ` m ` ma)H)iI. Efficienc re0uires that m be close to the median.
Ao5 do 5e %ind the median o% n elements? The obvious ans5er is to sort the elements and pick the middle one+
but this leads to a chicken#and#egg problem 5hen trying to sort in the %irst place. There e)ist sophisticated
algorithms that determine the e)act median o% n elements in time CHnI in the 5orst case NB/28T 72O. The
multiplicative constant might be large+ but %rom a theoretical point o% vie5 this does not matter. The elements are
partitioned into t5o e0ual#si6ed halves+ and 0uicksort runs in time CHn K log nI even in the 5orst case. /rom a
practical point o% vie5+ ho5ever+ it is not 5orth5hile to spend much e%%ort in %inding the e)act median 5hen there
are much cheaper 5ays o% %inding an acceptable appro)imation. The %ollo5ing techni0ues have all been used to pick
a threshold m as a Eguess at the medianE&
An array element in a %i)ed position such as ANHL ] 8I div 2O. Warning: stay a5ay %rom either end+ ANLO or
AN8O+ as these thresholds lead to poor per%ormance i% the elements are partially sorted.
An array element in a random position& a simple techni0ue that yields good results.
The median o% three or %ive array elements in %i)ed or random positions.
1>7
This book is licensed under a Creative Commons Attribution 3.0 License
The average bet5een the smallest and largest element. This re0uires a separate scan o% the entire array in
the beginningL therea%ter+ the average %or each subarray can be calculated during the previous partitioning
process.
The recursive procedure ;r0s; is a possible implementation o% 0uicksort. The %unction ;guessmedian; must yield a
threshold that lies on or bet5een the smallest and largest o% the elements to be sorted. $% an array element is used as
the threshold+ the procedure ;r0s; should be changed in such a 5ay that a%ter %inishing the partitioning process this
element is in its %inal position bet5een the le%t and right parts o% the array.
procedure rAs *8, 5- " .. n+& { sorts <708, C , <7I8 }
var i, /- 0 .. n 9 "&
procedure partition&
var m- elt&
egin { partition }
m -. guessmedian *8, 5+&
{ min(<708, C , <7I8 4 m 4 max(<708, C , <7I8 }
i -. 8& / -. 5&
repeat
{ <708, C , <7i D &8 4 m 4 <7j B &8, C , <7I8 }
while OFiG S m do i -. i 9 "&
{ <708, C , <7i D &8 4 m 4 <7i8 }
while m S OF/G do / -. / 0 "&
{ <7j8 4 m 4 <7j B &8, C , <7I8 }
if i J / then egin
OFiG -.- OF/G& { exchange }
{ i 4 j <7i8 4 m 4 <7j8 }
i -. i 9 "& / -. / 0 "
{ <708, C , <7i D &8 4 m 4 <7j B &8, C , <7I8 }
end
else
{ i 6 j i ! j B & exit }
end
until i K /
end& { partition }
egin { rJs }
partition&
if 8 S / then rAs*8, /+&
if i S 5 then rAs*i, 5+
end& { rJs }
An initial call ;r0sH1+ nI; 5ith n d 1 guarantees that L e 8 holds %or each recursive call.
An iterative implementation o% 0uicksort is given by the %ollo5ing procedure+ ;i0s;+ 5hich sorts the 5hole array
AN1 .. nO. The boundaries o% the subarrays to be sorted are maintained on a stack.
procedure iAs&
const stacklength . C &
type stackelement . record 8, 5- " .. n end&
var i, /, 8, 5, s- 0 .. n&
stack- arrayF" .. stacklengthG of stackelement&
procedure partition& { same as in rJs }
end& { partition }
egin { iJs }
s -. "& stackF"G.8 -. "& stackF"G.5 -. n&
repeat
8 -. stackFsG.8& 5 -. stackFsG.5& s -. s 0 "&
Algorithms and Data Structures 1>D A ,lobal Te)t
1'. $orting and its comple&it%
repeat
partition&
if / 0 8 S 5 0 i then egin
if i S 5 then { s -. s 9 "& stackFsG.8 -. i&
stackFsG.5 -. 5 }&
5 -. /
end
else egin
if 8 S / then { s -. s 9 "& stackFsG.8 -. 8&
stackFsG.5 -. / }&
8 -. i
end
until 8 I 5
until s . 0
end& { iJs }
A%ter partitioning+ ;i0s; pushes the bounds o% the larger part onto the stack+ thus making sure that part 5ill be
sorted later+ and sorts the smaller part %irst. Thus the length o% the stack is bounded by log2n.
/or very small arrays+ the overhead o% managing a stack makes 0uicksort less e%%icient than simpler CHn
2
I
algorithms+ such as an insertion sort. A practically e%%icient implementation o% 0uicksort might s5itch to another
sorting techni0ue %or subarrays o% si6e up to 10 or 20. N.ed 7DO is a comprehensive discussion o% ho5 to optimi6e
0uicksort.
Analysis for three cases# best$ =typical=$ and *orst
Consider a 0uicksort algorithm that chooses a guessed median that di%%ers %rom any o% the elements to be sorted
and thus partitions the array into t5o parts+ one 5ith k elements+ the other 5ith n U k elements. The 5ork 0HnI
re0uired to sort n elements satis%ies the recurrence relation
The constant b measures the cost o% calling 0uicksort %or the array to be sorted. The term a K n covers the cost o%
partitioning+ and the terms 0HkI and 0Hn U kI correspond to the 5ork involved in 0uicksorting the t5o subarrays.
'ost 0uicksort algorithms partition the array into three parts& the EsmallE le%t part+ the single array element used to
guess the median+ and the ElargeE right part. Their 5ork is e)pressed by the e0uation
:e analy6e e0uation HgIL it is close enough to the second e0uation to have the same asymptotic solution.
=uicksort;s behavior in the best and 5orst cases are easy to analy6e+ but the average over all permutations is not.
There%ore+ 5e analy6e another average 5hich 5e call the tpical case.
=uicksort;s best-case behavior is obtained i% 5e guess the correct median that partitions the array into t5o
e0ual#si6ed subarrays. /or simplicity;s sake the %ollo5ing calculation assumes that n is a po5er o% 2+ but this
assumption does not a%%ect the solution. Then HgI can be re5ritten as
:e use this recurrence e0uation to calculate
1>9
This book is licensed under a Creative Commons Attribution 3.0 License
and substitute on the right#hand side to obtain
8epeated substitution yields
The constant 0H1I+ 5hich measures 0uicksort;s 5ork on a trivially sorted array o% length 1+ and b+ the cost o% a
single procedure call+ do not a%%ect the dominant term n K log2n. The constant %actor a in the dominant term can be
estimated by analy6ing the code o% the procedure ;partition;. :hen these details do not matter+ 5e summari6e&
=uicksort;s time comple)ity in the best case is Hn K log nI.
=uicksort;s worst-case behavior occurs 5hen one o% the t5o subarrays consists o% a single element a%ter each
partitioning. $n this case e0uation HI becomes
:e use this recurrence e0uation to calculate
and substitute on the right#hand side to obtain
8epeated substitution yields
There%ore the time comple)ity o% 0uicksort in the 5orst case is Hn
2
I.
/or the analysis o% 0uicksort;s tpical behavior 5e make the plausible assumption that the array is e0ually likely
to get partitioned bet5een any t5o o% its elements& /or all k+ 1 ` k e n+ the probability that the array A is partitioned
into the subarrays AN1 .. kO and ANk ] 1 .. nO is 1 Q Hn U 1I. Then the average 5ork to be per%ormed by 0uicksort is
e)pressed by the recurrence relation
Algorithms and Data Structures 170 A ,lobal Te)t
1'. $orting and its comple&it%
This recurrence relation appro)imates the recurrence relation discussed in chapter 1> 5ell enough to have the
same solution
.ince ln < o 1.3D>+ 0uicksort;s asymptotic behavior in the typical case is only about <0s 5orse than in the best
case+ and remains in Hn K log nI. N.ed 77O is a thorough analysis o% 0uicksort.
'erging and merge sorts
The internal sorting algorithms presented so %ar re0uire direct access to each element. This is re%lected in our
analyses by treating an array access ANiO+ or each e)change ANiO &V& AN3O+ as a primitive operation 5hose cost is
constant Hindependent o% nI. This assumption is not valid %or elements stored on secondary storage devices such as
magnetic tapes or disks. A better assumption that mirrors the realities o% e*ternal sorting is that the elements to be
sorted are stored as a se,uential file %. The %ile is accessed through a %ile pointer 5hich+ at any given time+ provides
direct access to a single element. Accessing other elements re0uires repositioning o% the %ile pointer. .e0uential %iles
may permit the pointer to advance in one direction only+ as in the case o% 2ascal %iles+ or to move back5ard and
%or5ard. $n either case+ our theoretical model assumes that the time re0uired %or repositioning the pointer is
proportional to the distance traveled. This assumption obviously %avors algorithms that process Hcompare+
e)changeI pairs o% ad3acent elements+ and penali6es algorithms such as 0uicksort that access elements in random
positions.
The %ollo5ing e)ternal sorting algorithm is based on the merge sort principle. To make optimal use o% the
available main memory+ the algorithm %irst creates initial runsL a run is a sorted subse0uence o% elements %i+ %i]1+ [ +
%3 stored consecutively in %ile %+ %k
` %k]1 %or all k 5ith i ` k ` 3 U 1. Assume that a bu%%er o% capacity m elements is
available in main memory to create initial runs o% length m Hperhaps less %or the last runI. $n processing the r#th
run+ r V 0+ 1+ [ + 5e read the m elements %rKm]1+ %rKm]2+ [ + %rKm]m into memory+ sort them internally+ and 5rite the sorted
se0uence to a modi%ied %ile %+ 5hich may or may not reside in the same physical storage area as the original %ile %.
This ne5 %ile % is partially sorted into runs& %k ` %k]1 %or all k 5ith r K m ] 1 ` k e r K m ] m.
At this point 5e need t5o %iles+ g and h+ in addition to the %ile %+ 5hich contai ns the initial runs. $n a cop phase
5e distribute the initial runs by copying hal% o% them to g+ the other hal% to h. $n the subse0uent merge phase each
run o% g is merged 5ith e)actly one run o% h+ and the resulting ne5 run o% double length is 5ritten onto % H ")hibit
17.11I. A%ter the %irst cycle+ consisting o% a copy phase %ollo5ed by a merge phase+ % contains hal% as many runs as it
did be%ore. A%ter log2Hn QmI cycles % contains one single run+ 5hich is the sorted se0uence o% all elements.
171
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 17.11& "ach copy#merge cycle halves the number o% runs and doubles their lengths.
")ercise& a merge sort in main memory
Consider the %ollo5ing procedure that sorts the array A&
const n . C &
var O- arrayF" .. nG of integer&
C
procedure sort *8, 5- " .. n+&
var m- " .. n&
procedure comine&
var ;- array F" .. nG of integer&
i, /, k- " .. n&
egin { combine }
i -. 8& / -. m 9 "&
for k -. 8 to 5 do
if *i K m+ cor **/ J 5+ cand *OF/G S OFiG++ then
{ ;FkG -. OF/G& / -. / 9 " }
else
{ ;FkG -. OFiG& i -. i 9 " } &
for k -. 8 to 5 do OFkG -. ;FkG
end& { combine }
egin { sort}
if 8 S 5 then
{ m -. *8 9 5+ div $& sort*8, m+& sort*m 9 ", 5+& comine }
end& { sort }
The relational operators ;cand; and ;cor; are conditionalM The procedure is initially called by
sort*",n+&
HaI (ra5 a picture to sho5 ho5 ;sort; 5orks on an array o% eight elements.
HbI :rite do5n a recurrence relation to describe the 5ork done in sorting n elements.
HcI (etermine the asymptotic time comple)ity by solving this recurrence relation.
HdI Assume that ;sort; is called %or m subarrays o% e0ual si6e+ not 3ust %or t5o. Ao5 does the asymptotic time
comple)ity change?
.olution
HaI ;sort; depends on the algorithmic principle o% divide and con0uer. A%ter dividing an array into a le%t and a
right subarray 5hose numbers o% elements di%%er by at most one+ ;sort; calls itsel% recursively on these t5o
subarrays. A%ter these t5o calls are %inished+ the procedure ;combine; merges the t5o sorted subarrays
ANL .. mO and ANm ] 1 .. 8O together in B. /inally+ B is copied to A. An e)ample is sho5n in ")hibit 17.12.
Algorithms and Data Structures 172 A ,lobal Te)t
1'. $orting and its comple&it%
")hibit 17.12& .orting an array by using a divide#and#con0uer scheme.
HbI The 5ork 5HnI per%ormed 5hile sorting n elements satis%ies
The %irst term describes the cost o% the t5o recursive calls o% ;sort;+ the term a K n is the cost o% merging the
t5o sorted subarrays+ and the constant b is the cost o% calling ;sort; %or the array.
HcI $%
is substituted in HgI+ 5e obtain
Continuing this substitution process results in
173
This book is licensed under a Creative Commons Attribution 3.0 License
since 5H1I is constant the time comple)ity o% ;sort; is Hn K log nI.
HdI $% ;sort; is called recursively %or m subarrays o% e0ual si6e+ the cost 5;HnI is
solving this recursive e0uation sho5s that the time comple)ity does not change Ni.e. it is Hn K log nIO.
"s it possible to sort in linear time3
The lo5er bound Hn K log nI has been derived %or sorting algorithms that gather in%ormation about the ordering
o% the elements by binary 0uestions and nothing else. This lo5er bound need not apply in other situations.
")ample 1& sorting a permutation o% the integers %rom 1 to n
$% 5e kno5 that the elements to be sorted are a permutation o% the integers 1 .. n+ it is possible to sort in time
HnI by storing element i in the array element 5ith inde) i.
")ample 2& sorting elements %rom a %inite domain
Assume that the elements to be sorted are samples %rom a %inite domain : V 1 .. 5. Then it is possible to sort in
time HnI i% gaps bet5een the elements are allo5ed H")hibit 17.13I. The gaps can be closed in time H5I.
")hibit 17.13& .orting elements %rom a %inite domain in linear time.
(o these e)amples contradict the lo5er bound Hn K log nI? !o+ because in these e)amples the in%ormation
about the ordering o% elements is obtained by asking 0uestions more po5er%ul than binary 0uestions& namely+ n#
valued 0uestions in ")ample 1 and 5#valued 0uestions in ")ample 2.
A k#valued 0uestion is e0uivalent to log2k binary 0uestions. :hen this Ee)change rateE is taken into
consideration+ the theoretical time comple)ities o% the t5o sorting techni0ues above are Hn K log nI and Hn K log
5I+ respectively+ thus con%orming to the lo5er bound in the section EA lo5er bound Hn K log nIE.
.orting algorithms that sort in linear time He)pected linear time+ but not in the 5orst caseI are described in the
literature under the terms bucket sort+ distribution sort+ and radi* sort.
Sorting net*or(s
The sorting algorithms above are designed to run on a se0uential machine in 5hich all operations+ such as
comparisons and e)changes+ are per%ormed one at a time 5ith a single processor. $% algorithms are to be e%%icient+
they need to be rethought 5hen the ground rules %or their e)ecution change& 5hen the theoretician uses another
model o% computation+ or 5hen they are e)ecuted on a computer 5ith a di%%erent architecture. This is particularly
true o% the many di%%erent types o% multiprocessor architectures that have been built or conceived. :hen many
processors are available to share the 5orkload+ 0uestions o% ho5 to distribute the 5ork among them+ ho5 to
synchroni6e their operation+ and ho5 to transport data+ prevail. $t is not our intention to discuss sorting on general#
purpose parallel machines. :e 5ish to illustrate the point that algorithms must be redesigned 5hen the model o%
Algorithms and Data Structures 17< A ,lobal Te)t
1'. $orting and its comple&it%
computation changes. /or this purpose a discussion o% special#purpose sorting net5orks su%%ices. The EprocessorsE
in a sorting net5ork are merely comparators& Their only %unction is to compare the values on t5o input 5ires and
s5itch them onto t5o output 5ires such that the smaller is on top+ the larger at the bottom H")hibit 17.1<I.
")hibit 17.1<& Building block o% sorting net5orks.
Comparators are arranged into a net5ork in 5hich n 5ires enter at the le%t and n 5ires e)it at the right+ as
")hibit 17.1@ sho5s+ 5here each vertical connection 3oining a pair o% 5ires represents a comparator. The illustration
also sho5s 5hat happens to %our input elements+ chosen to be <+ 1+ 3+ 2 in this e)ample+ as they travel %rom le%t to
right through the net5ork.
")hibit 17.1@& A comparator net5ork that %ails to sort. The output o% each
comparator per%orming an e)change is sho5n in the ovals.
A net5ork o% comparators is a sorting network i% it sorts every input con%iguration. :e consider an input
con%iguration to consist o% distinct elements+ so that 5ithout loss o% generality 5e may regard it as one o% the nM
permutations o% the se0uence H1+ 2+ [ + nI. A net5ork that sorts a duplicate#%ree con%iguration 5ill also sort a
con%iguration containing duplicates.
The comparator net5ork above correctly sorts many o% its <M V 2< input con%igurations+ but it %ails on the
se0uence H<+ 1+ 3+ 2I. Aence it is not a sorting net5ork. $t is evident that a net5ork 5ith a su%%icient number o%
comparators in the right places 5ill sort correctly+ but as the e)ample above sho5s+ it is not immediately evident
5hat number su%%ices or ho5 the comparators should be placed. The net5ork in ")hibit 17.1> sho5s that %ive
comparators+ arranged 3udiciously+ su%%ice to sort %our elements.
")hibit 17.1>& /ive comparators su%%ice to sort %our elements.
Ao5 can 5e tell i% a given net5ork sorts success%ully? ")haustive testing is %easible %or small net5orks such as
the one above+ 5here 5e can trace the %lo5 o% all <M V 2< input con%igurations. !et5orks 5ith a regular structure
17@
c
1
c
2
c
3
c
4
c
5
This book is licensed under a Creative Commons Attribution 3.0 License
usually admit a simpler correctness proo%. /or this e)ample+ 5e observe that c1+ c2+ and c3 place the smallest element
on the top 5ire. .imilarly+ c1+ c2+ and c< place the largest on the bottom 5ire. This leaves the middle t5o elements on
the middle t5o 5ires+ 5hich c@ then puts into place.
:hat design principles might lead us to create large sorting net5orks guaranteed to be correct? .orting
algorithms designed %or a se0uential machine cannot+ in general+ be mapped directly into net5ork notation+
because the net5ork is a more restricted model o% computation& :hereas most se0uential sorting algorithms make
comparisons based on the outcome o% previous comparisons+ a sorting net5ork makes the same comparisons %or all
input con%igurations. The same %undamental algorithm design principles use%ul 5hen designing se0uential
algorithms also apply to parallel algorithms.
Divide-and-con,uer. 2lace t5o sorting net5orks %or n 5ires ne)t to each other+ and combine them into a sorting
net5ork %or 2 K n 5ires by appending a merge network to merge their outputs. $n se0uential computation merging
is simple because 5e can choose the most use%ul comparison depending on the outcome o% previous comparisons.
The rigid structure o% comparator net5orks makes merging net5orks harder to design.
Incremental algorithm.:e place an n#th 5ire ne)t to a sorting net5ork 5ith n U 1 5ires+ and either precede or
%ollo5 the net5ork by a EladderE o% comparators that tie the e)tra 5ire into the e)isting net5ork+ as sho5n in the
%ollo5ing %igures. This leads to designs that mirror the straight insertion and selection algorithms in the section
E.imple sorting algorithms that 5ork in time Hn
2
I
Insertion sort. :ith the top n U 1 elements sorted+ the element on the bottom 5ire trickles into its correct place.
$nduction yields the e)panded diagram on the right in ")hibit 17.17.
")hibit 17.17& $nsertion sort leads by induction to the sorting net5ork on the right.
Selection sort. The ma)imum element %irst trickles do5n to the bottom+ then the remaining elements are sorted.
The e)panded diagram is on the right in ")hibit 17.1D.
")hibit 17.1D& .election sort leads by induction to the sorting net5ork on the right.
Comparators can be shi%ted along their pair o% 5ires so as to reduce the number o% stages+ provided that the
topology o% the net5ork remains unchanged. This compression reduces both insertion and selection sort to the
triangular net5ork sho5n in ")hibit 17.19. Thus 5e see that the distinction bet5een insertion and selection 5as
more a distinction o% se0uential order o% operations rather than one o% data %lo5.
Algorithms and Data Structures 17> A ,lobal Te)t
1'. $orting and its comple&it%
")hibit 17.19& .hi%ting comparators reduces the number o% stages.
Any number o% comparators that are aligned vertically re0uire only a single unit o% time. The compressed
triangular net5ork has CHn
2
I comparators+ but its time comple)ity is 2 K n U 1 CHnI. There are net5orks 5ith
better asymptotic behavior+ but they are rather e)otic N-nu 73bO.
")ercises and programming pro3ects
1. $mplement insertion sort+ selection sort+ merge sort+ and 0uicksort and animate the sorting process %or each
o% these algorithms& %or e)ample+ as sho5n in the snapshots in RAlgorithm animationS. Compare the
number o% comparisons and e)change operations needed by the algorithms %or di%%erent input
con%igurations.
2. :hat is the smallest possible depth o% a lea% in a decision tree %or a sorting algorithm?
3. .ho5 that 2 K n U 1 comparisons are necessary in the 5orst case to merge t5o sorted arrays containing n
elements each.
<. The most obvious method o% systematically interchanging the out#o%#order pairs o% elements in an array
var A& arrayN1 .. nO o% eltL
is to scan ad3acent pairs o% elements %rom bottom to top Himagine that the array is dra5n vertically+ 5ith
AN1O at the top and ANnO at the bottomI repeatedly+ interchanging those %ound out o% order&
for i -. " to n 0 " do
for / -. n downto i 9 " do
if OF/ 0 "G K OF/G then OF/ 0 "G -.- OF/G&
This techni0ue is kno5n as bubble sort$ since smaller elements Ebubble upE to the top.
HaI ")plain by 5ords+ %igures+ and an e)ample ho5 bubble sort 5orks. .ho5 that this algorithm sorts
correctly.
HbI (etermine the e)act number o% comparisons and e)change operations that are per%ormed by bubble
sort in the best+ average+ and 5orst case.
HcI :hat is the 5orst#case time comple)ity o% this algorithm?
@. A sorting algorithm is called stable i% it preserves the original order o% e0ual elements. :hich o% the sorting
algorithms discussed in this chapter is stable?
>. Assume that 0uicksort chooses the threshold m as the %irst element o% the se0uence to be sorted. .ho5 that
the running time o% such a 0uicksort algorithm is Hn
2
I 5hen the input array is sorted in nonincreasing or
nondecreasing order.
7. /ind a 5orst#case input con%iguration %or a 0uicksort algorithm that chooses the threshold m as the median
o% the %irst+ middle+ and last elements o% the se0uence to be sorted.
D. Array A contains m and array B contains n di%%erent integers 5hich are not necessarily ordered&
const m . C & { length of array < }
n . C & { length of array K }
var O- arrayF" .. mG of integer&
;- arrayF" .. nG of integer&
177
This book is licensed under a Creative Commons Attribution 3.0 License
A duplicate is an integer that is contained in both A and B. Problem: Ao5 many duplicates are there in A
and B?
HaI (etermine the time comple)ity o% the brute#%orce algorithm that compares each integer contained in
one array to all integers in the other array.
HbI :rite a more e%%icient
function duplicates- integer&
Xour solution may rearrange the integers in the arrays.
HcI :hat is the 5orst#case time comple)ity o% your improved algorithm?
Algorithms and Data Structures 17D A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
!art B# Data structures
The tools o% bookkeeping
:hen thinking o% algorithms 5e emphasi6e a dynamic se0uence o% actions& ETake this and do that+ then that+
then [ .E $n human e)perience+ EtakeE is usually a straight%or5ard operation+ 5hereas EdoE means 5ork. $n
programming+ on the other hand+ there are lots o% interesting e)amples 5here EdoE is nothing more comple) than
incrementing a counter or setting a bitL but EtakeE triggers a long+ sophisticated search. :hy do 5e need %ancy data
structures at all? :hy can;t 5e 3ust spread out the data on a desk top? "veryday e)perience does not prepare us to
appreciate the importance o% data structureYit takes programming e)perience to see that algorithms are nothing
5ithout data structures. The algorithms presented so %ar 5ere care%ully chosen to re0uire only the simplest o% data
structures& static arrays. The geometric algorithms o% 2art F$+ on the other hand+ and lots o% other use%ul
algorithms+ depend on sophisticated data structures %or their e%%iciency.
The key insight in understanding data structures is the recognition that an algorithm in e)ecution is+ at all times+
in some state+ chosen %rom a potentially huge state space. The state records such vital in%ormation as 5hat steps
have already been taken 5ith 5hat results+ and 5hat remains to be done. (ata structures are the bookkeepers that
record all this state in%ormation in a tidy manner so that any part can be accessed and updated e%%iciently. The
remarkable %act is that there are a relatively small number o% standard data structures that turn out to be use%ul in
the most varied types o% algorithms and problems+ and constitute essential kno5ledge %or any programmer.
The literature on data structures. :hereas one can present some algorithms 5ithout emphasi6ing data
structures+ as 5e did in 2art $$$+ it appears pointless to discuss data structures 5ithout some o% the typical
algorithms that use themL at the very least+ access and update algorithms %orm a necessary part o% any data
structure. Accordingly+ a ne5 data structure is typically published in the conte)t o% a particular ne5 algorithm. Cnly
later+ as one notices its general applicability+ it may %ind its 5ay into te)tbooks. The data structures that have
become standard today can be %ound in many books+ such as NAA* D3O+ NCL8 90O+ N,B 91O+ NA. D2O+ N-nu 73aO+
N-nu 73bO+ N'eh D<aO+ N'eh D<cO+ N8!( 77O+ N.am 90aO+ N.am 90bO+ NTar D3O+ and N:ir D>O.
Algorithms and Data Structures 179 A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
%<& What is a data structure3
Learning ob3ectives&
data structures %or manual use He.g. edge#notched cardsI
general#purpose data structures
abstract data types speci%y %unctional properties only
data structures include access and maintenance algorithms and their implementation
per%ormance criteria and measures
asymptotics
Data structures old and ne*
The discipline o% data structures+ as a systematic body o% kno5ledge+ is truly a creation o% computer science. The
0uestion o% ho5 best to organi6e data 5as a lot simpler to ans5er in the days be%ore the e)istence o% computers& the
organi6ation had to be simple+ because there 5as no automatic device that could have processed an elaborate data
structure+ and there is no human being 5ith enough patience to do it. Consider t5o e)amples.
1. 'anual %iles and catalogs+ as used in business o%%ices and libraries+ e)hibit several distinct organi6ing
principles+ such as se0uential and hierarchical order and cross#re%erences. /rom today;s point o% vie5+
ho5ever+ manual %iles are not 5ell#de%ined data structures. /or good reasons+ people did not rigorously
de%ine those aspects that 5e consider essential 5hen characteri6ing a data structure& 5hat constraints are
imposed on the data+ both on the structure and its contentL 5hat operations the data structure must
supportL 5hat constraints these operations must satis%y. As a conse0uence+ searching and updating a
manual %ile is not typically a process that can be automated& $t re0uires common sense+ and perhaps even
e)pert training+ as is the case %or a library catalog.
2. $n manual computing H5ith pencil and paper or a nonprogrammable calculatorI the algorithm is the %ocus
o% attention+ not the data structure. 'ost %re0uently+ the person computing 5rites data Hinput+ intermediate
results+ outputI in any convenient place 5ithin his %ield o% vision+ hoping to %ind them again 5hen he needs
them. Cccasionally+ to %acilitate highly repetitive computations Hsuch as income ta) declarationsI+ someone
designs a %orm to prompt the user+ one operation at a time+ to 5rite each data item into a speci%ic %ield. .uch
a %orm speci%ies both an algorithm and a data structure 5ith considerable %ormality. Compared to the
general#purpose data structures 5e study in this chapter+ ho5ever+ such %orms are highly special purpose.
"dge#notched cards are perhaps the most sophisticated data structures ever designed %or manual use. Let us
illustrate them 5ith the e)ample o% a database o% "nglish 5ords organi6ed so as to help in solving cross5ord
pu66les. :e 5rite one 5ord per card and inde) it according to 5hich vo5els it contains and 5hich ones it does not
contain. Across the top ro5 o% the card 5e punch 10 holes labeled A+ "+ $+ C+ *+ kA+ k"+ k$+ kC+ k*. :hen a 5ord+
say ABACA+ e)hibits a given vo5el+ such as A+ 5e cut a notch above the hole %or AL 5hen it does not+ such as "+ 5e
cut a notch above the hole %or k" Hpronounced Enot "EI. ")hibit 1D.1 sho5s the encoding o% the 5ords B"A*T$/*L+
"W"T"8+ C'AAA+ C'",A. /or e)ample+ 5e search %or 5ords that contain at least one "+ but no *+ by sticking
Algorithms and Data Structures 1D0 A ,lobal Te)t
1(. 2hat is a data structure3
t5o needles through the pack o% cards at the holes " and k*. "W"T"8 and C'",A 5ill drop out. $n principle it is
easy to make this sample database more po5er%ul by including additional attributes+ such as EA occurs e)actly
onceE+ EA occurs e)actly t5iceE+ EA occurs as the %irst letter in the 5ordE+ and so on. $n practice+ a %e5 do6en
attributes and thousands o% cards 5ill stretch this mechanical implementation o% a multikey data structure to its
limits o% %easibility.
")hibit 1D.1& "ncoding o% di%%erent 5ords in edge#notched cards.
$n contrast to data structures suitable %or manual processing+ those developed %or automatic data processing can
be comple). Comple)ity is not a goal in itsel%+ o% course+ but it may be an unavoidable conse0uence o% the search %or
e%%iciency. "%%iciency+ as measured by processing time and memory space re0uired+ is the primary concern o% the
discipline o% data structures. Cther criteria+ such as simplicity o% the code+ play a role+ but the %irst 0uestion to be
asked 5hen evaluating a data structure that supports a speci%ied set o% operations is typically& Ao5 much time and
space does it re0uire?
$n contrast to the typical situation o% manual computing Hconsideration o% the algorithm comes %irst+ data gets
organi6ed only as neededI+ programmed computing typically proceeds in the opposite direction& /irst 5e de%ine the
organi6ation o% the data rigorously+ and %rom this the structure o% the algorithm %ollo5s. Thus algorithm design is
o%ten driven by data structure design.
The range of data structures studied
:e present generally use%ul data structures along 5ith the corresponding 0uery+ update+ and maintenance
algorithmsL and 5e develop concepts and techni0ues designed to organi6e a vast body o% kno5ledge into a coherent
5hole. Let us elaborate on both o% these goals.
E,enerally use%ulE re%ers to data structures that occur naturally in many applications. They are relatively simple
%rom the point o% vie5 o% the operations they supportYtables and 0ueues o% various types are typical e)amples.
These basic data structures are the building blocks %rom 5hich an applications programmer may construct more
elaborate structures tailored to her particular application. Although our collection o% speci%ic data structures is
rather small+ it covers the great ma3ority o% techni0ues an applications programmer is likely to need.
:e develop a uni%ied scheme %or understanding many data structures as special cases o% general concepts. This
includes&
1D1
This book is licensed under a Creative Commons Attribution 3.0 License
The separation o% abstract data types+ 5hich speci%y only %unctional properties+ %rom data structures+ 5hich
also involve aspects o% implementation
The classi%ication o% all data structures into three ma3or types& implicit data structures+ lists+ and address
computation
A rough assessment o% the per%ormance o% data structures based on the asymptotic analysis o% time and
memory re0uirements
The simplest and most common assumption about the elements to be stored in a data structure is that they
belong to a domain on 5hich a total order ` is de%ined. E*amples: integers ordered by magnitude+ a character set
5ith its alphabetic order+ character strings o% bounded length ordered le)icographically. :e assume that each
element in a domain re0uires as much storage as any other element in that domainL in other 5ords+ that a data
structure manages memory %ragments o% %i)ed si6e. (ata ob3ects o% greatly variable si6e or length+ such as %ragments
o% te)t+ are typically not considered to be EelementsEL instead+ they are broken into constituent pieces o% %i)ed si6e+
each o% 5hich becomes an element o% the data structure.
The elements stored in a data structure are o%ten processed according to the order ` de%ined on their domain.
The topic o% sorting+ 5hich 5e surveyed in R.orting and its comple)ityS+ is closely related to the study o% data
structures& $ndeed+ several sorting algorithms appear E%or %reeE in RList structuresS+ because every structure that
implements the abstract data type dictionar leads to a sorting algorithm by successive insertion o% elements+
%ollo5ed by a traversal.
!erformance criteria and measures
The design o% data structures is dominated by considerations o% e%%iciency+ speci%ically 5ith respect to time and
memory. But e%%iciency is a multi%aceted 0uality not easily de%ined and measured. As a scienti%ic discipline+ the
study o% data structures is not directly concerned 5ith the number o% microseconds+ machine cycles+ or bytes
re0uired by a speci%ic program processing a given set o% data on a particular system. $t is concerned 5ith general
statements %rom 5hich an e)pert practitioner can predict concrete outcomes %or a speci%ic processing task. Thus+
measuring run times and memory usage is not the typical 5ay to evaluate data structures. :e need concepts and
notations %or e)pressing the per%ormance o% an algorithm independently o% machine speed+ memory si6e+
programming language+ and operating system+ and a host o% other details that vary %rom run to run.
The solution to this problem emerged over the past t5o decades as the discipline o% computational comple)ity
5as developed. $n this theory+ algorithms are Ee)ecutedE on some Emathematical machineE+ care%ully designed to be
as simple as possible to re%lect the bare essentials o% a problem. The machine makes available certain primitive
operations+ and 5e measure EtimeE by counting ho5 many o% those are e)ecuted. /or a given algorithm and all the
data sets it accepts as input+ 5e analy6e the number o% primitive operations e)ecuted as a %unction o% the si6e o% the
data. :e are o%ten interested in the worst case+ that is+ a data set o% given si6e that causes the algorithm to run as
long as possible+ and the average case+ the run time averaged over all data sets o% a given si6e.
Among the many di%%erent mathematical machines that have been de%ined in the theory o% computation+ data
structures are evaluated almost e)clusively 5ith respect to a theoretical random access machine H8A'I. A 8A' is
essentially a memory 5ith as many locations as needed+ each o% 5hich can hold a data element+ such as an integer+
or a real numberL and a processing unit that can read %rom any one or t5o locations+ operate on their content+ and
5rite the result back into a third location+ all in one time unit. This model is rather close to actual se0uential
Algorithms and Data Structures 1D2 A ,lobal Te)t
1(. 2hat is a data structure3
computers+ e)cept that it incorporates no bounds on the memory si6eYeither in terms o% the number o% locations or
the si6e o% the content o% this location. $t implies+ %or e)ample+ that a multiplication o% t5o very large numbers
re0uires no more time than 2 K 3 does. This assumption is unrealistic %or certain problems+ but is an e)cellent one
%or most program runs that %it in central memory and do not re0uire variable#precision arithmetic or variable#
length data elements. The point is that the programmer has to understand the model and its assumptions+ and
bears responsibility %or applying it 3udiciously.
$n this model+ time and memory re0uirements are e)pressed as %unctions o% input data si6e+ and thus comparing
the per%ormance o% t5o data structures is reduced to comparing %unctions. Asmptotics has proven to be 3ust the
right tool %or this comparison& sharp enough to distinguish di%%erent gro5th rates+ blunt enough to ignore constant
%actors that di%%er %rom machine to machine.
As an e)ample o% the concise descriptions made possible by asymptotic operation counts+ the %ollo5ing table
evaluates several implementations %or the abstract data type ;dictionary;. The %our operations ;%ind;+ ;insert;+ ;delete;+
and ;ne)t; H5ith respect to the order `I e)hibit di%%erent asymptotic time re0uirements %or the di%%erent
implementations. The student should be able to e)plain and derive this table a%ter studying this part o% the book.
4rdered array 8inear list ;alanced tree Eash tale
find 4*log n+ 4*n+ 4*log n+ 4*"+
a
next 4*"+ 4*"+ 4*log n+ 4*n+
insert 4*n+ 4*n+ 4*log n+ 4*"+
a
delete 4*n+ 4*n+ 4*log n+ 4*"+
a
Cn the average+ but not necessarily in the 5orst case
b
(eletions are possible but may degrade per%ormance
")ercise
1. (escribe the manual data structures that have been developed to organi6e libraries He.g. catalogs that allo5
users to get access to the literature in their %ield o% interest+ or circulation records+ 5hich keep track o% 5ho
has borro5ed 5hat bookI. ,ive e)amples o% 0ueries that can be ans5ered by these data structures.
1D3
This book is licensed under a Creative Commons Attribution 3.0 License
%>& Abstract data types
Learning ob3ectives&
data abstraction
abstract data types as a tool to describe the %unctional behavior o% data structures
e)amples o% abstract data types& stack+ %i%o 0ueue+ priority 0ueue+ dictionary+ string
oncepts# What and *hy3
A data structure organi6es the data to be processed in such a 5ay that the relations among the data elements are
re%lected and the operations to be per%ormed on the data are supported. =ow these goals can be achieved e%%iciently
is the central issue in data structures and a ma3or concern o% this book. $n this chapter+ ho5ever+ 5e ask not ho5
but what. $n particular+ 5e ask& 5hat is the e)act %unctional behavior a data structure must e)hibit to be called a
stack+ a 0ueue+ or a dictionary or table?
There are several reasons %or seeking a %ormal %unctional speci%ication %or common data structures. The primary
motivation is increased generality through abstractionL speci%ically+ to separate inputQoutput behavior %rom
implementation+ so that the implementation can be changed 5ithout a%%ecting any program that uses a particular
data type. This goal led to the earlier introduction o% the concept o% tpe in programming languages& the type real is
implemented di%%erently on di%%erent machines+ but usually a program using reals does not re0uire modi%ication
5hen run on another machine. A secondary motivation is the ability to prove general theorems about all data
structures that e)hibit certain properties+ thus avoiding the need to veri%y the theorem in each instance. This goal is
akin to the one that sparked the development o% algebra& %rom the a)ioms that de%ine a %ield+ 5e prove theorems
that hold e0ually true %or real or comple) numbers as 5ell as 0uaternions.
The primary motivation can be %urther e)plained by calling on an analogy bet5een data and programs. All
programming languages support the concept o% procedural abstraction& operations or algorithms are isolated in
procedures+ thus making it easy to replace or change them 5ithout a%%ecting other parts o% the program. Cther
program parts do not kno5 ho5 a certain operation is reali6edL they kno5 only ho5 to call the corresponding
procedure and 5hat e%%ect the procedure call 5ill have. 'odern programming languages increasingly support the
analogous concept o% data abstraction or data encapsulation& the organi6ation o% data is encapsulated He.g. in a
module or a packageI so that it is possible to change the data structure 5ithout having to change the 5hole
program.
The secondary motivation %or %ormal speci%ication o% data types remains an unreali6ed goal& although abstract
data types are an active topic %or theoretical research+ it is di%%icult today to make the case that any theorem o% use
to programmers has been proved.
An abstract data tpe consists o% a domain %rom 5hich the data elements are dra5n+ and a set o% operations.
The speci%ication o% an abstract data type must identi%y the domain and de%ine each o% the operations. $denti%ying
and describing the domain is generally straight%or5ard. The de%inition o% each operation consists o% a syntactic and
a semantic part. The sntactic part+ 5hich corresponds to a procedure heading+ speci%ies the operation;s name and
Algorithms and Data Structures 1D< A ,lobal Te)t
1,. Abstract data t%pes
the type o% each operand. :e present the synta) o% operations in mathematical %unction notation+ speci%ying its
domain and range. The semantic part attaches a meaning to each operation& 5hat values it produces or 5hat e%%ect
it has on its environment. :e speci%y the semantics o% abstract data types algebraically by a)ioms %rom 5hich other
properties may be deduced. This %ormal approach has the advantage that the operations are de%ined rigorously %or
any domain 5ith the re0uired properties. A %ormal description+ ho5ever+ does not al5ays appeal to intuition+ and
o%ten %orces us to speci%y details that 5e might pre%er to ignore. :hen every detail matters+ on the other hand+ a
%ormal speci%ication is superior to a precise speci%ication in natural languageL the latter tends to become
cumbersome and di%%icult to understand+ as it o%ten takes many 5ords to avoid ambiguity.
$n this chapter 5e consider the abstract data types& stack+ %irst#in#%irst#out 0ueue+ priority 0ueue+ and dictionary.
/or each o% these data types+ there is an ideal+ unbounded version+ and several versions that re%lect the realities o%
%inite machines. /rom a theoretical point o% vie5 5e only need the ideal data types+ but %rom a practical point o%
vie5+ that doesn;t tell the 5hole story& in order to capture the di%%erent properties a programmer intuitively
associates 5ith the vague concept EstackE+ %or e)ample+ 5e are %orced into speci%ying di%%erent types o% stacks. $n
addition to the ideal unbounded stack+ 5e speci%y a fi*ed-length stack 5hich mirrors the behavior o% an array
implementation+ and a variable-length stack 5hich mirrors the behavior o% a list implementation. .imilar
distinctions apply to the other data types+ but 5e only speci%y their unbounded versions.
Let W denote the domain %rom 5hich the data elements are dra5n. .tacks and %i%o 0ueues make no assumptions
about WL priority 0ueues and dictionaries re0uire that a total order ` be de%ined on W. Let W
be the set o%
1D@
This book is licensed under a Creative Commons Attribution 3.0 License
possible states o% a stack+ let s V )1
)2
[ )k . be an arbitrary stack state 5ith k elements+ and let denote the empty
state o% the stack+ corresponding to the null string W
g
. Let ;cat; denote string concatenation. (e%ine the %unctions
create- ] S
empty- S ] =true, false>
push- S B ] S
top- S 0 =^> ] B
pop- S 0 =^> ] S
as %ollo5s&
s S,x, y B-
create . ^
empty*^+ . true
s T ^ empty*s+ . false
push*s, y+ . s cat y . x
"
x
$
C x
k
y
s T ^ top*s+ . x
k
s T pop*s+ . x
"
x
$
C x
k0"
This de%inition re%ers e)plicitly to the contents o% the stack. $% 5e pre%er to hide the contents and re%er only to
operations and their results+ 5e are led to another style o% %ormal de%inition o% abstract data types that e)presses the
semantics o% the operations by relating them to each other rather than to the e)plicitly listed contents o% a data
structure. This is the commonly used approach to de%ine abstract data types+ and 5e %ollo5 it %or the rest o% this
chapter.
Let . be a set and s0
. a distinguished state. s0 denotes the empty stack+ and . is the set o% stack states that can
be obtained %rom the empty stack by per%orming %inite se0uences o% ;push; and ;pop; operations. The %ollo5ing
%unctions represent stack operations&
create- ] S
empty- S ] =true, false>
push- S B ] S
top- S 0 =s
0
> ] B
pop- S 0 =s
0
> ] S
The semantics of the stack operations is specified y the following
axioms-
s S, x B-
*"+ create . s
0
*$+ empty*s
0
+ . true
*%+ empty*push*s, x++ . false
*#+ top*push*s, x++ . x
*(+ pop*push*s, x++ . s
These axioms can e descried in natural language as follows-
*"+ ,create, produces a stack in the distinguished state.
*$+ The distinguished state is empty.
*%+ O stack is not empty after an element has een inserted.
*#+ The element most recently inserted is on top of the stack.
*(+ ,pop, is the inverse of ,push,.
!otice that ;create; plays a di%%erent role %rom the other stack operations& it is merely a mechanism %or causing a
stack to come into e)istence+ and could have been omitted by postulating the e)istence o% a stack in st ate s0. $n any
implementation+ ho5ever+ there is al5ays some code that corresponds to ;create;. (echnical note: 5e could identi%y
Algorithms and Data Structures 1D> A ,lobal Te)t
1,. Abstract data t%pes
;create; 5ith s0+ but 5e choose to make a distinction bet5een the act o% creating a ne5 empty stack and the empty
state that results %rom this creationL the latter may recur during normal operation o% the stack.
8educed se0uences
Any s . is obtained %rom the empty stack s0 by per%orming a %inite se0uence o% ;push; and ;pop; operations. By
a)iom H@I this se0uence can be reduced to a se0uence that trans%orms s0 into s and consists o% ;push; operations
only.
")ample
s . pop*push*pop*push*push*s
0
, x+, y++, L++
. pop*push*push*s
0
, x+, L++
. push*s
0
, x+
An implementation o% a stack may provide the %ollo5ing procedures&
procedure create*var s- stack+&
function empty*s- stack+- oolean&
procedure push*var s- stack& x- elt+&
function top*s- stack+- elt&
procedure pop*var s- stack+&
Any program that uses this data type is restricted to calling these %ive procedures %or creating and
operating on stacksL it is not allo5ed to use in%ormation about the underlying implementation. The
procedures may only be called 5ithin the constraints o% the speci%icationL %or e)ample+ ;top; and
;pop; may be called only i% the stack is not empty&
if not empty*s+ then pop*s+&
The speci%ication above assumes that a stack can gro5 5ithout a boundL it de%ines an abstract data type called
unbounded stack' Ao5ever+ any implementation imposes some bound on the si6e HdepthI o% a stack& the si6e o% the
underlying array in an array impled re%lect such limitations. The %ollo5ing fi*ed-length stack describes an
implementation as an array o% %i)ed si6e m+ 5hich limits the ma)imal stack depth.
/i)ed#length stack
create- S
empty- S =true, false>
full- S =true, false>
push- =s S- not full*s+> B S
top- S 0 =s
0
> B
pop- S 0 =s
0
> S
To speci%y the behavior o% the %unction ;%ull; 5e need an internal %unction
depth& . a0+ 1+ 2+ [ + mb
that measures the stack depth+ that is+ the number o% elements currently in the stack. The %unction ;depth; interacts
5ith the other %unctions in the %ollo5ing a)ioms+ 5hich speci%y the stack semantics&
s S, x B-
create . s
0
empty*s+ . true
not full*s+ empty*push*s, x++ . false
depth*s
0
+ . 0
1D7
This book is licensed under a Creative Commons Attribution 3.0 License
not empty*s+ depth*pop*s++ . depth*s+ 0 "
not full*s+ depth*push*s, x++ . depth*s+ 9 "
full*s+ . *depth*s+ . m+
not full*s+
top*push*s, x++ . x
pop*push*s, x++ . s
Fariable#length stack
A stack implemented as a list may over%lo5 at unpredictable moments depending on the contents o% the entire
memory+ not 3ust o% the stack. :e speci%y this behavior by postulating a %unction ;space#available;. $t has no domain
and thus acts as an oracle that chooses its value independently o% the state o% the stack Hi% 5e gave ;space#available; a
domain+ this 5ould have to be the set o% states o% the entire memoryI.
create- S
empty- S =true, false>
spaceMavailale- =true, false>
push- S B S
top- S 0 =s
0
> B
pop- S 0 =s
0
> S
s S, x B-
create . s
0
empty*s
0
+ . true
spaceMavailale
empty*push*s, x++ . false
top*push*s, x++ . x
pop*push*s, x++ . s
$mplementation
:e have seen that abstract data types cannot capture our intuitive+ vague concept o% a stack in one single model.
The rigor en%orced by the %ormal de%inition makes us a5are that there are di%%erent types o% stacks 5ith di%%erent
behavior H0uite apart %rom the issue o% the domain type W+ 5hich speci%ies 5hat type o% elements are to be storedI.
This clarity is an advantage 5henever 5e attempt to process abstract data types automaticallyL it may be a
disadvantage %or human communication+ because a rigorous de%inition may %orce us to HoverIspeci%y details.
The di%%erent types o% stacks that 5e have introduced are directly related to di%%erent styles o% implementation.
The %i)ed#length stack+ %or e)ample+ describes the %ollo5ing implementation&
const m . C & { maximum length of a stack }
type elt . C &
stack .record
a- arrayF" .. mG of elt&
d- 0 .. m& { current depth of stack }
end&
procedure create*var s- stack+&
egin s.d -. 0 end&
function empty*s- stack+- oolean&
egin return*s.d . 0+ end&
function full*s- stack+- oolean&
egin return*s.d . m+ end&
procedure push*var s- stack& x- elt+& { not to be called if the stack
is full }
egin s.d -. s.d 9 "& s.aFs.dG -. x end&
Algorithms and Data Structures 1DD A ,lobal Te)t
1,. Abstract data t%pes
function top*s- stack+- elt& { not to be called if the stack is
empty }
egin return*s.aFs.dG+ end&
procedure pop*var s- stack+& { not to be called if the stack is
empty }
egin s.d -. s.d 0 " end&
.ince the %unction ;depth; is not e)ported Hi.e. not made available to the user o% this data typeI+ it need not be
provided as a procedure. $nstead+ 5e have implemented it as a variable d 5hich also serves as a stack pointer.
Cur implementation assumes that the user checks that the stack is not %ull be%ore calling ;push;+ and that it is not
empty be%ore calling ;top; or ;pop;. :e could+ o% course+ 5rite the procedures ;push;+ ;top;+ and ;pop; so as to Eprotect
themselvesE against illegal calls on a %ull or an empty stack simply by returning an error message to the calling
program. This re0uires adding a %urther argument to each o% these three procedures and leads to yet other types o%
stacks 5hich are %ormally di%%erent abstract data types %rom the ones 5e have discussed.
8irst)in)first)out 1ueue
The %ollo5ing operations H")hibit 19.2I are de%ined %or the abstract data type fifo ,ueue H%irst#in#%irst#out
0ueueI&
empty 5eturn true if the Aueue is empty.
enAueue !nsert a new element at the tail end of the Aueue.
front 5eturn the front element of the Aueue.
deAueue 5emove the front element.
")hibit 19.2& "lements are inserted at the tail and removed %rom the head o% the %i%o 0ueue.
Let / be the set o% 0ueue states that can be obtained %rom the empty 0ueue by per%orming %inite se0uences o%
;en0ueue; and ;de0ueue; operations. %
0
/ denotes the empty 0ueue. The %ollo5ing %unctions represent %i%o 0ueue
operations&
create- 3
empty- 3 =true, false>
enAueue- 3 B 3
front- 3 0 =f
0
> B
deAueue- 3 0 =f
0
> 3
The semantics of the fifo Aueue operations is specified y the
following axioms-
f 3,x B-
*"+ create . f
0
*$+ empty*f
0
+ . true
*%+ empty*enAueue*f, x++ . false
*#+ front*enAueue*f
0
, x++ . x
*(+ not empty*f+ front*enAueue*f, x++ . front*f+
*'+ deAueue*enAueue*f
0
, x++ . f
0
*)+ not empty*f+ deAueue*enAueue*f, x++ . enAueue*deAueue*f+, x+
1D9
This book is licensed under a Creative Commons Attribution 3.0 License
Any % / is obtained %rom the empty %i%o 0ueue %
0
by per%orming a %inite se0uence o% ;en0ueue; and ;de0ueue;
operations. By a)ioms H>I and H7I this se0uence can be reduced to a se0uence consisting o% ;en0ueue; operations
only 5hich also trans%orms %
0
into %.
")ample
f . deAueue*enAueue*deAueue*enAueue*enAueue*f
0
, x+, y++, L++
. deAueue*enAueue*enAueue*deAueue*enAueue*f
0
, x++, y+, L++
. deAueue*enAueue*enAueue*f
0
, y+, L++
. enAueue*deAueue*enAueue*f
0
, y++, L+
. enAueue*f
0
, L+
An implementation o% a %i%o 0ueue may provide the %ollo5ing procedures&
procedure create*var f- fifoAueue+&
function empty*f- fifoAueue+- oolean&
procedure enAueue*var f- fifoAueue& x- elt+&
function front*f- fifoAueue+- elt&
procedure deAueue*var f- fifoAueue+&
!riority 1ueue
A priority 0ueue orders the elements according to their value rather than their arrival time. Thus 5e assume that
a total order ` is de%ined on the domain W. $n the %ollo5ing e)amples+ W is the set o% integersL a small integer means
high priority. The %ollo5ing operations H")hibit 19.3I are de%ined %or the abstract data type priorit ,ueue&
M empty 5eturn true if the Aueue is empty.
M insert !nsert a new element into the Aueue.
M min 5eturn the element of highest priority contained in the Aueue.
M delete 5emove the element of highest priority from the Aueue.
")hibit 19.3& An element;s priority determines its position in a priority 0ueue.
Let 2 be the set o% priority 0ueue states that can be obtained %rom the empty 0ueue by per%orming %inite
se0uences o% ;insert; and ;delete; operations. The empty priority 0ueue is denoted by p
0
2. The %ollo5ing %unctions
represent priority 0ueue operations&
create- 6
empty- 6 =true, false>
insert- 6 B 6
min- 6 0 =p
0
> B
delete- 6 0 =p
0
> 6
The semantics o% the priority 0ueue operations is speci%ied by the %ollo5ing a)ioms. /or )+ y W+ the %unction
'$!H)+ yI returns the smaller o% the t5o values.
Algorithms and Data Structures 190 A ,lobal Te)t
1,. Abstract data t%pes
p 6,x B-
*"+ create . p
0
*$+ empty*p
0
+ . true
*%+ empty*insert*p, x++ . false
*#+ min*insert*p
0
, x++ . x
*(+ not empty*p+ min*insert*p, x++ . M!D*x, min*p++
*'+ delete*insert*p
0
, x++ . p
0
*)+ not empty*p+
delete *insert*p,x++.=
pifxminp
insertdeletep,xelse
Any p 2 is obtained %rom the empty 0ueue p0 by a %inite se0uence o% ;insert; and ;delete; operations. By a)ioms
H>I and H7I any such se0uence can be reduced to a shorter one that also trans%orms p0 into p and consists o% ;insert;
operations only.
")ample
Ossume that x S L, y S L.
p . delete*insert*delete*insert*insert*p
0
, x+, L++, y++
. delete*insert*insert*delete*insert*p
0
, x++, L+, y++
. delete*insert*insert*p
0
, L+, y++
. insert*p
0
, L+
An implementation o% a priority 0ueue may provide the %ollo5ing procedures&
procedure create*var p- priorityAueue+&
function empty*p- priorityAueue+- oolean&
procedure insert*var p- priorityAueue& x- elt+&
function min*p- priorityAueue+- elt&
procedure delete*var p- priorityAueue+&
Dictionary
:hereas stacks and %i%o 0ueues are designed to retrieve and process elements depending on their order o%
arrival+ a dictionary Hor tableI is designed to process elements e)clusively by their value HnameI. A priority 0ueue is
a hybrid& insertion is done according to value+ as in a dictionary+ and deletion according to position+ as in a %i%o
0ueue.
The simplest type of dictionary supports the following operations-
M memer 5eturn true if a given element is contained in the
dictionary.
M insert !nsert a new element into the dictionary.
M delete 5emove a given element from the dictionary.
Let ( be the set o% dictionary states that can be obtained %rom the empty dictionary by per%orming %inite
se0uences o% ;insert; and ;delete; operations. d0
( denotes the empty dictionary. Then the operations can be
represented by %unctions as %ollo5s&
create- @
insert- @ B @
memer- @ B =true, false>
delete- @ B @
The semantics of the dictionary operations is specified y the
following axioms-
191
This book is licensed under a Creative Commons Attribution 3.0 License
d @,x, y B-
*"+ create . d
0
*$+ memer*d
0
, x+ . false
*%+ memer*insert*d, x+, x+ . true
*#+ x T y memer*insert*d, y+, x+ . memer*d, x+
*(+ delete*d
0
, x+ . d
0
*'+ delete*insert*d, x+, x+ . delete*d, x+
*)+ x T y delete*insert*d, x+, y+ . insert*delete*d, y+, x+
Any d ( is obtained %rom the empty dictionary d0 by a %inite se0uence o% ;insert; and ;delete; operations. By
a)ioms H>I and H7I any such se0uence can be reduced to a shorter one that also trans%orms d0 into d and consists o%
;insert; operations only.
")ample
d . delete*insert*insert*insert*d
0
, x+, y+, L+, y+
. insert*delete*insert*insert*d
0
, x+, y+, y+, L+
. insert*delete*insert*d
0
, x+, y+, L+
. insert*insert*delete*d
0
, y+, x+, L+
. insert*insert*d
0
, x+, L+
This speci%ication allo5s duplicates to be inserted. Ao5ever+ a)iom H>I guarantees that all duplicates are
removed i% a delete operation is per%ormed. To prevent duplicates+ the %ollo5ing a)iom is added to the speci%ication
above&
*2+ memer*d, x+ insert*d, x+ . d
!n this case axiom *'+ can e weakened to
*',+ not memer*d, x+ delete*insert*d, x+, x+ . d
An implementation o% a dictionary may provide the %ollo5ing procedures&
procedure create*var d- dictionary+&
function memer*d- dictionary& x- elt+- oolean&
procedure insert*var d- dictionary& x- elt+&
procedure delete*var d- dictionary& x- elt+&
$n actual programming practice+ a dictionary usually supports the additional operations ;%ind;+ ;predecessor;+ and
;successor;. ;%ind; is similar to ;member; but in addition to a trueQ%alse ans5er+ provides a pointer to the element
%ound. Both ;predecessor; and ;successor; take a pointer to an element e as an argument+ and return a pointer to the
element in the dictionary that immediately precedes or %ollo5s e+ according to the order `. 8epeated call o%
;successor; thus processes the dictionary in se0uential order.
")ercise& e)tending the abstract data type ;dictionary;
:e have de%ined a dictionary as supporting the three operations ;member;+ ;insert; and ;delete;. But a dictionary+
or table+ usually supports additional operations based on a total ordering ` de%ined on its domain W. Let us add t5o
operations that take an argument ) W and deliver its t5o neighboring elements in the table&
succ*x+5eturn the successor of x in the tale.
pred*x+5eturn the predecessor of x in the tale.
Algorithms and Data Structures 192 A ,lobal Te)t
1,. Abstract data t%pes
The successor o% ) is de%ined as the smallest o% all the elements in the table 5hich are larger than )+ or as ]_ i%
none e)ists. The predecessor is de%ined symmetrically& the largest o% all the elements in the table that are smaller
than )+ or U_. 2resent a %ormal speci%ication to describe the behavior o% the table.
.olution
Let T be the set o% states o% the table+ and t0 a special state that denotes the empty table. The %unctions and
a)ioms are as %ollo5s&
memer- T B =true,false>
insert- T B T
delete- T B T
succ- T B B =9R>
pred- T B B =0R>
t T,x, y B-
memer*t
0
, x+ . false
memer*insert*t, x+, x+ . true
x T y memer*insert*t, y+, x+ . memer*t, x+
delete*t
0
, x+ . t
0
delete*insert*t, x+, x+ . delete*t, x+
x T y delete*insert*t, x+, y+ . insert*delete*t, y+, x+
0R S x S 9R
pred*t, x+ S x S succ*t, x+
succ*t, x+ T 9R memer*t, succ*t, x++ . true
pred*t, x+ T 0R memer*t, pred*t, x++ . true
x S y, memer*t, y+, y T succ*t, x+ succ*t, x+ S y
x K y, memer*t, y+, y T pred*t, x+ y S pred*t, x+
")ercise& the abstract data type ;string;
:e de%ine the %ollo5ing operations %or the abstract data type string&
M empty 5eturn true if the string is empty.
M append Oppend a new element to the tail of the string.
M head 5eturn the head element of the string.
M tail 5emove the head element of the given string.
M length 5eturn the length of the string.
M find 5eturn the index of the first occurrence of a value within the
string.
Let W V aa+ b+ [ + 6b+ and . be the set o% string states that can be obtained %rom the empty string by per%orming a
%inite number o% ;append; and ;tail; operations. s
0
. denotes the empty string. The operations can be represented
by %unctions as %ollo5s&
empty- S =true, false>
append- S B S
head- S 0 =s
0
> B
tail- S 0 =s
0
> S
length- S =0, ", $, C >
find- S B =0, ", $, C >
")amples&
empty*,ac,+ . false& append*,ac,, ,d,+ . ,acd,& head*,acd,+ .
,a,&
193
This book is licensed under a Creative Commons Attribution 3.0 License
tail*,acd,+ . ,cd,& length*,acd,+ . #& find*,acd,, ,,+ . $.
HaI ,ive the a)ioms that speci%y the semantics o% the abstract data type ;string;.
HbI The %unction hchop& . W . returns the substring o% a string s beginning 5ith the %irst occurrence o% a
given value. .imilarly+ tchop& . W . returns the substring o% s beginning 5ith headHsI and ending 5ith
the last occurrence o% a given value. .peci%y the behavior o% these operations by additional a)ioms.
E*amples:
hchop*,acdac,,,c,+.,cdac,
tchop*,acdac,, ,,+ . ,acda,
HcI The %unction cat& . . . returns the concatenation o% t5o se0uences. .peci%y the behavior o% ;cat; by
additional a)ioms. E*ample:
cat*,acd,, ,efg,+ . ,acdefg,
HdI The %unction reverse& . . returns the given se0uence in reverse order. .peci%y the behavior o% reverse by
additional a)ioms. E*ample:
reverse*,acd,+ . ,dca,
.olution
HaI A)ioms %or the si) ;string; operations&
s S, x, y B-
empty*s
0
+ . true
empty*append*s, x++ . false
head*append*s
0
, x++ . x
not empty*s+ head*s+ . head*append*s, x++
tail*append*s
0
, x++ . s
0
not empty*s+ tail*append*s, x++ . append*tail*s+, x+
length*s
0
+ . 0
length*append*s, x++ . length*s+ 9 "
find*s
0
, x+ . 0
x T y, find*s, x+ . 0 find*append*s, y+, x+ . 0
find*s, x+ . 0 find*append*s, x+, x+ . length*s+ 9 "
find*s, x+ . d K 0 find*append*s, y+, x+ . d
HbI A)ioms %or ;hchop; and ;tchop;&
s S, x, y B-
hchop*s
0
, x+ . s
0
not empty*s+, head*s+ . x hchop*s, x+ . s
not empty*s+, head*s+ T x hchop*s, x+ . hchop*tail*s+, x+
tchop*s
0
, x+ . s
0
tchop*append*s, x+, x+ . append*s, x+
x T y tchop*append*s, y+, x+ . tchop*s, x+
HcI A)ioms %or ;cat;&
s, s, S-
cat*s, s
0
+ . s
not empty*s,+ cat*s, s,+ . cat*append*s, head*s,++, tail*s,++
HdI A)ioms %or ;reverse;&
s S-
Algorithms and Data Structures 19< A ,lobal Te)t
1,. Abstract data t%pes
reverse*s
0
+ . s
0
s T s
0
reverse*s+ . append*reverse*tail*s++, head*s++
")ercises
1. $mplement t5o stacks i on array aN1 .. mO in such a 5ay that neither stack over%lo5s unless the total
number o% elements in both stacks together is m. The operations ;push;+ ;top;+ and ;pop; should run in CH1I
time.
2. A double#ended 0ueue Hde0ueI can gro5 and shrink at both ends+ le%t and right+ using the procedures
;en0ueue#le%t;+ ;de0ueue#le%t;+ ;en0ueue#right;+ and ;de0ueue#right;. 2resent a %ormal speci%ication to
describe the behavior o% the abstract data type de0ue.
3. ")tend the abstract data type priority 0ueue by the operation ne)tH)I+ 5hich returns the element in the
priority 0ueue having the ne)t lo5er priority than ).
19@
This book is licensed under a Creative Commons Attribution 3.0 License
-?& "mplicit data structures
Learning ob3ectives&
implicit data structures describe relationships among data elements implicitly by %ormulas and declarations
array storage
band matrices
sparse matrices
Bu%%ers eliminate temporary speed di%%erences among interacting producer and consumer processes.
%i%o 0ueue implemented as a circular bu%%er
priority 0ueue implemented as a heap
heapsort
What is an implicit data structure3
An important aspect o% the art o% data structure design is the e%%icient representation o% the structural
relationships among the data elements to be stored. (ata is usually modeled as a graph+ 5ith nodes corresponding
to data elements and links Hdirected arcs+ or bidirectional edgesI corresponding to relationships. 8elationships
o%ten serve a double purpose. 2rimarily+ they de%ine the semantics o% the data and thus allo5 programs to interpret
the data correctly. This aspect o% relationships is highlighted in the database %ield& %or e)ample+ in the entity#
relationship model. .econdarily+ relationships provide a means o% accessing data+ by starting at some element and
%ollo5ing an access path that leads to other elements o% interest. $n studying data structures 5e are mainly
concerned 5ith the use o% relationships %or access to data.
:hen the structure o% the data is irregular+ or 5hen the structure is highly dynamic He)tensively modi%ied at run
timeI+ there is no practical alternative to representing the relationships e)plicitly. This is the domain o% list
structures+ presented in the chapter on RList structuresS. :hen the structure o% the data is static and obeys a regular
pattern+ on the other hand+ there are alternatives that compress the structural in%ormation. :e can o%ten replace
many e)plicit links by a %e5 %ormulas that tell us 5here to %ind the EneighboringE elements. :hen this approach
5orks+ it saves memory space and o%ten leads to %aster programs.
:e use the term implicit to denote data structures in 5hich the relationships among data elements are given
implicitly by %ormulas and declarations in the programL no additional space is needed %or these relationships in the
data storage. The best kno5n e)ample is the array. $% one looks at the area in 5hich an array is stored+ it is
impossible to derive+ %rom its contents+ any relationships among the elements 5ithout the in%ormation that the
elements belong to an array o% a given type.
(ata structures al5ays go hand in hand 5ith the corresponding procedures %or accessing and operating on the
data. This is particularly true %or implicit data structures& They simply do not e)ist independent o% their accessing
procedures. .eparated %rom its code+ an implicit data structure represents at best an unordered set o% data. :ith the
right code+ it e)hibits a rich structure+ as is beauti%ully illustrated by the heap at the end o% this chapter.
Algorithms and Data Structures 19> A ,lobal Te)t
2.. 0mplicit data structures
Array storage
A t5o#dimensional array declared as
var O- arrayF" .. m, " .. nG of elt&
is usually 5ritten in a rectangular shape&
OF", "G OF", $G C OF", nG
OF$, "G OF$, $G C OF$, nG
C C C C
OFm, "G OFm, $G C OFm, nG
But it is stored in a linearly addressed memory+ typically ro5 by ro5 Has sho5n belo5I or column by column Has
in /ortranI in consecutive storage cells+ starting at base address b. $% an element %its into one cell+ 5e have
address
OF", "G
OF", $G 9 "
C C
OF", nG 9 n 0 "
OF$, "G 9 n
OF$, $G 9 n 9 "
C C
OF$, nG 9 $ 1 n 0 "
C C
OFm, nG 9 m 1 n 0 "
$% an element o% type ;elt; occupies c storage cells+ the address tHi+ 3I o% ANi+ 3O is
This linear %ormula generali6es to k#dimensional arrays declared as
var O- arrayF" .. m
"
, " .. m
$
, C , " .. m
k
G of elt&
The address tHi1+ i2+ [ + ikI o% element ANi
1
+ i2+ [ + ikO is
197
This book is licensed under a Creative Commons Attribution 3.0 License
The point is that access to an element ANi+ 3+ [O invokes evaluation o% a HlinearI %ormula tHi+ 3+ [I that tells us
5here to %ind this element. A high#level programming language hides most o% the details o% address computation+
e)cept 5hen 5e 5ish to take advantage o% any special structure our matrices may have. The %ollo5ing types o%
sparse matrices occur %re0uently in numerical linear algebra.
/and matrices. An n n matri) ' is called a band matri* of width 3 C b D ; Hb V 0+ 1+ [I i% 'i+3
V 0 %or all i and
3 5ith \i U 3\ d b. $n other 5ords+ all non6ero elements are located on the main diagonal and in b ad3acent minor
diagonals on both sides o% the main diagonal. $% n is large and b is small+ much space is saved by storing ' in a t5o#
dimensional array A 5ith n K H2 K b ] 1I cells rather than in an array 5ith n
2
cells&
type andm . arrayF" .. n, 0 .. G of elt&
var O- andm&
"ach ro5 ANi+ KO stores the non6ero elements o% the corresponding ro5 o% '+ namely the diagonal element 'i+i+
the b elements to the le%t o% the diagonal
M
i,i0
, M
i,i09"
, C , M
i,i0"
and the b elements to the right o% the diagonal
M
i,i9"
, M
i,i9$
, C , M
i,i9
.
The %irst and the last b ro5s o% A contain empty cells corresponding to the triangles that stick out %rom ' in
")hibit 20.1. The elements o% ' are stored in array A such that ANi+ 3O contains 'i+i]3 H1 ` i ` n+ Ub ` 3 ` bI. A total o%
b K Hb ] 1I cells in the upper le%t and lo5er right o% A remain unused. $t is not 5orth saving an additional b K Hb ] 1I
cells by packing the band matri) ' into an array o% minimal si6e+ as the mapping becomes irregular and the
%ormula %or calculating the indices o% 'i+3 becomes much more complicated.
")hibit 20.1& ")tending the diagonals 5ith dummy elements gives the
band matri) the shape o% a rectangular array.
Algorithms and Data Structures 19D A ,lobal Te)t
2.. 0mplicit data structures
")ercise& band matrices
HaI :rite a procedure addHp+ 0& bandmL var r& bandmIL
5hich adds t5o band matrices stored in p and 0 and stores the result in r.
HbI :rite a procedure bmvHp& bandmL v& [ L var 5& [ IL
5hich multiplies a band matri) stored in p 5ith a vector v o% length n and stores the result in 5.
.olution
*a+ procedure add*p, A- andm& var r- andm+&
var i- " .. n& /- 0 .. &
egin
for i -. " to n do
for / -. 0 to do
rFi, /G -. pFi, /G 9 AFi, /G
end&
*+ type vector . arrayF" .. nG of real&
procedure mv*p- andm& v- vector& var w- vector+&
var i- " .. n& /- 0 .. &
egin
for i -. " to n do egin
wFiG -. 0.0&
for / -. 0 to do
if *i 9 / I "+ and *i 9 / J n+ then wFiG -. wFiG 9 pFi, /G 1
vFi 9 /G
end
end&
+parse matrices. A matri) is called sparse i% it consists mostly o% 6eros. :e have seen that sparse matrices o%
regular shape can be compressed e%%iciently using address computation. $rregularly shaped sparse matrices+ on the
other hand+ do not yield grace%ully to compression into a smaller array in such a 5ay that access can be based on
address computation. $nstead+ the non6ero elements may be stored in an unstructured set o% records+ 5here each
record contains the pair HHi+ 3I+ ANi+ 3OI consisting o% an inde) tuple Hi+ 3I and the value ANi+ 3O. Any element that is
absent %rom this set is assumed to be 6ero. As the position o% a data element is stored e)plicitly as an inde) pair Hi+
3I+ this representation is not an implicit data structure. As a conse0uence+ access to a random element o% an
irregularly shaped sparse matri) typically re0uires searching %or it+ and thus is likely to be slo5er than the direct
access to an element o% a matri) o% regular shape stored in an implicit data structure.
")ercise& triangular matrices
Let A and B be lo5er#triangular n n#matricesL that is+ all elements above the diagonal are 6ero& Ai+3
V Bi+3
V 0 %or
i e 3.
HaI 2rove that the inverse Hi% it e)istsI and the matri) product o% lo5er#triangular matrices are again
lo5er#triangular.
HbI (evise a scheme %or storing t5o lo5er#triangular matrices A and B in one array C o% minimal si6e.
:rite a 2ascal declaration %or C and dra5 a picture o% its contents.
HcI :rite t5o %unctions
function O*i, /- " .. n+- real&
function ;*i, /- " .. n+- real&
199
This book is licensed under a Creative Commons Attribution 3.0 License
HdI that access C and return the corresponding matri) elements.
HeI :rite a procedure that computes A &V A K B in place& The entries o% A in C are replaced by the entries
o% the product A K B. Xou may use a HsmallI constant number o% additional variables+ independent o%
the si6e o% A and B.
H%I .ame as HdI+ but using A &V A
U1
K B.
.olution
HaI The inverse o% an n u n#matri) e)ists i%% the determinant o% the matri) is non 6ero. Let A be a lo5er#
triangular matri) %or 5hich the inverse matri) B e)ists+ that is+
and
Let 1 ` 3 ` n. Then
and there%ore B is a lo5er#triangular matri).
Let A and B be lo5er#triangular+ C &V A K B&
$% i e 3+ this sum is empty and there%ore Ci+3 V 0 Hi. e. C is lo5er#triangularI.
HbI A and B can be stored in an array C o% si6e n K Hn ] 1I as %ollo5s H")hibit 20.2I&
const n . C &
var <- array F0 .. n, " .. nG of real&
Algorithms and Data Structures 200 A ,lobal Te)t
2.. 0mplicit data structures
")hibit 20.2& A staircase separates t5o triangular matrices
HcI stored in a rectangular array. Hgraphic does not matchI
function O*i, /- " .. n+- real
egin if i S / then return*0.0+ else return*<Fi, /G+ end&
function ;*i, /- " .. n+- real&
egin if i S / then return*0.0+ else return*<Fn 0 i, n 9 " 0
/G+ end&
HdI Because the ne5 elements o% the result matri) C over5rite the old elements o% A+ it is important to compute
them in the right order. .peci%ically+ 5ithin every ro5 i o% C+ elements C
i+3
must be computed %rom le%t to
right+ that is+ in increasing order o% 3.
procedure mult&
var i, /, k- integer& x- real&
egin
for i -. " to n do
for / -. " to i do egin
x -. 0.0&
for k -. / to i do x -. x 9 O*i, k+ 1 ;*k, /+&
<Fi, /G -. x
end
end&
*e+ procedure invertO&
var i, /, k- integer& x- real&
egin
for i -. " to n do egin
for / -. " to i 0 " do egin
x -. 0.0&
for k -. / to i 0 " do x -. x 0 <Fi, kG 1 <Fk, /G&
201
This book is licensed under a Creative Commons Attribution 3.0 License
<Fi, /G -. x P <Fi, iG
end&
<Fi, iG -. ".0 P <Fi, iG
end
end&
procedure Oinvertedmult;&
egin invertO& mult end&
"mplementation of the fi/ed)length fifo 1ueue as a circular buffer
A %i%o 0ueue is needed in situations 5here t5o processes interact in the %ollo5ing 5ay. A process called producer
generates data %or a process called consumer. The processes typically 5ork in bursts& The producer may generate a
lot o% data 5hile the consumer is busy 5ith something elseL thus the data has to be saved temporarily in a bu%%er+
%rom 5hich the consumer takes it as needed. A keyboard driver and an editor are an e)ample o% this producer#
consumer interaction. The keyboard driver trans%ers characters generated by key presses into the bu%%er+ and the
editor reads them %rom the bu%%er and interprets them He.g. as control characters or as te)t to be insertedI. $t is
5orth remembering+ though+ that a bu%%er helps only i% t5o processes 5ork at about the same speed over the long
run. $% the producer is al5ays %aster+ any bu%%er 5ill over%lo5L i% the consumer is al5ays %aster+ no bu%%er is needed. A
bu%%er can e0uali6e only temporar di%%erences in speeds.
:ith some kno5ledge about the statistical behavior o% producer and consumer one can usually compute a bu%%er
si6e that is su%%icient to absorb producer bursts 5ith high probability+ and allocate the bu%%er statically in an array o%
%i)ed si6e. Among statically allocated bu%%ers+ a circular buffer is the natural implementation o% a %i%o 0ueue.
A circular bu%%er is an array B+ considered as a ring in 5hich the %irst cell BN0O is the successor o% the last cell BNm
U 1O+ as sho5n in ")hibit 20.3. The elements are stored in the bu%%er in consecutive cells bet5een the t5o pointers
;in; and ;out;& ;in; points to the empty cell into 5hich the ne)t element is to be insertedL ;out; points to the cell
containing the ne)t element to be removed. A ne5 element is inserted by storing it in BNinO and advancing ;in; to the
ne)t cell. The element in BNoutO is removed by advancing ;out; to the ne)t cell.
Algorithms and Data Structures 202 A ,lobal Te)t
2.. 0mplicit data structures
")hibit 20.3& $nsertions move the pointer ;in;+ deletions the pointer ;out; counterclock5ise around the array.
!otice that the pointers ;in; and ;out; meet both 5hen the bu%%er gets %ull and 5hen it gets empty. Clearly+ 5e
must be able to distinguish a %ull bu%%er %rom an empty one+ so as to avoid insertion into the %ormer and removal
%rom the latter. At %irst sight it appears that the pointers ;in; and ;out; are insu%%icient to determine 5hether a
circular bu%%er is %ull or empty. Thus the %ollo5ing implementation uses an additional variable n+ 5hich counts ho5
many elements are in the bu%%er.
const m . C & { length of buffer }
type addr . 0 .. m 0 "& { index range }
var ;- arrayFaddrG of elt& {storage}
in, out- addr& { access to buffer }
n- 0 .. m& { number of elements currently in buffer }
procedure create&
egin in -. 0& out -. 0& n -. 0 end&
function empty*+- oolean&
egin return*n . 0+ end&
function full*+- oolean&
egin return*n . m+ end&
procedure enAueue*x- elt+&
{ not to be called if the Jueue is full }
egin ;FinG -. x& in -. *in 9 "+ mod m& n -. n 9 " end&
function front*+- elt&
{ not to be called if the Jueue is empty }
egin return*;FoutG+ end&
procedure deAueue&
{ not to be called if the Jueue is empty }
egin out -. *out 9 "+ mod m& n -. n 0 " end&
203
This book is licensed under a Creative Commons Attribution 3.0 License
The producer uses only ;en0ueue; and ;%ull;+ as it deletes no elements %rom the circular bu%%er. The consumer uses
only ;%ront;+ ;de0ueue;+ and ;empty;+ as it inserts no elements.
The state o% the circular bu%%er is described by its contents and the values o% ;in;+ ;out;+ and n. .ince ;in; is changed
only 5ithin ;en0ueue;+ only the producer needs 5rite#access to ;in;. .ince ;out; is changed only by ;de0ueue;+ only the
consumer needs 5rite#access to ;out;. The variable n+ ho5ever+ is changed by both processes and thus is a shared
variable to 5hich both processes have 5rite#access H")hibit 20.< HaII.
")hibit 20.<&
HaI 2roducer and consumer both have 5rite#access to shared variable n.
HbI The producer has readQ5rite#access to ;in; and read#only#access to ;out;+
the consumer has readQ5rite#access to ;out; and read#only#access to ;in;.
$n a concurrent programming environment 5here several processes e)ecute independently+ access to shared
variables must be synchroni6ed. .ynchroni6ation is overhead to be avoided i% possible. The shared variable n
becomes super%luous H")hibit 20.< HbII i% 5e use the time#honored trick o% leaving at least one cell %ree as a sentinel.
This ensures that ;empty; and ;%ull;+ 5hen e)pressed in terms o% ;in; and ;out;+ can be distinguished. .peci%ically+ 5e
de%ine ;empty; as in V out+ and ;%ull; as Hin ] 1I mod m V out. This leads to an elegant and more e%%icient
implementation o% the fi*ed-length fifo ,ueue by a circular bu%%er&
const m . C & { length of buffer }
type addr . 0 .. m 0 "& { index range }
fifoAueue . record
;- arrayFaddrG of elt& { storage }
in, out- addr { access to buffer }
end&
procedure create*var f- fifoAueue+&
egin f.in -. 0& f.out -. 0 end&
function empty*f- fifoAueue+- oolean&
egin return*f.in . f.out+ end&
function full*f- fifoAueue+- oolean&
egin return**f.in 9 "+ mod m . f.out+ end&
procedure enAueue*var f- fifoAueue& x- elt+&
{ not to be called if the Jueue is full }
egin f.;Ff.inG -. x& f.in -. * f.in 9 "+ mod m end&
Algorithms and Data Structures 20< A ,lobal Te)t
2.. 0mplicit data structures
function front*f- fifoAueue+- elt&
{ not to be called if the Jueue is empty }
egin return*f.;Ff.outG+ end&
procedure deAueue*f- fifoAueue+&
{ not to be called if the Jueue is empty }
egin f.out -. *f.out 9 "+ mod m end&
"mplementation of the fi/ed)length priority 1ueue as a heap
A fi*ed-length priorit ,ueue can be reali6ed by a circular bu%%er+ 5ith elements stored in the cells bet5een ;in;
and ;out;+ and ordered according to their priority such that ;out; points to the element 5ith highest priority H")hibit
20.@I. $n this implementation+ the operations ;min; and ;delete; have time comple)ity CH1I+ since ;out; points directly
to the element 5ith the highest priority. But insertion re0uires %inding the correct cell corresponding to the priority
o% the element to be inserted+ and shi%ting other elements in the bu%%er to make space. Binary search could achieve
the %ormer task in time CHlog nI+ but the latter re0uires time CHnI.
")hibit 20.@& $mplementing a %i)ed#length priority 0ueue by a circular bu%%er.
.hi%ting elements to make space %or a ne5 element costs CHnI time.
$mplementing a priority 0ueue as a linear list+ 5ith elements ordered according to their priority+ does not speed
up insertion& /inding the correct position o% insertion still re0uires time CHnI H")hibit 20.>I.
")hibit 20.>& $mplementing a %i)ed#length priority 0ueue by a linear list. /inding the correct
position %or a ne5 element costs CHnI time.
The heap is an elegant and e%%icient data structure %or implementing a priority 0ueue. $t allo5s the operation
;min; to be per%ormed in time CH1I and allo5s both ;insert; and ;delete; to be per%ormed in 5orst#case time CHlog nI.
A heap is a binary tree that&
obeys a structural property
obeys an order property
is embedded in an array in a certain 5ay
Structure: The binary tree is as balanced as possibleL all leaves are at t5o ad3acent levels+ and the nodes at the
bottom level are located as %ar to the le%t as possible H")hibit 20.7I.
20@
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 20.7& A heap has the structure o% an almost complete binary tree.
Order: The element assigned to any node is ` the elements assigned to any children this node may have
H")hibit 20.DI.
")hibit 20.D& The order property implies that the smallest element is stored at the root.
The order property implies that the smallest element Hthe one 5ith top priorityI is stored in the root. The ;min;
operation returns its value in time CH1I+ but the most obvious 5ay to delete this element leaves a hole+ 5hich takes
time to %ill. Ao5 can the tree be reorgani6ed so as to retain the structural and the order property? The structural
condition re0uires the removal o% the rightmost node on the lo5est level. The element stored thereU13 in our
e)ampleUis used HtemporarilyI to %ill the vacuum in the root. The root may no5 violate the order condition+ but the
latter can be restored by si%ting 13 do5n the tree according to its 5eight H ")hibit 20.9I. $% the order condition is
violated at any node+ the element in this node is e)changed 5ith the smaller o% the elements stored in its childrenL
in our e)ample+ 13 is e)changed 5ith 2. This sift-down process continues until the element %inds its proper level+ at
the latest 5hen it lands in a lea%.
Algorithms and Data Structures 20> A ,lobal Te)t
1
2
3 9
19 10 8 4 13
6
5
7
2.. 0mplicit data structures
")hibit 20.9& 8ebuilding the order property o% the tree in ")hibit 20.D a%ter 1 has been
removed and 13 has been moved to the root.
$nsertion is handled analogously. The structural condition re0uires that a ne5 node is created on the bottom
level at the le%tmost empty slot. The ne5 element # 0 in our e)ample # is temporarily stored in this node H")hibit
20.10I. $% the parent node no5 violates the order condition+ 5e restore it by %loating the ne5 element up5ard
according to its 5eight. $% the ne5 element is smaller than the one stored in its parent node+ these t5o elements # in
our e)ample 0 and > # are e)changed. This sift-up process continues until the element %inds its proper level+ at the
latest 5hen it sur%aces at the root.
")hibit 20.10& 8ebuilding the order property o% the tree in ")hibit 20.D a%ter 0 has
been inserted in a ne5 rightmost node on the lo5est level.
The number o% steps e)ecuted during the si%t#up process and the si%t#do5n process is at most e0ual to the height
o% the tree. The structural condition implies that this height is Nlog
2
nO. Thus both ;insert; and ;delete; in a heap 5ork
in time CHlog nI.
207
This book is licensed under a Creative Commons Attribution 3.0 License
A binary tree can be implemented in many di%%erent 5ays+ but the special class o% trees that meets the structural
condition stated above has a particularly e%%icient array implementation. A heap is a binary tree that satis%ies the
structural and the order condition and is embedded in a linear array in such a 5ay that the children o% a node 5ith
inde) i have indices 2 K i and 2 K i ] 1 H")hibit 20.11I. Thus the parent o% a node 5ith inde) 3 has inde) 3 div 2. Any
subtree o% a heap is also a heap+ although it may not be stored contiguously. The order property %or the heap implies
that the elements stored at indices 2 K i and 2 K i ] 1 are Z the element stored at inde) i. This order is called the heap
order.
")hibit 20.11& "mbedding the tree o% ")hibit 20.D in a linear array.
The procedure ;restore; is a use%ul tool %or managing a heap. $t creates a heap out o% a binary tree embedded in a
linear array h that satis%ies the structural condition+ provided that the t5o subtrees o% the root node are already
heaps. 2rocedure ;restore; is applied to subtrees o% the entire heap 5hose nodes are stored bet5een the indices L
and 8 and 5hose tree structure is de%ined by the %ormulas 2 K i and 2 K i ] 1.
const m . C & { length of heap }
type addr . " .. m&
var h- arrayFaddrG of elt&
procedure restore*8, 5- addr+&
var i, /- addr&
egin
i -. 8&
while i J *5 div $+ do egin
if *$ 1 i S 5+ cand *hF$ 1 i 9 "G S hF$ 1 iG+ then / -. $ 1 i 9
" else / -. $ 1 i&
if hF/G S hFiG then { hFiG -.- hF/G& i -. / } else i -. 5
end
end&
.ince ;restore; operates along a single path %rom the root to a lea% in a tree 5ith at most 8 U L nodes+ it 5orks in
time CHlog H8 U LII.
Creating a heap
An array h can be turned into a heap as %ollo5s& %or i &V n div 2 do5n to 1 do restoreHi+ nIL
Algorithms and Data Structures 20D A ,lobal Te)t
2.. 0mplicit data structures
This is more e%%icient than repeated insertion o% a single element into an e)isting heap. .ince the %or loop is
e)ecuted n div 2 times+ and n U i ` n+ the time comple)ity %or creating a heap 5ith n elements is CHn K log nI. A more
care%ul analysis sho5s that the time comple)ity %or creating a heap is CHnI.
Aeap implementation o% the %i)ed#length priority 0ueue
const m . C & { maximum length of heap }
type addr . " .. m&
priorityAueue . record
h- arrayFaddrG of elt& { heap storage }
n- 0 .. m { current number of elements }
end&
procedure restore*var h- arrayFaddrG of elt& 8, 5- addr+&
egin C end&
procedure create*var p- priorityAueue+&
egin p.n -. 0 end&
function empty*p- priorityAueue+- oolean&
egin return*p.n . 0+ end&
function full*p- priorityAueue+- oolean&
egin return*p.n . m+ end&
procedure insert*var p- priorityAueue& x- elt+&
{ not to be called if the Jueue is full }
var i- " .. m&
egin
p.n -. p.n 9 "& p.hFp.nG -. x& i -. p.n&
while *i K "+ cand *p.hFiG S p.hFi div $G+ do
{ p.hFiG -.- p.hFi div $G& i -. i div $ }
end&
function min*p- priorityAueue+- elt&
{ not to be called if the Jueue is empty }
egin return*p.hF"G+ end&
procedure delete*var p- priorityAueue+&
{ not to be called if the Jueue is empty }
egin p.hF"G -. p.hFp.nG& p.n -. p.n 0 "& restore*p.h, ", p.n+
end&
Heapsort
The heap is the core o% an elegant CHn K log nI sorting algorithm. The %ollo5ing procedure ;heapsort; sorts n
elements stored in the array h into decreasing order.
procedure heapsort*n- addr+& { sort elements stored in h7& "" n8 }
var i- addr&
egin { heap creation phase= the heap is built up }
for i -. n div $ downto " do restore*i, n+&
{ shift-up phase= elements are extracted from heap in increasing
order }
for i -. n downto $ do { hFiG -.- hF"G& restore*", i 0 "+ }
end&
"ach o% the %or loops is e)ecuted less than n times+ and the time comple)ity o% restore is CHlog nI. Thus heapsort
al5ays 5orks in time CHn K log nI.
209
This book is licensed under a Creative Commons Attribution 3.0 License
")ercises and programming pro3ects
1. Block#diagonal matrices are composed o% smaller matrices that line up along the diagonal and have 0
elements every5here else+ as sho5n in ")hibit 20.12. .ho5 ho5 to store an arbitrary block#diagonal matri)
in a minimal storage area+ and 5rite do5n the corresponding address computation %ormulas.
")hibit 20.12& .tructure o% a block#diagonal matri).
2. Let A be an antisymmetric n n#matri) Hi. e.+ all elements o% the matri) satis%y Ai3 V UA3iI.
HaI :hat values do the diagonal elements Aii o% the matri) have?
HbI Ao5 can A be stored in a linear array c o% minimal si6e? :hat i s the si6e o% c?
HcI :rite a
function O*i, /- " .. n+- real&
5hich returns the value o% the corresponding matri) element.
3. .ho5 that the product o% t5o n n matrices o% 5idth 2 K b ] 1 Hb V 0+ 1+ [I is again a band matri). :hat is
the 5idth o% the product matri)? :rite a procedure that computes the product o% t5o band matrices both
having the same 5idth and stores the result as a band matri) o% minimal 5idth.
<. $mplement a double#ended 0ueue Hde0ueI by a circular bu%%er.
@. :hat are the minimum and ma)imum numbers o% elements in a heap o% height h?
>. (etermine the time comple)ities o% the %ollo5ing operations per%ormed on a heap storing n elements. HaI
.earching any element. HbI .earching the largest element Hi.e. the element 5ith lo5est priorityI.
7. $mplement heapsort and animate the sorting process+ %or e)ample as sho5n in the snapshots in RAlgorithm
animationS. Compare the number o% comparisons and e)change operations needed by heapsort and other
sorting algorithms He.g. 0uicksortI %or di%%erent input con%igurations.
D. :hat is the running time o% heapsort on an array hN1 .. nO that is already sorted in increasing order? :hat
about decreasing order?
9. $n a k#ary heap+ nodes have k children instead o% 2 children.
HaI Ao5 5ould you represent a k#ary heap in an array?
HbI :hat is the height o% a k#ary heap in terms o% the number o% elements n and k?
HcI $mplement a priority 0ueue by a k#ary heap. :hat are the time comple)ities o% the operations ;insert;
and ;delete; in terms o% n and k?
Algorithms and Data Structures 210 A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
-%& Aist structures
Learning ob3ectives&
static vs dynamic data structures
linear+ circular and t5o#5ay lists
%i%o 0ueue implemented as a linear list
breadth#%irst and depth#%irst tree traversal
traversing a binary tree 5ithout any au)iliary memory& triple tree traversal algorithm
dictionary implemented as a binary search tree
Balanced trees guarantee that dictionary operations can be per%ormed in logarithmic time
height#balanced trees
multi5ay trees
Aists$ memory management$ pointer variables
The spectrum o% data structures ranges %rom static ob3ects+ such as a table o% constants+ to dynamic structures+
such as lists. A list is designed so that not only the data values stored in it+ but its si&e and shape can change at run
time+ due to insertions+ deletions+ or rearrangement o% data elements. 'ost o% the data structures discussed so %ar
can change their si6e and shape to a limited e)tent. A circular bu%%er+ %or e)ample+ supports insertion at one end and
deletion at the other+ and can gro5 to a predeclared ma)imal si6e. A heap supports deletion at one end and
insertion any5here into an array. $n a list+ any local change can be done 5ith an e%%ort that is independent o% the
si6e o% the list # provided that 5e kno5 the memory locations o% the data elements involved. The key to meeting this
re0uirement is the idea o% abandoning memory allocation in large contiguous chunks+ and instead allocating it
dynamically in the smallest chunk that 5ill hold a given ob3ect. Because data elements are stored randomly in
memory+ not contiguously+ an insertion or deletion into a list does not propagate a ripple e%%ect that shi%ts other
elements around. An element inserted is allocated any5here in memory 5here there is space and tied to other
elements by pointers Hi.e. addresses o% the memory locations 5here these elements happen to be stored at the
momentI. An element deleted does not leave a gap that needs to be %illed as it 5ould in an array. $nstead+ it leaves
some %ree space that can be reclaimed later by a memory management process. The element deleted is likely to
break some chains that tie other elements togetherL i% so+ the broken chains are relinked according to rules speci%ic
to the type o% list used.
2ointers are the language %eature used in modern programming languages to capture the e0uivalent o% a
memory address. A pointer value is essentially an address+ and a pointer variable ranges over addresses. A pointer+
ho5ever+ may contain more in%ormation than merely an address. $n 2ascal and other strongly typed languages+ %or
e)ample+ a pointer also re%erences the type de%inition o% the ob3ects it can point to # a %eature that enhances the
compiler;s ability to check %or consistent use o% pointer variables.
Let us illustrate these concepts 5ith a simple e)ample& a one-way linear list is a se0uence o% cells each o%
5hich He)cept the lastI points to its successor. The %irst cell is the head o% the list+ the last cell is the tail. .ince the
Algorithms and Data Structures 211 A ,lobal Te)t
21. 4ist structures
tail has no successor+ its pointer is assigned a prede%ined value ;nil;+ 5hich di%%ers %rom the address o% any cell.
Access to the list is provided by an e)ternal pointer ;head;. $% the list is empty+ ;head; has the value ;nil;. A cell stores
an element )
i
and a pointer to the successor cell H")hibit 21.1I&
type cptr . _cell&
cell . record e- elt& next- cptr end&
")hibit 21.1& A one#5ay linear list.
Local operations+ such as insertion or deletion at a position given by a pointer p+ are e%%icient. /or e)ample+ the
%ollo5ing statements insert a ne5 cell containing an element y as successor o% a cell being pointed at by p H ")hibit
21.2I&
new*A+& A_.e -. y& A_.next -. p_.next& p_.next -. A&
")hibit 21.2& $nsertion as a local operation.
The successor o% the cell pointed at by p is deleted by a single assignment statement H")hibit 21.3I&
p_.next -. p_.next_.next&
")hibit 21.3& (eletion as a local operation.
An insertion or deletion at the head or tail o% this list is a special case to be handled separately. To support
insertion at the tail+ an additional pointer variable ;tail; may be set to point to the tail element+ i% it e)ists.
A one#5ay linear list sometimes is handier i% the tail points back to the head+ making it a circular list. $n a
circular list+ the head and tail cells are replaced by a single entry cell+ and any cell can be reached %rom any other
5ithout having to start at the e)ternal pointer ;entry; H")hibit 21.<I.
")hibit 21.<& A circular list combines head and tail into a single entry point
212
This book is licensed under a Creative Commons Attribution 3.0 License
$n a two-wa Hor doubl linkedI list each cell contains t5o pointers+ one to its successor+ the other to its
predecessor. The list can be traversed in both directions. ")hibit 21.@ sho5s a circular t5o#5ay list.
")hibit 21.@& A circular t5o#5ay or doubly#linked list
")ercise& traversal o% a singly linked list in both directions
:rite a recursive
procedure traverse*p- cptr+&
to traverse a singly linked list %rom the head to the tail and back again. At each visit o% a node+ call the
procedure visit*p- cptr+&
.olve the same problem iteratively 5ithout using any additional storage beyond a %e5 local pointers. Xour
traversal procedure may modi%y the structure o% the list temporarily.
.olution
*a+ procedure traverse*p- cptr+&
egin if p T nil then = visit*p+& traverse*p_.next+&
visit*p+ > end&
The initial call of this procedure is
traverse*head+&
*+ procedure traverse*p- cptr+&
var o, A- cptr& i- integer&
egin
for i -. " to $ do = forward and ack again > egin
o -. nil&
while p T nil do egin
visit*p+& A -. p_.next& p_.next -. o&
o -. p& p -. A = the fork advances >
end&
p -. o
end
end&
Traversal ecomes simpler if we let the ,next, pointer of the tail
cell point to this cell itself-
procedure traverse*p- cptr+&
var o, A- cptr&
egin
o -. nil&
while p T nil do egin
visit*p+& A -. p_.next& p_.next -. o&
o -. p& p -. A = the fork advances >
end
end&
Algorithms and Data Structures 213 A ,lobal Te)t
21. 4ist structures
The fifo 1ueue implemented as a one)*ay list
$t is natural to implement a %i%o 0ueue as a one#5ay linear list+ 5here each element points to the ne)t one Ein
lineE. The operation ;de0ueue; occurs at the pointer ;head;+ and ;en0ueue; is made %ast by having an e)ternal pointer
;tail; point to the last element in the 0ueue. A cra%ty implementation o% this data structure involves an empty cell+
called a sentinel+ at the tail o% the list. $ts purpose is to make the list#handling procedures simpler and %aster by
making the empty 0ueue look more like all other states o% the 0ueue. 'ore precisely+ 5hen the 0ueue is empty+ the
e)ternal pointers ;head; and ;tail; both point to the sentinel rather than having the value ;nil;. The sentinel allo5s
insertion into the empty 0ueue+ and deletion that results in an empty 0ueue+ to be handled by the same code that
handles the general case o% ;en0ueue; and ;de0ueue;. The reader should veri%y our claim that a sentinel simpli%ies the
code by programming the plausible+ but less e%%icient+ procedures 5hich assume that an empty 0ueue is represented
by head V tail V nil.
The 0ueue is empty i% and only i% ;head; and ;tail; both point to the sentinel Hi.e. i% head V tailI. An ;en0ueue;
operation is per%ormed by inserting the ne5 element into the sentinel cell and then creating a ne5 sentinel.
type cptr . _cell&
cell . record e- elt& next- cptr end&
fifoAueue . record head, tail- cptr end&
procedure create*var f- fifoAueue+&
egin new*f.head+& f.tail -. f.head end&
function empty*f- fifoAueue+- oolean&
egin return*f.head . f.tail+ end&
procedure enAueue*var f- fifoAueue& x- elt+&
egin f.tail_.e -. x& new*f.tail_.next+& f.tail -. f.tail_.next
end&
function front*f- fifoAueue+- elt&
{ not to be called if the Jueue is empty }
egin return*f.head_.e+ end&
procedure deAueue*var f- fifoAueue+&
{ not to be called if the Jueue is empty }
egin f.head -. f.head_.next end&
Tree traversal
:hen 5e speak o% trees in computer science+ 5e usually mean rooted ordered trees& they have a
distinguished node called the root+ and the subtrees o% any node are ordered. 8ooted+ ordered trees are best de%ined
recursively& a tree T is either empty+ or it is a tuple H!+ T
1
+ [ + T
k
I+ 5here ! is the root o% the tree+ and T
1
+ [ + T
k
is a
se0uence o% trees. Binary trees are the special case k V 2.
Trees are typically used to organi6e data or activities in a hierarchy& a top#level data set or activity is composed o%
a ne)t level o% data or activities+ and so on. :hen one 5ishes to gather or survey all o% the data or activities+ it is
necessary to traverse the tree+ visiting Hi.e. processingI the nodes in some systematic order. The visit at each node
might be as simple as printing its contents or as complicated as computing a %unction that depends on all nodes in
the tree. There are t5o ma3or 5ays to traverse trees& breadth %irst and depth %irst.
/readth-first traversal visits the nodes level by level. This is use%ul in heuristic search+ 5here a node
represents a partial solution to a problem+ 5ith deeper levels representing more complete solutions. Be%ore
pursuing any one solution to a great depth+ it may be advantageous to assess all the partial solutions at the present
21<
This book is licensed under a Creative Commons Attribution 3.0 License
level+ in order to pursue the most promising one. :e do not discuss breadth#%irst traversal %urther+ 5e merely
suggest the %ollo5ing&
")ercise& breadth#%irst traversal
(ecide on a representation %or trees 5here each node may have a variable number o% children. :rite a
procedure %or breadth#%irst traversal o% such a tree. =int: use a %i%o 0ueue to organi6e the traversal. The node to be
visited is removed %rom the head o% the 0ueue+ and its children are en0ueued+ in order+ at the tail end.
%epth-first traversal al5ays moves to the %irst unvisited node at the ne)t deeper level+ i% there is one. $t turns
out that depth#%irst better %its the recursive de%inition o% trees than breadth#%irst does and orders nodes in 5ays that
are more o%ten use%ul. :e discuss depth#%irst %or binary trees and leave the generali6ation to other trees to the
reader. (epth#%irst can generate three basic orders %or traversing a binary tree& preorder+ inorder+ and
postorder+ de%ined recursively as&
preorder `isit root, traverse left sutree, traverse right sutree.
!norder Traverse left sutree, visit root, traverse right sutree.
postorder Traverse left sutree, traverse right sutree, visit root.
/or the tree in ")hibit 21.>5e obtain the orders sho5n.
")hibit 21.>& .tandard orders de%ined on a binary tree
An arithmetic e)pression can be represented as a binary tree by assigning the operands to the leaves and the
operators to the internal nodes. The basic traversal orders correspond to di%%erent notations %or representing
arithmetic e)pressions. By traversing the e)pression tree H")hibit 21.7I in preorder+ inorder+ or postorder+ 5e
obtain the prefix+ infix+ or suffix notation+ respectively.
")hibit 21.7& .tandard traversal orders correspond to di%%erent notations %or arithmetic e)pressions
A binary tree can be implemented as a list structure in many 5ays. The most common 5ay uses an e)ternal
pointer ;root; to access the root o% the tree and represents each node by a cell that contains a %ield %or an element to
be stored+ a pointer to the root o% the le%t subtree+ and a pointer to the root o% the right subtree H")hibit 21.DI. An
empty le%t or right subtree may be represented by the pointer value ;nil;+ or by pointing at a sentinel+ or+ as 5e shall
see+ by a pointer that points to the node itsel%.
type nptr . _node&
node . record e- elt& 8, 5- nptr end&
var root- nptr&
Algorithms and Data Structures 21@ A ,lobal Te)t
21. 4ist structures
")hibit 21.D& .traight%or5ard implementation o% a binary tree
The %ollo5ing procedure ;traverse; implements any or all o% the three orders preorder+ inorder+ and postorder+
depending on ho5 the procedures ;visit1;+ ;visit2;+ and ;visit3; process the data in the node re%erenced by the pointer p.
The root o% the subtree to be traversed is passed through the %ormal parameter p. $n the simplest case+ a visit does
nothing or simply prints the contents o% the node.
procedure traverse*p- nptr+&
egin
if p T nil then egin
visit
"
*p+& { preorder }
traverse*p_.8+&
visit
$
*p+& { inorder }
traverse*p_.5+&
visit
%
*p+ { postorder }
end
end&
Traversing a tree involves both advancing %rom the root to5ard the leaves+ and backing up %rom the leaves
to5ard the root. 8ecursive invocations o% the procedure ;traverse; build up a stack 5hose entries contain re%erences
to the nodes %or 5hich ;traverse; has been called. These entries provide a means o% returning to a node a%ter the
traversal o% one o% its subtrees has been %inished. The bookkeeping done by a stack or e0uivalent au)iliary structure
can be avoided i% the tree to be traversed may be modi%ied temporarily.
The %ollo5ing triple-tree traversal algorithm provides an elegant and e%%icient 5ay o% traversing a binary tree
5ithout using any au)iliary memory Hi.e. no stack is used and it is not assumed that a node contains a pointer to its
parent nodeI. The data structure is modi%ied temporarily to retain the in%ormation needed to %ind the 5ay back up
the tree and to restore each subtree to its initial condition a%ter traversal. The triple#tree traversal algorithm
assumes that an empty subtree is encoded not by a ;nil; pointer+ but rather by an L Hle%tI or 8 HrightI pointer that
points to the node itsel%+ as sho5n in ")hibit 21.9.
21>
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 21.9& Coding o% a lea% used in procedure TTT
procedure TTT&
var o, p, A- nptr&
egin
o -. nil& p-. root&
while p T nil do egin
visit*p+&
A -. p_.8&
p_.8 -. p_.5& { rotate left pointer }
p_.5 -. o& { rotate right pointer }
o -. p&
p -. A
end
end&
$n this procedure the pointers p HEpresentEI and o HEoldEI serve as a t5o#pronged %ork. The tree is being
traversed by the pointer p and the companion pointer o+ 5hich al5ays lags one step behind p. The t5o pointers
%orm a t5o#pronged %ork that runs around the tree+ starting in the initial condition 5ith p pointing to the root o% the
tree+ and o V nil. An au)iliary pointer 0 is needed temporarily to advance the %ork. The 5hile loop in ;TTT; is
e)ecuted as long as p points to a node in the tree and is terminated 5hen p assumes the value ;nil;. The initial value
o% the o pointer gets saved as a temporary value. /irst it is assigned to the 8 pointer o% the root+ later to the L
pointer. /inally+ it gets assigned to p+ the %ork e)its %rom the root o% the tree+ and the traversal o% the tree is
complete. The correctness o% this algorithm is proved by induction on the number o% nodes in the tree.
Induction hpothesis =: i% at the beginning o% an iteration o% the 5hile loop+ the %ork pointer p points to the root
o% a subtree 5ith n d 0 nodes+ and o has a value ) that is di%%erent %rom any pointer value inside this subtree+ then
a%ter 3 K n iterations the subtree 5ill have been traversed in triple order Hvisiting each node e)actly three timesI+ all
tree pointers in the subtree 5ill have been restored to their original value+ and the %ork pointers 5ill have been
reversed Hi.e. p has the value ) and o points to the root o% the subtreeI.
5ase of induction: A is true %or n V 1.
Proof: The smallest tree 5e consider has e)actly one node+ the root alone. Be%ore the 5hile loop is e)ecuted %or
this subtree+ the %ork and the tree are in the initial state sho5n in")hibit 21.10. ")hibit 21.11 sho5s the state o% the
%ork and the tree a%ter each iteration o% the 5hile loop. The node is visited in each iteration.
")hibit 21.10 & $nitial con%iguration %or traversing a tree consisting o% a single node
Algorithms and Data Structures 217 A ,lobal Te)t
21. 4ist structures
")hibit 21.11& Tracing procedure TTT 5hile traversing the smallest tree
Induction step: $% A is true %or all n+ 0 e n ` k+ A is also true %or k ] 1.
Proof: Consider a tree T 5ith k ] 1 nodes. T consists o% a root and k nodes shared among the le%t and right
subtrees o% the root. "ach o% these subtrees has ` k nodes+ so 5e apply the induction hypothesis to each o% them.
The %ollo5ing is a highly compressed account o% the proo% o% the induction step+ illustrated by ")hibit 21.12.
Consider the tree 5ith k ] 1 nodes sho5n in state 1. The root is a node 5ith three %ieldsL the le%t and right subtrees
are sho5n as triangles. The %igure sho5s the typical case 5hen both subtrees are nonempty. $% one o% the t5o
subtrees is empty+ the corresponding pointer points back to the rootL these t5o cases can be handled similarly to the
case n V 1. The %ork starts out 5ith p pointing at the root and o pointing at anything outside the subtree being
traversed. :e 5ant to sho5 that the initial state 1 is trans%ormed in 3 K Hk ] 1I iterations into the %inal state >. $n the
%inal state the subtrees are shaded to indicate that they have been correctly traversedL the %ork has e)ited %rom the
root+ 5ith p and o having e)changed values. To sho5 that the algorithm correctly trans%orms state 1 into state >+ 5e
consider the intermediate states 2 to @+ and 5hat happens in each transition.
1 2 Cne iteration through the 5hile loop advances the %ork into the le%t subtree and rotates the pointers o% the
root.
2 3 A applied to the le%t subtree o% the root says that this subtree 5ill be correctly traversed+ and the %ork 5ill
e)it %rom the subtree 5ith pointers reversed.
3 < This is the second iteration through the 5hile loop that visits the root. The %ork advances into the right
subtree+ and the pointers o% the root rotate a second time.
< @ A applied to the right subtree o% the root says that this subtree 5ill be correctly traversed+ and the %ork 5ill
e)it %rom the subtree 5ith pointers reversed.
@ > This is the third iteration through the 5hile loop that visits the root. The %ork moves out o% the tree being
traversedL the pointers o% the root rotate a third time and thereby assume their original values.
21D
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 21.12& Trace o% procedure TTT+ invoking the induction hypothesis
")ercise& binary trees
Consider a binary tree declared as %ollo5s&
type nptr . _node&
node . record 8, 5- nptr end&
var root- nptr&
HaI $% a node has no le%t or right subtree+ the corresponding pointer has the value ;nil;. 2rove that a binary tree
5ith n nodes+ n d 0+ has n ] 1 ;nil; pointers.
Algorithms and Data Structures 219 A ,lobal Te)t
21. 4ist structures
HbI :rite a %unction nodesH[I& integerL that returns the number o% nodes+ and a %unction depthH[I& integerL
that returns the depth o% a binary tree. The depth o% the root is de%ined to be 0L the depth o% any other node
is the depth o% its parent increased by 1. The depth o% the tree is the ma)imum depth o% its nodes.
.olution
HaI "ach node contains t5o pointers+ %or a total o% 2 K n pointers in the tree. There is e)actly one pointer that
points to each o% n U 1 nodes+ none points to the root. Thus 2 K n U Hn U 1I V n ] 1 pointers are ;nil;. This can
also be proved by induction on the number o% nodes in the tree.
*+ function nodes*p- nptr+- integer&
egin
if p . nil then
return*0+
else
return*nodes*p_.8+ 9 nodes*p_.5+ 9 "+
end&
function depth*p- nptr+- integer&
egin
if p . nil then return *0"+
else return*" 9 max*depth*p_.8+, depth*p_.5+++
end&
where ,max, is
function max*a, - integer+- integer&
egin if a K then return*a+ else return*+ end&
")ercise& list copying
"%%ective memory management sometimes makes it desirable or necessary to copy a list. /or e)ample+
per%ormance may improve drastically i% a list spread over several pages can be compressed into a single page. List
copying involves a traversal o% the original concurrently 5ith a traversal o% the copy+ as the latter is being built up.
HaI Consider binary trees built %rom nodes o% type ;node; and pointers o% type ;nptr;. A tree is accessed through
a pointer to the root+ 5hich is ;nil; %or an empty tree
type nptr . _ node&
node . record e- elt& 8, 5- nptr end&
:rite a recursive
function cptree*p- nptr+- nptr&
to copy a tree given by a pointer p to its root+ and return a pointer to the root o% the copy.
HbI Consider arbitrary graphs built %rom nodes o% a type similar to the nodes in HaI+ but they have an additional
pointer %ield cn+ intended to point to the copy o% a node&
type node . record e- elt& 8, 5- nptr& cn- nptr end&
A graph is accessed through a pointer to a node called the origin+ and 5e are only concerned 5ith nodes that can
be reached %rom the originL this access pointer is ;nil; %or an empty graph. :rite a recursive
function cpgraph*p- nptr+- nptr&
220
This book is licensed under a Creative Commons Attribution 3.0 License
to copy a graph given by a pointer p to its origin+ and return a pointer to the origin o% the copy. *se the %ield cn+
assuming that its initial value is ;nil; in every node o% the original graphL set it to ;nil; in every node o% the copy.
.olution
*a+ function cptree*p- nptr+- nptr&
var cp- nptr&
egin
if p . nil then
return*nil+
else egin
new*cp+&
cp_.e -. p_.e& cp_.8 -. cptree*p_.8+& cp_.5 -. cptree*p_.5+&
return*cp+
end
end&
*+ function cpgraph*p- nptr+- nptr&
var cp- nptr&
egin
if p . nil then
return*nil+
elsif p_.cn T nil then { node has already been copied }
return*p_.cn+
else egin
new*cp+& p_.cn -. cp& cp_.cn -. nil&
cp_.e -. p_.e& cp_.8 -. cpgraph*p_.8+& cp_.5 -. cpgraph*p_.5+&
return*cp+
end
end&
")ercise& list copying 5ith constant au)iliary memory
Consider binary trees as in part HaI o% the preceding e)ercise. 'emory %or the stack implied by the recursion can
be saved by 5riting an iterative tree copying procedure that uses only a constant amount o% au)iliary memory. This
re0uires a trick+ as any depth#%irst traversal must be able to back up %rom the leaves to5ard the root. $n the triple#
tree traversal procedure+ the return path is temporarily encoded in the tree being traversed. This idea can again be
used here+ but there is a simpler solution& The return path is temporarily encoded in the 8#%ields o% the copyL the L#
%ields o% certain nodes o% the copy point back to the corresponding node in the original. :ork out the details o% a
tree#copying procedure that 5orks 5ith CH1I au)iliary memory.
")ercise& traversing a directed acyclic graph
A directed graph consists o% nodes and directed arcs+ 5here each arc leads %rom one node to another. A directed
graph is acclic i% the arcs %orm no cycles. Cne 5ay to ensure that a graph is acyclic is to label nodes 5ith distinct
integers and to dra5 each arc %rom a lo5er number to a higher number. Consider a binary directed acyclic graph+
5here each node has t5o pointer %ields+ L and 8+ to represent at most t5o arcs that lead out o% that node. An
e)ample is sho5n in ")hibit 21.13.
")hibit 21.13& A rooted acyclic graph.
Algorithms and Data Structures 221 A ,lobal Te)t
21. 4ist structures
HaI :rite a program to visit every node in a directed acyclic graph reachable %rom a pointer called ;root;. Xou
are %ree to e)ecute procedure ;visit; %or each node as o%ten as you like.
HbI :rite a program similar to HaI 5here you are re0uired to e)ecute procedure ;visit; e)actly once per node.
=int: !odes may need to have additional %ields.
")ercise& counting nodes on a s0uare grid
Consider a net5ork superimposed on a s0uare grid& each node is connected to at most %our neighbors in the
directions east+ north+ 5est+ south H")hibit 21.1<I&
type nptr . _node&
node . record 7, D, ?, S- nptr& status- oolean end&
var origin- nptr&
")hibit 21.1<& A graph embedded in a s0uare grid.
A ;nil; pointer indicates the absence o% a neighbor. !eighboring nodes are doubly linked& i% a pointer in node p
points to node 0+ the reverse pointer o% 0 points to pL He.g.+ pv.: V 0 and 0v." V pI. The pointer ;origin; is ;nil; or
points to a node. Consider the problem o% counting the number o% nodes that can be reached %rom ;origin;. Assume
that the status %ield o% all nodes is initially set to %alse. Ao5 do you use this %ield? :rite a %unction nnHp& nptrI&
integerL to count the number o% nodes.
.olution
function nn*p- nptr+- integer&
egin
if p . nil cor p_.status then
return*0+
else egin
p_.status-. true&
return*" 9 nn*p_.7+ 9 nn*p_.D+ 9 nn*p_.?+ 9 nn*p_.S++
end
end&
")ercise& counting nodes in an arbitrary net5ork
:e generali6e the problem above to arbitrary directed graphs+ such as that o% ")hibit 21.1@+ 5here each node
may have any number o% neighbors. This graph is represented by a data structure de%ined by ")hibit 21.1> and the
type de%initions belo5. "ach node is linked to an arbitrary number o% other nodes.
")hibit 21.1@& An arbitrary HcyclicI directed graph.
222
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 21.1>& A possible implementation as a list structure.
type nptr . _node& cptr . _cell&
node . record status- oolean& np- nptr& cp- cptr end&
cell . record np- nptr& cp- cptr end&
var origin- nptr&
The pointer ;origin; has the value ;nil; or points to a node. Consider the problem o% counting the number n o%
nodes that can be reached %rom ;origin;. The status %ield o% all nodes is initially set to %alse. Ao5 do you use it? :rite
a %unction nnHp& nptrI& integerL that returns n.
7inary search trees
A binary search tree is a binary tree T 5here each node ! stores a data element eH!I %rom a domain W on
5hich a total order ` is de%ined+ sub3ect to the %ollo5ing order condition& /or every node ! in T+ all elements in the
le%t subtree LH!I o% ! are e eH!I+ and all elements in the right subtree 8H!I o% ! are d eH!I. Let )1+ 2+ [ + )n be n
elements dra5n %rom the domain W.
Definition: A binary search tree %or )
1
+ )
2
+ [ + )
n
is a binary tree T 5ith n nodes and a one#to#one mapping
bet5een the n given elements and the n nodes+ such that
! in T !; LH!I !E 8H!I& eH!;I e eH!I e eH!EI
")ercise
.ho5 that the %ollo5ing statement is e0uivalent to this order condition& The inorder traversal o% the nodes o% T
coincides 5ith the natural order e o% the elements assigned to the nodes.
Remark: The order condition can be rela)ed to eH!;I ` eH!I e eH!EI to accommodate multiple occurrences o%
the same value+ 5ith only minor modi%ications to the statements and algorithms presented in this section. /or
simplicity;s sake 5e assume that all values in a tree are distinct.
The order condition permits binary search and thus guarantees a 5orst#case search time CHhI %or a tree o% height
h. Trees that are 5ell balanced Hin an intuitive senseL see the ne)t section %or a de%initionI+ that have not
degenerated into linear lists+ have a height h V CHlog nI and thus support search in logarithmic time.
Algorithms and Data Structures 223 A ,lobal Te)t
21. 4ist structures
Basic operations on binary search trees are most easily implemented as recursive procedures. Consider a tree
represented as in the preceding section+ 5ith empty subtrees denoted by ;nil;. The %ollo5ing %unction ;%ind; searches
%or an element ) in a subtree pointed to by p. $t returns a pointer to a node containing ) i% the search is success%ul+
and ;nil; i% it is not.
function find*x- elt& p- nptr+- nptr&
egin
if p . nil then return*nil+
elsif x S p_.e then return*find*x, p_.8++
elsif x K p_.e then return*find*x, p_.5++
else { x ! pL"e } return*p+
end&
The %ollo5ing procedure ;insert; leaves the tree alone i% the element ) to be inserted is already stored in the tree.
The parameter p initially points to the root o% the subtree into 5hich ) is to be inserted.
procedure insert*x- elt& var p- nptr+&
egin
if p . nil then { new*p+& p_.e -. x& p_.8 -. nil& p_.5 -.
nil }
elsif x S p_.e then insert*x, p_.8+
elsif x K p_.e then insert*x, p_.5+
end&
!nitial call-
insert*x, root+&
To delete an element )+ 5e %irst have to %ind the node ! that stores ). $% this node is a lea% or semilea% Ha node
5ith only one subtreeI+ it is easily deletedL but i% it has t5o subtrees+ it is more e%%icient to leave this node in place
and to replace its element ) by an element %ound in a lea% or semilea% node+ and delete the latter H ")hibit 21.17I.
Thus 5e distinguish three cases&
1. $% ! has no child+ remove !.
2. $% ! has e)actly one child+ replace ! by this child node.
3. $% ! has t5o children+ replace ) by the largest element y in the le%t subtree+ or by the smallest element 6 in
the right subtree o% !. "ither o% these elements is stored in a node 5ith at most one child+ 5hich is removed
as in case H1I or H2I.
22<
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 21.17& "lement ) is deleted 5hile preserving its node !. !ode ! is
%illed 5ith a ne5 value y+ 5hose old node is easier to delete.
A sentinel is again the key to an elegant iterative implementation o% binary search trees. $n a node 5ith no le%t or
right child+ the corresponding pointer points to the sentinel. This sentinel is a node that contains no elementL its le%t
pointer points to the root and its right pointer points to itsel%. The root+ i% it e)ists+ can only be accessed through the
le%t pointer o% the sentinel. The empty tree is represented by the sentinel alone H")hibit 21.1DI. A typical tree is
sho5n in ")hibit 21.19.
")hibit 21.1D& The empty binary tree is represented by the sentinel 5hich points to itsel%.
")hibit 21.19& A binary tree implemented as a list structure 5ith
sentinel.
The %ollo5ing implementation o% a dictionary as a binary search tree uses a sentinel accessed via the variable d&
type nptr . _node&
node . record e- elt& 8, 5- nptr end&
dictionary . nptr&
procedure create*var d- dictionary+&
egin {create sentinel } new*d+& d_.8 -. d& d_.5 -. d end&
Algorithms and Data Structures 22@ A ,lobal Te)t
21. 4ist structures
function memer*d- dictionary& x- elt+- oolean&
var p- nptr&
egin
d_.e -. x& { initiali)e element in sentinel }
p -. d_.8& { point to root, if it exists }
while x T p_.e do
if x S p_.e then p -. p_.8 else { x 6 pL"e } p -. p_.5&
return*p T d+
end&
2rocedure ;%ind; searches %or ). $% %ound+ p points to the node containing )+ and 0 to its parent. $% not %ound+ p
points to the sentinel and 0 to the parent#to#be o% a ne5 node into 5hich ) 5ill be inserted.
procedure find*d- dictionary& x- elt& var p, A- nptr+&
egin
d_.e -. x& p -. d_.8& A -. d&
while x T p_.e do egin
A -. p&
if x S p_.e then p -. p_.8 else { x 6 pL"e } p -. p_.5
end
end&
procedure insert*var d- dictionary& x- elt+&
var p, A- nptr&
egin
find*d, x, p, A+&
if p . d then egin { x is not yet in the tree }
new*p+& p_.e -. x& p_.8 -. d& p_.5 -. d&
if x J A_.e then A_.8 -. p else { x 6 JL"e } A_.5 -. p
end
end&
procedure delete*var d- dictionary& x- elt+&
var p, A, t- nptr&
egin
find*d, x, p, A+&
if p T d then { x has been found }
if *p_.8 T d+ and *p_.5 T d+ then egin
{ p has left and right children; find largest element in left
subtree }
t -. p_.8& A-. p&
while t_.5 T d do { A -. t& t -. t_.5 }&
if t_.e S A_.e then A_.8 -. t_.8 else { tL"e 6 JL"e }
A_.5 -. t_.8
p_.e -. t_.e&
end
else egin { p has at most one child }
if p_.8 T d then{ left child only } p -. p_.8
elsif p_.5 T d then{ right child only } p -. p_.5
else { p has no children }p -. d&
if x J A_.e then A_.8 -. p else { x 6 JL"e } A_.5 -. p
end
end&
$n the best case o% a completely balanced binary search tree %or n elements+ all leaves are on levels [log
2
n] or
[log
2
nOU 1+ and the search tree has the height [log
2
nO. The cost %or per%orming the ;member;+ ;insert;+ or ;delete;
operation is bounded by the longest path %rom the root to a lea% Hi.e. the height o% the treeI and is there%ore CHlog nI.
22>
This book is licensed under a Creative Commons Attribution 3.0 License
:ithout any %urther provisions+ a binary search tree can degenerate into a linear list in the 5orst case. Then the cost
%or each o% the operations 5ould be CHnI.
:hat is the e)pected average cost %or the search operation in a randoml generated binary search tree?
E8andomly generatedE means that each permutation o% the n elements to be stored in the binary search tree has the
same probability o% being chosen as the input se0uence. /urthermore+ 5e assume that the tree is generated by
insertions only. There%ore+ each o% the n elements is e0ually likely to be chosen as the root o% the tree. Let pn be the
e)pected path length o% a randomly generated binary search tree storing n elements. Then
As sho5n in chapter 1> in the section R8ecurrence relationsS+ this recurrence relation has the solution
.ince the average search time in randomly generated binary search trees+ measured in terms o% the number o%
nodes visited+ is p
n
Q n and ln < o 1.3D>+ it %ollo5s that the cost is CHlog nI and there%ore only about <0 per cent
higher than in the case o% completely balanced binary search trees.
Balanced trees& general de%inition
$% insertions and deletions occurred at random+ and the assumption o% the preceding section 5as realistic+ 5e
could let search trees gro5 and shrink as they please+ incurring a modest increase o% <0 per cent in search time over
completely balanced trees. But real data are not random& they are typically clustered+ and long runs o%
monotonically increasing or decreasing elements occur+ o%ten as the result o% a previous processing step.
*n%ortunately+ such deviation %rom randomness degrades the per%ormance o% search trees.
To prevent search trees %rom degenerating into linear lists+ 5e can monitor their shape and restructure them
into a more balanced shape 5henever they have become too ske5ed. .everal classes o% balanced search trees
guarantee that each operation ;member;+ ;insert;+ and ;delete; can be per%ormed in time CHlog nI in the 5orst case.
.ince the 5ork to be done depends directly on the height o% the tree+ such a class B o% search trees must satis%y the
%ollo5ing t5o conditions HhT is the height o% a tree T+ nT is the number o% nodes in TI&
Balance condition: c d 0 T B& hT ` c K log2 nT
Rebalancing condition: $% an ;insert; or ;delete; operation+ per%ormed on a tree T B+ yields a tree T; B+ it
must be possible to rebalance T; in time CHlog nI to yield a tree TE B.
")ample& almost complete trees
The class o% almost complete binary search trees satis%ies the balance condition but not the restructuring
condition. $n the 5orst case it takes time CHnI to restructure such a binary search tree H")hibit 21.20I+ and i% ;insert;
and ;delete; are de%ined to include any rebalancing that may be necessary+ these operations cannot be guaranteed to
run in time CHlog nI.
Algorithms and Data Structures 227 A ,lobal Te)t
21. 4ist structures
")hibit 21.20& 8estructuring& 5orst case
$n the ne)t t5o sections 5e present several classes o% balanced trees that meet both conditions& the height#
balanced or AFL#trees H,. Adel;son#Fel;skii and ". Landis+ 19>2I NAL >2O and various multi5ay trees+ such as B#
trees NB' 72+ Com 79O and their generali6ation+ Ha+bI#trees N'eh D<aO.
AFL#trees+ 5ith their small nodes that hold a single data element+ are used primarily %or storing data in main
memory. 'ulti5ay trees+ 5ith potentially large nodes that hold many elements+ are also use%ul %or organi6ing data
on secondary storage devices+ such as disks+ that allo5 direct access to si6able physical data blocks. $n this case+ a
node is typically chosen to %ill a physical data block+ 5hich is read or 5ritten in one access operation.
Height)balanced trees
Definition: A binary tree is height-balanced i%+ %or each node+ the heights o% its t5o subtrees di%%er by at most
one. Aeight#balanced search trees are also called AE/-trees. ")hibit 21.21 to ")hibit 21.23 sho5 various AFL#trees+
and one that is not.
")hibit 21.21& ")amples o% height#balanced trees
")hibit 21.22& ")ample o% a tree not height#balancedLthe marked node violates the balance condition.
A Emost#ske5edE AFL#tree Th is an AFL#tree o% height h 5ith a minimal number o% nodes. .tarting 5ith T0 and
T1 sho5n in ")hibit 21.23+ Th is obtained by attaching ThU1 and ThU2 as subtrees to a ne5 root.
22D
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 21.23& 'ost ske5ed AFL trees o% heights h V 0 through h V <
The number o% nodes in a most#ske5ed AFL#tree o% height h is given by the recurrence relation
nh V nhU1 ] nhU2 ] 1+ n0 V 1+ n1 V 2.
$n the section on recurrence relations in the chapter entitled RThe mathematics o% algorithm analysisS+ it has
been sho5n that the recurrence relation
mh V mhU1 ] mhU2+ m0 V 0+ m1 V 1
has the solution
.ince nh V mh]3 U 1 5e obtain
.ince
it %ollo5s that
and there%ore nh behaves asymptotically as
Algorithms and Data Structures 229 A ,lobal Te)t
21. 4ist structures
Applying the logarithm results in
There%ore+ the height o% a 5orst#case AFL#tree 5ith n nodes is about 1.<< K log2 n. Thus the class o% AFL#trees
satis%ies the balance condition+ and the ;member; operation can al5ays be per%ormed in time CHlog nI.
:e no5 sho5 that the class o% AFL#trees also satis%ies the rebalancing condition. Thus AFL#trees support
insertion and deletion in time CHlog nI. "ach node ! o% an AFL#tree has one o% the balance properties Q Hle%t#
leaningI+ j Hright#leaningI+ or U Hhori6ontalI+ depending on the relative height o% its t5o subtrees.
T5o local tree operations+ rotation and double rotation+ allo5 the restructuring o% height#balanced trees that
have been disturbed by an insertion or deletion. They split a tree into subtrees and rebuild it in a di%%erent 5ay.
")hibit 21.2< sho5s a node+ marked black+ that got out o% balance+ and ho5 a local trans%ormation builds an
e0uivalent tree H%or the same elements+ arranged in orderI that is balanced. "ach o% these trans%ormations has a
mirror image that is not sho5n. The algorithms %or insertion and deletion use these rebalancing operations as
described belo5.
")hibit 21.2<& T5o local rebalancing operations
$nsertion
A ne5 element is inserted as in the case o% a binary search tree. The balance condition o% the ne5 node becomes
U Hhori6ontalI. .tarting at the ne5 node+ 5e 5alk to5ard the root o% the tree+ passing along the message that the
height o% the subtree rooted at the current node has increased by one. At each node encountered along this path+ an
operation determined by the %ollo5ing rules is per%ormed. These rules depend on the balance condition o% the node
be%ore the ne5 element 5as inserted+ and on the direction %rom 5hich the node 5as entered Hi.e. %rom its le%t or
right childI.
230
This book is licensed under a Creative Commons Attribution 3.0 License
Rule I1: $% the current node has balance condition U+ change it to Q or j depending on 5hether 5e entered %rom
the node;s le%t or %rom its right child. $% the current node is the root+ terminateL i% not+ continue to %ollo5 the path
up5ard.
Rule I: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
shorter+ change the balance condition toYand terminate Hthe height o% the subtree rooted at the current node has
not changedI.
Rule I!: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
taller+ the balance condition o% the current node is violated and gets restored as %ollo5s&
HaI $% the last t5o steps 5ere in the same direction Hboth %rom le%t children+ or both %rom right childrenI+ an
appropriate rotation restores all balances and the procedure terminates.
HbI $% the last t5o steps 5ere in opposite directions Hone %rom a le%t child+ the other %rom a right childI+ an
appropriate double rotation restores all balances and the procedure terminates.
The initial insertion travels along a path %rom the root to a lea%+ and the rebalancing process travels back up
along the same path. Thus the cost o% an insertion in an AFL#tree is CHhI+ or CHlog nI in the 5orst case. !ot ice that
an insertion calls %or at most one rotation or double rotation+ as sho5n in the e)ample in ")hibit 21.2@.
")ample
$nsert 1+ 2+ @+ 3+ <+ >+ 7 into an initially empty AFL#tree H")hibit 21.2@I. The balance condition o% a node is sho5n
belo5 it. Bold%aced nodes violate the balance condition.
Algorithms and Data Structures 231 A ,lobal Te)t
21. 4ist structures
")hibit 21.2@& Trace o% consecutive insertions and the rebalancings they trigger
(eletion
An element is deleted as in the case o% a binary search tree. .tarting at the parent o% the deleted node+ 5alk
to5ards the root+ passing along the message that the height o% the subtree rooted at the current node has decreased
by one. At each node encountered+ per%orm an operation according to the %ollo5ing rules. These rules depend on
the balance condition o% the node be%ore the deletion and on the direction %rom 5hich the current node and its child
5ere entered.
Rule D1: $% the current node has balance condition U+ change it to j or Q depending on 5hether 5e entered %rom
the node;s le%t or %rom its right child+ and terminate Hthe height o% the subtree rooted at the current node has not
changedI.
Rule D: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
taller+ change the balance condition to U and continue up5ard+ passing along the message that the subtree rooted at
the current node has been shortened.
Rule D!: $% the current node has balance condition Q or j and is entered %rom the subtree that 5as previously
shorter+ the balance condition is violated at the current node. :e distinguish three subcases according to the
232
This book is licensed under a Creative Commons Attribution 3.0 License
balance condition o% the other child o% the current node Hconsider also the mirror images o% the %ollo5ing
illustrationsI&
HaI
X Y
Z
a
b
X
Y
Z
b
a
rotation
An appropriate rotation restores the balance o% the current node 5ithout changing the height o% the subtree
rooted at this node. Terminate.
HbI
X
Y Z
b
a
X Y Z
a
b
rotation
A rotation restores the balance o% the current node. Continue up5ard+ passing along the message that the
subtree rooted at the current node has been shortened.
HcI
double rotation
W
a
b
c
a
b
c
X Y
Z W X Y Z
A double rotation restores the balance o% the current node. Continue up5ard+ passing along the message that
the subtree rooted at the current node has been shortened. .imilar trans%ormations apply i% either W or X+ but not
both+ are one level shorter than sho5n in this %igure. $% so+ the balance conditions o% some nodes di%%er %rom those
sho5n+ but this has no in%luence on the total height o% the subtree. $n contrast to insertion+ deletion may re0uire
more than one rotation or double rotation to restore all balances. .ince the cost o% a rotation or double rotation is
constant+ the 5orst#case cost %or rebalancing the tree depends only on the height o% the tree+ and thus the cost o% a
deletion in an AFL#tree is CHlog nI in the 5orst case.
'ulti5ay trees
!odes in a multi5ay tree may have a variable number o% children. As 5e are interested in balanced trees+ 5e add
t5o restrictions. /irst+ 5e insist that all leaves Hthe nodes 5ithout childrenI occur at the same depth. .econd+ 5e
constrain the number o% children o% all internal nodes by a lo5er bound a and an upper bound b. 'any varieties o%
multi5ay trees are kno5nL they di%%er in details+ but all are based on similar ideas. /or e)ample+ H2+3I#trees are
de%ined by the re0uirement that all internal nodes have either t5o or three children. :e generali6e this concept and
discuss Ha+bI#trees.
Definition: Consider a domain W on 5hich a total order ` is de%ined. Let a and b be integers 5ith 2 ` a and 2 K
a U 1 ` b. Let cH!I denote the number o% children o% node !. An 4a$b1-tree is an ordered tree 5ith the %ollo5ing
properties&
Algorithms and Data Structures 233 A ,lobal Te)t
21. 4ist structures
All leaves are at the same level
2 ` cHrootI ` b
/or all internal nodes ! e)cept the root+ a ` cH!I ` b
A node 5ith k children contains k U 1 elements )1
e )2
e [ e )kU1 dra5n %rom WL the subtrees corresponding to
the k children are denoted by T1+ T2+ [ + Tk. An Ha+bI#tree supports EcH!I searchE in the same 5ay that a binary tree
supports binary search+ thanks to the %ollo5ing order condition&
y ` )i %or all elements y stored in subtrees T1+ [ + Ti
)i e 6 %or all elements 6 stored in subtrees Ti]1+ [ + Tk
Definition: Ha+bI#trees 5ith b V 2 K a U 1 are kno5n as 5-trees NB' 72+ Com 79O.
The algorithms 5e discuss operate on internal nodes+ sho5n in 5hite in ")hibit 21.2>+ and ignore the leaves+
sho5n in black. /or the purpose o% understanding search and update algorithms+ leaves can be considered %ictitious
entities used only %or counting. $n practice+ ho5ever+ things are di%%erent. The internal nodes merely constitute a
directory to a %ile that is stored in the leaves. A lea% is typically a physical storage unit+ such as a disk block+ that
holds all the records 5hose key values lie bet5een t5o Had3acentI elements stored in internal nodes.
")hibit 21.2>& ")ample o% a H3+@I#tree
The number n o% elements stored in the internal nodes o% an Ha+bI#tree o% height h is bounded by
and thus
this sho5s that the class o% Ha+bI#trees satis%ies the balance condition h V CHlog nI. :e sho5 that this class also
meets the rebalancing condition+ namely+ that Ha+bI#trees support insertion and deletion in time CHlog nI.
$nsertion
$nsertion o% a ne5 element ) begins 5ith a search %or ) that terminates unsuccess%ully at a lea%. Let ! be the
parent node o% this lea%. $% ! contained %e5er than b U 1 elements be%ore the insertion+ insert ) into ! and
terminate. $% ! 5as %ull+ 5e imagine b elements temporarily s0uee6ed into the over%lo5ing node !. Let m be the
median o% these b elements+ and use m to split ! into t5o& a le%t node !L populated by the Hb U 1I Q 2 elements
smaller than m+ and a right node !
8
populated by the Hb U 1I Q 2 elements larger than m. The condition 2 K a U 1 ` b
ensures that Hb U 1I Q2 Z a U 1+ in other 5ords+ that each o% the t5o ne5 nodes contains at least a U 1 elements.
The median element m is pushed up5ard into the parent node+ 5here it serves as a separator bet5een the t5o
ne5 nodes !L and !8 that no5 take the place %ormerly inhabited by !. Thus the problem o% insertion into a node at
a given level is replaced by the same problem one level higher in the tree. The ne5 separator element may be
absorbed in a non%ull parent+ but i% the parent over%lo5s+ the splitting process described is repeated recursively. At
23<
This book is licensed under a Creative Commons Attribution 3.0 License
5orst+ the splitting process propagates to the root o% the tree+ 5here a ne5 root that contains only the median
element is created. Ha+bI#trees gro5 at the root+ and this is the reason %or allo5ing the root to have as %e5 as t5o
children.
(eletion
(eletion o% an element ) begins by searching %or it. As in the case o% binary search trees+ deletion is easiest at the
bottom o% the tree+ at a node o% ma)imal depth 5hose children are leaves. $% ) is %ound at a higher level o% the tree+
in a node that has internal nodes as children+ ) is the separator bet5een t5o subtrees TL and T8. :e replace ) by
another element 6+ either the largest element in TL or the smallest element in T8+ both o% 5hich are stored in a node
at the bottom o% the tree. A%ter this e)change+ the problem is reduced to deleting an element 6 %rom a node ! at the
deepest level.
$% deletion Ho% ) or 6I leaves ! 5ith at least a U 1 elements+ 5e are done. $% not+ 5e try to restore !;s occupancy
condition by stealing an element %rom an ad3acent sibling node '. $% there is no sibling ' that can spare an
element+ that is+ i% ' is minimally occupied+ ' and ! are merged into a single node L. L contains the a U 2 elements
o% !+ the a U 1 elements o% '+ and the separator bet5een ' and ! 5hich 5as stored in their parent node+ %or a total
o% 2 K Ha U 1I ` b U 1 elements. .ince the parent Ho% the old nodes ' and !+ and o% the ne5 node LI lost an element in
this merger+ the parent may under%lo5. As in the case o% insertion+ this under%lo5 can propagate to the root and
may cause its deletion. Thus Ha+bI#trees gro5 and shrink at the root.
Both insertion and deletion 5ork along a single path %rom the root do5n to a l ea% and HpossiblyI back up. Thus
their time is bounded by CHhI+ or e0uivalently+ by CHlog nI& Ha+bI#trees can be rebalanced in logarithmic time.
9morti:ed cost. The per%ormance o% Ha+bI#trees is better than the 5orst#case analysis above suggests. $t can be
sho5n that the total cost o% an se,uence of s insertions and deletions into an initially empty Ha+bI#tree is linear in
the length s o% the se0uence& 5hereas the 5orst#case cost o% a single operation is CHlog nI+ the amorti&ed cost per
operation is CH1I N'eh D<aO. Amorti6ed cost is a comple)ity measure that involves both an average and a 5orst#case
consideration. The average is taken over all operations in a se0uenceL the 5orst case is taken over all se0uences.
Although any one operation may take time CHlog nI+ 5e are guaranteed that the total o% all s operations in any
se0uence o% length s can be done in time CHsI+ as i% each single operation 5ere done in time CH1I.
")hibit 21.27& A slightly ske5ed H3+@I#tree.
")ercise& insertion and deletion in a H3+@I#tree
.tarting 5ith the H3+@I#tree sho5n in ")hibit 21.27+ per%orm the se0uence o% operations& insert 3D+ delete 10+
delete 12+ delete @0. (ra5 the tree a%ter each operation.
.olution
$nserting 3D causes a lea% and its parent to split H")hibit 21.2DI. (eleting 10 causes under%lo5+ remedied by
borro5ing an element %rom the le%t sibling H")hibit 21.29I. (eleting 12 causes under%lo5 in both a lea% and its
parent+ remedied by merging H")hibit 21.30I. (eleting @0 causes merging at the lea% level and borro5ing at the
parent level H")hibit 21.31I.
Algorithms and Data Structures 23@ A ,lobal Te)t
21. 4ist structures
")hibit 21.2D& !ode splits propagate to5ards the root
")hibit 21.29& A deletion is absorbed by borro5ing
")hibit 21.30& Another deletion propagates node merges to5ards the root
")hibit 21.31& !ode merges and borro5ing combined
43$F1-trees are the special case a V 2+ b V 3& each node has t5o or three children. ")hibit 21.32 omits the leaves.
.tarting 5ith the tree in state 1 5e insert the value 9& the rightmost node at the bottom level over%lo5s and splits+
the median D moves up into the parent. The parent also over%lo5s+ and the median > generates a ne5 root Hstate 2I.
The deletion o% 1 is absorbed 5ithout any rebalancing Hstate 3I. The deletion o% 2 causes a node to under%lo5+
remedied by stealing an element %rom a sibling& 2 is replaced by 3 and 3 is replaced by < Hstate <I. The deletion o% 3
23>
This book is licensed under a Creative Commons Attribution 3.0 License
triggers the merger o% the nodes assigned to 3 and @L this causes an under%lo5 in their parent+ 5hich in turn
propagates to the root and results in a tree o% reduced height Hstate @I.
")hibit 21.32& Tracing insertions and deletions in a H2+3I#tree
As mentioned earlier+ multi5ay trees are particularly use%ul %or managing data on a disk. $% each node is
allocated to its o5n disk block+ searching %or a record triggers as many disk accesses as there are levels in the tree.
The depth o% the tree is minimi6ed i% the ma)imal %an#out b is ma)imi6ed. :e can pack more elements into a node
by shrinking their si6e. As the records to be stored are normally much larger than their identi%ying keys+ 5e store
keys only in the internal nodes and store entire records in the leaves H5hich 5e had considered to be empty until
no5I. Thus the internal nodes serve as an inde) that assigns to a key value the path to the corresponding lea%.
")ercises and programming pro3ects
1. (esign and implement a list structure %or storing a sparse matri). Xour implementation should provide
procedures %or inserting+ deleting+ changing+ and reading matri) elements.
2. $mplement a %i%o 0ueue by a circular list using only one e)ternal pointer % and a sentinel. % al5ays points to
the sentinel and provides access to the head and tail o% the 0ueue.
3. $mplement a double#ended 0ueue Hde0ueI by a doubly linked list.
<. 5inar search trees and sorting A binary search tree given by the %ollo5ing declarations is used to manage
a set o% integers&
type nptr . _node
node . record 8, 5- nptr& x- integer end&
var root- nptr&
The empty tree is represented as root V nil.
HaI (ra5 the result o% inserting the se0uence >+ 1@+ <+ 2+ 7+ 12+ @+ 1D into the empty tree.
Algorithms and Data Structures 237 A ,lobal Te)t
21. 4ist structures
HbI :rite a procedure smallestHvar )& integerIL 5hich returns the smallest number stored in the tree+ and a
procedure remove smallestL 5hich deletes it. $% the tree is empty both procedures should call a
procedure messageH;tree is empty;IL
HcI :rite a procedure sortL that sorts the numbers stored in var a& arrayN1 .. nO o% integerL by inserting the
numbers into a binary search tree+ then 5riting them back to the array in sorted order as it traverses
the tree.
HdI Analy6e the asymptotic time comple)ity o% ;sort; in a typical and in the 5orst case.
HeI (oes this approach lead to a sorting algorithm o% time comple)ity ( )
@. ")tend the implementation o% a dictionary as a binary search tree in the RBinary search treesS section to
support the operations ;succ; and ;pred; as de%ined in chapter 19 in the section R(ictionaryS.
>. Insertion and deletion in AE/-trees: .tarting 5ith an empty AFL#tree+ insert 1+ 2+ @+ >+ 7+ D+ 9+ 3+ <+ in this
order. (ra5 the AFL#tree a%ter each insertion. !o5 delete all elements in the opposite order o% insertion
Hi.e. in last#in#%irst#out orderI. (oes the AFL#tree go through the same states as during insertion but in
reverse order?
7. $mplement an AFL#tree supporting the dictionary operations ;insert;+ ;delete;+ ;member;+ ;pred;+ and ;succ;.
D. ")plain ho5 to %ind the smallest element in an Ha+bI#tree and ho5 to %ind the predecessor o% a given
element in an Ha+bI#tree.
9. $mplement a dictionary as a B#tree.
23D
This book is licensed under a Creative Commons Attribution 3.0 License
--& Address computation
Learning ob3ectives&
hashing
per%ect hashing
collision resolution methods& separate chaining+ coalesced chaining+ open addressing Hlinear probing and
double hashingI
deletions degrade per%ormance o% a hash table
2er%ormance does not depend on the number o% data elements stored but on the load %actor o% the hash table.
randomi6ation& trans%orm unkno5n distribution into a uni%orm distribution
")tendible hashing uses a radi) tree to adapt the address range dynamically to the contents to be storedL
deletions do not degrade per%ormance.
order#preserving e)tendible hashing
oncepts and terminology
The term address computation Halso hashing+ hash coding+ scatter storage+ or ke-to-address transformationsI
re%ers to many search techni0ues that aim to assign an address o% a storage cell to any ke value ) by means o% a
%ormula that depends on ) only. Assigning an address to ) independently o% the presence or absence o% other key
values leads to %aster access than is possible 5ith the comparative search techni0ues discussed in earlier chapters.
Although this goal cannot al5ays be achieved+ address computation does provide the %astest access possible in
many practical situations.
:e use the %ollo5ing concepts and terminology H")hibit 22.1I. The home address a o% ) is obtained by means o%
a hash function h that maps the ke domain W into the address space A Ni.e. a V hH)IO. The address range is A V a0+
1+ [ + m U 1b+ 5here m is the number o% storage cells available. The storage cells are represented by an array TN0 .. m
U 1O+ the hash tableL TNaO is the cell addressed by a A. TNhH)IO is the cell 5here an element 5ith key value ) is
preferentiall stored+ but alas+ not necessarily.
")hibit 22.1& The hash %unction h maps a Htypically largeI key domain W into a Hmuch smallerI
address space A.
Algorithms and Data Structures 239 A ,lobal Te)t
22. Address computation
"ach cell has a capacity o% b d 0 elementsL b stands %or bucket capacit. The number n o% elements to be stored
is there%ore bounded by m K b. T5o cases are use%ully distinguished+ depending on 5hether the hash table resides on
disk or in central memory&
1. (isk or other secondary storage device& Considerations o% e%%iciency suggest that a bucket be identi%ied 5ith
a physical unit o% trans%er+ typically a disk block. .uch a unit is usually large compared to the si6e o% an
element+ and thus b d 1.
2. 'ain memory& Cell si6e is less important+ but the code is simplest i% a cell can hold e)actly one element Hi.e.
b V 1I.
/or simplicity o% e)position 5e assume that b V 1 unless other5ise statedL the generali6ation to arbitrary b is
straight%or5ard.
The key domain W is normally much larger than the number n o% elements to be stored and the number m o%
available cells TNaO. /or e)ample+ a table used %or storing a %e5 thousand identi%iers might have as its key domain
the set o% strings o% length at most 10 over the alphabet a;a;+ ;b;+ [ + ;6;+ ;0;+ [ + ;9;bL its cardinality is close to 3>
10
.
Thus in general the %unction h is many#to#one& (i%%erent key values map to the same address.
The content to be stored is a sample %rom the key domain& $t is not under the programmer;s control and is
usually not even kno5n 5hen the hash %unction and table si6e are chosen. Thus 5e must e)pect collisions+ that is+
events 5here more than b elements to be stored are assigned the same address. 0ollision resolution methods are
designed to handle this case by storing some o% the colliding elements else5here. The more collisions that occur+ the
longer the search time. .ince the number o% collisions is a random event+ the search time is a random variable.
Aash tables are kno5n %or e)cellent average per%ormance and %or terrible 5orst#case per%ormance+ 5hich+ one
hopes+ 5ill never occur.
Address computation techni0ues support the operations ;%ind; and ;insert; Hand to a lesser e)tent also ;delete;I in
e)pected time CH1I. This is a remarkable di%%erence %rom all other data structures that 5e have discussed so %ar+ in
that the average time comple)ity does not depend on the number n o% elements stored+ but on the load factor V
n Q Hm K bI+ or+ %or the special case b V 1& V n Q m. !ote that 0 ` ` 1.
Be%ore 5e consider the typical case o% a hash table+ 5e illustrate these concepts in t5o special cases 5here
everything is simpleL these represent ideals rarely attainable.
The special case of small (ey domains
$% the number o% possible key values is less than or e0ual to the number o% available storage cells+ h can map W
one#to#one into or onto A. "verything is simple and e%%icient because collisions never occur. Consider the %ollo5ing
e)ample&
W V a;a;+ ;b;+ [ + ;6;b+ A V a0+ [ + 2@b
hH)I V ordH)I U ordH;a;IL that is+
hH;a;I V 0+ hH;b;I V 1+ hH;c;I V 2+ [ + hH;6;I V 2@.
.ince h is one#to#one+ each key value ) is implied by its address hH)I. Thus 5e need not store the key values
e)plicitly+ as a single bit Hpresent Q absentI su%%ices&
var T- arrayF0 .. $(G of oolean&
function memer*x+- oolean&
egin return*TFh*x+G+ end&
2<0
This book is licensed under a Creative Commons Attribution 3.0 License
procedure insert*x+&
egin TFh*x+G -. true end&
procedure delete*x+&
egin TFh*x+G -. false end&
The idea o% collision#%ree address computation can be e)tended to large key domains through a combination o%
address computation and list processing techni0ues+ as 5e 5ill see in the chapter E'etric data structuresE.
The special case of perfect hashing# table contents (no*n a priori
Certain common applications re0uire storing a set o% elements that never changes. The set o% reserved 5ords o% a
programming language is an e)ampleL 5hen the le)ical analy6er o% a compiler e)tracts an identi%ier+ the %irst issue
to be determined is 5hether this is a reserved 5ord such as ;begin; or ;5hile;+ or 5hether it is programmer de%ined.
The special case 5here the table contents are kno5n a priori+ and no insertions or deletions occur+ is handled more
e%%iciently by special#purpose data structures than by a general dictionary.
$% the elements )1+ )2+ [ + )n to be stored are kno5n be%ore the hash table is designed+ the underlying key domain
is not as important as the set o% actually occurring key values. :e can usually %ind a table si6e m+ not much larger
than the number n o% elements to be stored+ and an easily evaluated hash %unction h that assigns to each )i a uni0ue
address %rom the address space a0+ [ + m U 1b. $t takes some trial and error to %ind such a perfect hash function h
%or a given set o% elements+ but the bene%it o% avoiding collisions is 5ell 5orth the e%%ort Ythe code that implements a
collision#%ree hash table is simple and %ast. A per%ect hash %unction 5orks %or a static table onlyYa single insertion+
a%ter h has been chosen+ is likely to cause a collision and destroy the simplicity o% the concept and e%%iciency o% the
implementation. 2er%ect hash %unctions should be generated automatically by a program.
The %ollo5ing unrealistically small e)ample illustrates typical approaches to designing a per%ect hash table. The
task gets harder as the number m o% available storage cells is reduced to5ard the minimum possible+ that is+ the
number n o% elements to be stored.
")ample
$n designing a per%ect hash table %or the elements 17+ 20+ 2<+ 3D+ and @1+ 5e look %or arithmetic patterns. These
are most easily detected by considering the binary representations o% the numbers to be stored&
( # % $ " 0 it position
") 0 " 0 0 0 "
$0 0 " 0 " 0 0
$# 0 " " 0 0 0
%2 " 0 0 " " 0
(" " " 0 0 " "
:e observe that the least signi%icant three bits identi%y each element uni0uely. There%ore+ the hash %unction hH)I
V ) mod D maps these %ive elements collision#%ree into the address space A V a0+ [ + >b+ 5ith m V 7 and t5o empty
cells. An attempt to %urther economi6e space leads us to observe that the bits in positions 1+ 2+ and 3+ 5ith 5eights 2+
<+ and D in the binary number representation+ also identi%y each element uni0uely+ 5hile ranging over the address
space o% minimal si6e A V a0+ [ + <b. The %unction hH)I V H) div 2I mod D e)tracts these three bits and assigns the
%ollo5ing addresses&
B- ") $0 $# %2 ("
O- 0 $ # % "
Algorithms and Data Structures 2<1 A ,lobal Te)t
22. Address computation
A per%ect hash table has to store each element e)plicitly+ not 3ust a bit HpresentQabsentI. $n the e)ample above+
the elements 0+ 1+ 1>+ 17+ 32+ 33+ [ all map into address 0+ but only 17 is present in the table. The access %unction
;memberH)I; is implemented as a single statement&
return **h*x+ J #+ cand *TFh*x+G . x++&
The boolean operator ;cand; used here is understood to be the conditional and& "valuation o% the e)pression
proceeds %rom le%t to right and stops as soon as its value is determined. $n our e)ample+ hH)I d < su%%ices to assign
;%alse; to the e)pression HhH)I ` <I and HTNhH)IO V )I. Thus the ;cand; operator guarantees that the table declared as&
var T- arrayF0 .. #G of element&
is accessed 5ithin its inde) bounds.
/or table contents o% realistic si6e it is impractical to construct a per%ect hash %unction manuallyY5e need a
program to search e)haustively through the large space o% %unctions. The more slack m U n 5e allo5+ the denser is
the population o% per%ect %unctions and the 0uicker 5e 5ill %ind one. N'eh D<aO presents analytical results on the
comple)ity o% %inding per%ect hash %unctions.
")ercise& per%ect hash tables
(esign several per%ect hash tables %or the content a3+ 13+ @7+ 71+ D2+ 93b.
.olution
(esigning a per%ect hash table is like ans5ering a 0uestion o% the type& :hat is the ne)t element in the se0uence
1+ <+ 9+ [ ? There are in%initely many ans5ers+ but some are more elegant than others. Consider&
h 3 13 @7 71 D2 93 Address range
H) div 3I mod 7 1 < @ 2 > 3 N1 .. >O
) mod 13 3 0 @ > < 2 N0 .. >O
H) div <I mod D 0 3 > 1 < 7 N0 .. 7O
i% ) V 71 then < else ) mod 7 3 > 1 ; @ 2 N1 .. >O
onventional hash tables# collision resolution
$n contrast to the special cases discussed+ most applications o% address computation present the data structure
designer 5ith greater uncertainties and less %avorable conditions. Typically+ the underlying key domain is much
larger than the available address range+ and not much is kno5n about the elements to be stored. :e may have an
upper bound on n+ and 5e may kno5 the probability distribution that governs the random sample o% elements to be
stored. $n setting up a customer list %or a local business+ %or e)ample+ the number o% customers may be bounded by
the population o% the to5n+ and the distribution o% last names can be obtained %rom the telephone directoryYmany
names 5ill start 5ith A and .+ hardly any 5ith = and X. Cn the basis o% such in%ormation+ but in ignorance o% the
actual table contents to be stored+ 5e must choose the si6e m o% the hash table and design the hash %unction h that
maps the key domain W into the address space AV a0+ [ + m U 1b. :e 5ill then have to live 5ith the conse0uences o%
these decisions+ at least until 5e decide to rehash& that is+ resi6e the table+ redesign the hash %unction+ and reinsert
all the elements that 5e have stored so %ar.
Later sections present some pragmatic advice on the choice o% hL %or no5+ let us assume that an appropriate hash
%unction is available. 8egardless o% ho5 smart a hash %unction 5e have designed+ collisions Hmore than b elements
share the same home address o% a bucket o% capacity bI are inevitable in practice. Thus hashing re0uires techni0ues
2<2
This book is licensed under a Creative Commons Attribution 3.0 License
%or handling collisions. :e present the three ma3or collision resolution techni0ues in use& separate chaining+
coalesced chaining+ and open addressing. The t5o techni0ues called chaining call upon list processing techni0ues to
organi6e over%lo5ing elements. Separate chaining is used 5hen these lists live in an over%lo5 area distinct %rom the
hash table properL coalesced chaining 5hen the lists live in unused parts o% the table. "pen addressing uses
address computation to organi6e over%lo5ing elements. "ach o% these three techni0ues comes in di%%erent
variationsL 5e illustrate one typical choice.
.eparate chaining
The memory allocated to the table is split into a primar and an overflow area. Any over%lo5ing cell or bucket in
the primary area is the head o% a list+ called the overflow chain+ that holds all elements that over%lo5 %rom that
bucket. ")hibit 22.2 sho5s si) elements inserted in the order )1+ )2+ [ . The %irst arrival resides at its home addressL
later ones get appended to the over%lo5 chain.
")hibit 22.2& .eparate chaining handles collisions in a separate over%lo5
area.
.eparate chaining is easy to understand& insert+ delete+ and search operations are simple. $n contrast to other
collision handling techni0ues+ this hybrid bet5een address computation and list processing has t5o ma3or
advantages& H1I deletions do not degrade the per%ormance o% the hash table+ and H2I regardless o% the number m o%
home addresses+ the hash table 5ill not over%lo5 until the entire memory is e)hausted. The si6e m o% the table has a
critical in%luence on the per%ormance. $% m n n+ over%lo5 chains are long and 5e have essentially a list processing
techni0ue that does not support direct access. $% m p n+ over%lo5 chains are short but 5e 5aste space in the table.
"ven %or the practical choice m o n+ separate chaining has some disadvantages&
T5o di%%erent accessing techni0ues are re0uired.
2ointers take up spaceL this may be a signi%icant overhead %or small elements.
Algorithms and Data Structures 2<3 A ,lobal Te)t
22. Address computation
'emory is partitioned into t5o separate areas that do not share space& $% the over%lo5 area is %ull+ the entire
table is %ull+ even i% there is still space in the array o% home cells. This consideration leads to the ne)t
techni0ue.
Coalesced chaining
The chains that emanate %rom over%lo5ing buckets are stored in the empty space in the hash table rather than in
a separate over%lo5 area H")hibit 22.3I. This has the advantage that all available space is utili6ed %ully He)cept %or
the overhead o% the pointersI. Ao5ever+ managing the space shared bet5een the t5o accessing techni0ues gets
complicated.
")hibit 22.3& Coalesced chaining handles collisions by building lists that share memory 5ith the hash
table.
The ne)t techni0ue has similar advantages Hin addition+ it incurs no overhead %or pointersI and disadvantagesL
all things considered+ it is probably the best collision resolution techni0ue.
Cpen addressing
Assign to each element ) W a probe se,uence a0
V hH)I+ a1+ a2+ [ o% addresses that %ills the entire address range
A. The intention is to store ) pre%erentially at a0+ but i% TNa0O is occupied then at a1+ and so on+ until the %irst empty
cell is encountered along the probe se0uence. The occupied cells along the probe se0uence are called the collision
path o% )Ynote that the collision path is a pre%i) o% the probe se0uence. $% 5e en%orce the invariant&
$% ) is in the table at TNaO and i% i precedes a in the probe se0uence %or )+ then TNiO is occupied. The %ollo5ing %ast
and simple loop that travels along the collision path can be used to search %or )&
a -. h*x+&
while TFaG T x and TFaG T empty do
a -. *next address in proe seAuence+&
Let us 5ork out the details so that this loop terminates correctly and the code is as concise and %ast as 5e can
make it.
The probe se0uence is de%ined by %ormulas in the program Han e)ample o% an implicit data structureI rather than
by pointers in the data as is the case in coalesced chaining.
")ample& linear probing
ai]1
V Hai
] 1I mod m is the simplest possible %ormula. $ts only disadvantage is a phenomenon called clustering.
Clustering arises 5hen the collision paths o% many elements in the table overlap to a large e)tent+ as is likely to
happen in linear probing. Cnce elements have collided+ linear probing 5ill store them in consecutive cells. All
elements that hash into this block o% contiguous occupied cells travel along the same collision path+ thus
2<<
This book is licensed under a Creative Commons Attribution 3.0 License
lengthening this blockL this in turn increases the probability that %uture elements 5ill hash into this block. Cnce this
positive %eedback loop gets started+ the cluster keeps gro5ing.
%ouble hashing is a special type o% open addressing designed to alleviate the clustering problem by letting
di%%erent elements travel 5ith steps o% di%%erent si6e. The probe se0uence is de%ined by the %ormulas
a0 V hH)I+ V gH)I d 0+ ai]1 V Hai
] I mod m+ m prime
g is a second hash %unction that maps the key space W into N1 .. m U 1O.
T5o important important details must be solved&
The probe se0uence o% each element must span the entire address range A. This is achieved i% m is relatively
prime to every step si6e + and the easiest 5ay to guarantee this condition is to choose m prime.
The termination condition o% the search loop above is& TNaO V ) or TNaO V empty. An unsuccess%ul search H)
not in the tableI can terminate only i% an address a is generated 5ith TNaO V empty. :e have already insisted
that each probe se0uence generates all addresses in A. $n addition+ 5e must guarantee that the table
contains at least one empty cell at all timesYthis serves as a sentinel to terminate the search loop.
The %ollo5ing declarations and procedures implement double hashing. :e assume that the comparison
operators V and f are de%ined on W+ and that W contains a special value ;empty;+ 5hich di%%ers %rom all values to be
stored in the table. /or e)ample+ a string o% blanks might denote ;empty; in a table o% identi%iers. :e choose to
identi%y an unsuccess%ul search by simply returning the address o% an empty cell.
const m . C & { si)e of hash table - must be prime. }
empty . C &
type key . C & addr . 0 .. m 0 "& step . " .. m 0 "&
var T- arrayFaddrG of key&
n- integer& { number of elements currently stored in , }
function h*x- key+- addr& { hash function for home address }
function g*x- key+- step& { hash function for step }
procedure init&
var a- addr&
egin
n -. 0&
for a -. 0 to m 0 " do TFaG -. empty
end&
function find*x- key+- addr&
var a- addr& d- step&
egin
a -. h*x+& d -. g*x+&
while *TFaG T x+ and *TFaG T empty+ do a -. *a 9 d+ mod m&
return*a+
end&
function insert*x- key+- addr&
var a- addr& d- step&
egin
a -. h*x+& d -. g*x+&
while TFaG T empty do egin
if TFaG . x then return*a+&
a -. *a 9 d+ mod m
end&
if n S m 0 " then { n -. n 9 "& TFaG -. x } else errM
msg*,tale is full,+&
Algorithms and Data Structures 2<@ A ,lobal Te)t
22. Address computation
return*a+
end&
(eletion o% elements creates problems+ as is the case in many types o% hash tables. An element to be deleted
cannot simply be replaced by ;empty;+ or else it might break the collision paths o% other elements still in the tableY
recall the basic invariant on 5hich the correctness o% open addressing is based. The idea o% rearranging elements in
the table so as to re%ill a cell that 5as emptied but needs to remain %ull is 0uickly abandoned as too complicatedYi%
deletions are numerous+ the programmer ought to choose a data structure that %ully supports deletions+ such as
balanced trees implemented as list structures. A limited number o% deletions can be accommodated in an open
address hash table by using the %ollo5ing techni0ue.
At any time+ a cell is in one o% three states&
empty H5as never occupied+ the initial state o% all cellsI
occupied HcurrentlyI
deleted Hused to be occupied but is currently %reeI
A cell in state ;empty; terminates the %ind loopL a cell in state ;empty; or in state ;deleted; terminates the insert
loop. The state diagram sho5n in ")hibit 22.< describes the transitions possible in the li%etime o% a cell. (eletions
degrade the per%ormance o% a hash table+ because a cell+ once occupied+ never returns to the virgin state ;empty;
5hich alone terminates an unsuccess%ul %ind. "ven i% an e0ual number o% insertions and deletions keeps a hash table
at a lo5 load %actor + unsuccess%ul %inds 5ill ultimately scan the entire table+ as all cells dri%t into one o% the states
;occupied; or ;deleted;. Be%ore this occurs+ the table ought to be rehashedL that is+ the contents are inserted into a
ne5+ initially empty table.
")hibit 22.<& This state diagram describes possible li%e cycles o% a cell& Cnce occupied+ a cell
5ill never again be as use%ul as an empty cell.
")ercise& hash table 5ith deletions
'odi%y the program above to implement double hashing 5ith deletions.
hoice of hash function# randomi,ation
$n conventional terminology+ hashing is based on the concept o% randomi:ation. The purpose o% randomi6ing
is to trans%orm an unknown distribution over the key domain W into a uni%orm distribution+ and to turn
consecutive samples that may be dependent into independent samples. This task appears to call %or magic+ and
indeed+ there is little or no mathematics that applies to the construction o% hash %unctionsL but there are
commonsense observations 5orth remembering. These observations are primarily Edon;tsE. They stem %rom
properties that sets o% elements 5e 5ish to store %re0uently possess+ and thus are based on some kno5ledge about
the populations to be stored. $% 5e assumed strictly nothing about these populations+ there 5ould be little to say
about hash %unctions& an order#preserving proportional mapping o% W into A 5ould be as good as any other
%unction. But in practice it is not+ as the %ollo5ing e)amples sho5.
2<>
This book is licensed under a Creative Commons Attribution 3.0 License
1. A /ortran compiler might use a hash table to store the set o% identi%iers it encounters in a program being
compiled. The rules o% the language and human habits conspire to make this set a highly biased sample
%rom the set o% legal /ortran identi%iers. E*ample: $nteger variables begin 5ith $+ + -+ L+ '+ !L this
convention is likely to generate a cluster o% identi%iers that begin 5ith one o% these letters. E*ample:
.uccessive identi%iers encountered cannot be considered independent samples& $% W and X have occurred+
there is a higher chance %or 4 to %ollo5 than %or :8-A,. E*ample: /re0uently+ 5e see se0uences o%
identi%iers or statement numbers 5hose character codes %orm arithmetic progressions+ such as A1+ A2+ A3+
[ or 10+ 20+ 30+ [ .
2. All %ile systems re0uire or encourage the use o% naming conventions+ so that most %ile names begin or end
5ith one o% 3ust a %e5 pre%i)es or su%%i)es+ such as KKK..X.+ KKK.BA-+ KKK.CB. An individual user+ or a user
community+ is likely to generate additional conventions+ so that most %ile names might begin+ %or e)ample+
5ith the initials o% the names o% the people involved. The %iles that store this te)t+ %or e)ample+ are
structured according to ;part; and ;chapter;+ so 5e are currently in %ile 2@ C22. $n some directories+ %ile
names might be sorted alphabetically+ so i% they are inserted into a table in order+ 5e process a monotonic
se0uence.
The purpose o% a hash %unction is to break up all regularities that might be present in the set o% elements to
be stored. This is most reliably achieved by EhashingE the elements+ a 5ord %or 5hich the dictionary o%%ers
the %ollo5ing e)planations& H1I %rom the /rench hache+ Ebattle#a)EL H2I to chop into small piecesL H3I to
con%use+ to muddle. Thus+ to appro)imate the elusive goal o% randomi6ation+ a hash %unction destroys
patterns+ including+ un%ortunately+ the order e de%ined on W. Aashing typically proceeds in t5o steps.
1. Convert the element ) into a number PH)I. $n most cases PH)I is an integer+ occasionally+ it is a real
number 0 ` PH)I e 1. :henever possible+ this conversion o% ) into PH)I involves no action at all& The
representation o% )+ 5hatever type ) may be+ is reinterpreted as the representation o% the number PH)I.
:hen ) is a variable#length item+ %or e)ample a string+ the representation o% ) is partitioned into pieces
o% suitable length that are E%oldedE on top o% each other. /or e)ample+ the %our#letter 5ord ) V ;hash; is
encoded one letter per byte using the 7#bit A.C$$ code and a leading 0 as 01101000 01100001 01110011
01101000. $t may be %olded to %orm a 1>#bit integer by e)clusive#or o% the leading pair o% bytes 5ith the
trailing pair o% bytes&
0110100001100001
)or 01110011011010000
0001101100001001 5hich represents PH)I V 27 K 2
D
] 9 V >921.
.uch %olding+ by itsel%+ is not hashing. 2atterns in the representation o% elements easily survive %olding.
/or e)ample+ the leading 0 5e have used to pad the 7#bit A.C$$ code to an D#bit byte remains a 6ero
regardless o% ). $% 5e had padded 5ith a trailing 6ero+ all PH)I 5ould be even. Because PH)I o%ten has the
same representation as )+ or a closely related one+ 5e drop PHI and use ) slightly ambiguously to denote
both the original element and its interpretation as a number.
2. .cramble ) Nmore precisely+ PH)IO to obtain hH)I. Any scrambling techni0ue is a sensible try+ as long as
it avoids %airly obvious pit%alls. 8ules o% thumb&
Algorithms and Data Structures 2<7 A ,lobal Te)t
22. Address computation
"ach bit o% an address hH)I should depend on all bits o% the key value ). $n particular+ don;t ignore
any part o% ) in computing hH)I. Thus hH)I V ) mod 2
13
is suspect+ as only the least signi%icant 13 bits
o% ) a%%ect hH)I.
'ake sure that arithmetic progressions such as Ch1+ Ch2+ Ch3+ [ get broken up rather than being
mapped into arithmetic progressions. Thus hH)I V ) mod k+ 5here k is signi%icantly smaller than the
table si6e m+ is suspect.
Avoid any %unction that cannot produce a uni%orm distribution o% addresses. Thus hH)I V )
2
is
suspectL i% ) is uni%ormly distributed in N0+ 1O+ the distribution o% )
2
is highly ske5ed.
A hash %unction must be %ast and simple. All o% the desiderata above are obtained by a hash %unction o% the type&
hH)I V ) mod m
5here m is the table si6e and a prime number+ and ) is the key value interpreted as an integer.
!o hash %unction is guaranteed to avoid the 5orst case o% hashing+ namely+ that all elements to be stored collide
on one address Hthis happens here i% 5e store only multiples o% the prime mI. Thus a hash %unction must be 3udged
in relation to the data it is being asked to store+ and usually this is possible only a%ter one has begun using it.
Aashing provides a per%ect e)ample %or the in3unction that the programmer must think about the data+ analy6e its
statistical properties+ and adapt the program to the data i% necessary.
!erformance analysis
:e analy6e open addressing 5ithout deletions assuming that each address t
i
is chosen independently o% all
other addresses %rom a uni%orm distribution over A. This assumption is reasonable %or double hashing and leads to
the conclusion that the average cost %or a search operation in a hash table is CH1I i% 5e consider the load %actor to
be constant. :e analy6e the average number o% probes e)ecuted as a %unction o% in t5o cases& *HI %or an
unsuccess%ul search+ and .HI %or a success%ul search.
Let pi denote the probability o% using e*actl i probes in an unsuccess%ul search. This event occurs i% the %irst $ U
1 probes hit occupied cells+ and the i#th probe hits an empty cell& p
i
V
iU1
K H1 U I. Let 0i denote the probability that
at least i probes are used in an unsuccess%ul searchL this occurs i% the %irst i U 1 inspected cells are occupied& 0i
V
iU1
.
0i can also be e)pressed as the sum o% the probabilities that 5e probe e)actly 3 cells+ %or 3 running %rom i to m. Thus
5e obtain
The number o% probes e)ecuted in a success%ul search %or an element ) e0uals the number o% probes in an
unsuccess%ul search %or the same element ) be%ore it is inserted into the hash table. N!ote& This holds only 5hen
elements are never relocated or deletedO. Thus the average number o% probes needed to search %or the i#th element
inserted into the hash table is *HHi U 1I Q mI+ and .HI can be computed as the average o% *HI+ %or increasing in
discrete steps %rom 0 to . $t is a reasonable appro)imation to let vary continuously in the range %rom 0 to &
2<D
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 22.@ suggests that a reasonable operating range %or a hash table keeps the load %actor bet5een 0.2@ and
0.7@. $% is much smaller+ 5e 5aste space+ i% it is larger than 7@ per cent+ 5e get into a domain 5here the
per%ormance degrades rapidly. !ote& $% all searches are success%ul+ a hash table per%orms 5ell even i% loaded up to
9@ per centYunsuccess%ul searching is the killerM
Table 22.1& The average number o% probes per search gro5s rapidly as the load %actor approaches 1.
h 0.2@ 0.@ 0.7@ 0.9 0.9@ 0.99
*HI 1.3 2.0 <.0 10.0 20.0 100.0
.HI 1.2 1.< 1.D 2.> 3.2 <.7
")hibit 22.@& The average number o% probes per search gro5s rapidly as the load %actor approaches 1.
Thus the hash table designer should be able to estimate n 5ithin a %actor o% 2Ynot an easy task. An incorrect
guess may 5aste memory or cause poor per%ormance+ even table over%lo5 %ollo5ed by a crash. $% the programmer
becomes a5are that the load %actor lies outside this range+ she may rehashYchange the si6e o% the table+ change the
hash %unction+ and reinsert all elements previously stored.
6/tendible hashing
$n contrast to standard hashing methods+ e)tendible %orms o% hashing allo5 %or the dynamic e)tension or
shrinkage o% the address range into 5hich the hash %unction maps the keys. This has t5o ma3or advantages& H1I
'emory is allocated only as needed Hit is unnecessary to determine the si6e o% the address range a prioriI+ and H2I
deletion o% elements does not degrade per%ormance. As the address range changes+ the hash %unction is changed in
such a 5ay that only a %e5 elements are assigned a ne5 address and need to be stored in a ne5 bucket. The idea that
makes this possible is to map the keys into a very large address space+ o% 5hich only a portion is active at any given
time.
Farious e)tendible hashing methods di%%er in the 5ay they represent and manage a smaller active address
range o% variable si6e that is a subrange o% a larger virtual address range. $n the %ollo5ing 5e describe the method
o% e)tendible hashing that is especially 5ell suited %or storing data on secondary storage devicesL in this case an
address points to a physical block o% secondary storage that can contain more than one element. An address is a bit
string o% ma)imum length kL ho5ever+ at any time only a pre%i) o% d bits is used. $% all bit strings o% length k are
represented by a so#called radi* tree o% height k+ the active part o% all bit strings is obtained by using only the upper
d levels o% the tree Hi.e. by cutting the tree at level dI. ")hibit 22.> sho5s an e)ample %or d V 3.
Algorithms and Data Structures 2<9 A ,lobal Te)t
22. Address computation
")hibit 22.>& Address space organi6ed as a binary radi) tree.
The radi) tree sho5n in ")hibit 22.> H5ithout the nodes that have been clippedI describes an active address
range 5ith addresses a00+ 010+ 011+ 1b that are considered as bit strings or binary numbers. To each active node
5ith address s there corresponds a bucket B that can store b records. $% a ne5 element has to be inserted into a %ull
bucket B+ then B is split& $nstead o% B 5e %ind t5o t5in buckets B0 and B1 5hich have a one bit longer address than B+
and the elements stored in B are distributed among B0 and B1 according to this bit. The ne5 radi) tree no5 has to
point to the t5o data buckets B0 and B1 instead o% BL that is+ the active address range must be e)tended locally Hby
moving the broken line in ")hibit 22.>I. $% the block 5ith address 00 over%lo5s+ t5o ne5 t5in blocks 5ith addresses
000 and 001 5ill be created 5hich are represented by the corresponding nodes in the tree. $% the over%lo5ing bucket
B has depth d+ then d is incremented by 1 and the radi) tree gro5s by one level.
$n e)tendible hashing the clipped radi) tree is represented by a directory that is implemented by an array. Let d
be the ma)imum number o% bits that are used in one o% the bit strings %or %orming an addressL in the e)ample above+
d V 3. Then the directory consists o% 2
d
entries. "ach entry in this directory corresponds to an address and points to
a physical data bucket 5hich contains all elements that have been assigned this address by the hash %unction h. The
directory %or the radi) tree in ")hibit 22.> looks as sho5n in ")hibit 22.7.
")hibit 22.7& The active address range o% the tree in ")hibit 22.> implemented as an array.
The bucket 5ith address 010 corresponds to a node on level 3 o% the radi) tree+ and there is only one entry in the
directory corresponding to this bucket. $% this bucket over%lo5s+ the directory and data buckets are reorgani6ed as
sho5n in ")hibit 22.D. T5o t5in buckets that 3ointly contain %e5er than b elements are merged into a single bucket.
This keeps the average bucket occupancy at a high 70 per cent even in the presence o% deletions+ as probabilistic
analysis predicts and simulation results con%irm. Bucket merging may lead to halving the directory. A %ormerly
large %ile that shrinks to a much smaller si6e 5ill have its directory shrink in proportion. Thus e)tendible hashing+
unlike conventional hashing+ su%%ers no permanent per%ormance degradation under deletions.
2@0
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 22.D& An over%lo5ing bucket may trigger doubling o% the directory.
A virtual radi/ tree# order)preserving e/tendible hashing
Aashing+ in the usual sense o% the 5ord+ destroys structure and thus buys uni%ormity at the cost o% order.
")tendible hashing+ on the other hand+ is practical 5ithout randomi6ation and thus needs not accept its inevitable
conse0uence+ the destruction o% order. A uni%orm distribution o% elements is not nearly as important&
!onuni%ormity causes the directory to be deeper and thus larger than it 5ould be %or a uni%orm distribution+ but it
a%%ects neither access time nor bucket occupancy. And the directory is only a small space overhead on top o% the
space re0uired to store the data& $t typically contains only one or a %e5 pointers+ say a do6en bytes+ per data bucket
o%+ say 1k bytesL it adds perhaps a %e5 percent to the total space re0uirement o% the table+ so its gro5th is not
critical. Thus e)tendible hashing remains %easible 5hen the identity is used as the address computation %unction h+
in 5hich case data is accessible and can be processed se0uentially in the order ` de%ined on the domain W.
:hen h preserves order+ the 5ord hashing seems out o% place. $% the directory resides in central memory and the
data buckets on disk+ 5hat 5e are implementing is a virtual memory organi6ed in the %orm o% a radi) tree o%
unbounded si6e. $n contrast to conventional virtual memory+ 5hose address space gro5s only at one end+ this
address space can gro5 any5here& $t is a virtual radi) tree.
As an e)ample+ consider the domain W o% character strings up to length 32+ say+ and assume that elements to be
stored are sampled according to the distribution o% the %irst letter in "nglish 5ords. :e obtain an appro)imate
distribution by counting pages in a dictionary H")hibit 22.9I. "ncode the blank as 00000+ ;a; as 00001+ up to ;6; as
11011+ so that ;aah;+ %or e)ample+ has the code 00001 00001 01000 00000 [ H29 0uintuples o% 6eros pad ;aah;
to32lettersI. This address computation %unction h is almost an identity& $t maps a; ;+ ;a;+ [ + ;6;b
32
one#to#one into a0+
1b
1>0
. .uch an order#preserving address computation %unction supports many use%ul types o% operations& %or
e)ample+ range 0ueries such as Elist in alphabetic order all the 5ords stored %rom ;uni); to ;)inu; E.
Algorithms and Data Structures 2@1 A ,lobal Te)t
22. Address computation
")hibit 22.9& 8elative %re0uency o% 5ords beginning 5ith a given letter in :ebster;s dictionary.
$% there is one page o% 5ords starting 5ith W %or 1>0 pages o% 5ords starting 5ith .+ this suggests that i% our
active address space is partitioned into e0ually si6ed intervals+ some intervals may be populated 1>0 times more
densely than others. This translates into a directory that may be 1>0 times larger than necessary %or a uni%orm
distribution+ or+ since directories gro5 as po5ers o% 2+ may be 12D or 2@> times larger. This sounds like a lot but
may 5ell be bearable+ as the %ollo5ing estimates sho5.
Assume that 5e store 10
@
records on disk+ 5ith an average occupancy o% 100 records per bucket+ re0uiring about
1000 buckets. A uni%orm distribution generates a directory 5ith one entry per bucket+ %or a total o% 1k entries+ say
2k or <k bytes. The nonuni%orm distribution above re0uires the same number o% buckets+ about 1+000+ but
generates a directory o% 2@>k entries. $% a pointer re0uires 2 to < bytes+ this amounts to 0.@ to 1 'byte. This is less o%
a memory re0uirement than many applications re0uire on today;s personal computers. $% the application 5arrants
it He.g. %or an on#line reservation systemI 1 'byte o% memory is a small price to pay.
Thus 5e see that %or large data sets+ e)tendible hashing appro)imates the ideal characteristics o% the special case
5e discussed in this chapter;s section on Rthe special case o% small key domainsS. All it takes is a disk and a central
memory o% a si6e that is standard today but 5as practically in%easible a decade ago+ impossible t5o decades ago+ and
unthought o% three decades ago.
")ercises and programming pro3ects
1. (esign a per%ect hash table %or the elements 1+ 10+ 1<+ 20+ 2@+ and 2>.
2. The si) names AL+ /L+ ,A+ !C+ .C and FA must be distinguished %rom all other ordered pairs o% uppercase
letters. To solve this problem+ these names are stored in the array T such that they can easily be %ound by
means o% a hash %unction h.
type addr . 0 .. )&
pair . record c", c$- ,O, .. ,H, end&
var T- array FaddrG of pair&
HaI :rite a
function h *name- pair+- adr&
5hich maps the si) names onto di%%erent addresses in the range ;adr;.
2@2
This book is licensed under a Creative Commons Attribution 3.0 License
HbI :rite a
procedure initTale&
5hich initiali6es the entries o% the hash table T.
HcI :rite a
function memer *name- pair+- oolean&
5hich returns %or any pair o% uppercase letters 5hether it is stored in T.
3. Consider the hash %unction hH)I V ) mod 9 %or a table having nine entries. Collisions in this hash table are
resolved by coalesced chaining. (emonstrate the insertion o% the elements 1<+ 19+ 10+ >+ 11+ <2+ 21+ D+ and 1.
<. Consider inserting the keys 1<+ 1+ 19+ 10+ >+ 11+ <2+ 21+ D+ and 17 into a hash table o% length m V 13 using open
addressing 5ith the hash %unction hH)I V ) mod m. .ho5 the result o% inserting these elements using
HaI Linear probing.
HbI (ouble hashing 5ith the second hash %unction gH)I V 1 ] ) mod Hm]1I.
@. $mplement a dictionary supporting the operations ;insert;+ ;delete;+ and ;member; as a hash table 5ith
double hashing.
>. $mplement a dictionary supporting the operations ;insert;+ ;delete;+ ;member;+ ;succ;+ and ;pred; by order#
preserving e)tendible hashing.
Algorithms and Data Structures 2@3 A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
-0& @etric data structures
Learning ob3ectives&
organi6ing the embedding space versus organi6ing its contents
0uadtrees and octtrees. grid %ile. t5o#disk#access principle
simple geometric ob3ects and their parameter spaces
region 0ueries o% arbitrary shape
appro)imation o% comple) ob3ects by enclosing them in simple containers
:rgani,ing the embedding space versus organi,ing its contents
'ost o% the data structures discussed so %ar organi6e the set o% elements to be stored depending primarily+ or
even e)clusively+ on the relative values o% these elements to each other and perhaps on their order o% insertion into
the data structure. C%ten+ the only assumption made about these elements is that they are dra5n %rom an ordered
domain+ and thus these structures support only comparative search techni0ues& the search argument is compared
against stored elements. The shape o% data structures based on comparative search varies dynamically 5ith the set
o% elements currently storedL it does not depend on the static domain %rom 5hich these elements are samples. These
techni0ues organi6e the particular contents to be stored rather than the embedding space.
The data structures discussed in this chapter mirror and organi6e the domain %rom 5hich the elements are
dra5nYmuch o% their structure is determined be%ore the %irst element is ever inserted. This is typically done on the
basis o% %i)ed points o% re%erence 5hich are independent o% the current contents+ as inch marks on a measuring scale
are independent o% 5hat is being measured. /or this reason 5e call data structures that organi6e the embedding
space metric data structures. They are o% increasing importance+ in particular %or spatial data+ such as needed in
computer#aided design or geographic data processing. Typically+ these domains e)hibit a much richer structure
than a mere order& $n t5o# or three#dimensional "uclidean space+ %or e)ample+ not only is order de%ined along an
line Hnot 3ust the coordinate a)esI+ but also distance bet5een any t5o points. 'ost 0ueries about spatial data
involve the absolute position o% elements in space+ not 3ust their relative position among each other. A typical 0uery
in graphics+ %or e)ample+ asks %or the %irst ob3ect intercepted by a given ray o% light. Computing the ans5er involves
absolute position Hthe location o% the rayI and relative order Hnearest along the rayI. A data structure that supports
direct access to ob3ects according to their position in space can clearly be more e%%icient than one based merely on
the relative position o% elements.
The terms Eorgani6ing the embedding spaceE and Eorgani6ing its contentsE suggest t5o e)tremes along a
spectrum o% possibilities. As 5e have seen in previous chapters+ ho5ever+ many data structures are hybrids that
combine %eatures %rom distinct types. This is particularly true o% metric data structures& They al5ays have aspects o%
address computation needed to locate elements in space+ and they o%ten use list processing techni0ues %or e%%icient
memory utili6ation.
Algorithms and Data Structures 2@< A ,lobal Te)t
23. /etric data structures
'adi/ trees$ tries
:e have encountered binary radi) trees+ and a possible implementation+ in chapter 22 in the section R")tendible
hashingS. 8adi) trees 5ith a branching %actor+ or fan-out+ greater than 2 are ubi0uitous. The (e5ey decimal
classi%ication used in libraries is a radi) tree 5ith a %an#out o% 10. The hierarchical structure o% many te)tbooks+
including this one+ can be seen as a radi) tree 5ith a %an#out determined by ho5 many subsections at depth d ] 1
are packed into a section at depth d.
As another e)ample+ consider tries+ a type o% radi) tree that permits the retrieval o% variable#length data. As 5e
traverse the tree+ 5e check 5hether or not the node 5e are visiting has any successors. Thus the trie can be very
long along certain paths. As an e)ample+ consider a trie containing 5ords in the "nglish language. $n ")hibit 23.1
belo5+ the %our 5ords ;a;+ ;at;+ ;ate;+ and ;be; are sho5n e)plicitly. The letter ;a; is a 5ord and is the %irst letter o% other
5ords. The %ield corresponding to ;a; contains the value 1+ signaling that 5e have spelled a valid 5ord+ and there is a
pointer to longer 5ords beginning 5ith ;a;. The letter ;b; is not a 5ord+ thus is marked by a 0+ but it is the beginning
o% many 5ords+ all %ound by %ollo5ing its pointer. The string ;aa; is neither a 5ord nor the beginning o% a 5ord+ so its
%ield contains 0 and its pointer is ;nil;.
")hibit 23.1& A radi) tree over the alphabet o% letters stores Hpre%i)es o%I 5ords.
Cnly a %e5 5ords begin 5ith ;ate;+ but among these there are some long ones+ such as ;atelectasis;. $t 5ould be
5aste%ul to introduce eight additional nodes+ one %or each o% the characters in ;lectasis;+ 3ust to record this 5ord+
5ithout making signi%icant use o% the %an#out o% 2> provided at each node. Thus tries typically use an Eover%lo5
techni0ueE to handle long entries& The pointer %ield o% the pre%i) ;ate; might point to a te)t %ield that contains
;Hate#Ilectasis; and ;Hate#Ilier;.
.uadtrees and octtrees
Consider a s0uare recursively partitioned into 0uadrants. ")hibit 23.2 23.2 sho5s such a s0uare partitioned to
the depth o% <. There are < 0uadrants at depth 1+ separated by the thickest linesL < K < Hsub#I0uadrants separated by
slightly thinner linesL <
3
Hsub#sub#I0uadrants separated by yet thinner linesL and %inally+ <
<
V 2@> lea% 0uadrants
separated by the thinnest lines. The partitioning structure described is a ,uadtree+ a particular type o% radi) tree o%
%an#out <. The root corresponds to the entire s0uare+ its < children to the < 0uadrants at depth 1+ and so on+ as
sho5n in the ")hibit 23.2.
2@@
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 23.2& A 0uarter circle digiti6ed on a 1> K 1> grid+ and its representation as a <#level 0uadtree.
A 0uadtree is the obvious t5o#dimensional analog o% the one#dimensional binary radi) tree 5e have seen.
Accordingly+ 0uadtrees are %re0uently used to represent+ store+ and process spatial data+ such as images. The %igure
sho5s a 0uarter circle+ digiti6ed on a 1> K 1> grid o% pi)els. This image is most easily represented by a 1> K 1> array o%
bits. The 0uadtree provides an alternative representation that is advantageous %or images digiti6ed to a high level o%
resolution. 'ost graphic images in practice are digiti6ed on rectangular grids o% any5here %rom hundreds to
thousands o% pi)els on a side& %or e)ample+ @12 K @12. $n a 0uadtree+ only the largest 0uadrants o% constant color
Hblack or 5hite+ in our e)ampleI are represented e)plicitlyL their sub0uadrants are implicit.
The 0uadtree in ")hibit 23.2 is interpreted as %ollo5s. C% the %our children o% the root+ the north5est 0uadrant+
labeled 1+ is simple& entirely 5hite. This %act is recorded in the root. The other three children+ labeled 0+ 2+ and 3+
contain both black and 5hite pi)els. As their description is not simple+ it is contained in three 0uadtrees+ one %or
each 0uadrant. 2ointers to these sub0uadtrees emanate %rom the corresponding %ields o% the root.
The south5estern 0uadrant labeled 2 in turn has %our 0uadrants at depth 2. Three o% these+ labeled 2.0+ 2.1+ and
2.2+ are entirely 5hiteL no pointers emanate %rom the corresponding %ields in this node. .ub0uadrant 2.3 contains
both black and 5hite pi)elsL thus the corresponding %ield contains a pointer to a sub#sub0uadtree.
$n this discussion 5e have introduced a notation to identi%y every 0uadrant at any depth o% the 0uadtree. The
root is identi%ied by the null stringL a 0uadrant at depth d is uni0uely identi%ied by a string o% d radi)#< digits. This
string can be interpreted in various 5ays as a number e)pressed in base <. Thus accessing and processing a
0uadtree is readily reduced to arithmetic.
Breadth#%irst addressing
Label the root 0+ its children 1+ 2+ 3+ <+ its grand children @ through 20+ and so on+ one generation a%ter the other.
Algorithms and Data Structures 2@> A ,lobal Te)t
23. /etric data structures
0
" $ % #
( ' ) 2 9 "0 "" "$ "% "# "( "' ") "2 "9 $0
!otice that the children o% any node i are < K i ] 1+ < K i ] 2+ < K i ] 3+ < K i ] <. The parent o% node i is Hi U 1I div <.
This is similar to the address computation used in the heap o% R$mplicit data structuresS+ a binary tree 5here each
node i has children 2 K i and 2 K i ] 1L and the parent o% node i is obtained as i div 2.
")ercise
The string o% radi) < digits along a path %rom the root to any node is called the path address o% this node.
$nterpret the path address as an integer+ most signi%icant digit %irst. These integers label the nodes at depth d d 0
consecutively %rom 0 to <
d
U 1. (evise a %ormula that trans%orms the path address into the breadth#%irst address.
This %ormula can be used to store a 0uadtree as a one#dimensional array.
(ata compression
The representation o% an image as a 0uadtree is sometimes much more compact than its representation as a bit
map. T5o conditions must hold %or this to be true&
1. The image must be %airly large+ typically hundreds o% pi)els on a side.
2. The image must have large areas o% constant value HcolorI.
The 0uadtree %or the 0uarter circle above+ %or e)ample+ has only 1< nodes. A bit map o% the same image re0uires
2@> bits. :hich representation re0uires more storage? Certainly the 0uadtree. $% 5e store it as a list+ each node
must be able to hold %our pointers+ say < or D bytes. $% a pointer has value ;nil;+ indicating that its 0uadrant needs no
re%inement+ 5e need a bit to indicate the color o% this 0uadrant H5hite or blackI+ or a total o% < bits. $% 5e store the
0uadtree breadth#%irst+ no pointers are needed as the node relationships are e)pressed by address computationL
thus a node is reduced to %our three#valued %ields H;5hite;+ ;black;+ or ;re%ine;I+ conveniently stored in D bits+ or 1 byte.
This implicit data structure 5ill leave many unused holes in memory. Thus 0uadtrees do not achieve data
compression %or small images.
Ccttrees
")actly the same idea %or three#dimensional space as 0uadtrees are %or t5o#dimensional space& A cube is
recursively partitioned into eight octants+ using three orthogonal planes.
Spatial data structures# ob;ectives and constraints
'etric data structures are used primarily %or storing spatial data+ such as points and simple geometric ob3ects
embedded in a multidimensional space. The most important ob3ectives a spatial data structure must meet include&
1. "%%icient handling o% large+ dynamically varying data sets in interactive applications
2. /ast access to ob3ects identi%ied in a %ully speci%ied 0uery
3. "%%icient processing o% pro)imity 0ueries and region 0ueries o% arbitrary shape
<. A uni%ormly high memory utili6ation
Achieving these ob3ectives is sub3ect to many constraints+ and results in trade#o%%s.
<anaging disks. By Elarge data setE 5e mean one that must be stored on diskL only a small %raction o% the data
can be kept in central memory at any one time. 'any data structures can be used in central memory+ but the choice
is much more restricted 5hen it comes to managing disks because o% the 5ell#kno5n Ememory speed gapE
2@7
This book is licensed under a Creative Commons Attribution 3.0 License
phenomenon. Central memory is organi6ed in small physical units Ha byte+ a 5ordI 5ith access times o%
appro)imately 1 microsecond+ 10
U>
second. (isks are organi6ein large physical blocks H@12 bytes to @kilobytesI 5ith
access times ranging %rom 10 to 100 milliseconds H10
U2
to 10
U1
secondI. Compared to central memory+ a disk delivers
data blocks typically 10
3
times larger 5ith a delay 10
<
times greater. $n terms o% the data rate delivered to the
central processing unit&
the disk is a storage device 5hose e%%ectiveness is 5ithin an order o% magnitude o% that o% central memory. The large
si6e o% a physical disk block is a potential source o% ine%%iciency that can easily reduce the use%ul data rate o% a disk a
hundred%old or a thousand%old. Accessing a couple o% bytes on disk+ say a pointer needed to traverse a list+ takes
about as long as accessing the entire disk block. Thus the game o% managing disks is about minimi&ing the number
of disk accesses.
%ynamically varying data. The ma3ority o% computer applications today are interactive. That means that
insertions+ deletions+ and modi%ications o% data are at least as %re0uent as operations that merely process %i)ed data.
(ata structures that entail a systematic degradation o% per%ormance 5ith continued use Hsuch as ever#lengthening
over%lo5 chains+ or an ever#increasing number o% cells marked EdeletedE in a conventional hash tableI are
unsuitable. Cnly structures that automatically adapt their shape to accommodate ever#changing contents can
provide uni%orm response times.
=nstantaneous response. $nteractive use o% computers sets another ma3or challenge %or data management&
the goal o% providing Einstantaneous responseE to a %ully speci%ied 0uery. E/ullyE speci%ied means that every
attribute relevant %or the search has been provided+ and that at most one element satis%ies the 0uery. $magine the
user clicking an icon on the screen+ and the ob3ect represented by the icon appears instantaneously. $n human
terms+ EinstantaneousE is a 5ell#de%ined physiological 0uantity+ namely+ about o% a second+ the limit o% human time
resolution. $deally+ an interactive system retrieves any single element %ully speci%ied in a 0uery 5ithin 0.1 second.
Two-disk-access principle. :e have already stated that in today;s technology+ a disk access typically takes
%rom tens o% milliseconds. Thus the goal o% retrieving any single element in 0.1 second translates into Eretrieve any
element in at most a %e5 disk accessesE. /ortunately+ it turns out that use%ul data structure can be designed that
access data in a t5o#step process& H1I access the correct portion o% a directory+ and H2I access the correct data
bucket. *nder the assumption that both data and directory are so large that they are stored on disk+ 5e call this the
two-disk-access principle'
Proximity 8ueries and region 8ueries of arbitrary shape. The simplest e)ample o% a pro)imity 0uery is
the operation ;ne)t;+ 5hich 5e have o%ten encountered in one#dimensional data structure traversals& ,iven a pointer
to an element+ get the ne)t element Hthe successor or the predecessorI according to the order de%ined on the
domain. Another simple e)ample is an interval or range 0uery such as Eget all ) bet5een 13 and 17E. This
generali6es directly to k#dimensional orthogonal range ,ueries such as the t5o#dimensional 0uery Eget all H)1+ )2I
5ith 13 ` )1
e 17 and 3 ` )2
e <E. $n geometric computation+ %or e)ample+ many other instances o% pro)imity 0ueries
are important+ such as the Enearest neighborE Hin any directionI+ or intersection 0ueriesamong ob3ects. 8egion
0ueries o% arbitrary shape Hnot 3ust rectangularI are able to e)press a variety o% geometric conditions.
Algorithms and Data Structures 2@D A ,lobal Te)t
23. /etric data structures
'niformly high memory utili:ation. Any data structure that adapts its shape to dynamically changing
contents is likely to leave Eunused holesE in storage space& space that is currently unused+ and that cannot
conveniently be used %or other purposes because it is %ragmented. :e have encountered this phenomenon in
multi5ay trees such as B#trees and in hash tables. $t is practically unavoidable that dynamic data structures use
their allocated space to less than 100s+ and an average space utili6ation o% @0s is o%ten tolerable. The danger to
avoid is a built#in bias that drives space utili6ation to5ard 0 5hen the %ile shrinksYelements get deleted but their
space is not relin0uished. The grid %ile+ to be discussed ne)t+ achieves an average memory utili6ation o% about 70s
regardless o% the mi) o% insertions or deletions.
The grid file
The grid %ile is a metric data structure designed to store points and simple geometric ob3ects in
multidimensional space so as to achieve the ob3ectives stated above. This section describes its architecture+ access
and update algorithms+ and properties. 'ore details can be %ound in N!A. D<O and NAin D@O.
.cales+ directory+ buckets
Consider as an e)ample a t5o#dimensional domain& the Cartesian product W1 u W2+ 5here W1
V 0 .. 1999 is a
subrange o% the integers+ and W2
V a .. 6 is the ordered set o% the 2> characters o% the "nglish alphabet. 2airs o% the
%orm H)1+ )2I+ such as H19DD+ 5I+ are elements %rom this domain.
The bit map is a natural data structure %or storing a set . o% elements %rom W1 u W2. $t may be declared as
var T- arrayFB
"
, B
$
G of oolean&
5ith the convention that
TFx
"
, x
$
G . true *x
"
, x
$
+ S.
Basic set operations are per%ormed by direct access to the array element corresponding to an element& %indH)1+
)2I is simply the boolean e)pression TN)1+ )2OL insertH)1+ )2I is e0uivalent to TN)1+ )2O&V ;true;+ deleteH)1+ )2I is
e0uivalent to TN)1+ )2O &V ;%alse;. The bit map %or our small domain re0uires an a%%ordable @2k bits. Bit maps %or
realistic e)amples are rarely a%%ordable+ as the %ollo5ing reasoning sho5s. /irst+ consider that ) and y are 3ust keys
o% records that hold additional data. $% space is reserved in the array %or this additional data+ an array element is not
a bit but as many bytes as are needed+ and all the absent records+ %or elements H)1+ )2I .+ 5aste a lot o% storage.
.econd+ most domains are much larger than the e)ample above& the three#dimensional "uclidean space+ %or
e)ample+ 5ith elements H)+ y+ 6I taken as triples o% 32#bit integers+ or ><#bit %loating#point numbers+ re0uires bit
maps o% about 10
30
and 10
>0
bits+ respectively. /or comparison;s sake& a large disk has about 10
10
bits.
.ince large bit maps are e)tremely sparsely populated+ they are amenable to data compression. The grid %ile is
best understood as a practical data compression techni0ue that stores huge+ sparsely populated bit maps so as to
support direct access. 8eturning to our e)ample+ imagine a historical database inde)ed by the year o% birth and the
%irst letter o% the name o% scientists& thus 5e %ind ;ohn von !eumann; under H1903+ vI. Cur database is pictured as a
cloud o% points in the domain sho5n in ")hibit 23.3L because 5e have more scientists Hor at least+ more recordsI in
recent years+ the density increases to5ard the right. .toring this database implies packing the records into buckets
o% %i)ed capacity to hold c He.g. c V 3I records. The %igure sho5s the domain partitioned by orthogonal hyperplanes
into bo)#shaped grid cells+ none o% 5hich contains more than c points.
2@9
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 23.3& Cells o% a grid partition adapt their si6e so that no cell is populated by more than c points.
A grid %ile %or this database contains the %ollo5ing components&
/inear scales sho5 ho5 the domain is currently partitioned.
The director is an array 5hose elements are in one#to#one correspondence 5ith the grid cellsL each entry
points to a data bucket that holds all the records o% the corresponding grid cell.
Access to the record H1903+ vI proceeds through three steps&
1. .cales trans%orm key values to array indices& H1903+ vI becomes H@+ <I. .cales contain small amounts o%
data+ 5hich is kept in central memoryL thus this step re0uires no disk access.
2. The inde) tuple H@+ <I provides direct access to the correct element o% the directory. The directory may be
large and occupy many pages on disk+ but 5e can compute the address o% the correct directory page and in
one disk access retrieve the correct directory element.
3. The directory element contains a pointer Hdisk addressI o% the correct data bucket %or H1903+ vI+ and the
second disk access retrieves the correct record& NH1903+ vI+ ohn von !eumann [O.
(isk utili6ation
The grid %ile does not allocate a separate bucket to each grid cell Ythat 5ould lead to an unacceptably lo5 disk
utili6ation. ")hibit 23.< suggests+ %or e)ample+ that the t5o grid cells at the top right o% the directory share the same
bucket. Ao5 this bucket sharing comes about+ and ho5 it is maintained through splitting o% over%lo5ing buckets+
and merging sparsely populated buckets+ is sho5n in the %ollo5ing.
Algorithms and Data Structures 2>0 A ,lobal Te)t
23. /etric data structures
")hibit 23.<& The search %or a record 5ith key values H1903+ vI starts 5ith the scales and
proceeds via the directory to the correct data bucket on disk.
The dynamics o% splitting and merging
The dynamic behavior o% the grid %ile is best e)plained by tracing an e)ample& 5e sho5 the e%%ect o% repeated
insertions in a t5o#dimensional %ile. $nstead o% sho5ing the grid directory+ 5hose elements are in one#to#one
correspondence 5ith the grid blocks+ 5e dra5 the bucket pointers as originating directly %rom the grid blocks.
$nitially+ a single bucket A+ o% capacity c V 3 in our e)ample+ is assigned to the entire domain H")hibit 23.@I.
:hen bucket A over%lo5s+ the domain is split+ a ne5 bucket B is made available+ and those records that lie in one
hal% o% the space are moved %rom the old bucket to the ne5 one H")hibit 23.>I. $% bucket A over%lo5s again+ its grid
block Hi.e. the le%t hal% o% the spaceI is split according to some splitting policy& :e assume the simplest splitting
policy o% alternating directions. Those records o% A that lie in the lo5er#le%t grid block o% ")hibit 23.7 are moved to a
ne5 bucket C. !otice that as bucket B did not over%lo5+ it is le%t alone& $ts region no5 consists o% t5o grid blocks.
/or e%%ective memory utili6ation it is essential that in the process o% re%ining the grid partition 5e need not
necessarily split a bucket 5hen its region is split.
")hibit 23.@& A gro5ing grid %ile starts 5ith a single bucket allocated to the entire key space.
2>1
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 23.>& An over%lo5ing bucket triggers a re%inement o% the space partition.
")hibit 23.7& Bucket A has been split into A and C+ but the contents o% B remain unchanged.
Assuming that records keep arriving in the lo5er#le%t corner o% the space+ bucket C 5ill over%lo5. This 5ill trigger
a %urther re%inement o% the grid partition as sho5n in ")hibit 23.D+ and a splitting o% bucket C into C and (. The
history o% repeated splitting can be represented in the %orm o% a binary tree+ 5hich imposes on the set o% buckets
currently in use Hand hence on the set o% regions o% these bucketsI a twin sstem Halso called a budd sstemI& "ach
bucket and its region have a uni0ue t5in %rom 5hich it split o%%. $n ")hibit 23.D+ C and ( are t5ins+ the pair HC+ (I is
A;s t5in+ and the pair HA+ HC+ (II is B;s t5in.
")hibit 23.D& Bucket regions that span several cells ensure high disk utili6ation.
(eletions trigger merging operations. $n contrast to one#dimensional storage+ 5here it is su%%icient to merge
buckets that split earlier+ merging policies %or multidimensional grid %iles need to be more general in order to
maintain a high occupancy.
Algorithms and Data Structures 2>2 A ,lobal Te)t
23. /etric data structures
Simple geometric ob;ects and their parameter spaces
Consider a class o% simple spatial ob3ects+ such as aligned rectangles in the plane Hi.e. 5ith sides parallel to the
a)esI. :ithin its class+ each ob3ect is de%ined by a small number o% parameters. /or e)ample+ an aligned rectangle is
determined by its center Hc)+ cyI and the hal%#length o% each side+ d) and dy.
An ob3ect de%ined 5ithin its class by k parameters can be considered to be a point in a k#dimensional parameter
space. /or e)ample+ an aligned rectangle becomes a point in %our#dimensional space. All o% the geometric and
topological properties o% an ob3ect can be deduced %rom the class it belongs to and %rom the coordinates o% its
corresponding point in parameter space.
(i%%erent choices o% the parameter space %or the same class o% ob3ects are appropriate+ depending on
characteristics o% the data to be processed. .ome considerations that may determine the choice o% parameters are&
1. Distinction between location parameters and e*tension parameters' /or some classes o% simple ob3ects it
is reasonable to distinguish location parameters+ such as the center Hc)+ cyI o% an aligned rectangle+ %rom
e)tension parameters+ such as the hal%#sides d) and dy. This distinction is al5ays possible %or ob3ects that
can be described as Cartesian products o% spheres o% various dimensions. /or e)ample+ a rectangle is the
product o% t5o one#dimensional spheres+ a cylinder the product o% a one#dimensional and a t5o#
dimensional sphere. :henever this distinction can be made+ cone#shaped search regions generated by
pro)imity 0ueries as described in the ne)t section have a simple intuitive interpretation& The subspace o%
the location parameters acts as a EmirrorE that re%lects a 0uery.
2. Independence of parameters$ uniform distribution' As an e)ample+ consider the class o% all intervals on a
straight line. $% intervals are represented by their le%t and right endpoints+ l) and r)+ the constraint l) ` r)
restricts all representations o% these intervals by points Hl)+ r)I to the triangle above the diagonal. Any data
structure that organi6es the embedding space o% the data points+ as opposed to the particular set o% points
that must be stored+ 5ill pay some overhead %or representing the unpopulated hal% o% the embedding space.
A coordinate trans%ormation that distributes data all over the embedding space leads to more e%%icient
storage. The phenomenon o% nonuni%orm data distribution can be 5orse than this. $n most applications+ the
building blocks %rom 5hich comple) ob3ects are built are much smaller than the space in 5hich they are
embedded+ as the si6e o% a brick is small compared to the si6e o% a house. $% so+ parameters such as l) and r)
that locate boundaries o% an ob3ect are highly dependent on each other. ")hibit 23.9 sho5s short intervals
on a long line clustering along the diagonal+ leaving large regions o% a large embedding space unpopulatedL
5hereas the same set o% intervals represented by a location parameter c) and an e)tension parameter d)
%ills a smaller embedding space in a much more uni%orm 5ay. :ith the assumption o% bounded d)+ this data
distribution is easier to handle.
2>3
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 23.9& A set o% intervals represented in t5o di%%erent parameter spaces.
'egion 1ueries of arbitrary shape
$ntersection is a basic component o% other pro)imity 0ueries+ and thus deserves special attention. CA( design
rules+ %or e)ample+ o%ten re0uire di%%erent ob3ects to be separated by some minimal distance. This is e0uivalent to
re0uiring that ob3ects surrounded by a rim do not intersect. ,iven a subset o% a class o% simple spatial ob3ects 5ith
parameter space A+ 5e consider t5o types o% 0ueries&
point 0uery ,iven a 0uery point 0+ %ind all ob3ects A %or 5hich 0 A.
point set 0uery ,iven a 0uery set = o% points+ %ind all ob3ects A that intersect =.
Point 8uery. /or a 0uery point 0 compute the region in A that contains all points representing ob3ects in that
overlap 0.
1. Consider the class o% intervals on a straight line. An interval given by its center c) and its hal% length d)
overlaps a point 0 5ith coordinate 0) i% and only i% c) U d) ` 0) ` c) ] d).
2. The class o% aligned rectangles in the plane H5ith parameters c)+ cy+ d)+ dyI can be treated as the Cartesian
product o% t5o classes o% intervals+ one along the )#a)is+ the other along the y#a)is H")hibit 23.10I. All
rectangles that contain a given point 0 are represented by points in %our#dimensional space that lie in the
Cartesian product o% t5o point#in#interval 0uery regions. The region is sho5n by its pro3ections onto the c)#
d) plane and the cy#dy plane.
Algorithms and Data Structures 2>< A ,lobal Te)t
23. /etric data structures
")hibit 23.10& A set o% aligned rectangles represented as a set o% points in a %our#dimensional
parameter space. A point 0uery is trans%ormed into a cone#shaped region 0uery.
3. Consider the class o% circles in the plane. :e represent a circle as a point in three#dimensional space by the
coordinates o% its center Hc)+ cyI and its radius r as parameters. All circles that overlap a point 0 are
represented in the corresponding three#dimensional space by points that lie in the cone 5ith verte) 0
sho5n in ")hibit 23.11. The a)is o% the cone is parallel to the r#a)is Hthe e)tension parameterI+ and its
verte) 0 is considered a point in the c)#cy plane Hthe subspace o% the location parametersI.
")hibit 23.11& .earch cone %or a point 0uery %or circles in the plane.
Point set 8uery. ,iven a 0uery set = o% points+ the region in A that contains all points representing ob3ects A
that intersect = is the union o% the regions in A that results %rom the point 0ueries %or each point 0 =. The
union o% cones is a particularly simple region in A i% the 0uery set = is a simple spatial ob3ect.
2>@
This book is licensed under a Creative Commons Attribution 3.0 License
1. Consider the class o% intervals on a straight line. An interval i V Hc)+ d)I intersects a 0uery interval = V Hc0+
d0I i% and only i% its representing point lies in the shaded region sho5n in ")hibit 23.12L this region is given
by the ine0ualities c) U d) ` c0 ] d0 and c) ] d) Z c0 U d0.
")hibit 23.12& An interval 0uery+ as a union o% point 0ueries+ again gets trans%ormed into a search cone.
2. The class o% aligned rectangles in the plane is again treated as the Cartesian product o% t5o classes o%
intervals+ one along the )#a)is+ the other along the y#a)is. $% = is also an aligned rectangle+ all rectangles
that intersect = are represented by points in %our#dimensional space lying in the Cartesian product o% t5o
interval intersection 0uery regions.
3. Consider the class o% circles in the plane. All circles that intersect a line segment L are represented by points
lying in the cone#shaped solid sho5n in ")hibit 23.13. This solid is obtained by embedding L in the c)#cy
plane+ the subspace o% the location parameters+ and moving the cone 5ith verte) at 0 along L.
Algorithms and Data Structures 2>> A ,lobal Te)t
23. /etric data structures
")hibit 23.13& .earch region as a union o% cones.
6valuating region 1ueries *ith a grid file
:e have seen that pro)imity 0ueries on spatial ob3ects lead to search regions signi%icantly more comple) than
orthogonal range 0ueries. The grid %ile allo5s the evaluation o% irregularly shaped search regions in such a 5ay that
the comple)ity o% the region a%%ects C2* time but not disk accesses. The latter limits the per%ormance o% a data base
implementation. A 0uery region = is matched against the scales and converted into a set $ o% inde) tuples that re%er
to entries in the directory. Cnly a%ter this preprocessing do 5e access disk to retrieve the correct pages o% the
directory and the correct data buckets 5hose regions intersect = H")hibit 23.1<I.
")hibit 23.1<& The cells o% a grid partition that overlap an arbitrary 0uery region = are determined by
merely looking up the scales.
"nteraction bet*een 1uery processing and data access
The point o% the t5o preceding sections 5as to sho5 that in a metric data structure+ intricate computations
triggered by pro)imity 0ueries can be preprocessed to a remarkable e)tent before the ob3ects involved are retrieved.
2>7
This book is licensed under a Creative Commons Attribution 3.0 License
=uery preprocessing may involve a signi%icant amount o% computation based on small amounts o% au)iliary dataY
the scales and the 0ueryYthat are kept in central memory. The %inal access o% data %rom disk is highly selectiveY
data retrieved has a high chance o% being part o% the ans5er.
Contrast this to an approach 5here an ob3ect can be accessed only by its name He.g. the part numberI because
the geometric in%ormation about its location in space is only included in the record %or this ob3ect but is not part o%
the accessing mechanism. $n such a database+ all ob3ects might have to be retrieved in order to determine 5hich
ones ans5er the 0uery. ,iven that disk access is the bottleneck in most database applications+ it pays to preprocess
0ueries as much as possible in order to save disk accesses.
The integration o% 0uery processing and accessing mechanism developed in the preceding sections 5as made
possible by the assumption o% simple ob3ects+ 5here each instance is described by a small number o% parameters.
:hat can 5e do 5hen %aced 5ith a large number o% irregularly shaped ob3ects?
Comple)+ irregularly shaped spatial ob3ects can be represented or appro)imated by simpler ones in a variety o%
5ays+ %or e)ample& decomposition+ as in a 0uad tree tessellation o% a %igure into dis3oint raster s0uaresL
representation as a cover o% overlapping simple shapesL and enclosing each ob3ect in a container chosen %rom a
class o% simple shapes. The container techni0ue allo5s e%%icient processing o% pro)imity 0ueries because it preserves
the most important properties %or pro)imity#based access to spatial ob3ects+ in particular& $t does not break up the
ob3ect into components that must be processed separately+ and it eliminates many potential tests as unnecessary Hi%
t5o containers don;t intersect+ the ob3ects 5ithin 5on;t eitherI. As an e)ample+ consider %inding all polygons that
intersect a given 0uery polygon+ given that each o% them is enclosed in a simple container such as a circle or an
aligned rectangle. Testing t5o polygons %or intersection is an e)pensive operation compared to testing their
containers %or intersection. The cheap container test e)cludes most o% the polygons %rom an e)pensive+ detailed
intersection check.
Any appro)imation techni0ue limits the primitive shapes that must be stored to one or a %e5 types& %or e)ample+
aligned rectangles or bo)es. An instance o% such a type is determined by a %e5 parameters+ such as coordinates o% its
center and its e)tension+ and can be considered to be a point in a Hhigher#dimensionalI parameter space. This
trans%ormation reduces ob3ect storage to point storage+ increasing the dimensionality o% the problem 5ithout loss o%
in%ormation. Combined 5ith an e%%icient multi#dimensional data structure %or point storage it is the basis %or an
e%%ective implementation o% databases %or spatial ob3ects.
")ercises
1. (ra5 three 0uadtrees+ one %or each o% the < K D pi)el rectangles A+ B and C outlined in ")hibit 23.1@.
Algorithms and Data Structures 2>D A ,lobal Te)t
23. /etric data structures
")hibit 23.1@& The location o% congruent ob3ects greatly a%%ects the comple)ity o% a 0uadtree
representation.
2. Consider a grid %ile that stores points lying in a t5o#dimensional domain& the Cartesian product W1 u W2+
5here W1
V 0 .. 1@ and W2
V 0 .. 1@ are subranges o% the integers. Buckets have a capacity o% t5o points.
HaI $nsert the points H2+ 3I+ H13+ 1<I+ H3+ @I+ H>+ 9I+ H10+ 13I+ H11+ @I+ H1<+ 9I+ H7+ 3I+ H1@+ 11I+ H9+ 9I+ and H11+ 10I
into the initially empty grid %ile and sho5 the state o% the scales+ the directory+ and the buckets a%ter
each insert operation. Buckets are split such that their shapes remain as 0uadratic as possible.
HbI (elete the points H10+ 13I+ H9+ 9I+ H11+ 10I+ and H1<+ 9I %rom the grid %ile obtained in aI and sho5 the state
o% the scales+ the directory+ and the buckets a%ter each delete operation. Assume that a%ter deleting a
point in a bucket this bucket may be merged 5ith a neighbor bucket i% their 3oint occupancy does not
e)ceed t5o points. /urther+ a boundary should be removed %rom its scale i% there is no longer a bucket
that is split 5ith respect to this boundary.
HcI :ithout imposing %urther restrictions a deadlock situation may occur a%ter a se0uence o% delete
operations& !o bucket can merge 5ith any o% its neighbors+ since the resulting bucket region 5ould no
longer be rectangular. $n the e)ample sho5n in ")hibit 23.1> the shaded ovals represent bucket
regions. (evise a merging policy that prevents such deadlocks %rom occurring in a t5o#dimensional
grid %ile.
2>9
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 23.1>& This e)ample sho5s bucket regions that cannot be merged pair5ise.
3. Consider the class o% circles in the plane represented as points in three#dimensional parameter space as
proposed in chapter 23 in the section R8egion 0ueries o% arbitrary shapeS. (escribe the search regions in
the parameter space HaI %or all the circles intersecting a given circle C+ HbI %or all the circles contained in C+
and HcI %or all the circles enclosing .
Algorithms and Data Structures 270 A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
!art B"# "nteraction bet*een
algorithms and data
structures# case studies in
geometric computation
Crgani6ing and processing "uclidean space
$n 2art $$$ 5e presented a varied sample o% algorithms that use simple+ mostly static+ data structures. 2art F 5as
dedicated to dynamic data structures+ and 5e presented the corresponding access and update algorithms. $n this
%inal part 5e illustrate the use o% these dynamic data structures by presenting algorithms 5hose e%%iciency depends
crucially on them+ in particular on priority 0ueues and dictionaries. :e choose these algorithms %rom
computational geometry+ a recently developed discipline o% great practical importance 5ith applications in
computer graphics+ computer#aided design+ and geographic databases.
$% data structures are tools %or organi6ing sets o% data and their relationships+ geometric data processing poses
one o% the most challenging tests. The ability to organi6e data embedded in the "uclidean space in such a 5ay as to
re%lect the rich relationships due to location He.g. touching or intersecting+ contained in+ distanceI is o% utmost
importance %or the e%%iciency o% algorithms %or processing spatial data. (ata structures developed %or traditional
commercial data processing 5ere o%ten based on the concept o% one primary key and several subordinate secondary
keys. This asymmetry %ails to support the e0ual role played by the Cartesian coordinate a)es )+ y+ 6+ [ o% "uclidean
space. $% one spatial a)is+ say )+ is identi%ied as the primar ke+ there is a danger that 0ueries involving the other
a)es+ say y and 6+ become inordinately cumbersome to process+ and there%ore slo5. /or the sake o% simplicity 5e
concentrate on t5o#dimensional geometric problems+ and in particular on the highly success%ul class o% plane#
s5eep algorithms. .5eep algorithms do a remarkably good 3ob at processing t5o#dimensional space e%%iciently
using t5o distinct one#dimensional data structures+ one %or organi6ing the )#a)is+ the other %or the y#a)is.
Algorithms and Data Structures 271 A ,lobal Te)t
This book is licensed under a Creative Commons Attribution 3.0 License
-2& Sample problems and
algorithms
Learning ob3ectives&
The nature o% geometric computation& three problems and algorithms chosen to illustrate the variety o%
issues encountered&
Conve) hull yields to simple and e%%icient algorithms+ straight%or5ard to implement and analy6e.
Cb3ects 5ith special properties+ such as conve)ity+ are o%ten much simpler to process than are general
ob3ects.
Fisibility problems are surprisingly comple)L even i% this comple)ity does not sho5 in the design o% an
algorithm+ it sneaks into its analysis.
Geometry and geometric computation
Classical geometry+ shaped by the ancient ,reeks+ is more a)iomatic than constructive& $t emphasi6es a)ioms+
theorems+ and proo%s+ rather than algorithms. The typical statement o% "uclidean geometry is an assertion about all
geometric con%igurations 5ith certain properties He.g. the theorem o% 2ythagoras& E$n a right#angled triangle+ the
s0uare on the hypotenuse c is e0ual to the sum o% the s0uares on the t5o catheti a and b& c
2
V a
2
] b
2E
I or an
assertion o% e)istence He.g. the parallel a)iom& E,iven a line L and a point 2 L+ there is e)actly one line parallel to
L passing through 2EI. Constructive solutions to problems do occur+ but the theorems about the impossibilit o%
constructive solutions steal the glory& EXou cannot trisect an arbitrary angle using ruler and compass only+E and the
proverbial E$t is impossible to s0uare the circle.E
Computational geometry+ on the other hand+ starts out 5ith problems o% construction so simple that+ until the
1970s+ they 5ere dismissed as trivial& E,iven n line segments in the plane+ are they %ree o% intersections? $% not+
compute HconstructI all intersections.E This problem is only trivial 5ith respect to the e*istence o% a constructive
solution. As 5e 5ill soon see+ the 0uestion is %ar %rom trivial i% interpreted as& Ao5 efficientl can 5e obtain the
ans5er?
Computational geometry has some appealing %eatures that make it ideal %or learning about algorithms and data
structures& HaI The problem statements are easily understood+ intuitively meaning%ul+ and mathematically rigorousL
right a5ay the student can try his o5n hand at solving them+ 5ithout having to 5orry about hidden subtleties or a
lot o% re0uired background kno5ledge. HbI 2roblem statement+ solution+ and every step o% the construction have
natural visual representations that support abstract thinking and help in detecting errors o% reasoning. HcI These
algorithms are practicalL it is easy to come up 5ith e)amples 5here they can be applied.
Appealing as geometric computation is+ 5riting geometric programs is a demanding task. T5o traps lie hiding
behind the obvious combinatorial intricacies that must be mastered+ and they are particularly dangerous 5hen they
occur together& HaI degenerate con%igurations+ and HbI the pit%alls o% numerical computation due to discreti6ation
and rounding errors. (egenerate con%igurations+ such as those 5e discussed in R.traight lines and circlesS on
Algorithms and Data Structures 272 A ,lobal Te)t
24. $ample problems and algorithms
intersecting line segments+ are special cases that o%ten re0uire special code. $t is not al5ays easy to envision all the
kinds o% degeneracies that may occur in a given problem. A con%iguration may be degenerate %or a speci%ic
algorithm+ 5hereas it may be nondegenerate %or a di%%erent algorithm solving the same problem. 8ounding errors
tend to cause more obviously disastrous conse0uences in geometric computation than+ say+ in linear algebra or
di%%erential e0uations. :hereas the traditional analysis o% rounding errors %ocuses on bounding their cumulative
value+ geometry is concerned primarily 5ith a stringent all#or#nothing 0uestion& Aave errors impaired the
topological consistency o% the data? H8emember the pathology o% the braided straight lines.I
$n this 2art F$ 5e aim to introduce the reader to some o% the central ideas and techni0ues o% computational
geometry. /or simplicity;s sake 5e limit coverage to t5o#dimensional "uclidean geometry # most problems become
a lot more complicated 5hen 5e go %rom t5o# to three#dimensional con%igurations. :e %ocus on a type o% algorithm
that is remarkably 5ell suited %or solving t5o#dimensional problems e%%iciently& s5eep algorithms. To illustrate
their generality and e%%ectiveness+ 5e use plane#s5eep to solve several rather distinct problems. :e 5ill see that
s5eep algorithms %or di%%erent problems can be assembled %rom the same building blocks& a skeleton s5eep
program that s5eeps a line across the plane based on a 0ueue o% events to be processed+ and transition procedures
that update the data structures Ha dictionary or table+ and perhaps other structuresI at each event and maintain a
geometric invariant. .5eeps sho5 convincingly ho5 the dynamic data structures o% 2art F are essential %or the
e%%iciency.
The problems and algorithms 5e discuss deal 5ith very simple ob3ects& points and line segments. Applications o%
geometric computation such as CA(+ on the other hand+ typically deal 5ith very comple) ob3ects made up o%
thousands o% polygons. The simplicity o% these algorithms does not deter %rom their utility. Comple) ob3ects get
processed by being broken into their primitive parts+ such as points+ line segments+ and triangles. The algorithms
5e present are some o% the most basic subroutines o% geometric computation+ 5hich play a role analogous to that o%
a s0uare root routine %or numerical computation& As they are called untold times+ they must be correct and e%%icient.
onve/ hull# a multitude of algorithms
The problem o% computing the conve) hull AH.I o% a set . consisting o% n points in the plane serves as an
e)ample to demonstrate ho5 the techni0ues o% computational geometry yield the concise and elegant solution that
5e presented in RAlgorithm animationS. The conve) hull o% a set . o% points in the plane is the smallest conve)
polygon that contains the points o% . in its interior or on its boundary. $magine a nail sticking out above each point
and a tight rubber band surrounding the set o% nails.
'any di%%erent algorithms solve this simple problem. Be%ore 5e present in detail the algorithm that %orms the
basis o% the program ;Conve)Aull; o% chapter 3+ 5e brie%ly illustrate the main ideas behind three others. 'ost
conve) hull algorithms have an initiali6ation step that uses the %act that 5e can easily identi%y t5o points o% . that
lie on the conve) hull AH.I& %or e)ample+ t5o points 2
min
and 2
ma)
5ith minimal and ma)imal )#coordinate+
respectively. Algorithms that gro5 conve) hulls over increasing subsets can use the segment as a HdegenerateI
conve) hull to start 5ith. Cther algorithms use the segment to partition . into an upper and a lo5er subset+ and
compute the upper and the lo5er part o% the hull AH.I separately.
1. :arvis%s march Nar 73O starts at a point on AH.I+ say 2
min
+ and ;5alks around; by computing+ at each point
2+ the ne)t tangent to .+ characteri6ed by the property that all points o% . lie on the same side o% 2=
273
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 2<.1& The Egi%t#5rappingE approach to building the conve) hull.
2. Divide-and-con,uer comes to mind& .ort the points o% . according to their )#coordinate+ use the median )#
coordinate to partition . into a le%t hal% .L and a right hal% .8+ apply this conve) hull algorithm recursively to
each hal%+ and merge the t5o solutions AH.LI and AH.8I by computing the t5o common e)terior tangents to
AH.LI and AH.8I H")hibit 2<.2I. Terminate the recursion 5hen a set has at most three points.
")hibit 2<.2& (ivide#and#con0uer applies to many problems on spatial data.
3. Guickhull NByk 7DO+ N"dd 77O+ N,. 79O uses divide#and#con0uer in a di%%erent 5ay. :e start 5ith t5o points
on the conve) hull AH.I+ say 2min and 2ma). $n general+ i% 5e kno5 Z 2 points on AH.I+ say 2+ =+ 8 in ")hibit
2<.3+ these de%ine a conve) polygon contained in AH.I. H(ra5 the appropriate picture %or 3ust t5o points
2min and 2ma) on the conve) hull.I There can be no points o% . in the shaded sectors that e)tend out5ard
%rom the vertices o% the current polygon+ 2=8 in the e)ample. Any other points o% . must lie either in the
polygon 2=8 or in the regions e)tending out5ard %rom the sides.
")hibit 2<.3& Three points kno5n to lie on the conve) hull identi%y regions devoid o% points.
/or each side+ such as 2= in ")hibit 2<.<+ let T be a point farthest %rom 2= among all those in the region
e)tending out5ard %rom 2=+ i% there are any. T must lie on the conve) hull+ as is easily seen by considering
Algorithms and Data Structures 27< A ,lobal Te)t
24. $ample problems and algorithms
the parallel to 2= that passes through T. Aaving processed the side 2=+ 5e e)tend the conve) polygon to
include T+ and 5e no5 must process 2 additional sides+2T and T=. The reader 5ill observe a %ormal analogy
bet5een 0uicksort HR.orting and its comple)itySI and 0uickhull+ 5hich has given the latter its name.
")hibit 2<.<& The point T %arthest %rom identi%ies a ne5 region o% e)clusion
HshadedI.
<. $n an incremental scan or sweep 5e sort the points o% . according to their )#coordinates+ and use the
segment 2min2ma) to partition . into an upper subset and a lo5er subset+ as sho5n in ")hibit 2<.@. /or
simplicity o% presentation+ 5e reduce the problem o% computing AH.I to the t5o separate problems o%
computing the upper hull *H.I Ni.e. the upper part o% AH.IO+ sho5n in bold+ and the lo5er hull LH.I+ dra5n
as a thin line. Cur notation and pictures are chosen to describe *H.I.
")hibit 2<.@& .eparate computations %or the upper hull and the lo5er hull.
Let 21+ [ + 2n be the points o% . sorted by )#coordinate+ and let *i
V *H21+ [ + 2iI be the upper hull o% the %irst i
points. *1
V 21 may serve as an initiali6ation. /or i V 2 to n 5e compute *i %rom *iU1+ as ")hibit 2<.> sho5s. .tarting
5ith the tentative tangent 2i2iU1
sho5n as a thin dashed line+ 5e retrace the upper hull *iU1 until 5e reach the actual
tangent& in our e)ample+ the bold dashed line 2i22. The tangent is characteri6ed by the %act that %or 3 V 1+ [ + iU1+ it
minimi6es the angle Ai+3 bet5een 2i23
and the vertical.
27@
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 2<.>& ")tending the partial upper hull *H2
1
+ [ + 2
iU1
I to the ne)t point 2
i
The program ;Conve)Aull; presented in RAlgorithm animationS as an e)ample %or algorithm animation is 5ritten
as an on#line algorithm& 8ather than reading all the data be%ore starting the computation+ it accepts one point at a
time+ 5hich must lie to the right o% all previous ones+ and immediately e)tends the hull *iU1 to obtain *i. Thanks to
the input restriction that the points are entered in sorted order+ ;Conve)Aull; becomes simpler and runs in linear
time. This e)plains the t5o#line main body&
6ointHero& { sets first point and initiali)es all necessary
variables }
while Dext5ight do <omputeTangent&
There remain a %e5 programming details that are best e)plained by relating /ig. 2<.> to the declarations&
var x, y, dx, dy- arrayF0 .. nmaxG of integer&
- arrayF0 .. nmaxG of integer& { backpointer }
n- integer& { number of points entered so far }
px, py- integer& { new point }
The coordinates o% the points 2i are stored in the arrays ) and y. 8ather than storing angles such as Ai+3+ 5e store
0uantities proportional to cosHAi+3I and sinHAi+3I in the arrays d) and dy. The array b holds back pointers %or retracing
the upper hull back to5ard the le%t& bNiO V 3 implies that 23 is the predecessor o% 2i in *i. This e)plains the key
procedure o% the program&
procedure <omputeTangent& { from 3
n
! (px, py to M
nD&
}
var i- integer&
egin
i -. FnG&
while dyFnG 1 dxFiG K dyFiG 1 dxFnG do egin { dy7n8*dx7n8 6
dy7i8*dx7i8 }
i -. FiG&
Algorithms and Data Structures 27> A ,lobal Te)t
24. $ample problems and algorithms
dxFnG -. xFnG 0 xFiG& dyFnG -. yFnG 0 yFiG&
MoveTo*px, py+& 8ine*0dxFnG, 0dyFnG+&
FnG -. i
end&
MoveTo*px, py+& 6enSiLe*$, $+& 8ine*0dxFnG, 0dyFnG+& 6enDormal
end& { 9ompute,angent }
The algorithm implemented by ;Conve)Aull; is based on ,raham;s scan N,ra 72O+ 5here the points are ordered
according to the angle as seen %rom a %i)ed internal point+ and on NAnd 79O.
The uses of conve/ity# basic operations on polygons
The conve) hull o% a set o% points or ob3ects Hi.e. the smallest conve) set that contains all ob3ectsI is a model
problem in geometric computation+ 5ith many algorithms and applications. :hy? As 5e stated in the introductory
section+ applications o% geometric computation tend to deal 5ith comple) ob3ects that o%ten consist o% thousands o%
primitive parts+ such as points+ line segments+ and triangles. $t is o%ten e%%ective to appro)imate a comple)
con%iguration by a simpler one+ in particular+ to package it in a container o% simple shape. 'any pro)imity 0ueries
can be ans5ered by processing the container only. Cne o% the most %re0uent 0ueries in computer graphics+ %or
e)ample+ asks 5hat ob3ect+ i% any+ is %irst struck by a given ray. $% 5e %ind that the ray misses a container+ 5e in%er
that it misses all ob3ects in it 5ithout looking at themL only i% the ray hits the container do 5e start the costly
analysis o% all the ob3ects in it.
The conve) hull is o%ten a very e%%ective container. Although not as simple as a rectangular bo)+ say+ conve)ity is
such a strong geometric property that many algorithms that take time CHnI on an arbitrary polygon o% n vertices
re0uire only time CHlog nI on conve) polygons. Let us list several such e)amples. :e assume that a polygon , is
given as a HcyclicI se0uence o% n vertices andQor n edges that trace a closed path in the plane. 2olygons may be sel%#
intersecting+ 5hereas simple polygons may not. A simple polygon partitions the plane into t5o regions& the interior+
5hich is simply connected+ and the e)terior+ 5hich has a hole.
2oint#in#polygon test
,iven a simple polygon , and a 0uery point 2 Hnot on ,I+ determine 5hether 2 lies inside or outside the
polygon.
T5o closely related algorithms that 5alk around the polygon solve this problem in time CHnI. The %irst one
computes the winding number o% , around 2. $magine an observer at 2 looking at a verte)+ say F+ 5here the 5alk
starts+ and turning on her heels to keep 5atching the 5alker H")hibit 2<.7I. The observer 5ill make a %irst HpositiveI
turn + %ollo5ed by a HnegativeI turn w+ %ollo5ed by [ + until the 5alker returns to the starting verte) F. The sum ]
w ] [ o% all turning angles during one complete tour o% , is& 2K i% 2 is inside ,+ and 0 i% 2 is outside ,.
277
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 2<.7& 2oint#in polygon test by adding up all turning angles.
The second algorithm computes the crossing number o% , 5ith respect to 2. (ra5 a semi#in%inite ray 8 %rom 2
in any direction H")hibit 2<.DI. (uring the 5alk around the polygon , %rom an arbitrary starting verte) F back to F+
keep track o% 5hether the current oriented edge intersects 8+ and i% so+ 5hether the edge crosses 8 %rom belo5 H]1I
or %rom above HU1I. The sum o% all these numbers is ]1 i% 2 is inside ,+ and 0 i% 2 is outside ,.
")hibit 2<.D& 2oint#in polygon test by adding up crossing numbers.
2oint#in#conve)#polygon test
/or a conve) polygon = 5e use binary search to per%orm a point#in#polygon test in time CHlog nI. Consider the
hierarchical decomposition o% = illustrated by the conve) 12#gon sho5n in ")hibit 2<.9. :e choose three
Happro)imatelyI e0uidistant vertices as the vertices o% an innermost core triangle+ painted black. E"0uidistantE here
re%ers not to any "uclidean distance+ but rather to the number o% vertices to be traversed by traveling along the
perimeter o% =. /or a 0uery point 2 5e %irst ask+ in time CH1I+ 5hich o% the seven regions de%ined by the e)tended
edges o% this triangular core contains 2. These seven regions sho5n in ")hibit 2<.10 are all EtrianglesE Halbeit si) o%
them e)tend to in%inityI+ in the sense that each one is de%ined as the intersection o% three hal%#spaces. /our o% these
regions provide a de%inite ans5er to the 0uery E$s 2 inside =+ or outside =?E Cne region Hsho5n hatched in ")hibit
2<.10I provides the ans5er ;$n;+ three the ans5er ;Cut;. The remaining three regions+ labeled ;*ncertain;+ lead
recursively to a ne5 point#in#conve)#polygon test+ %or the same 0uery point 2+ but a ne5 conve) polygon =; 5hich is
Algorithms and Data Structures 27D A ,lobal Te)t
24. $ample problems and algorithms
the intersection o% = 5ith one o% the uncertain regions. As =; has only about n Q 3 vertices+ the depth o% recursion is
CHlog nI. Actually+ a%ter the %irst comparison against the innermost triangular core o% =+ 5e have no longer a general
point#in#conve)#polygon problem+ but one 5ith additional in%ormation that makes all but the %irst test steps o% a
binary search.
")hibit 2<.9& Aierarchical appro)imation o% a conve) 12#gon as a 3#level tree o% triangles. The root is in
black+ its children are in dark grey+ grandchildren in light grey.
")hibit 2<.10& The plane partitioned into %our regions o% certainty and three o% uncertainty.
The latter are processed recursively.
Bisibility in the plane# a simple algorithm *hose analysis is not
'any computer graphics programs are dominated by visibility problems& ,iven a con%iguration o% ob3ects in
three#dimensional space+ and given a point o% vie5+ 5hat is visible? (o6ens o% algorithms %or hidden#line or hidden#
sur%ace elimination have been developed to solve this everyday problem that our visual system per%orms Eat a
glanceE. $n contrast to the problems discussed above+ visibility is surprisingly comple). :e give a hint o% this
comple)ity by describing some o% the details buried belo5 the smooth sur%ace o% a EsimpleE version& computing the
visibility o% line segments in the plane.
Problem: ,iven n line segments in the plane+ compute the se0uence o% HsubIsegments seen by an observer at
in%inity Hsay+ at y V U_I.
279
This book is licensed under a Creative Commons Attribution 3.0 License
The comple)ity o% this problem 5as une)pected until discovered in 19D> N:. DDO. /ortunately+ this comple)ity
is revealed not by re0uiring complicated algorithms+ but in the analysis o% the inherent comple)ity o% the geometric
problem. The e)ample sho5n in ")hibit 2<.11 illustrates the input data. The endpoints H21+ 210I+ H22+ 2DI+ H2@+ 212I o%
the three line segments labeled 1+ 2+ 3 are givenL other points are computed by the algorithm. The re0uired result is
a list o% visible segments+ each segment described by its endpoints and by the identi%ier o% the line o% 5hich it is a
part&
H21+ 23+ 1I+ H23+ 2<+ 2I+ H2@+ 2>+ 3I+ H2>+ 2D+ 2I+ H27+ 29+ 3I+ H29+ 210+ 1I+ H211+ 212+ 3I
")hibit 2<.11& ")ample& Three line segments seen %rom belo5 generate seven visible subsegments.
$n search o% algorithms+ the reader is encouraged to 5ork out the details o% the %irst idea that might come to
mind& /or each o% the n
2
ordered pairs HLi+ L3I o% line segments+ remove %rom Li the subsegment occluded by L3.
Because Li can get cut into as many as n pieces+ it must be managed as a se0uence o% subsegments. /inding the
endpoints o% L3 in this se0uence 5ill take time CHlog nI+ leading to an overall algorithm o% time comple)ity CHn
2
K log
nI.
A%ter the reader has mastered the s5eep algorithm %or line intersection presented in R2lane#s5eep& a general#
purpose algorithm %or t5o#dimensional problems illustrated using line segment intersectionS+ he 5ill see that its
straight%or5ard application to the line visibility problem re0uires time CHHn ] kI K log nI+ 5here k CHn
2
I is the
number o% intersections. Thus plane#s5eep appears to do all the 5ork the brute#%orce algorithm above does+
organi6ed in a systematic le%t#to#right %ashion. $t keeps track o% all intersections+ most o% 5hich may be invisible. $t
has the potential to 5ork in time CHn K log nI %or many realistic data con%igurations characteri6ed by k CHnI+ but
not in the 5orst case.
Divide-and-con,uer yields a simple t5o#dimensional visibility algorithm 5ith a better 5orst#case per%ormance.
$% n V 0 or 1+ the problem is trivial. $% n d 1+ partition the set o% n line segments into t5o Happro)imateI halves+ solve
both subproblems+ and merge the results. There is no constraint on ho5 the set is halved+ so the divide step is easy.
The con0uer step is taken care o% by recursion. 'erging amounts to computing the minimum o% t5o piece5ise Hnot
necessarily continuousI linear %unctions+ in time linear in the number o% pieces. The e)ample 5ith n V < sho5n in
")hibit 2<.12 illustrates the algorithm. %12 is the visible %ront o% segments 1 and 2+ %3< o% segments 3 and <+ minH%12+
%3<I o% all %our segments H")hibit 2<.13I.
Algorithms and Data Structures 2D0 A ,lobal Te)t
24. $ample problems and algorithms
")hibit 2<.12& The %our line segments 5ill be partitioned into subsets a1+ 2b and a3+
<b.
")hibit 2<.13& The min operation merges the solutions o% this divide#and#con0uer
algorithm.
The time comple)ity o% this divide#and#con0uer algorithm is obtained as %ollo5s. ,iven that at each level o%
recursion the relevant sets o% line segments can be partitioned into Happro)imateI halves+ the depth o% recursion is
CHlog nI. A merge step that processes v visible subsegments takes linear time CHvI. Together+ all the merge steps at
2D1
This book is licensed under a Creative Commons Attribution 3.0 License
a given depth process at most F subsegments+ 5here F is the total number o% visible subsegments. Thus the total
time is bounded by CHF K log nI. Ao5 large can F be?
.urprising theoretical results
Let FHnI be the number o% visible subsegments in a given con%iguration o% n lines+ i.e. the si6e o% the output o% the
visibility computation. /or tiny n+ the 5orst cases NFH2I V <+ FH3I V DO are sho5n in ")hibit 2<.1<. An attempt to
%ind 5orst#case con%igurations %or general n leads to e)amples such as that sho5n in /igure 2<.1@+ 5ith FHnI V @Kn U
D.
")hibit 2<.1<& Con%igurations 5ith the largest number o% visible subsegments.
/igure 2<.1@& A %amily o% con%igurations 5ith @Kn U D visible subsegments.
Xou 5ill %ind it di%%icult to come up 5ith a class o% con%igurations %or 5hich FHnI gro5s %aster. $t is tempting to
con3ecture that FHnI CHnI+ but this con3ecture is very hard to prove # %or the good reason that it is %alse+ as 5as
discovered in N:. DDO. $t turns out that FHnI Hn K HnII+ 5here HnI+ the inverse o% Ackermann;s %unction Hsee
RComputability and comple)ityS+ ")ercise 2I+ is a monotonically increasing %unction that gro5s so slo5ly that %or
practical purposes it can be treated as a constant+ call it .
Let us present some o% the steps o% ho5 this surprising result 5as arrived at. Cccasionally+ simple geometric
problems can be tied to deep results in other branches o% mathematics. :e trans%orm the t5o#dimensional visibility
problem into a combinatorial string problem. By numbering the given line segments+ 5alking along the )#a)is %rom
Algorithms and Data Structures 2D2 A ,lobal Te)t
24. $ample problems and algorithms
le%t to right+ and 5riting do5n the number o% the line segment that is currently visible+ 5e obtain a se0uence o%
numbers H")hibit 2<.1>I.
")hibit 2<.1>& The (avenport#.chin6el se0uence associated 5ith a con%iguration o%
segments.
A geometric con%iguration gives rise to a se0uence u
1
+ u
2
+ [ + u
m
5ith the %ollo5ing properties&
1. 1 ui
n %or 1 i m Hnumbers identi%y line segmentsI.
2. ui
f ui]1 %or 1 i m U 1 Hno t5o consecutive numbers are e0ualI.
3. There are no %ive indices 1 a e b e c e d e e m such that ua
V uc
V ue
V r and ub
V ud
V s+ r f s. This
condition captures the geometric properties o% t5o intersecting straight lines& $% 5e ever see r+ s+ r+ s
Hpossibly separatedI+ 5e 5ill never see r again+ as this 5ould imply that r and s intersect more than once
H")hibit 2<.17I.
")hibit 2<.17& The subse0uence r+ s+ r+ s e)cludes %urther occurrences o% r.
")ample
The se0uence %or the e)ample above that sho5s m @ n U D is
1+ 2+ 1+ 3+ 1+ [ + 1+ nU1+ 1+ nU1+ nU2+ nU3+ [ + 3+ 2+ n+ 2+ n+ 3+ n+ [ + n+ nU2+ n+ nU1+ n.
2D3
This book is licensed under a Creative Commons Attribution 3.0 License
.e0uences 5ith the properties 1 to 3+ called Davenport-Schin&el se,uences+ have been studied in the conte)t o%
linear di%%erential e0uations. The ma)imal length o% a (avenport#.chin6el se0uence is k K n K HnI+ 5here k is a
constant and HnI is the inverse o% Ackermann;s %unction Hsee RComputability and comple)ityS+ ")ercise 2I NA. D>O.
:ith increasing n+ HnI approaches in%inity+ albeit very slo5ly. This dampens the hope %or a linear upper bound %or
the visibility problem+ but does not yet disprove the con3ecture. /or the latter+ 5e need an inverse& /or any given
(avenport#.chin6el se0uence there e)ists a corresponding geometric con%iguration 5hich yields this se0uence. An
e)plicit construction is given in N:. DDO. This establishes an isomorphism bet5een the t5o#dimensional visibility
problem and the (avenport#.chin6el se0uences+ and sho5s that the si6e o% the output o% the t5o#dimensional
visibility problem can be superlinear # a result that challenges our geometric intuition.
")ercises
1. ,iven a set o% points .+ prove that the pair o% points %arthest %rom each other must be vertices o% the conve)
hull AH.I.
2. Assume a model o% computation in 5hich the operations addition+ multiplication+ and comparison are
available at unit cost. 2rove that in such a model Hn K log nI is a lo5er bound %or computing+ in order+ the
vertices o% the conve) hull AH.I o% a set . o% n points. =int: .ho5 that every algorithm 5hich computes the
conve) hull o% n given points can be used to sort n numbers.
3. Complete the second algorithm %or the point#in#polygon test in chapter 2< in the section RThe uses o%
conve)ity& basic operations on polygonsS 5hich computes the crossing number o% the polygon , around
point 2 by addressing the special cases that arise 5hen the semi#in%inite ray 8 emanating %rom 2 intersects
a verte) o% , or overlaps an edge o% ,.
<. Consider an arbitrary Hnot necessarily simpleI polygon , H")hibit 2<.1DI. 2rovide an interpretation %or the
5inding number 5H,+ 2I o% , around an arbitrary point 2 not on ,+ and prove that 5H,+ 2I Q 2K o% 2 is
al5ays e0ual to the crossing number o% 2 5ith respect to any ray 8 emanating %rom 2.
")hibit 2<.1D& :inding number and crossing number o% a polygon , 5ith respect to 2.
@. (esign an algorithm that computes the area o% an n#verte) simple+ but not necessarily conve) polygon in
HnI time.
>. :e consider the problem o% computing the intersection o% t5o conve) polygons 5hich are given by their
lists o% vertices in cyclic order.
HaI .ho5 that the intersection is again a conve) polygon.
HbI (esign an algorithm that computes the intersection. :hat is the time comple)ity o% your algorithm?
Algorithms and Data Structures 2D< A ,lobal Te)t
24. $ample problems and algorithms
7. Intersection test for line / and Hconve*I polgon G $% an Hin%initely e)tendedI line L intersects a polygon =+
it must intersect one o% =;s edges. Thus a test %or intersection o% a given line L 5ith a polygon can be
reduced to repeated test o% L %or intersection 5ith Nsome o%O =;s edges.
HaI 2rove that+ in general+ a test %or line#polygon intersection must check at least n U 2 o% =;s edges. =int:
*se an adversary argument. $% t5o edges remain unchecked+ they could be moved so as to invalidate
the ans5er.
HbI (esign a test that 5orks in time CHlog nI %or decoding 5hether a line L intersects a conve) polygon =.
D. (ivide#and#con0uer algorithms may divide the space in 5hich the data is embedded+ rather than the set o%
data Hthe set o% linesI. (escribe an algorithm %or computing the se0uence o% visible segments that partitions
the space recursively into vertical stripes+ until each stripe is Esimple enoughEL describe ho5 you choose the
boundaries o% the stripesL state advantages and disadvantages o% this algorithm as compared to the one
described in chapter 2< in the section RFisibility in the plane& a simple algorithm 5hose analysis is notS.
Analy6e the asymptotic time comple)ity o% this algorithm.
2D@
This book is licensed under a Creative Commons Attribution 3.0 License
-4& !lane)s*eep# a general)
purpose algorithm for t*o)
dimensional problems
illustrated using line segment
intersection
Learning ob3ectives&
line segment intersection test
turning space dimensions into time dimensions
updating a y table and detecting intersections
s5eeping across and intersection
2lane#s5eep is an algorithm schema %or t5o#dimensional geometry o% great generality and e%%ectiveness+ and
algorithm designers are 5ell advised to try it %irst. $t 5orks %or a surprisingly large set o% problems+ and 5hen it
5orks+ tends to be very e%%icient. 2lane#s5eep is easiest to understand under the assumption o% nondegenerate
con%igurations. A%ter e)plaining plane#s5eep under this assumption+ 5e remark on ho5 degenerate cases can be
handled 5ith plane#s5eep.
The line segment intersection test
:e present a plane#s5eep algorithm N.A 7>O %or the line segment intersection test&
:iven n line segments in the plane, determine whether any two
intersect&
and if so, compute a witness *i.e. a pair of segments that
intersect+.
Bounds on the comple)ity o% this problem are easily obtained. The literature on computational geometry He.g.
N2. D@OI proves a lo5er bound Hn K log nI. The obvious brute %orce approach o% testing all n K Hn U 1I Q 2 pairs o%
line segments re0uires Hn
2
I time. This 5ide gap bet5een n K log n and n
2
is a challenge to the algorithm designer+
5ho strives %or an optimal algorithm 5hose asymptotic running time CHn K log nI matches the lo5er bound.
(ivide#and#con0uer is o%ten the %irst attempt to design an algorithm+ and it comes in t5o variants illustrated in
/ig. 2@.1& H1I (ivide the data+ in this case the set o% line segments+ into t5o subsets o% appro)imately e0ual si6e Hi.e.
n Q 2 line segmentsI+ or H2I divide the embedding space+ 5hich is easily cut in e)act halves.
Algorithms and Data Structures 2D> A ,lobal Te)t
25. 5lane!s6eep: a general!purpose algorithm for t6o!dimensional problems illustrated using line segment
intersection
")hibit 2@.1& T5o 5ays o% applying divide#and#con0uer to a set o% ob3ects embedded in the plane.
$n the %irst case+ 5e hope %or a separation into subsets .1 and .2 that permits an e%%icient test 5hether any line
segment in .1 intersects some line segment in .2. ")hibit 2@.1 sho5s the ideal case 5here .1 and .2 do not interact+
but o% course this cannot al5ays be achieved in a nontrivial 5ayL and even i% . can be separated as the %igure
suggests+ %inding such a separating line looks like a more %ormidable problem than the original intersection
problem. Thus+ in general+ 5e have to test each line segment in .1 against every line segment in .2+ a test that may
take Hn
2
I time.
The second approach o% dividing the embedding space has the un%ortunate conse0uence o% e%%ectively increasing
our data set. "very segment that straddles the dividing line gets EcutE Hi.e. processed t5ice+ once %or each hal%
spaceI. The t5o resulting subproblems 5ill be o% si6e n; and nE+ respectively+ 5ith n; ] nE d n+ in the 5orst case n; ]
nE V 2 K n. At recursion depth d 5e may have 2
d
K n subsegments to process. !o optimal algorithm is kno5n that
uses this techni0ue.
The key idea in designing an optimal algorithm is the observation that those line segments that intersect a
vertical line L at abscissa ) are totally ordered& A segment s lies belo5 segment t+ 5ritten s eL
t+ i% both intersect L at
the current position ) and the intersection o% s 5ith L lies belo5 the intersection o% t 5ith L. :ith respect to this
order a line segment may have an upper and a lo5er neighbor+ and ")hibit 2@.2 sho5s that s and t are neighbors at
).
")hibit 2@.2& The s5eep line L totally orders the segments that intersect L.
:e describe the intersection test algorithm under the assumption that the con%iguration is nondegenerate Hi.e.
no three segments intersect in the same pointI. /or simplicity;s sake 5e also assume that no segment is vertical+ so
every segment has a le%t endpoint and a right endpoint. The latter assumption entails no loss o% generality& /or a
vertical segment+ 5e can arbitrarily de%ine the lo5er endpoint to be the Ele%t endpointE+ thus imposing a
le)icographic H)+ yI#order to re%ine the )#order. :ith the important assumption o% non#degeneracy+ t5o line
segments s and t can intersect at )
0
only i% there e)ists an abscissa ) e )
0
5here s and t are neighbors. Thus it
2D7
This book is licensed under a Creative Commons Attribution 3.0 License
su%%ices to test all segment pairs that become neighbors at some time during a le%t#to#right s5eep o% L # a number
that is usually signi%icantly smaller than n K Hn U 1I Q 2.
As the s5eep line L moves %rom le%t to right across the con%iguration+ the order eL among the line segments
intersecting L changes only at endpoints o% a segment or at intersections o% segments. As 5e intend to stop the
s5eep as soon as 5e discover an intersection+ 5e need to per%orm the intersection test only at the le%t and right
endpoints o% segments. A segment t is tested at its le%t endpoint %or intersection 5ith its lo5er and upper neighbors.
At the right endpoint o% t 5e test its lo5er and upper neighbor %or intersection H")hibit 2@.3I.
The algorithm terminates as soon as 5e discover an intersecting pair o% segments. ,iven n segments+ each o%
")hibit 2@.3& Three pair5ise intersection tests charged to segment t.
5hich may generate three intersection tests as sho5n in ")hibit 2@.3 Ht5o at its le%t+ one at its right endpointI+ 5e
per%orm the CH1I pair5ise segment intersection test at most 3 K n times. This linear bound on the number o% pairs
tested %or intersection might raise the hope o% %inding a linear#time algorithm+ but so %ar 5e have counted only the
geometric primitive& E(oes a pair o% segments intersect # yes or no?E Aiding in the background 5e %ind bookkeeping
operations such as E/ind the upper and lo5er neighbor o% a given segmentE+ and these turn out to be costlier than
the geometric ones. :e 5ill %ind neighbors e%%iciently by maintaining the order eL in a data structure called a y#
table during the entire s5eep.
The s(eleton# Turning a space dimension into a time dimension
The name plane-sweep is derived %rom the image o% s5eeping the plane %rom le%t to right 5ith a vertical line
H%ront+ or cross sectionI+ stopping at every transition point HeventI o% a geometric con%iguration to update the cross
section. All processing is done at this moving %ront+ 5ithout any backtracking+ 5ith a look#ahead o% only one point.
The events are stored in the )#0ueue+ and the current cross section is maintained by the y#table. The skeleton o% a
plane#s5eep algorithm is as %ollo5s&
initB& inita&
while not emptyB do { e -. nextB& transition*e+ }
The procedures ;initW; and ;initX; initiali6e the )#0ueue and the y#table. ;ne)tW; returns the ne)t event in the )#
0ueue+ ;emptyW; tells us 5hether the )#0ueue is empty. The procedure ;transition;+ the advancing mechanism o% the
s5eep+ embodies all the 5ork to be done 5hen a ne5 event is encounteredL it moves the %ront %rom the slice to the
le%t o% an event e to the slice immediately to the right o% e.
Data structures
/or the line segment intersection test+ the )#0ueue stores the le%t and right endpoints o% the given line segments+
ordered by their )#coordinate+ as events to be processed 5hen updating the vertical cross section. "ach endpoint
stores a re%erence to the corresponding line segment. :e compare points by their )#coordinates 5hen building the
Algorithms and Data Structures 2DD A ,lobal Te)t
25. 5lane!s6eep: a general!purpose algorithm for t6o!dimensional problems illustrated using line segment
intersection
)#0ueue. /or simplicity o% presentation 5e assume that no t5o endpoints o% line segments have e0ual )# or y#
coordinates. The only operation to be per%ormed on the )#0ueue is ;ne)tW;& it returns the ne)t event Hi.e. the ne)t le%t
or right endpoint o% a line segment to be processedI. The cost %or initiali6ing the )#0ueue is CHn K log nI+ the cost %or
per%orming the ;ne)tW; operation is CH1I.
The y#table contains those line segments that are currently intersected by the s5eep line+ ordered according to
eL. $n the slice bet5een t5o events+ this order does not change+ and the y#table needs no updating H")hibit 2@.<I.
The y#table is a dictionary that supports the operations ;insertX;+ ;deleteX;+ ;succX;+ and ;predX;. :hen entering the
le%t endpoint o% a line segment s 5e %ind the place 5here s is to be inserted in the ordering o% the y#table by
comparing s to other line segments t already stored in the y#table. :e can determine 5hether s eL
t or t eL
s by
determining on 5hich side o% t the le%t endpoint o% s lies. As 5e have seen in chapter 1< in the section R$ntersectionS+
this tends to be more e%%icient than computing and comparing the intersection points o% s and t 5ith the s5eep line.
$% 5e implement the dictionary as a balanced tree He.g. an AFL treeI+ the operations ;insertX; and ;deleteX; are
per%ormed in CHlog nI time+ and ;succX; and ;predX; are per%ormed in CH1I time i% additional pointers in each node
o% the tree point to the successor and predecessor o% the line segment stored in this node. .ince there are 2 K n
events in the )#0ueue and at most n line segments in the y#table the space comple)ity o% this plane#s5eep algorithm
is CHnI.
")hibit 2@.<& The y#table records the varying state o% the s5eep line L.
Fpdating the y)table and detecting an intersection
The procedure ;transition; maintains the order eL o% the line segments intersecting the s5eep line and per%orms
intersection tests. At a le%t endpoint o% a segment t+ t is inserted into the y#table and tested %or intersection 5ith its
lo5er and upper neighbors. At the right endpoint o% t+ t is deleted %rom the y#table and its t5o %ormer neighbors are
tested. The algorithm terminates 5hen an intersection has been %ound or all events in the )#0ueue have been
processed 5ithout %inding an intersection&
procedure transition*e- event+&
egin
s -. segment*e+&
if left6oint*e+ then egin
inserta*s+&
2D9
This book is licensed under a Creative Commons Attribution 3.0 License
if intersect*preda*s+, s+ or intersect *s, succa*s++ then
terminate*,intersection found,+
end
else { e is right endpoint of s } egin
if intersect*preda*s+, succa*s++ then
terminate*,intersection found,+&
deletea*s+
end
end&
:ith at most 2 K n events+ and a call o% ;transition; costing time CHlog nI+ this plane#s5eep algorithm needs CHn K
log nI time to per%orm the line segment intersection test.
S*eeping across intersections
The plane#s5eep algorithm %or the line segment intersection test is easily adapted to the %ollo5ing more general
problem NBC 79O&
,iven n line segments+ report all intersections.
$n addition to the le%t and right endpoints+ the )#0ueue no5 stores intersection points as eventsYany
intersection detected is inserted into the )#0ueue as an event to be processed. :hen the s5eep line reaches an
intersection event the t5o participating line segments are s5apped in the y#table H")hibit 2@.@I. The ma3or increase
in comple)ity as compared to the segment intersection test is that no5 5e must process not only 2 K n events+ but 2 K
n ] k events+ 5here k is the number o% intersections discovered as 5e s5eep the plane. A con%iguration 5ith n Q 2
segments vertical and n Q 2 hori6ontal sho5s that+ in the 5orst case+ k Hn
2
I+ 5hich leads to an CHn
2
K log nI
algorithm+ certainly no improvement over the brute#%orce comparison o% all pairs. $n most realistic con%igurations+
say engineering dra5ings+ the number o% intersections is much less than CHn
2
I+ and thus it is in%ormative to
introduce the parameter k in order to get an output#sensitive bound on the comple)ity o% this algorithm Hi.e. a
bound that adapts to the amount o% data needed to report the result o% the computationI.
")hibit 2@.@& .5eeping across an intersection.
Cther changes are comparatively minor. The )#0ueue must be a priority 0ueue that supports the operation
;insertW;L it can be implemented as a heap. The cost %or initiali6ing the )#0ueue remains CHn K log nI. :ithout
%urther analysis one might presume that the storage re0uirement o% the )#0ueue is CHn ] kI+ 5hich implies that the
cost %or calling ;insertW; and ;ne)tW; remains CHlog nI+ since k CHn
2
I. A more detailed analysis N2. 91O+ ho5ever+
sho5s that the si6e o% the )#0ueue never e)ceeds CHn K Hlog nI
2
I. :ith a slight modi%ication o% the algorithm NBro D1O
Algorithms and Data Structures 290 A ,lobal Te)t
25. 5lane!s6eep: a general!purpose algorithm for t6o!dimensional problems illustrated using line segment
intersection
it can even be guaranteed that the si6e o% the )#0ueue never e)ceeds CHnI. The cost %or e)changing t5o intersecting
line segments in the y#table is CHlog nI+ the costs %or the other operations on the y#table remain the same. .ince
there are 2 K n le%t and right endpoints and k intersection events+ the total cost %or this algorithm is CHHn ] kI K log
nI. As most realistic applications are characteri6ed by k CHnI+ reporting all intersections o%ten remains an CHn K
log nI algorithm in practice. A time#optimal algorithm that %inds all int ersecting pairs o% line segments in CHn K log n
] kI time using CHn ] kI storage space is described in NC" 92O.
Degenerate configurations$ numerical errors$ robustness
The discussion above is based on several assumptions o% nondegeneracy+ some o% minor and some o% ma3or
importance. Let us e)amine one o% each type.
:henever 5e access the )#0ueue H;ne)tW;I+ 5e used an implicit assumption that no t5o events Hendpoints or
intersectionsI have e0ual )#coordinates. The order o% processing events o% e0ual )#coordinate is irrelevant.
Assuming that no t5o events coincide at the same point in the plane+ le)icographic H)+ yI#ordering is a convenient
systematic 5ay to de%ine ;ne)tW;.
'ore serious %orms o% degeneracy arise 5hen events coincide in the plane+ such as more than t5o segments
intersecting in the same point. This type o% degeneracy is particularly di%%icult to handle in the presence o%
numerical errors+ such as rounding errors. $n the con%iguration sho5n in ")hibit 2@.> an endpoint o% u lies e)actly
or nearly on segment s. :e may not care 5hether the intersection routine ans5ers ;yes; or ;no; to the 0uestion E(o s
and u intersect?E but 5e certainly e)pect a ;yes; 5hen asking E(o t and u intersect?E This e)ample sho5s that the
slightest numerical inaccuracy can cause a serious error& The algorithm may %ail to report the intersection o% t and
u+ 5hich it 5ould clearly see i% it bothered to look # but the algorithm looks the other 5ay and never asks the
0uestion E(o t and u intersect?E
")hibit 2@.>& A degenerate con%iguration may lead to inconsistent
results.
The trace o% the plane#s5eep %or reporting intersections may look as %ollo5s&
1. s is inserted into the y#table
2. t is inserted above s into the y#table+ and s and t are tested %or intersection& !o intersection is %ound
3. u is inserted belo5 s in the y#table Hsince the evaluation o% the %unction sH)I may conclude that the le%t
endpoint o% u lies belo5 sIL s and u are tested %or intersection+ but the intersection routine may conclude
that s and u do not intersect& u remains belo5 s
291
This book is licensed under a Creative Commons Attribution 3.0 License
<. (elete u %rom the y#table
@. (elete s %rom the y#table
>. (elete t %rom the y#table
!otice the calamity that struck at the critical step 3. The evaluation o% a linear e)pression sH)I and the
intersection routine %or t5o segments both arrived at a result that+ in isolation+ is reasonable 5ithin the tolerance o%
the underlying arithmetic. The t5o results together are inconsistentM $% the evaluation o% sH)I concludes that the le%t
endpoint o% u lies belo5 s+ the intersection routine must conclude that s and u intersectM $% these t5o geometric
primitives %ail to coordinate their ans5ers+ catastrophe may strike. $n our e)ample+ u and t never become neighbors
in the y#table+ so their intersection gets lost.
")ercises
1. .ho5 that there may be Hn
2
I intersections in a set o% n line segments.
2. (esign a plane#s5eep algorithm that determines in CHn K log nI time 5hether t5o simple polygons 5ith a
total o% n vertices intersect.
3. (esign a plane#s5eep algorithm that determines in CHn K log nI time 5hether any t5o disks in a set o% n
disks intersect.
<. (esign a plane#s5eep algorithm that solves the line visibility problem discussed in chapter 2< in the section
RFisibility in the plane& a simple algorithm 5hose analysis is notS in time CHHn ] kI K log nI+ 5here k CHn
2
I
is the number o% intersections o% the line segments.
@. ,ive a con%iguration 5ith the smallest possible number o% line segments %or 5hich the %irst intersection
point reported by the plane#s5eep algorithm in chapter 2@ in the section R.5eeping across intersectionsS is
not the le%tmost intersection point.
>. Adapt the plane#s5eep algorithm presented in chapter 2@ in the section R.5eeping across intersectionsS to
detect all intersections among a given set o% n hori6ontal or vertical line segments. Xou may assume that the
line segments do not overlap. :hat is the time comple)ity o% this algorithm i% the hori6ontal and vertical
line segments intersect in k points?
7. (esign a plane#s5eep algorithm that %inds all intersections among a given set o% n rectangles all o% 5hose
sides are parallel to the coordinate a)es. :hat is the time comple)ity o% your algorithm?
Algorithms and Data Structures 292 A ,lobal Te)t
2#. )he closest pair
-5& The closest pair
Learning ob3ectives&
Applying+ implementing and analy6ing plane s5eep
*sing plane s5eep on three or more dimensions
.5eep algorithms solve many kinds o% pro)imity problems e%%iciently. :e present a simple s5eep that solves the
t5o#dimensional closest pair problem elegantly in asymptotically optimal time. :e e)plain 5hy s5eeping
generali6es easily+ but not e%%iciently+ to multidimensional closest pair problems.
The problem
:e consider the t5o#dimensional closest pair problem& ,iven a set . o% n points in the plane %ind a pair o%
points 5hose distance is smallest H")hibit 2>.1I. :e measure distance using the metric dk+
%or any k Z 1+ or d_+
de%ined as&
")hibit 2>.1& $denti%y a closest pair among n points in the plane.
.pecial cases o% interest include the E'anhattan metricE d1+ the E"uclidean metricE d2+ and the Ema)imum
metricE d_. ")hibit 2>.2 sho5s the EcirclesE o% radius 1 centered at a point p %or some o% these metrics.
")hibit 2>.2& The results o% this chapter remain valid 5hen distances are measured in various metrics.
The closest pair problem has a lo5er bound Hn K log nI in the algebraic decision tree model o% computation N2.
D@O. $ts solution can be obtained in asymptotically optimal time CHn K log nI as a special case o% more general
problems+ such as ;all#nearest#neighbors; NA!. 92O H%or each point+ %ind a nearest neighborI+ or constructing the
Foronoi diagram N.A 7@O. These general approaches call on po5er%ul techni0ues that make the resulting algorithms
harder to understand than one 5ould e)pect %or a simply stated problem such as E%ind a closest pairE. The divide#
and#con0uer algorithm presented in NB. 7>O solves the closest pair problem directly in optimal 5orst#case time
comple)ity Hn K log nI using the "uclidean metric d2. :hereas the recursive divide#and#con0uer algorithm
293
This book is licensed under a Creative Commons Attribution 3.0 License
involves an intricate argument %or combining the solutions o% t5o e0ually si6ed subsets+ the iterative plane#s5eep
algorithm NA!. DDO uses a simple incremental update& .tarting 5ith the empty set o% points+ keep adding a single
point until the %inal solution %or the entire set is obtained. A similar plane#s5eep algorithm solves the closest pair
problem %or a set o% conve) ob3ects NBA 92O.
!lane)s*eep applied to the closest pair problem
The skeleton o% the general s5eep algorithm presented in chapter 2@ in the section RThe skeleton& turning a
space dimension into a time dimensionS+ 5ith the data structures )#0ueue and y#table+ is adapted to the closest pair
problem as sho5n in ")hibit 2>.3. The *-,ueue stores the points o% the set .+ ordered by their )#coordinate+ as
events to be processed 5hen updating the vertical cross section. T5o pointers into the )#0ueue+ ;tail; and ;current;+
partition . into %our dis3oint subsets&
1. The discarded points to the le%t o% ;tail; are not accessed any longer
2. The active points bet5een ;tail; HinclusiveI and ;current; He)clusiveI are being 0ueried
3. The current transition point+ p+ is being processed
<. The %uture points have not yet been looked at
The y#table stores the active points only+ ordered by their y#coordinate.
")hibit 2>.3& *pdating the invariant as the ne)t point p is processed.
:e need to compare points by their )#coordinates 5hen building the )#0ueue+ and by their y#coordinates 5hile
s5eeping. /or simplicity o% presentation 5e assume that no t5o points have e0ual )# or y#coordinates. 2oints 5ith
e0ual )# or y#coordinates are handled by imposing an arbitrary+ but consistent+ total order on the set o% points. :e
achieve this by de%ining t5o le)icographic orders& e
)
to be used %or the )#0ueue+ e
y
%or the y#table&
Algorithms and Data Structures 29< A ,lobal Te)t
2#. )he closest pair
The program o% the %ollo5ing section initiali6es the )#0ueue and y#table 5ith the t5o le%tmost points being
active+ 5ith e0ual to their distance+ and starts the s5eep 5ith the third point.
The distinction bet5een discarded and active points is motivated by the %ollo5ing argument. :hen a ne5 point
p is encountered 5e 5ish to ans5er the 0uestion 5hether this point %orms a closest pair 5ith one o% the points to its
le%t. :e keep a pair o% closest points seen so %ar+ along 5ith the corresponding minimal distance . There%ore+ all
candidates that may %orm a ne5 closest pair 5ith the point p on the s5eep line lie in a hal% circle centered at p+ 5ith
radius .
The key 0uestion to be ans5ered in striving %or e%%iciency is ho5 to retrieve 0uickly all the points seen so %ar that
lie inside this hal% circle to the le%t o% p+ in order to compare their distance to p against the minimal distance seen
so %ar. :e may use any help%ul data structure that organi6es the points seen so %ar+ as long as 5e can update this
data structure e%%iciently across a transition. A circle Hor hal%#circleI 0uery is comple)+ at least 5hen embedded in a
plane#s5eep algorithm that organi6es data according to an orthogonal coordinate system. A rectangle 0uery can be
ans5ered more e%%iciently. Thus 5e replace the hal%#circle 0uery 5ith a bounding rectangle 0uery+ accepting the %act
that 5e might include some e)traneous points+ such as 0.
The rectangle 0uery in ")hibit 2>.3 is implemented in t5o steps. /irst+ 5e cut o%% all the points to the le%t at
distance Z %rom the s5eep line. These points lie bet5een ;tail; and ;current; in the )#0ueue and can be discarded
easily by advancing ;tail; and removing them %rom the y#table. .econd+ 5e consider only those points 0 in the #slice
5hose vertical distance %rom p is less than & \0y
U py\ e . These points can be %ound in the y#table by looking at
successors and predecessors starting at the y#coordinate o% p. $n other 5ords+ 5e maintain the %ollo5ing invariant
across a transition&
1. x is the minimal distance bet5een a pair o% points seen so %ar Hdiscarded or activeI.
2. The active points H%ound in the )#0ueue bet5een ;tail; and ;current;+ and stored in the y#table ordered by y#
coordinatesI are e)actly those that lie in the interior o% a x#slice to the le%t o% the s5eep line.
3. There%ore+ processing the transition point p involves three steps&
<. (elete all points 0 5ith 0) ` p) U x%rom the y#table. They are %ound by advancing ;tail; to the right.
@. $nsert p into the y#table.
>. /ind all points 0 in the y#table 5ith \0y U py\ e x by looking at the successors and predecessors o% p. $% such
a point 0 is %ound and its distance %rom p is smaller than x+ update x and the closest pair %ound so %ar.
"mplementation
$n the %ollo5ing implementation the )#0ueue is reali6ed by an array that contains all the points sorted by their )#
coordinate. ;closestLe%t; and ;closest8ight; describe the pair o% closest points %ound so %ar+ n is the number o% points
under consideration+ and t and c determine the positions o% ;tail; and ;current;&
xQueue- arrayF" .. maxDG of point&
closest8eft, closest5ight- point&
t, c, n- " .. maxD&
29@
This book is licensed under a Creative Commons Attribution 3.0 License
The xMAueue is initialiLed y
procedure initB&
,initB, stores all the points into the xMAueue, ordered y their xM
coordinates.
The empty yMtale is created y
procedure inita&
O new point is inserted into the yMtale y
procedure inserta*p- point+&
O point is deleted from the yMtale y
procedure deletea*p- point+&
The successor of a point in the yMtale is returned y
function succa*p- point+- point&
The predecessor of a point in the yMtale is returned y
function preda*p- point+- point&
The initiali6ation part o% the plane#s5eep is as %ollo5s&
initB& inita&
closest8eft -. xQueueF"G& closest5ight -. xQueueF$G&
delta -. distance*closest8eft, closest5ight+&
inserta*closest8eft+& inserta*closest5ight+&
c -. %&
The events are processed by the loop&
while c J n do egin transition& c -. c 9 "& { next event }
end&
The procedure ;transition; encompasses all the 5ork to be done %or a ne5 point&
procedure transition&
egin
{ step &= remove points outside the -slice from the y-table }
current -. xQueueFcG&
while current.x 0 xQueueFtG.x I delta do egin
deletea*xQueueFtG+& t -. t 9 "
end&
{ step 5= insert the new point into the y-table }
inserta*current+&
{ step #a= check the successors of the new point in the y-table }
check -. current&
repeat
check -. succa*check+&
new@elta -. distance*current, check+&
if new@elta S delta then egin
delta -. new@elta&
closest8eft -. check& closest5ight -. current&
end&
until check.y 0 current.y K delta&
{ step #b= check the predecessors of the new point in the y-
table }
check -. current&
repeat
check -. preda*check+&
new@elta -. distance*current, check+&
Algorithms and Data Structures 29> A ,lobal Te)t
2#. )he closest pair
if new@elta S delta then egin
delta -. new@elta&
closest8eft -. check& closest5ight -. current&
end&
until current.y 0 check.y K delta&
end& { transition }
Analysis
:e sho5 that the algorithm described can be implemented so as to run in 5orst#case time CHn K log nI and space
CHnI.
$% the y#table is implemented by a balanced binary tree He.g. an AFL#tree or a 2#3#treeI the operations ;insertX;+
;deleteX;+ ;succX;+ and ;predX; can be per%ormed in time CHlog nI. The space re0uired is CHnI.
;initW; builds the sorted )#0ueue in time CHn K log nI using space CHnI. The procedure ;deleteX; is called at most
once %or each point and thus accumulates to CHn K log nI. "very point is inserted once into the y#table+ thus the calls
o% ;insertX; accumulate to CHn K log nI.
There remains the problem o% analy6ing step 3. The loop in step 3a calls ;succX; once more than the number o%
points in the upper hal% o% the bounding bo). .imilarly+ the loop in step 3b calls ;predX; once more than the number
o% points in the lo5er hal% o% the bounding bo). A standard counting techni0ue sho5s that the bounding bo) is
sparsely populated& /or any metric dk+ the bo) contains no more than a small+ constant number ck o% points+ and %or
any k+ ck
` D. Thus ;succX; and ;predX; are called no more than 10 times+ and step 3 costs time CHlog nI.
The key to this counting is the %act that no t5o points in the y#table can be closer than + and thus not many o%
them can be packed into the bounding bo) 5ith sides and 2 K . :e partition this bo) into the eight pair5ise
dis3oint+ mutually e)haustive regions sho5n in ")hibit 2>.<. These regions are hal% circles o% diameter in the
'anhattan metric d1+ and 5e %irst argue our case only 5hen distances are measured in this metric. !one o% these
hal%#circles can contain more than one point. $% a hal%#circle contained t5o points at distance + they 5ould have to
be at opposite ends o% the uni0ue diameter o% this hal%#circle. These endpoints lie on the le%t or the right boundary
o% the bounding bo)+ and these t5o boundary lines cannot contain any points+ %or the %ollo5ing reasons&
!o active point can be located on the le%t boundary o% the bounding bo)L such a point 5ould have been
thro5n out 5hen the #slice 5as last updated.
!o active point can e)ist on the right boundary+ as that )#coordinate is preempted by the transition point p
being processed Hremember our assumption o% une0ual )#coordinatesI.
297
This book is licensed under a Creative Commons Attribution 3.0 License
")hibit 2>.<& Cnly %e5 points at pair5ise distance Z can populate a bo) o% si6e 2 K by .
:e have sho5n that the bounding bo) can hold no more than eight points at pair5ise distance Z 5hen using
the 'anhattan metric d1. $t is 5ell kno5n that %or any points p+ 0+ and %or any k d 1& d1Hp+0I d dkHp+0I d d_Hp+0I. Thus
the bounding bo) can hold no more than eight points at pair5ise distance Z 5hen using any distance dk or d_.
There%ore+ the calculation o% the predecessors and successors o% a transition point costs time CHlog nI and
accumulates to a total o% CHn K log nI %or all transitions. .umming up all costs results in CHn K log nI time and CHnI
space comple)ity %or this algorithm. .ince Hn K log nI is a lo5er bound %or the closest pair problem+ 5e kno5 that
this algorithm is optimal.
S*eeping in three or more dimensions
To gain insight into the po5er and limitation o% s5eep algorithms+ let us e)plore 5hether the algorithm
presented generali6es to higher#dimensional spaces. :e illustrate our reasoning %or three#dimensional space+ but
the same conclusion holds %or any number o% dimensions d 2. All o% the %ollo5ing steps generali6e easily.
.ort all the points according to their )#coordinate into the )#0ueue. .5eep space 5ith a y#6 plane+ and in
processing the current transition point p+ assume that 5e kno5 the closest pair among all the points to the le%t o% p+
and their distance . Then to determine 5hether p %orms a ne5 closest pair+ look at all the points inside a hal%#
sphere o% radius centered at p+ e)tending to the le%t o% p. $n the hope o% implementing this sphere 0uery e%%iciently+
5e enclose this hal% sphere in a bounding bo) o% side length 2 K in the y# and 6#dimension+ and in the )#
dimension. $nside this bo) there can be at most a small+ constant number ck o% points at pair5ise distance Z 5hen
using any distance dk or d_.
:e implement this bo) 0uery in t5o steps& H1I by cutting o%% all the points %arther to the le%t o% p than + 5hich is
done by advancing ;tail; in the )#0ueue+ and H2I by per%orming a s0uare 0uery among the points currently in the y#6#
table H5hich all lie in the #slice to the le%t o% the s5eep planeI+ as sho5n in ")hibit 2>.@. !o5 5e have reached the
only place 5here the three#dimensional algorithm di%%ers substantially. $n the t5o#dimensional case+ the
corresponding one#dimensional interval 0uery can be implemented e%%iciently in time CHlog nI using %ind+
predecessor+ and successor operations on a balanced tree+ and using the kno5ledge that the si6e o% the ans5er set is
Algorithms and Data Structures 29D A ,lobal Te)t
2#. )he closest pair
bounded by a constant. $n the three#dimensional case+ the corresponding t5o#dimensional orthogonal range 0uery
cannot in general be ans5ered in time CHlog nI Hper retrieved pointI using any o% the kno5n data structures.
.traight%or5ard search re0uires time CHnI+ resulting in an overall time CHn
2
I %or the space s5eep. This is not an
interesting result %or a problem that admits the trivial CHn
2
I algorithm o% comparing every pair.
")hibit 2>.@& .5eeping a plane across three#dimensional space. $deas generali6e+ but e%%iciency does not.
.5eeping reduces the dimensionality o% a geometric problem by one+ by replacing one space dimension by a
Etime dimensionE. 8educing a t5o#dimensional problem to a se0uence o% one#dimensional problems is o%ten
e%%icient because the total order de%ined in one dimension allo5s logarithmic search times. $n contrast+ reducing a
three#dimensional problem to a se0uence o% t5o#dimensional problems rarely results in a gain in e%%iciency.
")ercises
1. Consider the %ollo5ing modi%ication o% the plane#s5eep algorithm %or solving the closest pair problem NBA
92O. :hen encountering a transition point p do not process all points 0 in the y#table 5ith \0y U py\ e + but
test only 5hether the distance o% p to its successor or predecessor in the y#table is smaller than . :hen
deleting a point 0 5ith 0) ` p)
U %rom the y#table test 5hether the successor and predecessor o% 0 in the y#
table are closer than . $% a pair o% points 5ith a smaller distance than the current is %ound update and
the closest pair %ound so %ar. 2rove that this modi%ied algorithm %inds a closest pair :hat is the time
comple)ity o% this algorithm?
2. (esign a divide#and#con0uer algorithm 5hich solves the closest pair problem. :hat is the time comple)ity
o% your algorithm? =int: 2artition the set o% n points by a vertical line into t5o subsets o% appro)imately n Q
2 points. .olve the closest pair problem recursively %or both subsets. $n the con0uer step you should use the
%act that is the smallest distance bet5een any pair o% points both belonging to the same subset. A point
%rom the le%t subset can only have a distance smaller than to a point in the right subset i% both points lie in
a 2 K #slice to the le%t and to the right o% the partitioning line. There%ore+ you only have to match points
lying in the le%t #slice against points lying in the right #slice.
299